编译原理 数据流方程_数据科学中最可悲的方程式
編譯原理 數(shù)據(jù)流方程
重點 (Top highlight)
Prepare a box of tissues! I’m about to drop a truth bomb about statistics and data science that’ll bring tears to your eyes.
準備一盒紙巾! 我將投放一本關(guān)于統(tǒng)計和數(shù)據(jù)科學(xué)的真相炸彈,這會讓您眼淚汪汪。
SOURCE.SOURCE 。INFERENCE = DATA + ASSUMPTIONS. In other words, statistics does not give you truth.
推斷=數(shù)據(jù)+假設(shè)。 換句話說,統(tǒng)計并不能為您提供真實的信息。
常見的神話 (Common myths)
Here are some standard misconceptions:
以下是一些標準的誤解:
“If I find the right equations, I can know the unknown.”
“如果找到正確的方程式,我就能知道未知數(shù)。”
“If I math at my data hard enough, I can reduce my uncertainty.”
“如果我對數(shù)據(jù)進行足夠的數(shù)學(xué)計算,就可以減少不確定性。”
“Statistics can transform data into truth!”
“統(tǒng)計可以將數(shù)據(jù)轉(zhuǎn)化為事實!”
They sound like fairytales, don’t they? That’s because they are!
他們聽起來像童話,不是嗎? 那是因為他們!
痛苦的事實 (Painful truths)
There is no magic in the world that lets you make something out of nothing, so abandon that hope now. That’s not what statistics is about. Take it from a statistician. (As a bonus, this article might save you from wasting a decade of your life studying the dark arts of statistics to chase that elusive dream.)
世界上沒有任何魔法可以讓您一無所有,所以現(xiàn)在就放棄那個希望。 那不是統(tǒng)計的意義。 從統(tǒng)計學(xué)家那里拿來。 (作為獎勵,這篇文章可能使您免于浪費生命的十年來研究統(tǒng)計的黑暗藝術(shù)來追逐那個難以捉摸的夢想。)
Unfortunately, there are plenty of charlatans out there who may try to convince you otherwise. They’ll pull a classic bullying move on you, “You don’t understand the equations I’m clobbering you with, so bow before my superiority and do what I say!”
不幸的是,那里有許多騙子可能試圖說服您。 他們將向您施加經(jīng)典的欺凌舉動, “您不理解我正在困擾您的方程式,所以在我的優(yōu)勢面前屈服,做我說的!”
Resist those posers.
抵制那些裝腔作勢者。
SOURCE.SOURCE 。伊卡洛斯(Icarus)別摔了! (Don’t land with a splat, Icarus!)
Think of statistical inference (“statistics” for short) as an Icarus-like leap from what we know (our sample data) to what we don’t (our population parameter).
將統(tǒng)計推斷(簡稱“ 統(tǒng)計 ”)視為從我們所知道的(我們的樣本數(shù)據(jù) )到我們所不知道的(我們的總體參數(shù) )類似伊卡洛斯的飛躍。
In statistics, what you know is not what you wish you knew.
在統(tǒng)計中,您所知道的并不是您所希望的。
Perhaps you want tomorrow’s facts, but you only have the past to inform you. (It’s so annoying when we can’t remember the future, right?) Perhaps you want to know what all your potential users think of your product, but you can only ask a hundred of them. Then you’re dealing with uncertainty!
也許您想要明天的事實,但只有過去可以告訴您。 (當(dāng)我們不記得未來時,這真令人討厭,對嗎?)也許您想知道所有潛在用戶對您產(chǎn)品的看法,但您只能問其中的一百個 。 然后,您正在處理不確定性 !
這不是魔術(shù),而是假設(shè) (It’s not magic, it’s assumptions)
How can you possibly leap from what you know to what you don’t? You need a bridge to cross that chasm… and that bridge is assumptions. Which brings me back to the most painful equation in all of data science: DATA + ASSUMPTIONS = PREDICTION.
您怎么可能從知道的知識躍升為不知道的知識? 您需要一座橋梁來克服這一鴻溝……而這座橋梁是假設(shè) 。 這使我回到了所有數(shù)據(jù)科學(xué)中最痛苦的方程式:數(shù)據(jù)+假設(shè)=預(yù)測。
DATA + ASSUMPTIONS = PREDICTION
數(shù)據(jù)+假設(shè)=預(yù)測
(Feel free to replace the word “prediction” with “inference” or “forecast” if you like — they’re all the same thing here: a statement about something you can’t know for sure.)
(如果愿意,可以用“ 推斷 ”或“ 預(yù)測 ”替換“ 預(yù)測 ”一詞,它們在這里都是一樣的:關(guān)于您不確定的事情的陳述。)
SOURCE.SOURCE 。有什么假設(shè)? (What‘s an assumption?)
If we knew all the facts (and we knew that our facts were actually true facts), we wouldn’t need assumptions (or statisticians). Assumptions are the ugly patches you use to bridge the gap between what you know and what you wish you knew. They’re hacks you have to use to make the math work out when you’re missing the facts.
如果我們知道所有事實 (并且我們知道我們的事實實際上是真實的事實),則不需要假設(shè)(或統(tǒng)計學(xué)家)。 假設(shè)是您用來彌合您所知道和所希望之間的鴻溝的丑陋補丁。 當(dāng)您錯過事實時,您必須使用它們來進行數(shù)學(xué)計算。
Assumptions are ugly band-aids you put over the parts where information is missing.
假設(shè)是您在缺少信息的部分上貼上了丑陋的創(chuàng)可貼。
Should I put it more bluntly? An assumption is not a fact, it’s some nonsense you make up precisely because you’ve got gaping holes in your knowledge. If you’re in the habit of bullying people with your overconfidence intervals, take a moment to remind yourself of that it’s a stretch to refer to anything based on assumptions as truth. It’s best to start treating the whole thing as a personal decision-making tool that is imperfect but better than nothing (in specific situations).
我應(yīng)該說得更直白些嗎? 假設(shè)不是事實,這恰恰是因為您的知識空洞而造成的,這是胡說八道。 如果您習(xí)慣于以過分自信的時間欺負他人,請花點時間提醒自己,將任何基于假設(shè)的東西稱為真理是很困難的 。 最好開始將整個事情視為不完美但總比沒有好( 在特定情況下 )的個人決策工具 。
Statistics is your attempt to do your best in an uncertain world.
統(tǒng)計數(shù)據(jù)是您在不確定的世界中盡力而為的嘗試。
There are always assumptions.
總有假設(shè)。
假設(shè)是決策的一部分 (Assumptions are part of decision-making)
Show me an “assumption-free” real-world decision and I’ll rattle off a host of implicit assumptions you’re not even aware you’re making.
向我展示一個“無假設(shè)”的現(xiàn)實決策,我會冒充您甚至不知道自己在做的一系列隱含假設(shè)。
Examples: When you read a newspaper, did you assume all the facts were checked? When you made your plans for 2020, did you assume there would be no global pandemic? If you analyzed data, did you assume the information was captured without errors? Did you assume that your random number generator is random? (They usually aren’t.) When you chose to make an online purchase, did you assume the right amount would be withdrawn from your bank account? What about the last snack you had, did you assume it wouldn’t poison you? When you took medicine, did you *know* anything about its long-term safety and efficacy… or did you assume?
示例: 當(dāng)您閱讀報紙時,您是否假設(shè)所有事實都經(jīng)過檢查? 當(dāng)您制定2020年計劃時,您是否假設(shè)不會發(fā)生全球大流行? 如果您分析了數(shù)據(jù),您是否假設(shè)信息被正確捕獲? 您是否假設(shè)您的隨機數(shù)生成器是隨機的? (通常不是。)當(dāng)您選擇進行在線購買時,您是否假設(shè)將從您的銀行帳戶中提取了正確的金額? 您最近吃的零食怎么樣,您是否認為它不會毒死您? 當(dāng)您服藥時,您是否*知道*有關(guān)其長期安全性和功效的任何信息……還是您假設(shè)?
Like it or not, assumptions are part of decision-making.
不管喜歡與否,假設(shè)都是決策的一部分。
Like it or not, assumptions are always part of decision-making. A proper foray into real-world data should contain a host of written-down assumptions where the data scientist comes clean about corners they had to cut.
無論喜歡與否,假設(shè)始終是決策的一部分。 對現(xiàn)實世界數(shù)據(jù)的適當(dāng)嘗試應(yīng)包含大量的書面假設(shè), 數(shù)據(jù)科學(xué)家可以清楚地了解自己必須削減的數(shù)據(jù)。
Even if you choose to steer clear of statistics, you’re probably using assumptions to guide your actions. To stay safe, it’s crucial that you keep track of the assumptions that your decisions are based on.
即使您選擇避開統(tǒng)計信息,您也可能會使用假設(shè)來指導(dǎo)自己的行動。 為了保持安全,至關(guān)重要的是,您要跟蹤決策所依據(jù)的假設(shè)。
統(tǒng)計“魔術(shù)”如何發(fā)生 (How the statistical “magic” happens)
The field of statistics gives you a whole arsenal of tools for formalizing your assumptions and combining them with evidence to make reasonable decisions. (Catch my 8 minute intro to stats here.)
統(tǒng)計領(lǐng)域為您提供了一整套工具,用于正規(guī)化您的假設(shè)并將其與證據(jù)結(jié)合以做出合理的決定。 ( 在這里獲取我8分鐘的統(tǒng)計簡介)。
It’s preposterous to expect an analysis involving uncertainty and probability to be a source of truth-with-a-capital-T.
期望將涉及不確定性和概率的分析作為資本真實性T的來源是荒謬的。
Yep, that’s how the statistical “magic” happens. You choose which assumptions you’re willing to live with, then you combine them with data to take reasonable actions on the basis of that unholy union. That’s all statistics is.
是的,這就是統(tǒng)計“魔術(shù)”的發(fā)生方式。 您選擇愿意接受的假設(shè),然后將它們與數(shù)據(jù)結(jié)合起來,以根據(jù)那個邪惡的聯(lián)盟采取合理的行動。 這就是所有統(tǒng)計信息。
SOURCE.SOURCE 。That’s why an analysis involving uncertainty and probability could never be a source of truth-with-a-capital-T. There is no secret dark art that can do that for you.
這就是為什么涉及不確定性和概率的分析永遠不會成為資本真實性的來源。 沒有秘密的黑暗藝術(shù)可以為您做到這一點。
Two people can come to completely different valid conclusions from the same data! All it takes is using different assumptions.
兩個人可以從同一數(shù)據(jù)得出完全不同的有效結(jié)論! 它所要做的只是使用不同的假設(shè)。
It’s also why two people can come to completely different valid conclusions from the same data! All it takes is using different assumptions. Statistics gives you a tool for making decisions more thoughtfully, but there’s no single right way to use it. It’s a personal decision-making tool.
這也是為什么兩個人可以從同一數(shù)據(jù)得出完全不同的有效結(jié)論的原因! 它所要做的只是使用不同的假設(shè)。 統(tǒng)計信息為您提供了一種更周到地制定決策的工具,但是沒有唯一正確的使用方法。 這是個人決策工具。
A study is only as good as the assumptions you’ll make about it.
一項研究僅與您對它所做的假設(shè)一樣好 。
那科學(xué)呢? (What about science?)
What does it mean when a scientist uses statistics to come to a conclusion? Simply that they’ve formed an opinion and have made the decision to share it with the world. That’s not a bad thing — it’s a scientist’s job to form opinions reluctantly, which makes me feel better about assuming that they’re worth listening to.
科學(xué)家使用統(tǒng)計數(shù)據(jù)得出結(jié)論是什么意思? 只是他們已經(jīng)形成了一種意見,并決定與世界分享。 這不是一件壞事-勉強地形成觀點是科學(xué)家的工作,這使我對假設(shè)它們值得聽取感到更好。
It’s a scientist’s job to form opinions reluctantly.
勉強形成意見是科學(xué)家的工作。
I’m a huge fan of taking advice from those who have more expertise and information than I do, but I never let myself confuse their opinions with facts. But while many scientists are well-versed in working with probability, I’ve seen other scientists make enough statistical mess to last several lifetimes. Opinions could not (and should not) convince someone who’s not willing to make the assumption that those opinions were arrived at competently from a blend of evidence and mutually-palatable untested assumptions.
我非常喜歡 忠告 那些比我擁有更多專業(yè)知識和信息的人,但我從來沒有讓自己迷惑他們 意見 與 事實 。 但是,盡管許多科學(xué)家精通概率論,但我已經(jīng)看到其他科學(xué)家在統(tǒng)計上一團糟,可以持續(xù)幾生。 意見不能(也不應(yīng)該)說服別人誰是不愿意讓這些意見是在勝任從證據(jù)和相互 -palatable未經(jīng)檢驗的假設(shè)混合到達的假設(shè) 。
If you’d like to hear more of my musings on science and scientists, read this.
如果您想聽到更多我對科學(xué)和科學(xué)家沉思的,讀 這個 。
綜上所述 (In summary)
It’s best to think of statistics as the science of changing your mind under uncertainty. It’s a framework to help you make thoughtful decisions when you lack information… and there’s no single right way to use it.
最好將統(tǒng)計數(shù)字視為在不確定性下改變主意的科學(xué) 。 它是一個框架,可在您缺乏信息時幫助您做出周到的決定……并且沒有唯一正確的使用方法。
And no, it doesn’t give you the facts you need; it gives you what you need to cope with not having those facts in the first place. The entire point is to help you do your best in an uncertain world.
不,它并不能為您提供所需的事實。 它為您提供了您需要解決的事情,而不是一開始就沒有這些事實。 關(guān)鍵是要幫助您在不確定的世界中盡力而為。
To do that, you’ll have to start making assumptions.
為此,您必須開始進行假設(shè)。
接下來 (Next up)
In follow-up articles, I’ll write about where assumptions come from, how to pick “good” assumptions, and what it means to test an assumption. If these topics intrigue you, your retweets are my favorite motivation for writing.
在后續(xù)文章中,我將介紹假設(shè)的來源,如何選擇“好的”假設(shè)以及檢驗假設(shè)的含義。 如果這些主題引起您的興趣,您的轉(zhuǎn)發(fā)是我最喜歡寫的動機。
In the meantime, most of the links in this article take you to my other musings. Can’t choose? Try one of these:
同時,本文中的大多數(shù)鏈接都將您帶入我的其他想法。 無法選擇? 嘗試以下方法之一:
翻譯自: https://towardsdatascience.com/the-saddest-equation-in-data-science-e60e7819b63f
編譯原理 數(shù)據(jù)流方程
總結(jié)
以上是生活随笔為你收集整理的编译原理 数据流方程_数据科学中最可悲的方程式的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: azure第一个月_MLOps:两个Az
- 下一篇: 梦到前男友和好是什么意思