當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

数据科学和统计学_数据科学中的统计

發(fā)布時(shí)間：2023/12/1 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了数据科学和统计学_数据科学中的统计小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

數(shù)據(jù)科學(xué)和統(tǒng)計(jì)學(xué)

統(tǒng)計(jì) (Statistics)

Statistics are utilized to process complex issues in reality with the goal that Data Scientists and Analysts can search for important patterns and changes in Data. In straightforward words, Statistics can be utilized to get significant experiences from information by performing scientific calculations on it. A few Statistical capacities, standards and calculations are executed to break down crude information, fabricate a Statistical Model and construe or foresee the outcome. The motivation behind this is to give an extensive review of the fundamentals of statistics that you’ll need to start your data science journey.

統(tǒng)計(jì)數(shù)據(jù)用于處理現(xiàn)實(shí)中的復(fù)雜問(wèn)題，其目標(biāo)是數(shù)據(jù)科學(xué)家和分析師可以搜索數(shù)據(jù)的重要模式和變化。簡(jiǎn)而言之，可以通過(guò)對(duì)統(tǒng)計(jì)信息進(jìn)行科學(xué)計(jì)算，利用統(tǒng)計(jì)信息來(lái)獲得重要的經(jīng)驗(yàn)。執(zhí)行一些統(tǒng)計(jì)能力，標(biāo)準(zhǔn)和計(jì)算以分解原始信息，構(gòu)建統(tǒng)計(jì)模型并解釋或預(yù)見(jiàn)結(jié)果。其背后的動(dòng)機(jī)是對(duì)開(kāi)始進(jìn)行數(shù)據(jù)科學(xué)之旅所需的統(tǒng)計(jì)基礎(chǔ)知識(shí)進(jìn)行廣泛的回顧。

資料類(lèi)型 (Data Types)

Numerical:

數(shù)值 ：

Data communicated with digits; is quantifiable. It can either be discrete (limited number of qualities) or consistent (interminable number of qualities).

用數(shù)字傳達(dá)的數(shù)據(jù)；是可量化的。它可以是離散的(有限數(shù)量的質(zhì)量)或一致的(無(wú)限數(shù)量的質(zhì)量)。

Downright:

完全：

Qualitative data grouped into classes. It tends to be ostensible (no structure) or ordinal (requested data).

定性數(shù)據(jù)分為幾類(lèi)。它傾向于表面上的(無(wú)結(jié)構(gòu))或順序的(請(qǐng)求的數(shù)據(jù))。

集中趨勢(shì)測(cè)度 (Measures of Central Tendency)

Mean: The normal of a dataset.
平均值 ：數(shù)據(jù)集的法線。
Medium: The center of an arranged dataset; less defenseless to anomalies.
中：排列的數(shù)據(jù)集的中心；對(duì)異常情況缺乏防御力。
Mode: The most widely recognized incentive in a dataset; just significant for discrete information.
模式：數(shù)據(jù)集中最廣泛認(rèn)可的激勵(lì)；對(duì)于離散信息而言意義重大。

變異量度 (Measures of Variability)

Range: The distinction between the most elevated and least incentive in a dataset.
范圍：數(shù)據(jù)集中最高激勵(lì)和最低激勵(lì)之間的區(qū)別。
Variance (σ2): Apportions on how to spread a lot of data is comparative with the mean.
方差(σ2) ：關(guān)于如何分散大量數(shù)據(jù)的方式與均值比較。
Standard Deviation (σ): Another estimation of how to spread out numbers are in data collection; it is the square foundation of variance
標(biāo)準(zhǔn)偏差(σ) ：關(guān)于如何分散數(shù)字的另一種估計(jì)是在數(shù)據(jù)收集中。它是方差的平方根
Z-score: Decides the number of the standard deviations data point is from the mean.
Z分?jǐn)?shù) ：確定標(biāo)準(zhǔn)差數(shù)據(jù)點(diǎn)與平均值的數(shù)量。
R-Squared: A factual proportion of fit that demonstrates how much variety of a reliant variable is clarified by the free variable(s); just helpful for straightforward direct relapse.
R平方 ：擬合的實(shí)際比例，它表明自由變量闡明了多少依賴(lài)變量；有助于直接復(fù)發(fā)。
Balanced R-squared: A changed variant of r-squared that has been balanced for the number of indicators in the model; it increments if the new term improves the model more than would be normal by some coincidence and the other way around.
平衡的R平方 ： R平方的已更改變體，已經(jīng)針對(duì)模型中的指標(biāo)數(shù)量進(jìn)行了平衡；如果新術(shù)語(yǔ)對(duì)模型的改進(jìn)程度比正常情況好一些(反之亦然)，則它會(huì)增加。

變量之間關(guān)系的度量 (Measurement of Relationships between Variables)

Covariance: Measures the fluctuation between (at least two) factors. On the off chance that it's sure, at that point they will move in a similar way, in the event that it's negative, at that point they will in general move in inverse bearings, and on the off chance that they're zero, they have no connection to one another.
協(xié)方差 ：衡量(至少兩個(gè))因素之間的波動(dòng)。可以肯定的是，到那時(shí)它們將以類(lèi)似的方式運(yùn)動(dòng)，如果它為負(fù)，則通常它們將反向移動(dòng)，而當(dāng)它們?yōu)榱銜r(shí)，它們將以相反的方向運(yùn)動(dòng)。沒(méi)有任何聯(lián)系。
Correlation: Measures the quality of a connection between two factors and ranges from - 1 to 1; the standardized adaptation of covariance. By and large, a connection of +/ - 0.7 speaks to a solid connection between two factors. On the other side, connections between - 0.3 and 0.3 show that there is almost no connection between factors.
相關(guān) ：測(cè)量?jī)蓚€(gè)因素之間的連接質(zhì)量，范圍為-1到1；協(xié)方差的標(biāo)準(zhǔn)化適應(yīng)。總的來(lái)說(shuō)，+ /-0.7的連接表示兩個(gè)因素之間的牢固連接。另一方面，-0.3和0.3之間的聯(lián)系表明因素之間幾乎沒(méi)有聯(lián)系。

概率分布函數(shù) (Probability Distribution Functions)

Probability Density Function (PDF): A capacity for ceaseless data where the incentive anytime can be deciphered as giving a relative probability that the estimation of the irregular variable would rise to that example.
概率密度函數(shù)(PDF) ：一種不間斷數(shù)據(jù)的能力，在這種能力下，可以隨時(shí)將激勵(lì)解釋為給出不規(guī)則變量的估計(jì)將上升到該示例的相對(duì)概率。
Probability Mass Function (PMF): A capacity for discrete information that gives the likelihood of a given worth happening.
概率質(zhì)量函數(shù)(PMF) ：離散信息的能力，給出給定價(jià)值發(fā)生的可能性。
Cumulative Density Function (CDF): A capacity that reveals to us the probability that an irregular variable is not exactly a specific worth; the basis of the PDF.
累積密度函數(shù)(CDF) ：一種能力，向我們揭示不規(guī)則變量不完全是特定價(jià)值的可能性； PDF的基礎(chǔ)。

連續(xù)數(shù)據(jù)分配 (Continuous Data Distributions)

Uniform Distribution: Probability dissemination where all results are similarly likely.
均勻分布 ：概率分布，所有結(jié)果都有可能相似。
Normal/Gaussian Distribution: Regularly alluded to as the bell curve and is identified with central limit theorem; has a mean of 0 and a standard deviation of 1.
正態(tài)/高斯分布 ：通常被稱(chēng)為鐘形曲線，并通過(guò)中心極限定理進(jìn)行標(biāo)識(shí)；平均值為0，標(biāo)準(zhǔn)偏差為1。

T-Distribution: Probability dissemination used to evaluate populace parameters when the example size is little and/r when the populace change is obscure.

T分布 ：當(dāng)樣本量較小時(shí)和/或在人口變化不明顯時(shí)，用于評(píng)估人口參數(shù)的概率分布。

Chi-Square Distribution: Dissemination of the chi-square measurement.

卡方分布 ：傳播卡方測(cè)量。

離散數(shù)據(jù)分布 (Discrete Data Distributions)

Poisson Distribution: Probability dissemination that communicates the likelihood of a given number of occasions happening inside a fixed timeframe.
泊松分布 ：概率分布，用于傳達(dá)在固定時(shí)間范圍內(nèi)發(fā)生給定次數(shù)的情況的可能性。
Binomial Distribution: Probability dissemination of the number of achievements in a succession of n autonomous encounters each with its Boolean-esteemed result (p, 1-p).
二項(xiàng)式分布 ：概率分布 n次連續(xù)的自動(dòng)遭遇中每個(gè)成就的數(shù)量，每個(gè)自主遭遇都有布爾值估計(jì)的結(jié)果(p，1-p)。

片刻 (Moments)

Moments portray various parts of nature and state of circulation. The principal moment is the mean, the subsequent moment is the fluctuation, the third moment is the skewness, and the fourth moment is the kurtosis.

時(shí)刻刻畫(huà)了自然的各個(gè)部分和循環(huán)狀態(tài)。主力矩是均值，隨后力矩是波動(dòng)，第三力矩是偏度，第四力矩是峰度。

可能性 (Probability)

Conditional Probability [P(A|B)] is the probability of an occasion happening, in light of the event of a past occasion.

條件概率[P(A | B)]是根據(jù)過(guò)去的事件發(fā)生的情況的概率。

Independent Event whose result doesn't impact the likelihood of the result of another occasion; P(A|B) = P(A).

獨(dú)立事件，其結(jié)果不會(huì)影響其他情況下結(jié)果的可能性； P(A | B)= P(A)。

Mutually Exclusive events are events that can't happen at the same time; P(A|B) = 0.

互斥事件是不能同時(shí)發(fā)生的事件。 P(A | B)= 0。

Bayes' Theorem: A scientific recipe for deciding restrictive likelihood. "The probability of A given B is equal to the probability of B given A times the probability of A over the probability of B".

貝葉斯定理 ：決定限制性可能性的科學(xué)方法。 “ A給定B的概率等于B給定A的概率乘以A的概率對(duì)B的概率”。

準(zhǔn)確性 (Accuracy)

True positive: Identifies the condition when the condition is available.
真實(shí)肯定 ：在條件可用時(shí)標(biāo)識(shí)條件。
True negative: doesn't distinguish the condition when the condition is absent.
真否定 ：不存在條件時(shí)不區(qū)分條件。
False-positive: distinguishes the condition when the condition is missing.
假陽(yáng)性 ：缺少條件時(shí)區(qū)分條件。
False-negative: doesn't distinguish the condition when the condition is available.
假陰性 ：在條件可用時(shí)不區(qū)分條件。
Sensitivity: otherwise called recall; quantifies the capacity of a test to distinguish the condition when the condition is available; sensitivity = TP/(TP+FN)
敏感性 ：否則稱(chēng)為召回；在條件可用時(shí)量化測(cè)試區(qū)分條件的能力；靈敏度= TP /(TP + FN)
Specificity: quantifies the capacity of a test to accurately reject the condition when the condition is missing; Specificity = TN/(TN+FP)
特異性 ：量化測(cè)試在條件缺失時(shí)準(zhǔn)確拒絕條件的能力；特異性= TN /(TN + FP)
Predictive value positive: otherwise called precision; the extent of positives that compare to the nearness of the condition; PVP = TP/(TP+FP)
正預(yù)測(cè)值 ：否則稱(chēng)為精度；與條件的接近程度相比，陽(yáng)性的程度； PVP = TP /(TP + FP)
Predictive value negative: the extent of negatives that compare to the nonattendance of the condition; PVN = TN/(TN+FN)
預(yù)測(cè)值負(fù)數(shù) ：與條件的無(wú)人值守相比較的負(fù)數(shù)范圍； PVN = TN /(TN + FN)

假設(shè)檢驗(yàn)及其統(tǒng)計(jì)意義 (Hypothesis Testing and Statistical Significance)

Null Hypothesis: The speculation that example perceptions result absolutely from possibility.
零假設(shè)(Null Hypothesis) ： 假設(shè)感知完全是由可能性引起的。
Alternative Hypothesis: The theory that example perceptions are affected by some non-irregular reason.
替代假設(shè) ：理論感知受一些非常規(guī)原因影響的理論。
P-value: the likelihood of acquiring the watched aftereffects of a test, accepting that the invalid speculation is right; a littler p-value implies that there is more grounded proof for the elective theory.
P值：接受無(wú)效推測(cè)是正確的，獲得測(cè)試的觀察到的后效應(yīng)的可能性；較小的p值表示選修理論有更多扎實(shí)的證據(jù)。
Alpha: The essentialness level; the probability of dismissing the invalid theory when it is valid — otherwise called Type 1 error.
Alpha ：必要性級(jí)別；無(wú)效理論成立時(shí)被駁回的可能性-否則稱(chēng)為1類(lèi)錯(cuò)誤。
Beta: type 2 mistake; neglecting to dismiss the false null hypothesis.
Beta ：類(lèi)型2錯(cuò)誤；忽略了錯(cuò)誤的虛假假設(shè)。

假設(shè)檢驗(yàn)的步驟 (Steps to Hypothesis Testing)

Express the invalid and elective theory

表達(dá)無(wú)效選修理論

Decide the test size; is it a couple or two-tailed test?

確定測(cè)試大小；是幾尾還是兩尾測(cè)試？

注冊(cè)測(cè)試測(cè)量值和似然值

Dissect the outcomes and either dismiss or don't dismiss the invalid speculation

剖析結(jié)果，或者駁斥或不駁斥無(wú)效的推測(cè)

翻譯自: https://www.includehelp.com/data-science/statistics.aspx