无偏估计与自由度
不記得當初是怎么學(xué)概率論和數(shù)理統(tǒng)計的了。最近總是遇到一個小問題,想不通為什么樣本方差的無偏估計量是要除以N-1的。
上Wiki找了一下,
Estimating variance
Suppose?X1, ...,?Xn?are independent and identically?distributed?random variables?with expectation μ and variance σ? Let
be the "sample average", and let
be a "sample variance". Then?S?is a "biased estimator" of σ?because
Note that when a transformation?is applied to an unbiased estimator, the result is not necessarily itself an unbiased estimate of its corresponding population statistic. That is, for a?non-linear?functionf?and an unbiased estimator?U?of a parameter?p,?f(U) is usually not an unbiased estimator of?f(p). For example the?square root?of the unbiased estimator of the population?variance?is not an unbiased estimator of the population?standard deviation.
Bias is not the only consideration when choosing a statistic, however. Bias refers to the central tendency of the sampling distribution of a statistic, but the variance of the sampling distribution can also be an important consideration. Specifically, statistics with smaller sampling variances will yield greater?statistical power. For example, while?S?above is more biased than the traditional sample calculation
S?has a lower estimation variability than?S?sub>sample?because the denominator dividing the sum of squares is larger in the calculation of?S? resulting in a smaller scale of final values, and therefore lower estimation variability, than that of?S?sub>sample. Practically, this demonstrates that for some applications (where the amount of bias can be equated between groups/conditions) it is possible that a biased estimator can prove to be a more powerful, and therefore useful, statistic.
自由度(degree of freedom, df)是指當以樣本的統(tǒng)計量來估計總體的參數(shù)時,樣本中獨立或能自由變化的數(shù)據(jù)的個數(shù)稱為該統(tǒng)計量的自由度。
例如,在估計總體的平均數(shù)時,樣本中的n個數(shù)全部加起來,其中任何一個數(shù)都和其他數(shù)據(jù)相獨立,從其中抽出任何一個數(shù)都不影響其他數(shù)據(jù)(這也是隨機抽樣所要求的)。因此一組數(shù)據(jù)中每一個數(shù)據(jù)都是獨立的,所以自由度就是估計總體參數(shù)時獨立數(shù)據(jù)的數(shù)目,而平均數(shù)是根據(jù)n個獨立數(shù)據(jù)來估計的,因此自由度為n。
但是為什么用樣本估計總體的方差時,方差的自由度就是(n-1)?
s2= ?(X-m)2/n
從此公式我們可以看出總體的方差是由各數(shù)據(jù)與總體平均數(shù)的差值求出來的,因此必須將m固定后才可以求總體的方差。因此,由于m被固定,它就不能獨立自由變化,也就是方差受到總體平均數(shù)的限制,少了一個自由變化的機會,因此要從n里減掉一個。
假設(shè)一個樣本有兩個數(shù)值,X1=10,X2=20,我們現(xiàn)在要用這個樣本估計總體的方差,則樣本的平均數(shù)是:
Xm=? X/n=(10+20)/2=15
現(xiàn)在假設(shè)我們已知Xm=15,X1=10,根據(jù)公式Xm=? X/n,則有:
X2=2Xm-X1=2×15-10=20
由此我們可以知道在有兩個數(shù)據(jù)樣本中,當平均數(shù)的值和其中一個數(shù)據(jù)的值已知時,另一個數(shù)據(jù)的值就不能自由變化了,因此這個樣本的自由度就減少一個,變成了(n-1)。依此類推:在一組數(shù)據(jù)中,當其平均數(shù)和前面的數(shù)據(jù)都已知時,最后一個數(shù)據(jù)就被固定而不能獨立變化了,因此這個樣本能夠獨立自由變化的數(shù)目就是(n-1)個.
總結(jié)
- 上一篇: 求方差时为什么要除以N—1,而不是除以N
- 下一篇: 如何解决某个端口被谁占用?