UA MATH567 高维统计 专题0 为什么需要高维统计理论?——以线性判别分析为例
UA MATH567 高維統(tǒng)計(jì) 專題0 為什么需要高維統(tǒng)計(jì)理論?——以線性判別分析為例
- 線性判別分析基礎(chǔ)
- 理論
- 算法
線性判別分析基礎(chǔ)
理論
我們回顧一下二元假設(shè)檢驗(yàn)問題,它的目標(biāo)是判斷某一個(gè)observation x∈Rdx \in \mathbb{R}^dx∈Rd到底屬于總體P1P_1P1?還是P2P_2P2?,在統(tǒng)計(jì)理論中,Neyman-Pearson引理說明了似然比檢驗(yàn)是最優(yōu)檢驗(yàn),也就是基于log?P2(x)P1(x)\log \frac{P_2(x)}{P_1(x)}logP1?(x)P2?(x)?導(dǎo)出的檢驗(yàn)統(tǒng)計(jì)量與拒絕域是最優(yōu)的。現(xiàn)在我們考慮線性判別分析的設(shè)定,假設(shè)兩個(gè)總體分別是N(μ1,Σ),N(μ2,Σ)N(\mu_1,\Sigma),N(\mu_2,\Sigma)N(μ1?,Σ),N(μ2?,Σ),則給定某個(gè)observation x∈Rdx \in \mathbb{R}^dx∈Rd,對(duì)數(shù)似然比為(多元正態(tài)分布的概率密度參考我之前這一篇)
log?P1(x)P2(x)=log?(2π)?d/2∣Σ∣?1/2exp?(?12(x?μ1)′Σ?1(x?μ1))(2π)?d/2∣Σ∣?1/2exp?(?12(x?μ2)′Σ?1(x?μ2))=12(x?μ2)′Σ?1(x?μ2)?12(x?μ1)′Σ?1(x?μ1)=12(x?μ1+(μ1?μ2))′Σ?1(x?μ1+μ22+μ1+μ22?μ2)?12(x?μ1)′Σ?1(x?μ1+μ22+μ1+μ22?μ1)=12(x?μ1)′Σ?1(x?μ1+μ22)+12(μ1?μ2)′Σ?1(x?μ1+μ22)+14(x?μ1)′Σ?1(μ1?μ2)+14(μ1?μ2)′Σ?1(μ1?μ2)?12(x?μ1)′Σ?1(x?μ1+μ22)?14(x?μ1)′Σ?1(μ1?μ2)=12(μ1?μ2)′Σ?1(x?μ1+μ22)+14(μ1?μ2)′Σ?1(μ1?μ2)∝Ψ(x)=(μ1?μ2)′Σ?1(x?μ1+μ22)\log \frac{P_1(x)}{P_2(x)}=\log \frac{(2\pi)^{-d/2}|\Sigma|^{-1/2}\exp \left( -\frac{1}{2}(x-\mu_1)'\Sigma^{-1}(x-\mu_1) \right)}{(2\pi)^{-d/2}|\Sigma|^{-1/2}\exp \left( -\frac{1}{2}(x-\mu_2)'\Sigma^{-1}(x-\mu_2) \right)} \\ = \frac{1}{2}(x-\mu_2)'\Sigma^{-1}(x-\mu_2) -\frac{1}{2}(x-\mu_1)'\Sigma^{-1}(x-\mu_1) \\ = \frac{1}{2}(x-\mu_1+(\mu_1-\mu_2))'\Sigma^{-1}(x-\frac{\mu_1+\mu_2}{2}+\frac{\mu_1+\mu_2}{2}-\mu_2) \\-\frac{1}{2}(x-\mu_1)'\Sigma^{-1}(x-\frac{\mu_1+\mu_2}{2}+\frac{\mu_1+\mu_2}{2}-\mu_1) \\ = \frac{1}{2}(x-\mu_1)'\Sigma^{-1}(x-\frac{\mu_1+\mu_2}{2})+\frac{1}{2}(\mu_1-\mu_2)'\Sigma^{-1}(x-\frac{\mu_1+\mu_2}{2}) \\ +\frac{1}{4}(x-\mu_1)'\Sigma^{-1}(\mu_1-\mu_2)+\frac{1}{4}(\mu_1-\mu_2)'\Sigma^{-1}(\mu_1-\mu_2) \\ - \frac{1}{2}(x-\mu_1)'\Sigma^{-1}(x-\frac{\mu_1+\mu_2}{2})-\frac{1}{4}(x-\mu_1)'\Sigma^{-1}(\mu_1-\mu_2) \\ = \frac{1}{2}(\mu_1-\mu_2)'\Sigma^{-1}(x-\frac{\mu_1+\mu_2}{2})+\frac{1}{4}(\mu_1-\mu_2)'\Sigma^{-1}(\mu_1-\mu_2) \\ \propto \Psi(x)=(\mu_1-\mu_2)'\Sigma^{-1}(x-\frac{\mu_1+\mu_2}{2})logP2?(x)P1?(x)?=log(2π)?d/2∣Σ∣?1/2exp(?21?(x?μ2?)′Σ?1(x?μ2?))(2π)?d/2∣Σ∣?1/2exp(?21?(x?μ1?)′Σ?1(x?μ1?))?=21?(x?μ2?)′Σ?1(x?μ2?)?21?(x?μ1?)′Σ?1(x?μ1?)=21?(x?μ1?+(μ1??μ2?))′Σ?1(x?2μ1?+μ2??+2μ1?+μ2???μ2?)?21?(x?μ1?)′Σ?1(x?2μ1?+μ2??+2μ1?+μ2???μ1?)=21?(x?μ1?)′Σ?1(x?2μ1?+μ2??)+21?(μ1??μ2?)′Σ?1(x?2μ1?+μ2??)+41?(x?μ1?)′Σ?1(μ1??μ2?)+41?(μ1??μ2?)′Σ?1(μ1??μ2?)?21?(x?μ1?)′Σ?1(x?2μ1?+μ2??)?41?(x?μ1?)′Σ?1(μ1??μ2?)=21?(μ1??μ2?)′Σ?1(x?2μ1?+μ2??)+41?(μ1??μ2?)′Σ?1(μ1??μ2?)∝Ψ(x)=(μ1??μ2?)′Σ?1(x?2μ1?+μ2??)
如果Ψ(x)>0\Psi(x)>0Ψ(x)>0,我們認(rèn)為xxx是來自P1P_1P1?的樣本;如果Ψ(x)≤0\Psi(x) \le 0Ψ(x)≤0,我們認(rèn)為xxx是來自P2P_2P2?的樣本;所以判別分析出錯(cuò)的概率是
Err(Ψ)=12P1[Ψ(x)≤0]+12P2[Ψ(x)>0]Err(\Psi)=\frac{1}{2}P_1[\Psi(x)\le 0]+\frac{1}{2}P_2[\Psi(x)>0]Err(Ψ)=21?P1?[Ψ(x)≤0]+21?P2?[Ψ(x)>0]
定義
γ=(μ1?μ2)′Σ?1(μ1?μ2)\gamma = \sqrt{(\mu_1-\mu_2)'\Sigma^{-1}(\mu_1-\mu_2)}γ=(μ1??μ2?)′Σ?1(μ1??μ2?)?
先計(jì)算第一個(gè)概率,如果x~N(μ1,Σ)x \sim N(\mu_1,\Sigma)x~N(μ1?,Σ),則
EΨ(x)=(μ1?μ2)′Σ?1(μ1?μ1+μ22)=γ22Var(Ψ(x))=(μ1?μ2)′Σ?1ΣΣ?1(μ1?μ2)=γ2E\Psi(x)=(\mu_1-\mu_2)'\Sigma^{-1}(\mu_1-\frac{\mu_1+\mu_2}{2})=\frac{\gamma^2}{2} \\ Var(\Psi(x))=(\mu_1-\mu_2)'\Sigma^{-1} \Sigma \Sigma^{-1}(\mu_1-\mu_2)=\gamma^2EΨ(x)=(μ1??μ2?)′Σ?1(μ1??2μ1?+μ2??)=2γ2?Var(Ψ(x))=(μ1??μ2?)′Σ?1ΣΣ?1(μ1??μ2?)=γ2
所以Ψ(x)∣P1~N(γ/2,γ)\Psi(x)|_{P_1} \sim N(\gamma/2,\gamma)Ψ(x)∣P1??~N(γ/2,γ),于是
P1[Ψ(x)≤0]=P1(Ψ(x)?γ22γ≤?γ2)=Φ(?γ/2)P_1[\Psi(x)\le 0]=P_1(\frac{\Psi(x)-\frac{\gamma^2}{2}}{\gamma} \le -\frac{\gamma}{2})=\Phi(-\gamma/2)P1?[Ψ(x)≤0]=P1?(γΨ(x)?2γ2??≤?2γ?)=Φ(?γ/2)
接下來計(jì)算第二個(gè)概率,如果x~N(μ2,Σ)x \sim N(\mu_2,\Sigma)x~N(μ2?,Σ),則
EΨ(x)=(μ1?μ2)′Σ?1(μ2?μ1+μ22)=?γ22Var(Ψ(x))=(μ1?μ2)′Σ?1ΣΣ?1(μ1?μ2)=γ2E\Psi(x)=(\mu_1-\mu_2)'\Sigma^{-1}(\mu_2-\frac{\mu_1+\mu_2}{2})=-\frac{\gamma^2}{2} \\ Var(\Psi(x))=(\mu_1-\mu_2)'\Sigma^{-1} \Sigma \Sigma^{-1}(\mu_1-\mu_2)=\gamma^2EΨ(x)=(μ1??μ2?)′Σ?1(μ2??2μ1?+μ2??)=?2γ2?Var(Ψ(x))=(μ1??μ2?)′Σ?1ΣΣ?1(μ1??μ2?)=γ2
所以Ψ(x)∣P2~N(γ/2,γ)\Psi(x)|_{P_2} \sim N(\gamma/2,\gamma)Ψ(x)∣P2??~N(γ/2,γ),于是
P2[Ψ(x)≤0]=P2(Ψ(x)+γ22γ>γ2)=Φ(?γ/2)P_2[\Psi(x)\le 0]=P_2(\frac{\Psi(x)+\frac{\gamma^2}{2}}{\gamma} > \frac{\gamma}{2})=\Phi(-\gamma/2)P2?[Ψ(x)≤0]=P2?(γΨ(x)+2γ2??>2γ?)=Φ(?γ/2)
綜上,
Err(Ψ)=Φ(?γ/2)Err(\Psi)=\Phi(-\gamma/2)Err(Ψ)=Φ(?γ/2)
算法
我們看一下理論部分的判別規(guī)則:如果Ψ(x)>0\Psi(x)>0Ψ(x)>0,我們認(rèn)為xxx是來自P1P_1P1?的樣本;如果Ψ(x)≤0\Psi(x) \le 0Ψ(x)≤0,我們認(rèn)為xxx是來自P2P_2P2?的樣本;要使用這個(gè)規(guī)則進(jìn)行判別,我們需要計(jì)算出
Ψ(x)=(μ1?μ2)′Σ?1(x?μ1+μ22)\Psi(x)=(\mu_1-\mu_2)'\Sigma^{-1}(x-\frac{\mu_1+\mu_2}{2})Ψ(x)=(μ1??μ2?)′Σ?1(x?2μ1?+μ2??)
為此,我們需要樣本均值與協(xié)方差矩陣的估計(jì)。假設(shè)我們有n1+n2n_1+n_2n1?+n2?個(gè)樣本,前n1n_1n1?個(gè)來自總體P1P_1P1?,后n2n_2n2?個(gè)來自總體P2P_2P2?,引入樣本均值
μ^1=1n1∑i=1n1xi,μ^2=1n2∑i=n1+1n1+n2xi\hat \mu_1 = \frac{1}{n_1}\sum_{i=1}^{n_1} x_i,\hat \mu_2 = \frac{1}{n_2} \sum_{i=n_1+1}^{n_1+n_2} x_iμ^?1?=n1?1?i=1∑n1??xi?,μ^?2?=n2?1?i=n1?+1∑n1?+n2??xi?
Pooled sample covariance matrix,
Σ^=∑i=1n1(xi?μ^1)(xi?μ^1)T+∑i=1n2(xi?μ^2)(xi?μ^2)Tn1+n2?2\hat \Sigma =\frac{\sum_{i=1}^{n_1}(x_i-\hat \mu_1)(x_i - \hat \mu_1)^T+\sum_{i=1}^{n_2}(x_i-\hat \mu_2)(x_i - \hat \mu_2)^T}{n_1+n_2-2}Σ^=n1?+n2??2∑i=1n1??(xi??μ^?1?)(xi??μ^?1?)T+∑i=1n2??(xi??μ^?2?)(xi??μ^?2?)T?
把這三個(gè)估計(jì)量代入Ψ(x)\Psi(x)Ψ(x),我們就可以得到Fisher線性判別函數(shù)(Fisher linear determinant function):
Ψ^(x)=(μ^1?μ^2)′Σ^?1(x?μ^1+μ^22)\hat{\Psi}(x)=(\hat \mu_1- \hat \mu_2)' \hat \Sigma^{-1}(x-\frac{\hat \mu_1+ \hat \mu_2}{2})Ψ^(x)=(μ^?1??μ^?2?)′Σ^?1(x?2μ^?1?+μ^?2??)
基于Fisher線性判別函數(shù)的判別錯(cuò)誤概率為
Err(Ψ^)=12P1[Ψ^(x)≤0]+12P2[Ψ^(x)>0]Err(\hat \Psi)=\frac{1}{2}P_1[\hat \Psi(x)\le 0]+\frac{1}{2}P_2[\hat \Psi(x)>0]Err(Ψ^)=21?P1?[Ψ^(x)≤0]+21?P2?[Ψ^(x)>0]
一個(gè)重要的問題是,基于Fisher線性判別函數(shù)的判別錯(cuò)誤概率能不能等于判別分析出錯(cuò)的理論概率,或者說只比判別分析出錯(cuò)的理論概率大一點(diǎn)點(diǎn)?概統(tǒng)祖師爺Kolmogorov分析過這個(gè)問題,如果d/ni→α,?i=1,2d/n_i \to \alpha,\forall i=1,2d/ni?→α,?i=1,2,∥μ^1?μ^2∥2→pγ\left\| \hat \mu_1- \hat \mu_2 \right\|_2\to_p \gamma∥μ^?1??μ^?2?∥2?→p?γ,協(xié)方差矩陣為單位陣,則
Err(Ψ^)→pΦ(?γ22γ2+α2)Err(\hat \Psi) \to_p \Phi(-\frac{\gamma^2}{2\sqrt{\gamma^2+\alpha^2}})Err(Ψ^)→p?Φ(?2γ2+α2?γ2?)
我們來簡(jiǎn)單看一下下面的實(shí)證結(jié)果:
左圖:Mean shrift越大,說明這兩個(gè)總體分得越開,越不容易判別出錯(cuò);實(shí)驗(yàn)結(jié)果略大于理論結(jié)果,所以還是比較可信的;
右圖:α\alphaα越大說明d/nid/n_id/ni?越大,也就是這個(gè)問題的維度越高,這時(shí)實(shí)驗(yàn)的結(jié)果就會(huì)越差,但經(jīng)典理論的結(jié)果是不變,因此我們需要建立新的理論來解釋高維統(tǒng)計(jì)問題。
總結(jié)
以上是生活随笔為你收集整理的UA MATH567 高维统计 专题0 为什么需要高维统计理论?——以线性判别分析为例的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: UA MATH567 高维统计 专题1
- 下一篇: R语言数据可视化 ggplot2基础3