统计学习方法第四章课后习题(转载+重新排版+自己解读)
4.1 用極大似然估計法推導樸素貝葉斯法中的先驗概率估計公式(4.8)和條件概率估計公式(4.9)
首先是(4.8)
P(Y=ck)=∑i=1NI(yi=ck)NP({Y=c_k})=\frac {\sum_{i=1}^NI(y_i=c_k)} {N} P(Y=ck?)=N∑i=1N?I(yi?=ck?)?
###################下面開始證明###############################
下面的ajla_{jl}ajl?表示的是第j個特征可能取的第lll個值
xi(j)x_i^{(j)}xi(j)?指的是第j個樣本的第j個特征
P(x(j)=ajl∣Y=ck)P(x^{(j)}=a_{jl}|Y=c_k)P(x(j)=ajl?∣Y=ck?)
=∑1=1N(xi(j)=ajl,yi=ck)∑i=1NI(Y=ck)= \frac {\sum_{1=1}^N(x_i^{(j)}=a_{jl},y_i=c_k)} {\sum_{i=1}^NI(Y=c_k)}=∑i=1N?I(Y=ck?)∑1=1N?(xi(j)?=ajl?,yi?=ck?)?
設p=P(Y=ck)p=P(Y=c_k)p=P(Y=ck?)
相當于從樣本中獨立同分布地隨機抽取N個樣本,每個樣本的結果為yiy_iyi?
似然概率
P(y1,y2,...,yn)P(y_1,y_2,...,y_n)P(y1?,y2?,...,yn?)
=p∑i=1N?I(yi=ck)?(1?p)∑i=1NI(yi≠ck)=p^{ \sum_{i=1}^N·I(y_i=c_k) }· (1-p)^{\sum_{i=1}^NI(y_i≠c_k)}=p∑i=1N??I(yi?=ck?)?(1?p)∑i=1N?I(yi???=ck?)
然后求解最大似然概率:
dP(y1,y2,...,yn)dp\frac {dP(y_1,y_2,...,y_n)} {dp}dpdP(y1?,y2?,...,yn?)?
=∑i=1NI(yi=ck)p∑i=1NI(yi=ck)?1?(1?p)∑i=1NI(yi≠ck)= \sum_{i=1}^NI(y_i=c_k)p^{\sum_{i=1}^NI(y_i=c_k)-1}·(1-p)^{\sum_{i=1}^NI(y_i≠c_k)} =i=1∑N?I(yi?=ck?)p∑i=1N?I(yi?=ck?)?1?(1?p)∑i=1N?I(yi???=ck?)
?∑i=1NI(yi≠ck)(1?p)∑i=1NI(yi≠ck)?1?p∑i=1NI(yi=ck)-\sum_{i=1}^NI(y_i≠c_k)(1-p)^{\sum_{i=1}^NI(y_i≠c_k)-1}·p^{\sum_{i=1}^NI(y_i=c_k) }?i=1∑N?I(yi???=ck?)(1?p)∑i=1N?I(yi???=ck?)?1?p∑i=1N?I(yi?=ck?)
=p[∑i=1NI(yi=ck)]?1?(1?p)[∑i=1NI(yi≠ck)]?1= p^{[\sum_{i=1}^NI(y_i=c_k)]-1}·(1-p)^{[\sum_{i=1}^NI(y_i≠c_k)]-1} =p[∑i=1N?I(yi?=ck?)]?1?(1?p)[∑i=1N?I(yi???=ck?)]?1
?[(1?p)∑i=1NI(yi=ck)?p∑i=1NI(yi≠ck)]=0·[(1-p)\sum_{i=1}^NI(y_i=c_k)-p\sum_{i=1}^NI(y_i≠c_k)]=0?[(1?p)i=1∑N?I(yi?=ck?)?pi=1∑N?I(yi???=ck?)]=0
∴[(1?p)∑i=1NI(yi=ck)?p∑i=1NI(yi≠ck)]=0∴[(1-p)\sum_{i=1}^NI(y_i=c_k)-p\sum_{i=1}^NI(y_i≠c_k)]=0∴[(1?p)i=1∑N?I(yi?=ck?)?pi=1∑N?I(yi???=ck?)]=0
又∵ΣiNI(yi=ck)=p(∑i=1NI(yi=ck)+∑i=1NI(yi≠ck))=pN又∵Σ_i^NI(y_i=c_k)=p(\sum_{i=1}^NI(y_i=c_k)+\sum_{i=1}^NI(y_i≠c_k))=pN又∵ΣiN?I(yi?=ck?)=p(i=1∑N?I(yi?=ck?)+i=1∑N?I(yi???=ck?))=pN
∴p=P(Y=ck)=∑i=1NI(yi=ck)N①∴p=P(Y=c_k)=\frac {\sum_{i=1}^NI(y_i=c_k)} {N}① ∴p=P(Y=ck?)=N∑i=1N?I(yi?=ck?)?①
(4.8)證明結束
###############################################
接下來證明(4.9)
P(X(j)=ajl∣Y=ck)P(X^{(j)}=a_{jl}|Y=c_k)P(X(j)=ajl?∣Y=ck?)
=∑i=1NI(xi(j)=ajl,yi=ck)∑i=1NI(yi=ck)=\frac {\sum_{i=1}^NI(x_i^{(j)}=a_{jl},y_i=c_k)}{\sum_{i=1}^NI(y_i=c_k)}=∑i=1N?I(yi?=ck?)∑i=1N?I(xi(j)?=ajl?,yi?=ck?)?
j∈[1,n]j∈[1,n]j∈[1,n]
l∈[1,Sj]l∈[1,S_j]l∈[1,Sj?]
k∈[1,K]k∈[1,K]k∈[1,K]
##############下面開始證明#########
P(Y=ck,x(j)=aij)P(Y=c_k,x^{(j)}=a_{ij})P(Y=ck?,x(j)=aij?)
=∑i=1NI(yi=ck,xi(j)=ajl)N②= \frac { \sum_{i=1}^NI(y_i=c_k,x_i^{(j)}=a_{jl}) } {N}②=N∑i=1N?I(yi?=ck?,xi(j)?=ajl?)?②
而需要證明的式子的左邊是:
P(x(j)=ajl∣Y=ck)P({x^{(j)}=a_{jl}|Y=c_k})P(x(j)=ajl?∣Y=ck?)
=P(Y=ck,x(j)=ajl)P(Y=ck)③=\frac {P(Y=c_k,x^{(j)}=a_{jl})} {P(Y=c_k)}③=P(Y=ck?)P(Y=ck?,x(j)=ajl?)?③
接下來,
把①代入③的分母,
把②代入③的分子。
得到:
P(x(j)=ajl∣Y=ck)=[∑i=1NI(yi=ck,xi(j)=ajl)N][∑j=1NI(yi=ck)N]P({x^{(j)}=a_{jl}|Y=c_k})= \frac {[\frac{\sum_{i=1}^NI(y_i=c_k,x_i^{(j)}=a_{jl})}{N}]}{[\frac{\sum_{j=1}^NI(y_i=c_k)}{N}]} P(x(j)=ajl?∣Y=ck?)=[N∑j=1N?I(yi?=ck?)?][N∑i=1N?I(yi?=ck?,xi(j)?=ajl?)?]?
=∑i=1NI(yi=ck,xi(j)=ajl)∑i=1NI(yi=ck)=\frac {\sum_{i=1}^NI(y_i=c_k,x_i^{(j)}=a_{jl})} {\sum_{i=1}^NI(y_i=c_k)}=∑i=1N?I(yi?=ck?)∑i=1N?I(yi?=ck?,xi(j)?=ajl?)?
(4.9)證明結束
#######下面開始證明(4.11)########################
假設先驗概率為均勻概率,那么有:
p=1k=>pK?1=0(1)p= \frac {1}{k} =>pK-1=0(1)p=k1?=>pK?1=0(1)
另外根據①,也就是式(4.8)有以下關系:
pN?∑i=1NI(yi=ck)=0(2)pN-\sum_{i=1}^NI(y_i=c_k)=0(2)pN?i=1∑N?I(yi?=ck?)=0(2)
注意:嚴格來講,上面(1)(2)中p,并不是同一個p
(1)中的p指的是樣本分布絕對均勻的情況
(2)中的p是根據實際樣本分布得到的數值
(1)?λ+(2)=0(1)·λ+(2)=0(1)?λ+(2)=0
∴
λ(pK?1)+pN?∑i=1NI(yi=ck)=0λ(pK-1)+pN-\sum_{i=1}^NI(y_i=c_k)=0λ(pK?1)+pN?i=1∑N?I(yi?=ck?)=0
P(Y=ck)=λ+∑i=1NI(yi=ck)λK+NP(Y=c_k) =\frac {λ+\sum_{i=1}^NI(y_i=c_k)} {λK+N}P(Y=ck?)=λK+Nλ+∑i=1N?I(yi?=ck?)?
(4.11)證明完畢
###############################
#######下面開始證明(4.10)########
根據(4.9)已知極大似然估計為:
p=P(X(j)=ajl∣Y=ck)=∑i=1NI(xi(j)=ajl,yi=ck)∑i=1NI(yi=ck)p=P(X^{(j)}=a_{jl}|Y=c_k) =\frac {\sum_{i=1}^NI(x_i^{(j)}=a_{jl},y_i=c_k)} {\sum_{i=1}^NI(y_i=c_k)} p=P(X(j)=ajl?∣Y=ck?)=∑i=1N?I(yi?=ck?)∑i=1N?I(xi(j)?=ajl?,yi?=ck?)?
=>=>=>p∑i=1NI(yi=ck)?∑i=1NI(xi(j)=ajl,yi=ck)=0p{\sum_{i=1}^NI(y_i=c_k)}-{\sum_{i=1}^NI(x_i^{(j)}=a_{jl},y_i=c_k)}=0p∑i=1N?I(yi?=ck?)?∑i=1N?I(xi(j)?=ajl?,yi?=ck?)=0(3)
可以看到(4.10)與(4.9)十分相似,
但是(4.10)比(4.9)的分子和分母多了平滑項。
我們引入一個平滑條件:
當Y=ckY=c_kY=ck?時,因為一個屬性會有SjS_jSj?種取值,我們假設任意屬性的各個取值的對應的樣本數量是一致的。
那么就有以下關系:
p=P(X(j)=ajl∣Y=ck)=1Sjp=P(X^{(j)}=a_{jl}|Y=c_k)=\frac{1}{S_j}p=P(X(j)=ajl?∣Y=ck?)=Sj?1?
=>=>=>p?Sj?1=0(4)p·S_j-1=0(4)p?Sj??1=0(4)
=>=>=>(3)+λ(4)=0(3)+λ(4)=0(3)+λ(4)=0
=>=>=>
(3)+λ(4)=p[∑i=1NI(yi=ck)+Sj?λ]?λ?∑i=1NI(xi(j)=ajl,yi=ck)=0(3)+λ(4)=p[{\sum_{i=1}^NI(y_i=c_k)}+S_j·λ]-λ-{\sum_{i=1}^NI(x_i^{(j)}=a_{jl},y_i=c_k)}=0(3)+λ(4)=p[∑i=1N?I(yi?=ck?)+Sj??λ]?λ?∑i=1N?I(xi(j)?=ajl?,yi?=ck?)=0
=>=>=>
p=∑i=1NI(xi(j)=ajl,yi=ck)+λ∑i=1NI(yi=ck)+Sj?λp=\frac {\sum_{i=1}^NI(x_i^{(j)}=a_{jl},y_i=c_k)+λ} {\sum_{i=1}^NI(y_i=c_k)+S_j·λ}p=∑i=1N?I(yi?=ck?)+Sj??λ∑i=1N?I(xi(j)?=ajl?,yi?=ck?)+λ?
(4.10)證明完畢
總結
以上是生活随笔為你收集整理的统计学习方法第四章课后习题(转载+重新排版+自己解读)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 统计学习方法-第二章课后习题答案整理
- 下一篇: SVM入门(八)松弛变量(转)