Logistics Regression公式推导
以前一直以為邏輯回歸的公式(sigmoid函數)是人為臆造的,學了機器學習課程才知道,它的背后是有貝葉斯定理這樣的數學理論支撐的。
為什么是Sigmoid函數?
回顧貝葉斯分類器,貝葉斯分類器是一種生成式學習方。為了獲取P(Y∣X)P(Y|X)P(Y∣X),我們將它轉化為P(Y)P(Y)P(Y)和P(X|Y),然后從數據集中估計這兩個參數。那么問題來了,我們是否可以直接估計P(Y∣X)P(Y|X)P(Y∣X)呢?
在邏輯回歸模型中,我們做如下假設:
- 設XXX是一個實數值矩陣,表示nnn個特征,<X1,X2,…,Xn><X_1,X_2, \dots, X_n><X1?,X2?,…,Xn?>
- 設YYY是一個布爾值矩陣
- 假設在給定YYY后所有的XiX_iXi?之間都是條件獨立的,即
P(X∣Y)=P(X1,X2,…,Xn∣Y)=P(X1∣Y)P(X2∣Y)…P(Xn∣Y)P(X|Y)\\ =P(X_1,X_2, \dots ,X_n|Y)\\ =P(X_1|Y)P(X_2|Y) \dots P(X_n|Y)P(X∣Y)=P(X1?,X2?,…,Xn?∣Y)=P(X1?∣Y)P(X2?∣Y)…P(Xn?∣Y) - 假設在給定Y=ykY=y_kY=yk?后,每一個XiX_iXi?都服從高斯分布N(μik,σi)N(\mu_{ik}, \sigma_i)N(μik?,σi?),即
(Xi∣Y=yk)~N(μik,σi)(X_i|Y=y_k) \sim N(\mu_{ik},\sigma_i)(Xi?∣Y=yk?)~N(μik?,σi?)
P(Xi∣Y=0)=12πσiexp?{?(Xi?μi0)22σi2}P \left( X_i|Y=0 \right) =\frac{1}{\sqrt{2\pi}\sigma _i}\exp \left\{ -\frac{\left( X_i-\mu _{i0} \right) ^2}{2\sigma _{i}^{2}} \right\}P(Xi?∣Y=0)=2π?σi?1?exp{?2σi2?(Xi??μi0?)2?} - 假設類別YYY的先驗服從伯努利分布,即
P(Y=1)=π,P(Y=0)=1?πP(Y=1)=\pi,P(Y=0)=1-\piP(Y=1)=π,P(Y=0)=1?π
然后我們就可以開始推公式了。
P(Y=1∣X)=P(X∣Y=1)P(Y=1)P(X)=P(X∣Y=1)P(Y=1)P(X∣Y=1)P(Y=1)+P(X∣Y=0)P(Y=0)=11+P(X∣Y=0)P(Y=0)P(X∣Y=1)P(Y=1)=11+1?ππ?P(X∣Y=0)P(X∣Y=1)=11+1?ππ?∏iP(Xi∣Y=0)∏iP(Xi∣Y=1)=11+exp?{ln?(1?ππ?∏iP(Xi∣Y=0)∏iP(Xi∣Y=1))}=11+exp?{ln?(1?ππ)+ln?(∏iP(Xi∣Y=0)∏iP(Xi∣Y=1))}=11+exp?{ln?(1?ππ)+∑i(ln?(P(Xi∣Y=0))?ln?(P(Xi∣Y=1)))}P\left( Y=1|X \right) \\ =\frac{P\left( X|Y=1 \right) P\left( Y=1 \right)}{P\left( X \right)} \\ =\frac{P\left( X|Y=1 \right) P\left( Y=1 \right)}{P\left( X|Y=1 \right) P\left( Y=1 \right) +P\left( X|Y=0 \right) P\left( Y=0 \right)} \\ =\frac{1}{1+\frac{P\left( X|Y=0 \right) P\left( Y=0 \right)}{P\left( X|Y=1 \right) P\left( Y=1 \right)}} \\ =\frac{1}{1+\frac{1-\pi}{\pi}\cdot \frac{P\left( X|Y=0 \right)}{P\left( X|Y=1 \right)}} \\ =\frac{1}{1+\frac{1-\pi}{\pi}\cdot \frac{\prod_i{P\left( X_i|Y=0 \right)}}{\prod_i{P\left( X_i|Y=1 \right)}}} \\ =\frac{1}{1+\exp \left\{ \ln \left( \frac{1-\pi}{\pi}\cdot \frac{\prod_i{P\left( X_i|Y=0 \right)}}{\prod_i{P\left( X_i|Y=1 \right)}} \right) \right\}} \\ =\frac{1}{1+\exp \left\{ \ln \left( \frac{1-\pi}{\pi} \right) +\ln \left( \frac{\prod_i{P\left( X_i|Y=0 \right)}}{\prod_i{P\left( X_i|Y=1 \right)}} \right) \right\}} \\ =\frac{1}{1+\exp \left\{ \ln \left( \frac{1-\pi}{\pi} \right) +\sum_i{\left( \ln \left( P\left( X_i|Y=0 \right) \right) -\ln \left( P\left( X_i|Y=1 \right) \right) \right)} \right\}}P(Y=1∣X)=P(X)P(X∣Y=1)P(Y=1)?=P(X∣Y=1)P(Y=1)+P(X∣Y=0)P(Y=0)P(X∣Y=1)P(Y=1)?=1+P(X∣Y=1)P(Y=1)P(X∣Y=0)P(Y=0)?1?=1+π1?π??P(X∣Y=1)P(X∣Y=0)?1?=1+π1?π??∏i?P(Xi?∣Y=1)∏i?P(Xi?∣Y=0)?1?=1+exp{ln(π1?π??∏i?P(Xi?∣Y=1)∏i?P(Xi?∣Y=0)?)}1?=1+exp{ln(π1?π?)+ln(∏i?P(Xi?∣Y=1)∏i?P(Xi?∣Y=0)?)}1?=1+exp{ln(π1?π?)+∑i?(ln(P(Xi?∣Y=0))?ln(P(Xi?∣Y=1)))}1?
我們觀察分母的最后一項中的
ln?(P(Xi∣Y=0))?ln?(P(Xi∣Y=1))\ln \left( P\left( X_i|Y=0 \right) \right) -\ln \left( P\left( X_i|Y=1 \right) \right)ln(P(Xi?∣Y=0))?ln(P(Xi?∣Y=1))
可以發現由于我們有概率密度函數
P(Xi∣Y=0)=12πσiexp?{?(Xi?μi0)22σi2}P(Xi∣Y=1)=12πσiexp?{?(Xi?μi1)22σi2}P\left( X_i|Y=0 \right) =\frac{1}{\sqrt{2\pi}\sigma _i}\exp \left\{ -\frac{\left( X_i-\mu _{i0} \right) ^2}{2\sigma _{i}^{2}} \right\} \\ P\left( X_i|Y=1 \right) =\frac{1}{\sqrt{2\pi}\sigma _i}\exp \left\{ -\frac{\left( X_i-\mu _{i1} \right) ^2}{2\sigma _{i}^{2}} \right\} P(Xi?∣Y=0)=2π?σi?1?exp{?2σi2?(Xi??μi0?)2?}P(Xi?∣Y=1)=2π?σi?1?exp{?2σi2?(Xi??μi1?)2?}
所以有
ln?(P(Xi∣Y=0))?ln?(P(Xi∣Y=1))=?(Xi?μi0)22σi2+(Xi?μi1)22σi2=?(Xi?μi0)2+(Xi?μi1)22σi2=?Xi2+2Xiμi0?μi02+Xi2?2Xiμi1+μi122σi2=2(μi0?μi1)Xi?μi02+μi122σi2=μi0?μi1σi2Xi+?μi02+μi122σi2\ln \left( P\left( X_i|Y=0 \right) \right) -\ln \left( P\left( X_i|Y=1 \right) \right) \\ =-\frac{\left( X_i-\mu _{i0} \right) ^2}{2\sigma _{i}^{2}}+\frac{\left( X_i-\mu _{i1} \right) ^2}{2\sigma _{i}^{2}} \\ =\frac{-\left( X_i-\mu _{i0} \right) ^2+\left( X_i-\mu _{i1} \right) ^2}{2\sigma _{i}^{2}} \\ =\frac{-X_{\begin{array}{c} i\\ \end{array}}^{2}+2X_i\mu _{i0}-\mu _{i0}^{2}+X_{\begin{array}{c} i\\ \end{array}}^{2}-2X_i\mu _{i1}+\mu _{i1}^{2}}{2\sigma _{i}^{2}} \\ =\frac{2\left( \mu _{i0}-\mu _{i1} \right) X_i-\mu _{i0}^{2}+\mu _{i1}^{2}}{2\sigma _{i}^{2}} \\ =\frac{\mu _{i0}-\mu _{i1}}{\sigma _{i}^{2}}X_i+\frac{-\mu _{i0}^{2}+\mu _{i1}^{2}}{2\sigma _{i}^{2}}ln(P(Xi?∣Y=0))?ln(P(Xi?∣Y=1))=?2σi2?(Xi??μi0?)2?+2σi2?(Xi??μi1?)2?=2σi2??(Xi??μi0?)2+(Xi??μi1?)2?=2σi2??Xi?2?+2Xi?μi0??μi02?+Xi?2??2Xi?μi1?+μi12??=2σi2?2(μi0??μi1?)Xi??μi02?+μi12??=σi2?μi0??μi1??Xi?+2σi2??μi02?+μi12??
因此,
P(Y=1∣X)=11+exp?{ln?(1?ππ)+∑i(ln?(P(Xi∣Y=0))?ln?(P(Xi∣Y=1)))}=11+exp?{ln?(1?ππ)+∑i(μi0?μi1σi2Xi+?μi02+μi122σi2)}=11+exp?{ln?(1?ππ)+∑i(?μi02+μi122σi2)+∑i(μi0?μi1σi2Xi)}P\left( Y=1|X \right) \\ =\frac{1}{1+\exp \left\{ \ln \left( \frac{1-\pi}{\pi} \right) +\sum_i{\left( \ln \left( P\left( X_i|Y=0 \right) \right) -\ln \left( P\left( X_i|Y=1 \right) \right) \right)} \right\}} \\ =\frac{1}{1+\exp \left\{ \ln \left( \frac{1-\pi}{\pi} \right) +\sum_i{\left( \frac{\mu _{i0}-\mu _{i1}}{\sigma _{i}^{2}}X_i+\frac{-\mu _{i0}^{2}+\mu _{i1}^{2}}{2\sigma _{i}^{2}} \right)} \right\}} \\ =\frac{1}{1+\exp \left\{ \ln \left( \frac{1-\pi}{\pi} \right) +\sum_i{\left( \frac{-\mu _{i0}^{2}+\mu _{i1}^{2}}{2\sigma _{i}^{2}} \right)}+\sum_i{\left( \frac{\mu _{i0}-\mu _{i1}}{\sigma _{i}^{2}}X_i \right)} \right\}}P(Y=1∣X)=1+exp{ln(π1?π?)+∑i?(ln(P(Xi?∣Y=0))?ln(P(Xi?∣Y=1)))}1?=1+exp{ln(π1?π?)+∑i?(σi2?μi0??μi1??Xi?+2σi2??μi02?+μi12??)}1?=1+exp{ln(π1?π?)+∑i?(2σi2??μi02?+μi12??)+∑i?(σi2?μi0??μi1??Xi?)}1?
即
P(Y=1∣X)=11+exp?{ln?(1?ππ)+∑i(?μi02+μi122σi2)+∑i(μi0?μi1σi2Xi)}P\left( Y=1|X \right)=\frac{1}{1+\exp \left\{ \ln \left( \frac{1-\pi}{\pi} \right) +\sum_i{\left( \frac{-\mu _{i0}^{2}+\mu _{i1}^{2}}{2\sigma _{i}^{2}} \right)}+\sum_i{\left( \frac{\mu _{i0}-\mu _{i1}}{\sigma _{i}^{2}}X_i \right)} \right\}} P(Y=1∣X)=1+exp{ln(π1?π?)+∑i?(2σi2??μi02?+μi12??)+∑i?(σi2?μi0??μi1??Xi?)}1?
令
w0=ln?(1?ππ)+∑i(?μi02+μi122σi2)w_0=\ln \left( \frac{1-\pi}{\pi} \right)+\sum_i{\left( \frac{-\mu _{i0}^{2}+\mu _{i1}^{2}}{2\sigma _{i}^{2}} \right)}w0?=ln(π1?π?)+∑i?(2σi2??μi02?+μi12??)
wi=μi0?μi1σi2w_i=\frac{\mu _{i0}-\mu _{i1}}{\sigma _{i}^{2}}wi?=σi2?μi0??μi1??
則
P(Y=1∣X)=11+exp?{w0+∑i(wiXi)}P\left( Y=1|X \right)=\frac{1}{1+\exp \left\{ w_0+\sum_i{\left( w_iX_i \right)} \right\}} P(Y=1∣X)=1+exp{w0?+∑i?(wi?Xi?)}1?
即
P(Y=1∣X)=11+ewX+bP\left( Y=1|X \right)=\frac{1}{1+e^{wX+b}} P(Y=1∣X)=1+ewX+b1?
也就是我們常說的sigmoid函數。
損失函數推導
我們采用極大似然估計(MLE),但是P(<Xi,yi>∣w)P(<X_i,y_i>|w)P(<Xi?,yi?>∣w)很難求,數據集中很難找到這樣的數據,因此我們采用更弱一些的條件極大似然(MCLE),求P(Y=yi∣Xi,w)P(Y=y_i|X_i,w)P(Y=yi?∣Xi?,w)。
在這里我們需要一個真實的場景。
總結
以上是生活随笔為你收集整理的Logistics Regression公式推导的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Sentinel圣天诺加密狗简单使用教程
- 下一篇: Kali-Linux虚拟机安装提示