當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

第三讲-------Logistic Regression Regularization

發(fā)布時(shí)間：2025/3/21 编程问答 17 豆豆

生活随笔收集整理的這篇文章主要介紹了第三讲-------Logistic Regression Regularization 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

第三講-------Logistic Regression & Regularization

本講內(nèi)容：

Logistic Regression

=========================

(一)、Classification

（二）、Hypothesis Representation

（三）、Decision Boundary

（四）、Cost Function

（五）、Simplified Cost Function and Gradient Descent

（六）、Parameter Optimization in Matlab

（七）、Multiclass classification : One-vs-all

The problem of overfitting and how to solve it

=========================

（八）、The problem of overfitting

（九）、Cost Function

（十）、Regularized Linear Regression

（十一）、Regularized Logistic Regression

本章主要講述邏輯回歸和Regularization解決過擬合的問題，非常非常重要，是機(jī)器學(xué)習(xí)中非常常用的回歸工具，下面分別進(jìn)行兩部分的講解。

第一部分：Logistic Regression

/*************（一）~（二）、Classification /?Hypothesis Representation***********/

假設(shè)隨Tumor Size變化，預(yù)測(cè)病人的腫瘤是惡性（malignant）還是良性（benign）的情況。

給出8個(gè)數(shù)據(jù)如下：

? ?

假設(shè)進(jìn)行l(wèi)inear regression得到的hypothesis線性方程如上圖中粉線所示，則可以確定一個(gè)threshold:0.5進(jìn)行predict

y=1, if h(x)>=0.5

y=0, if ?h(x)<0.5

即malignant=0.5的點(diǎn)投影下來，其右邊的點(diǎn)預(yù)測(cè)y=1;左邊預(yù)測(cè)y=0；則能夠很好地進(jìn)行分類。

那么，如果數(shù)據(jù)集是這樣的呢？

這種情況下，假設(shè)linear regression預(yù)測(cè)為藍(lán)線，那么由0.5的boundary得到的線性方程中，不能很好地進(jìn)行分類。因?yàn)椴粷M足

y=1, h(x)>0.5

y=0, h(x)<=0.5

這時(shí)，我們引入logistic regression model：

所謂Sigmoid function或Logistic function就是這樣一個(gè)函數(shù)g(z)見上圖所示

當(dāng)z>=0時(shí)，g(z)>=0.5；當(dāng)z<0時(shí)，g(z)<0.5

由下圖中公式知，給定了數(shù)據(jù)x和參數(shù)θ，y=0和y=1的概率和=1

/*****************************（三）、decision boundary**************************/

所謂Decision Boundary就是能夠?qū)⑺袛?shù)據(jù)點(diǎn)進(jìn)行很好地分類的h(x)邊界。

如下圖所示，假設(shè)形如h(x)=g(θ0+θ1x1+θ2x2)的hypothesis參數(shù)θ=[-3,1,1]T, 則有

predict Y=1, if -3+x1+x2>=0

predict Y=0, if -3+x1+x2<0

剛好能夠?qū)D中所示數(shù)據(jù)集進(jìn)行很好地分類

Another Example:

answer:

除了線性boundary還有非線性decision boundaries，比如

下圖中，進(jìn)行分類的decision boundary就是一個(gè)半徑為1的圓，如圖所示：

/********************（四）~（五）Simplified cost function and gradient descent<非常重要>*******************/

該部分講述簡(jiǎn)化的logistic regression系統(tǒng)中how to implement gradient descents for logistic regression.

假設(shè)我們的數(shù)據(jù)點(diǎn)中y只會(huì)取0和1, 對(duì)于一個(gè)logistic regression model系統(tǒng)，有，那么cost function定義如下：

由于y只會(huì)取0,1，那么就可以寫成

不信的話可以把y=0,y=1分別代入，可以發(fā)現(xiàn)這個(gè)J（θ）和上面的Cost(hθ(x),y)是一樣的(*^__^*) ，那么剩下的工作就是求能最小化 J(θ)的θ了~

在第一章中我們已經(jīng)講了如何應(yīng)用Gradient Descent, 也就是下圖Repeat中的部分，將θ中所有維同時(shí)進(jìn)行更新，而J(θ)的導(dǎo)數(shù)可以由下面的式子求得，結(jié)果如下圖手寫所示：

現(xiàn)在將其帶入Repeat中：

這是我們驚奇的發(fā)現(xiàn)，它和第一章中我們得到的公式是一樣滴~

也就是說，下圖中所示，不管h(x)的表達(dá)式是線性的還是logistic regression model, 都能得到如下的參數(shù)更新過程。

那么如何用vectorization來做呢？換言之，我們不要用for循環(huán)一個(gè)個(gè)更新θj，而用一個(gè)矩陣乘法同時(shí)更新整個(gè)θ。也就是解決下面這個(gè)問題：

上面的公式給出了參數(shù)矩陣θ的更新，那么下面再問個(gè)問題，第二講中說了如何判斷學(xué)習(xí)率α大小是否合適，那么在logistic regression系統(tǒng)中怎么評(píng)判呢？

Q：Suppose you are running gradient descent to fit a logistic regression model with parameter?θ∈Rn+1. Which of the following is a reasonable way to make sure the learning rate?α?is set properly and that gradient descent is running correctly?

A：

/*************（六）、Parameter Optimization in Matlab***********/

這部分內(nèi)容將對(duì)logistic regression 做一些優(yōu)化措施，使得能夠更快地進(jìn)行參數(shù)梯度下降。本段實(shí)現(xiàn)了matlab下用梯度方法計(jì)算最優(yōu)參數(shù)的過程。

首先聲明，除了gradient descent 方法之外，我們還有很多方法可以使用，如下圖所示，左邊是另外三種方法，右邊是這三種方法共同的優(yōu)缺點(diǎn)，無需選擇學(xué)習(xí)率α，更快，但是更復(fù)雜。

也就是matlab中已經(jīng)幫我們實(shí)現(xiàn)好了一些優(yōu)化參數(shù)θ的方法，那么這里我們需要完成的事情只是寫好cost function,并告訴系統(tǒng)，要用哪個(gè)方法進(jìn)行最優(yōu)化參數(shù)。比如我們用‘GradObj’，?Use the GradObj option to specify?that FUN also returns a second output argument G that is the partial?derivatives of the function df/dX, at the point X.

如上圖所示，給定了參數(shù)θ，我們需要給出cost Function. 其中，

jVal 是 cost function 的表示，比如設(shè)有兩個(gè)點(diǎn)（1,0,5）和（0,1,5）進(jìn)行回歸，那么就設(shè)方程為hθ(x)=θ1x1+θ2x2;
則有costfunction J(θ)： jVal=(theta(1)-5)^2+(theta(2)-5)^2;

在每次迭代中，按照gradient descent的方法更新參數(shù)θ：θ(i)-=gradient(i),其中g(shù)radient(i)是J(θ)對(duì)θi求導(dǎo)的函數(shù)式，在此例中就有g(shù)radient(1)=2*(theta(1)-5),?gradient(2)=2*(theta(2)-5)。如下面代碼所示：

函數(shù)costFunction, 定義jVal=J(θ)和對(duì)兩個(gè)θ的gradient：

[cpp] view plaincopyprint?

function?[?jVal,gradient?]?=?costFunction(?theta?)??

%COSTFUNCTION?Summary?of?this?function?goes?here??

%???Detailed?explanation?goes?here??

jVal=?(theta(1)-5)^2+(theta(2)-5)^2;??

gradient?=?zeros(2,1);??

%code?to?compute?derivative?to?theta??

gradient(1)?=?2?*?(theta(1)-5);??

gradient(2)?=?2?*?(theta(2)-5);??

end??

function [ jVal,gradient ] = costFunction( theta ) %COSTFUNCTION Summary of this function goes here % Detailed explanation goes herejVal= (theta(1)-5)^2+(theta(2)-5)^2;gradient = zeros(2,1); %code to compute derivative to theta gradient(1) = 2 * (theta(1)-5); gradient(2) = 2 * (theta(2)-5);end

編寫函數(shù)Gradient_descent，進(jìn)行參數(shù)優(yōu)化

[cpp] view plaincopyprint?

function?[optTheta,functionVal,exitFlag]=Gradient_descent(?)??

%GRADIENT_DESCENT?Summary?of?this?function?goes?here??

%???Detailed?explanation?goes?here??

?options?=?optimset('GradObj','on','MaxIter',100);??

?initialTheta?=?zeros(2,1)??

?[optTheta,functionVal,exitFlag]?=?fminunc(@costFunction,initialTheta,options);??

????

end??

function [optTheta,functionVal,exitFlag]=Gradient_descent( ) %GRADIENT_DESCENT Summary of this function goes here % Detailed explanation goes hereoptions = optimset('GradObj','on','MaxIter',100);initialTheta = zeros(2,1)[optTheta,functionVal,exitFlag] = fminunc(@costFunction,initialTheta,options);end

matlab主窗口中調(diào)用，得到優(yōu)化厚的參數(shù)(θ1,θ2)=(5,5),即hθ(x)=θ1x1+θ2x2=5*x1+5*x2

[cpp] view plaincopyprint?

?[optTheta,functionVal,exitFlag]?=?Gradient_descent()??

initialTheta?=??

?????0??

Local?minimum?found.??

Optimization?completed?because?the?size?of?the?gradient?is?less?than??

the?default?value?of?the?function?tolerance.??

<stopping?criteria?details>??

optTheta?=??

?????5??

functionVal?=??

?????0??

exitFlag?=??

?????1??

[optTheta,functionVal,exitFlag] = Gradient_descent()initialTheta =00Local minimum found.Optimization completed because the size of the gradient is less than the default value of the function tolerance.<stopping criteria details>optTheta =55functionVal =0exitFlag =1

最后得到的結(jié)果顯示出優(yōu)化參數(shù)optTheta=[5,5], functionVal = costFunction(迭代后) = 0

/*****************************（七）、Multi-class Classification One-vs-all**************************/

所謂one-vs-all method就是將binary分類的方法應(yīng)用到多類分類中。

比如我想分成K類，那么就將其中一類作為positive，另（k-1）合起來作為negative，這樣進(jìn)行K個(gè)h(θ)的參數(shù)優(yōu)化，每次得到的一個(gè)hθ(x)是指給定θ和x，它屬于positive的類的概率。

按照上面這種方法，給定一個(gè)輸入向量x，獲得最大hθ(x)的類就是x所分到的類。

第二部分：The problem of overfitting and how to solve it

/************（八）、The problem of overfitting***********/

The Problem of overfitting:

overfitting就是過擬合，如下圖中最右邊的那幅圖。對(duì)于以上講述的兩類（logistic regression和linear regression）都有overfitting的問題，下面分別用兩幅圖進(jìn)行解釋：

<Linear Regression>:

<logistic regression>:

怎樣解決過擬合問題呢？兩個(gè)方法：

1. 減少feature個(gè)數(shù)（人工定義留多少個(gè)feature、算法選取這些feature）

2. 規(guī)格化（留下所有的feature，但對(duì)于部分feature定義其parameter非常小）

下面我們將對(duì)regularization進(jìn)行詳細(xì)的講解。

對(duì)于linear regression model, 我們的問題是最小化

寫作矩陣表示即

i.e. the loss function can be written as

there we can get:

After regularization, however,we have:

/************（九）、Cost Function***********/
對(duì)于Regularization，方法如下，定義cost function中θ3，θ4的parameter非常大，那么最小化cost function后就有非常小的θ3,θ4了。

寫作公式如下，在cost function中加入θ1~θn的懲罰項(xiàng)：

這里要注意λ的設(shè)置，見下面這個(gè)題目：

? ? A:λ很大會(huì)導(dǎo)致所有θ≈0

下面呢，我們分linear regression 和 logistic regression分別進(jìn)行regularization步驟.

/************（十）、Regularized Linear Regression***********/

<Linear regression>:

首先看一下，按照上面的cost function的公式，如何應(yīng)用gradient descent進(jìn)行參數(shù)更新。

對(duì)于θ0，沒有懲罰項(xiàng)，更新公式跟原來一樣

對(duì)于其他θj，J(θ)對(duì)其求導(dǎo)后還要加上一項(xiàng)(λ/m)*θj，見下圖：

如果不使用梯度下降法（gradient descent+regularization），而是用矩陣計(jì)算（normal equation）來求θ，也就求使J(θ)min的θ，令J(θ)對(duì)θj求導(dǎo)的所有導(dǎo)數(shù)等于0，有公式如下：

而且已經(jīng)證明，上面公式中括號(hào)內(nèi)的東西是可逆的。

/************（十一）、Regularized Logistic Regression***********/

<Logistic regression>:

前面已經(jīng)講過Logisitic Regression的cost function和overfitting的情況，如下圖中所示:

和linear regression一樣，我們給J(θ)加入關(guān)于θ的懲罰項(xiàng)來抑制過擬合：

用Gradient Descent的方法，令J(θ)對(duì)θj求導(dǎo)都等于0，得到

這里我們發(fā)現(xiàn)，其實(shí)和線性回歸的θ更新方法是一樣的。

When using regularized logistic regression, which of these is the best way to monitor whether gradient descent is working correctly?

和上面matlab中調(diào)用那個(gè)例子相似，我們可以定義logistic regression的cost function如下所示：

圖中，jval表示cost function 表達(dá)式，其中最后一項(xiàng)是參數(shù)θ的懲罰項(xiàng)；下面是對(duì)各θj求導(dǎo)的梯度，其中θ0沒有在懲罰項(xiàng)中，因此gradient不變，θ1~θn分別多了一項(xiàng)(λ/m)*θj；

至此，regularization可以解決linear和logistic的overfitting regression問題了~

《新程序員》：云原生和全面數(shù)字化實(shí)踐50位技術(shù)專家共同創(chuàng)作，文字、視頻、音頻交互閱讀

總結(jié)

以上是生活随笔為你收集整理的第三讲-------Logistic Regression Regularization的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： Linear Regression总结2
下一篇： Coursera公开课笔记: 斯坦福大学

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

第三讲-------Logistic Regression Regularization

總結(jié)