當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

寻找最优参数解：最速下降法，牛顿下降法，阻尼牛顿法，拟牛顿法

發布時間：2025/3/21 编程问答 12 豆豆

生活随笔收集整理的這篇文章主要介紹了寻找最优参数解：最速下降法，牛顿下降法，阻尼牛顿法，拟牛顿法小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

感謝于建民的投稿，轉載請注明出處：數盟社區

機器學習的一個重要組成部分是如何尋找最優參數解。本文就常見尋優方法進行總結，并給出簡單python2.7實現，可能文章有點長，大家耐心些。

尋找最優參數解，就是在一塊參數區域上，去找到滿足約束條件的那組參數。形象描述，比如代價函數是個碗狀的，那我們就是去找最底部（代價最小）的那個地方的對應的參數值作為最優解。那么，如何找到那個底部的最優參數解呢，如何由一個初始值，一步一步地接近該最優解呢。尋優方法，提供了靠近最優解的方法，其中涉及到的核心點，無外乎兩點：靠近最優解的方向和步幅（每步的長度）。

最優化，分為線性最優化理論和非線性最優化理論。其中線性最優化又稱線性規劃。目標函數和約束條件的表達是線性的，Y=αX；非線性最優化理論，是非線性的。其中包括梯度法，牛頓法，擬牛頓法（DFP/BFGS），約束變尺度（SQP），Lagrange乘子法，信賴域法等。

算法原理及簡單推導

最速下降法（梯度下降法）

借助梯度，找到下降最快的方向，大小為最大變化率。

梯度：是方向導數中，變化最大的那個方向導數。
梯度方向：標量場中增長最快的方向。
梯度大小：最大變化率。
更新：沿著梯度的負向，更新參數（靠近最優解）。
*********************************************

*********************************************
梯度下降法
優點：方便直觀，便于理解。
缺點：下降速度慢，有時參數會震蕩在最優解附近無法終止。

牛頓下降法

牛頓下降法，是通過泰勒展開到二階，推到出參數更新公式的。

調整了參數更新的方向和大小（牛頓方向）。
*********************************************

*********************************************
牛頓下降法
優點：對于正定二次函數，迭代一次，就可以得到極小值點。下降的目的性更強。
缺點：要求二階可微分；收斂性對初始點的選取依賴性很大；每次迭代都要計算Hessian矩陣，計算量大；計算Dk時，方程組有時奇異或者病態，無法求解Dk或者Dk不是下降方向。

阻尼牛頓法

這是對牛頓法的改進，在求新的迭代點時，以Dk作為搜索方向，進行一維搜索，求步長控制量α，
*********************************************

*********************************************
阻尼牛頓法
優點：修改了下降方向，使得始終朝著下降的方向迭代。
缺點：與牛頓法一樣。

一維搜索方法簡介

一維無約束優化問題minF(α)，求解F(α)的極小值和極大值的數值迭代方法，即為一維搜索方法。常用的方法包括：試探法（黃金分割法，fibonacci方法，平分法，格點法）；插值法（牛頓法，拋物線法）。
（1）確定最優解所在區間[a,b] （進退法）
思想：從初始點α0開始，以步長h前進或者后退，試出三個點f(α0+h),f(α0),f(α0?h)，滿足大，小，大規律。
*********************************************

*********************************************
（2）在[a, b]內，找到極小值（黃金分割法和平分法）
*********************************************

*********************************************
思考：如何在實際應用中，選擇[a, b]，函數f是什么樣子的？這些問題需要討論。整個優化的目標是：找到最優θ，使得代價CostJ最小。故此，f=CostJ。

擬牛頓法 – DFP法

由于牛頓法計算二階導數，計算量大，故此用其他方法（一階導數）估計Hessian矩陣的逆。f(x)在Xk+1處，展開成二階泰勒級數。

根據H的構造函數不同，分為不同的擬牛頓方法，下面為DFP方法：

*********************************************

*********************************************
擬牛頓法DFP：
優點：減少了二階計算，運算量大大降低。

擬牛頓法 – BFGS法

若構造函數如下，則為BFGS法。

*********************************************

*********************************************
擬牛頓法是無約束最優化方法中最有效的一類算法。

算法的Python實現代碼

Python2.7需要安裝pandas, numpy, scipy, matplotlib。
下面給出Windows7下exe方式按照上面模塊的簡單方法。
numpy–http://sourceforge.net/projects/numpy/files/?–這里面也可以找到較新的scipy –
scipy–http://download.csdn.net/detail/caanyee/8241305
pandas-https://pypi.python.org/packages/2.7/p/pandas/pandas-0.12.0.win32-py2.7.exe#md5=80b0b9b891842ef4bdf451ac07b368e5
test.py

# coding = utf-8 ''' time: 2015.06.03 author: yujianmin objection: BGD / SGD / mini-batch GD / QNGD / DFP / BFGS 實現了批量梯度下降、單個梯度下降；最速下降法、牛頓下降法、阻尼牛頓法、擬牛頓DFP和BFGS ''' import pandas as pd import numpy as np import scipy as sp import matplotlib.pyplot as plt data = pd.read_csv("C:\\Users\\yujianmin\\Desktop\\python\\arraydataR.csv") print(data.ix[1:5, :]) dataArray = np.array(data) ''' x = dataArray[:, 0] y = dataArray[:, 1] plt.plot(x, y, 'o') plt.title('data is like this') plt.xlabel('x feature') plt.ylabel('y label') plt.show() ''' def Myfunction_BGD(data, alpha, numIter, eplise):''' Batch Gradient Descent:type data: array :param data: contain x and y(label):type step: int/float numeric:param step: length of step when update the theta'''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)i = 0costJ = []#eplise = 0.4while i < numIter:H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)print('Itering %d ;cost is:%f' %(i+1,J))costJ.append(J)Gradient = (np.dot(np.transpose(y-H),x))/nRowGradient = Gradient.reshape(nCol+1, 1)if np.sum(np.fabs(Gradient))<= eplise:return theta, costJelse:## updatetheta = theta + alpha * Gradienti = i + 1return theta, costJdef Myfunction_SGD(data, alpha, numIter, eplise):''' Stochastic Gradient Descent:type data: array :param data: contain x and y(label):type step: int/float numeric:param step: length of step when update the theta'''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)Loop = 0costJ = []while Loop <numIter:H = np.dot(x,theta)J = np.sum((y-H)**2)/(2*nRow)print('Itering %d ;cost is:%f' %(Loop+1,J))costJ.append(J)i = 0while i <nRow:Gradient = (y[i] - np.dot(x[i], theta)) * x[i]Gradient = Gradient.reshape(nCol+1, 1)theta = theta + alpha * Gradienti = i + 1#eplise = 0.4Gradient = (np.dot(np.transpose(y-H),x))/nRowif np.sum(np.fabs(Gradient))<= eplise:return theta, costJLoop = Loop + 1return theta, costJdef Myfunction_NGD1(data, alpha, numIter, eplise):''' Newton Gradient Descent -- theta := theta - alpha*[f'']^(-1)*f':type data: array :param data: contain x and y(label):type step: int/float numeric:param step: length of step when update the theta:reference:http://www.doc88.com/p-145660070193.html:hessian = transpos(x) * x '''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)i = 0costJ = []while i < numIter:H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)## updateprint('Itering %d ;cost is:%f' %(i+1,J))costJ.append(J)Gradient = (np.dot(np.transpose(y-H),x))/nRowGradient = Gradient.reshape(nCol+1, 1)#eplise = 0.4if np.sum(np.fabs(Gradient))<=eplise:return theta, costJHessian = np.dot(np.transpose(x), x)/nRowtheta = theta + alpha * np.dot(np.linalg.inv(Hessian), Gradient)#theta = theta + np.dot(np.linalg.inv(Hessian), Gradient)i = i + 1return theta, costJdef Myfunction_NGD2(data, alpha, numIter, eplise):''' Newton Gradient Descent -- theta := theta - [f'']^(-1)*f':type data: array :param data: contain x and y(label):type step: int/float numeric:param step: length of step when update the theta:reference:http://www.doc88.com/p-145660070193.html:hessian = transpos(x) * x '''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)i = 0costJ = []while i < numIter:H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)## updateprint('Itering %d ;cost is:%f' %(i+1,J))costJ.append(J)Gradient = (np.dot(np.transpose(y-H),x))/nRowGradient = Gradient.reshape(nCol+1, 1)#eplise = 0.4if np.sum(np.fabs(Gradient)) <= eplise:return theta, costJHessian = np.dot(np.transpose(x), x)/nRowtheta = theta + np.dot(np.linalg.inv(Hessian), Gradient)i = i + 1return theta, costJdef Myfunction_QNGD(data, alpha, numIter, eplise):''' Newton Gradient Descent -- theta := theta - alpha* [f'']^(-1)*f'--alpha is search by ForwardAndBack method and huang jin fen ge :type data: array :param data: contain x and y(label):type step: int/float numeric:param step: length of step when update the theta:reference:http://www.doc88.com/p-145660070193.html:hessian = transpos(x) * x '''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)i = 0costJ = []#eplise = 0.4while i < numIter:H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)## updateprint('Itering %d ;cost is:%f' %(i+1,J))costJ.append(J)Gradient = (np.dot(np.transpose(y-H),x))/nRowGradient = Gradient.reshape(nCol+1, 1)if np.sum(np.fabs(Gradient))<= eplise:return theta, costJelse:Hessian = np.dot(np.transpose(x), x)/nRowDk = - np.dot(np.linalg.inv(Hessian), Gradient)## find optimal [a,b] which contain optimal alpha## optimal alpha lead to min{f(theta + alpha*DK)}alpha0 = 0h = np.random.random(1)alpha1 = alpha0alpha2 = alpha0 + htheta1 = theta + alpha1 * Dktheta2 = theta + alpha2 * Dkf1 = (np.sum((y-np.dot(x, theta1))**2))/(2*nRow)f2 = (np.sum((y-np.dot(x, theta2))**2))/(2*nRow)Loop = 1a = 0b = 0while Loop >0:print(' find [a,b] loop is %d' %Loop)Loop = Loop + 1if f1 > f2:h = 2*helse:h = -h(alpha1, alpha2) = (alpha2, alpha1)(f1, f2) = (f2, f1)alpha3 = alpha2 + htheta3 = theta + alpha3 * Dkf3 = (np.sum((y-np.dot(x, theta3))**2))/(2*nRow)print('f3 - f2 is %f' %(f3-f2))if f3 > f2:a = min(alpha1, alpha3)b = max(alpha1, alpha3)breakif f3 <= f2:alpha1 = alpha2alpha2 = alpha3f1 = f2 f2 = f3## find optiaml alpha in [a,b] using huang jin fen ge fa e = 0.01while Loop >0:alpha1 = a + 0.382 * (b - a)alpha2 = a + 0.618 * (b - a)theta1 = theta + alpha1* Dktheta2 = theta + alpha2* Dkf1 = (np.sum((y-np.dot(x, theta1))**2))/(2*nRow)f2 = (np.sum((y-np.dot(x, theta2))**2))/(2*nRow)if f1 > f2:a = alpha1if f1< f2:b = alpha2if np.fabs(a-b) <= e:alpha = (a+b)/2breakprint('optimal alpha is %f' % alpha)theta = theta + alpha * Dki = i + 1return theta, costJdef Myfunction_DFP2(data, alpha, numIter, eplise):''' DFP -- theta := theta + alpha * Dk --alpha is searched by huangjin method --satisfied argmin{f(theta+alpha*Dk)}##:type data: array :param data: contain x and y(label):type step: int/float numeric:param step: length of step when update the theta:reference:http://blog.pfan.cn/miaowei/52925.html:reference:http://max.book118.com/html/2012/1025/3119007.shtm ## important ##:hessian is estimated by DFP method.'''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)i = 0costJ = []Hessian = np.eye(nCol+1)H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)#costJ.append(J)Gradient = (np.dot(np.transpose(y-H),x))/nRowGradient = Gradient.reshape(nCol+1, 1)Dk = - Gradient#eplise = 0.4while i < numIter:if(np.sum(np.fabs(Dk)) <= eplise ): ## stop condition ##return theta, costJelse:## find alpha that min f(thetaK + alpha * Dk)## find optimal [a,b] which contain optimal alpha## optimal alpha lead to min{f(theta + alpha*DK)}alpha0 = 0h = np.random.random(1)alpha1 = alpha0alpha2 = alpha0 + htheta1 = theta + alpha1 * Dktheta2 = theta + alpha2 * Dkf1 = (np.sum((y-np.dot(x, theta1))**2))/(2*nRow)f2 = (np.sum((y-np.dot(x, theta2))**2))/(2*nRow)Loop = 1a = 0b = 0while Loop >0:print(' find [a,b] loop is %d' %Loop)Loop = Loop + 1if f1 > f2:h = 2*helse:h = -h(alpha1, alpha2) = (alpha2, alpha1)(f1, f2) = (f2, f1)alpha3 = alpha2 + htheta3 = theta + alpha3 * Dkf3 = (np.sum((y-np.dot(x, theta3))**2))/(2*nRow)print('f3 - f2 is %f' %(f3-f2))if f3 > f2:a = min(alpha1, alpha3)b = max(alpha1, alpha3)breakif f3 <= f2:alpha1 = alpha2alpha2 = alpha3f1 = f2 f2 = f3## find optiaml alpha in [a,b] using huang jin fen ge fa e = 0.01while Loop >0:alpha1 = a + 0.382 * (b - a)alpha2 = a + 0.618 * (b - a)theta1 = theta + alpha1* Dktheta2 = theta + alpha2* Dkf1 = (np.sum((y-np.dot(x, theta1))**2))/(2*nRow)f2 = (np.sum((y-np.dot(x, theta2))**2))/(2*nRow)if f1 > f2:a = alpha1if f1< f2:b = alpha2if np.fabs(a-b) <= e:alpha = (a+b)/2breakprint('optimal alpha is %f' % alpha)theta_old = thetatheta = theta + alpha * Dk## update the Hessian matrix ##H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)## update print('Itering %d ;cost is:%f' %(i+1,J))costJ.append(J)# here to estimate Hessian'inv ## sk = ThetaNew - ThetaOld = alpha * inv(H) * Gradientsk = theta - theta_old#yk = DelX(k+1) - DelX(k)DelXK = - (np.dot(np.transpose(y-np.dot(x, theta)),x))/nRowDelXk = - (np.dot(np.transpose(y-np.dot(x, theta_old)),x))/nRowyk = (DelXK - DelXk).reshape(nCol+1, 1)#z1 = (sk * sk') # a matrix#z2 = (sk' * yk) # a valuez1 = sk * np.transpose(sk)z2 = np.dot(np.transpose(sk),yk)#z3 = (H * yk * yk' * H) # a matrix#z4 = (yk' * H * yk) # a valuez3 = np.dot(np.dot(np.dot(Hessian, yk), np.transpose(yk)), Hessian)z4 = np.dot(np.dot(np.transpose(yk), Hessian),yk)DHessian = z1/z2 - z3/z4Hessian = Hessian + DHessianDk = - np.dot(Hessian, DelXK.reshape(nCol+1,1))i = i + 1return theta, costJdef Myfunction_DFP1(data, alpha, numIter, eplise):''' DFP -- theta := theta + alpha * Dkalpha is fixed ##:type data: array :param data: contain x and y(label) :type step: int/float numeric:param step: length of step when update the theta:reference:http://blog.pfan.cn/miaowei/52925.html:reference:http://max.book118.com/html/2012/1025/3119007.shtm ## important ##:hessian is estimated by DFP method.'''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)i = 0costJ = []Hessian = np.eye(nCol+1)H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)#costJ.append(J)Gradient = (np.dot(np.transpose(y-H),x))/nRowGradient = Gradient.reshape(nCol+1, 1)Dk = - Gradient#eplise = 0.4while i < numIter:if(np.sum(np.fabs(Dk)) <= eplise ): ## stop condition ##return theta, costJelse:## find alpha that min f(thetaK + alpha * Dk)## here for simple alpha is parameter 'alpha'alpha = alphatheta_old = thetatheta = theta + alpha * Dk## update the Hessian matrix ##H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)## update print('Itering %d ;cost is:%f' %(i+1,J))costJ.append(J)# here to estimate Hessian'inv ## sk = ThetaNew - ThetaOld = alpha * inv(H) * Gradientsk = theta - theta_old#yk = DelX(k+1) - DelX(k)DelXK = - (np.dot(np.transpose(y-np.dot(x, theta)),x))/nRowDelXk = - (np.dot(np.transpose(y-np.dot(x, theta_old)),x))/nRowyk = (DelXK - DelXk).reshape(nCol+1, 1)#z1 = (sk * sk') # a matrix#z2 = (sk' * yk) # a valuez1 = sk * np.transpose(sk)z2 = np.dot(np.transpose(sk),yk)#z3 = (H * yk * yk' * H) # a matrix#z4 = (yk' * H * yk) # a valuez3 = np.dot(np.dot(np.dot(Hessian, yk), np.transpose(yk)), Hessian)z4 = np.dot(np.dot(np.transpose(yk), Hessian),yk)DHessian = z1/z2 - z3/z4Hessian = Hessian + DHessianDk = - np.dot(Hessian, DelXK.reshape(nCol+1,1))i = i + 1return theta, costJdef Myfunction_BFGS1(data, alpha, numIter, eplise):''' BFGS :type data: array :param data: contain x and y(label):type step: int/float numeric:param step: length of step when update the theta:reference:http://blog.pfan.cn/miaowei/52925.html:reference:http://max.book118.com/html/2012/1025/3119007.shtm ## important ##:hessian is estimated by BFGS method.'''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)i = 0costJ = []Hessian = np.eye(nCol+1)H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)#costJ.append(J)Gradient = (np.dot(np.transpose(y-H),x))/nRowGradient = Gradient.reshape(nCol+1, 1)Dk = - Gradient#eplise = 0.4while i < numIter:if(np.sum(np.fabs(Dk)) <= eplise ): ## stop condition ##return theta, costJelse:## find alpha that min J(thetaK + alpha * Dk)## here for simple alpha is parameter 'alpha'alpha = alphatheta_old = thetatheta = theta + alpha * Dk## update the Hessian matrix ##H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)## update print('Itering %d ;cost is:%f' %(i+1,J))costJ.append(J)# here to estimate Hessian ## sk = ThetaNew - ThetaOld = alpha * inv(H) * Gradientsk = theta - theta_old#yk = DelX(k+1) - DelX(k)DelXK = - (np.dot(np.transpose(y-np.dot(x, theta)),x))/nRowDelXk = - (np.dot(np.transpose(y-np.dot(x, theta_old)),x))/nRowyk = (DelXK - DelXk).reshape(nCol+1, 1)#z1 = yk' * H * yk # a value#z2 = (sk' * yk) # a valuez1 = np.dot(np.dot(np.transpose(yk), Hessian), yk)z2 = np.dot(np.transpose(sk),yk)#z3 = sk * sk' # a matrix#z4 = sk * yk' * H # a matrixz3 = np.dot(sk, np.transpose(sk))z4 = np.dot(np.dot(sk, np.transpose(yk)), Hessian)DHessian = (1+z1/z2) * (z3/z2) - z4/z2Hessian = Hessian + DHessianDk = - np.dot(Hessian, DelXK.reshape(nCol+1,1))i = i + 1return theta, costJdef Myfunction_BFGS2(data, alpha, numIter, eplise):''' BFGS :type data: array :param data: contain x and y(label):type step: int/float numeric:param step: length of step when update the theta:reference:http://blog.pfan.cn/miaowei/52925.html:reference:http://max.book118.com/html/2012/1025/3119007.shtm ## important ##:hessian is estimated by BFGS method.'''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)i = 0costJ = []Hessian = np.eye(nCol+1)H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)#costJ.append(J)Gradient = (np.dot(np.transpose(y-H),x))/nRowGradient = Gradient.reshape(nCol+1, 1)Dk = - Gradient#eplise = 0.4while i < numIter:if(np.sum(np.fabs(Dk)) <= eplise ): ## stop condition ##return theta, costJelse:## find alpha that min J(thetaK + alpha * Dk)alpha = alpha## find optimal [a,b] which contain optimal alpha## optimal alpha lead to min{f(theta + alpha*DK)}'''alpha0 = 0h = np.random.random(1)alpha1 = alpha0alpha2 = alpha0 + htheta1 = theta + alpha1 * Dktheta2 = theta + alpha2 * Dkf1 = (np.sum((y-np.dot(x, theta1))**2))/(2*nRow)f2 = (np.sum((y-np.dot(x, theta2))**2))/(2*nRow)Loop = 1a = 0b = 0while Loop >0:print(' find [a,b] loop is %d' %Loop)Loop = Loop + 1if f1 > f2:h = 2*helse:h = -h(alpha1, alpha2) = (alpha2, alpha1)(f1, f2) = (f2, f1)alpha3 = alpha2 + htheta3 = theta + alpha3 * Dkf3 = (np.sum((y-np.dot(x, theta3))**2))/(2*nRow)print('f3 - f2 is %f' %(f3-f2))if f3 > f2:a = min(alpha1, alpha3)b = max(alpha1, alpha3)breakif f3 <= f2:alpha1 = alpha2alpha2 = alpha3f1 = f2 f2 = f3## find optiaml alpha in [a,b] using huang jin fen ge fa e = 0.01while Loop >0:alpha1 = a + 0.382 * (b - a)alpha2 = a + 0.618 * (b - a)theta1 = theta + alpha1* Dktheta2 = theta + alpha2* Dkf1 = (np.sum((y-np.dot(x, theta1))**2))/(2*nRow)f2 = (np.sum((y-np.dot(x, theta2))**2))/(2*nRow)if f1 > f2:a = alpha1if f1< f2:b = alpha2if np.fabs(a-b) <= e:alpha = (a+b)/2breakprint('optimal alpha is %f' % alpha)'''## Get Dk and update Hessiantheta_old = thetatheta = theta + alpha * Dk## update the Hessian matrix ##H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)## update print('Itering %d ;cost is:%f' %(i+1,J))costJ.append(J)# here to estimate Hessian ## sk = ThetaNew - ThetaOld = alpha * inv(H) * Gradientsk = theta - theta_old#yk = DelX(k+1) - DelX(k)DelXK = - (np.dot(np.transpose(y-np.dot(x, theta)),x))/nRowDelXk = - (np.dot(np.transpose(y-np.dot(x, theta_old)),x))/nRowyk = (DelXK - DelXk).reshape(nCol+1, 1)#z1 = yk' * H * yk # a value#z2 = (sk' * yk) # a valuez1 = np.dot(np.dot(np.transpose(yk), Hessian), yk)z2 = np.dot(np.transpose(sk),yk)#z3 = sk * sk' # a matrix#z4 = sk * yk' * H # a matrixz3 = np.dot(sk, np.transpose(sk))z4 = np.dot(np.dot(sk, np.transpose(yk)), Hessian)DHessian = (1+z1/z2) * (z3/z2) - z4/z2Hessian = Hessian + DHessianDk = - np.dot(Hessian, DelXK.reshape(nCol+1,1))i = i + 1return theta, costJ## test ##num = 10000 #theta, costJ = Myfunction_BGD(dataArray, alpha=0.0005, numIter=num, eplise=0.4) ## #theta, costJ = Myfunction_SGD(dataArray, alpha=0.00005, numIter=num, eplise=0.4) #theta, costJ = Myfunction_NGD1(dataArray, alpha=0.0005, numIter=num, eplise=0.4) ## alpha is fixed ## #theta, costJ = Myfunction_NGD2(dataArray, alpha=0.0005, numIter=num, eplise=0.4) ## alpha is 1 ## #theta, costJ = Myfunction_QNGD(dataArray, alpha=0.0005, numIter=num, eplise=0.4) ## alpha is searched ## #theta, costJ = Myfunction_DFP1(dataArray, alpha=0.0005, numIter=num, eplise=0.4) ## alpha is fixed ## #theta, costJ = Myfunction_DFP2(dataArray, alpha=0.0005, numIter=num, eplise=0.4) ## alpha is searched ## theta, costJ = Myfunction_BFGS1(dataArray, alpha=0.0005, numIter=num, eplise=0.4) ## alpha is fxied ## print theta klen = len(costJ) leng = np.linspace(1, klen, klen) plt.plot(leng, costJ) plt.show()

實驗數據和結果展示

數據csv格式

0 28.22401669 1 33.24921693 2 35.82084277 3 36.87096878 4 30.98488531 5 38.78221296 6 38.46753324 7 41.96065845 8 36.82656413 9 35.5081121 10 35.74647181 11 36.17110987 12 37.51165999 13 41.27109257 14 44.03842677 15 48.03001705 16 45.50401843 17 45.02635608 18 51.70574034 19 46.76359881 20 52.6487595 21 48.81383593 22 50.69451254 23 55.54200403 24 54.55639586 25 53.19036223 26 58.89269091 27 54.78884251 28 57.9033951 29 62.21114967 30 64.51025468 31 62.20710537 32 62.94736304 33 60.30447933 34 65.32044406 35 65.82903452 36 66.37872216 37 69.75640553 38 66.02112594 39 65.87119039 40 74.27209751 41 67.57661628 42 73.19444088 43 69.4533117 44 74.91129817 45 71.21187609 46 77.0962545 47 81.95066837 48 78.04636838 49 83.42842526 50 80.40217563 51 78.68650206 52 82.91395215 53 85.09663115 54 88.71540907 55 87.73955 56 89.18654776 57 91.09337441 58 83.95614422 59 93.30683179 60 93.27618596 61 88.07859238 62 89.10667856 63 95.61443666 64 93.39899106 65 94.38258758 66 96.87641802 67 96.87896946 68 97.0094412 69 100.076115 70 104.7619905 71 100.7917093 72 99.85523362 73 106.9018494 74 103.6061063 75 103.4105058 76 106.4304576 77 110.7357249 78 107.0420455 79 107.2834221 80 113.9299496 81 111.2187627 82 116.4100596 83 108.0237256 84 112.7773592 85 117.3464957 86 117.1976807 87 120.0538521 88 114.4584964 89 122.2860022

結果展示

橫軸是迭代次數，縱軸是代價

? Batch Gradient Descent- 批量梯度下降法 ?
? Stochastic Gradient Descent- 隨機梯度下降法 ?
? Newton下降法，固定alpha=1 ?
? Newton下降法，固定alpha=0.0005 ?
? DFP，alpha是一維搜索得到的 ?
? 阻尼牛頓法，alpha是一維搜索得到的

總結

不管什么最優化方法，都是試圖去尋找代價下降最快的方向和合適的步幅。

作者簡介：于建民，關注領域數據挖掘，模式識別。我的博客。

from: http://dataunion.org/20714.html

總結

以上是生活随笔為你收集整理的寻找最优参数解：最速下降法，牛顿下降法，阻尼牛顿法，拟牛顿法的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： k-d tree算法
下一篇： TLD（Tracking-Learnin