感謝于建民的投稿,轉載請注明出處:數盟社區
機器學習的一個重要組成部分是如何尋找最優參數解。本文就常見尋優方法進行總結,并給出簡單python2.7實現,可能文章有點長,大家耐心些。
尋找最優參數解,就是在一塊參數區域上,去找到滿足約束條件的那組參數。形象描述,比如代價函數是個碗狀的,那我們就是去找最底部(代價最小)的那個地方的對應的參數值作為最優解。那么,如何找到那個底部的最優參數解呢,如何由一個初始值,一步一步地接近該最優解呢。尋優方法,提供了靠近最優解的方法,其中涉及到的核心點,無外乎兩點:靠近最優解的方向和步幅(每步的長度)。
最優化,分為線性最優化理論和非線性最優化理論。其中線性最優化又稱線性規劃。目標函數和約束條件的表達是線性的,Y=αX;非線性最優化理論,是非線性的。其中包括梯度法,牛頓法,擬牛頓法(DFP/BFGS),約束變尺度(SQP),Lagrange乘子法,信賴域法等。
算法原理及簡單推導
最速下降法(梯度下降法)
借助梯度,找到下降最快的方向,大小為最大變化率。
梯度:是方向導數中,變化最大的那個方向導數。
梯度方向:標量場中增長最快的方向。
梯度大小:最大變化率。
更新:沿著梯度的負向,更新參數(靠近最優解)。
*********************************************
*********************************************
梯度下降法
優點:方便直觀,便于理解。
缺點:下降速度慢,有時參數會震蕩在最優解附近無法終止。
牛頓下降法
牛頓下降法,是通過泰勒展開到二階,推到出參數更新公式的。
調整了參數更新的方向和大小(牛頓方向)。
*********************************************
*********************************************
牛頓下降法
優點:對于正定二次函數,迭代一次,就可以得到極小值點。下降的目的性更強。
缺點:要求二階可微分;收斂性對初始點的選取依賴性很大;每次迭代都要計算Hessian矩陣,計算量大;計算Dk時,方程組有時奇異或者病態,無法求解Dk或者Dk不是下降方向。
阻尼牛頓法
這是對牛頓法的改進,在求新的迭代點時,以Dk作為搜索方向,進行一維搜索,求步長控制量α,
*********************************************
*********************************************
阻尼牛頓法
優點:修改了下降方向,使得始終朝著下降的方向迭代。
缺點:與牛頓法一樣。
一維搜索方法簡介
一維無約束優化問題minF(α),求解F(α)的極小值和極大值的數值迭代方法,即為一維搜索方法。常用的方法包括:試探法(黃金分割法,fibonacci方法,平分法,格點法);插值法(牛頓法,拋物線法)。
(1)確定最優解所在區間[a,b] (進退法)
思想:從初始點α0開始,以步長h前進或者后退,試出三個點f(α0+h),f(α0),f(α0?h),滿足大,小,大規律。
*********************************************
*********************************************
(2)在[a, b]內,找到極小值(黃金分割法和平分法)
*********************************************
*********************************************
思考:如何在實際應用中,選擇[a, b],函數f是什么樣子的?這些問題需要討論。整個優化的目標是:找到最優θ,使得代價CostJ最小。故此,f=CostJ。
擬牛頓法 – DFP法
由于牛頓法計算二階導數,計算量大,故此用其他方法(一階導數)估計Hessian矩陣的逆。f(x)在Xk+1處,展開成二階泰勒級數。
根據H的構造函數不同,分為不同的擬牛頓方法,下面為DFP方法:
*********************************************
*********************************************
擬牛頓法DFP:
優點:減少了二階計算,運算量大大降低。
擬牛頓法 – BFGS法
若構造函數如下,則為BFGS法。
*********************************************
*********************************************
擬牛頓法是無約束最優化方法中最有效的一類算法。
算法的Python實現代碼
Python2.7需要安裝pandas, numpy, scipy, matplotlib。
下面給出Windows7下exe方式按照上面模塊的簡單方法。
numpy–http://sourceforge.net/projects/numpy/files/?–這里面也可以找到較新的scipy –
scipy–http://download.csdn.net/detail/caanyee/8241305
pandas-https://pypi.python.org/packages/2.7/p/pandas/pandas-0.12.0.win32-py2.7.exe#md5=80b0b9b891842ef4bdf451ac07b368e5
test.py
# coding = utf-8
'''
time: 2015.06.03
author: yujianmin
objection: BGD / SGD / mini-batch GD / QNGD / DFP / BFGS
實現了批量梯度下降、單個梯度下降; 最速下降法、牛頓下降法、阻尼牛頓法、擬牛頓DFP和BFGS
'''
import pandas as pd
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
data = pd.read_csv("C:\\Users\\yujianmin\\Desktop\\python\\arraydataR.csv")
print(data.ix[1:5, :])
dataArray = np.array(data)
'''
x = dataArray[:, 0]
y = dataArray[:, 1]
plt.plot(x, y, 'o')
plt.title('data is like this')
plt.xlabel('x feature')
plt.ylabel('y label')
plt.show()
'''
def Myfunction_BGD(data, alpha, numIter, eplise):''' Batch Gradient Descent:type data: array :param data: contain x and y(label):type step: int/float numeric:param step: length of step when update the theta'''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)i = 0costJ = []#eplise = 0.4while i < numIter:H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)print('Itering %d ;cost is:%f' %(i+1,J))costJ.append(J)Gradient = (np.dot(np.transpose(y-H),x))/nRowGradient = Gradient.reshape(nCol+1, 1)if np.sum(np.fabs(Gradient))<= eplise:return theta, costJelse:## updatetheta = theta + alpha * Gradienti = i + 1return theta, costJdef Myfunction_SGD(data, alpha, numIter, eplise):''' Stochastic Gradient Descent:type data: array :param data: contain x and y(label):type step: int/float numeric:param step: length of step when update the theta'''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)Loop = 0costJ = []while Loop <numIter:H = np.dot(x,theta)J = np.sum((y-H)**2)/(2*nRow)print('Itering %d ;cost is:%f' %(Loop+1,J))costJ.append(J)i = 0while i <nRow:Gradient = (y[i] - np.dot(x[i], theta)) * x[i]Gradient = Gradient.reshape(nCol+1, 1)theta = theta + alpha * Gradienti = i + 1#eplise = 0.4Gradient = (np.dot(np.transpose(y-H),x))/nRowif np.sum(np.fabs(Gradient))<= eplise:return theta, costJLoop = Loop + 1return theta, costJdef Myfunction_NGD1(data, alpha, numIter, eplise):''' Newton Gradient Descent -- theta := theta - alpha*[f'']^(-1)*f':type data: array :param data: contain x and y(label):type step: int/float numeric:param step: length of step when update the theta:reference:http://www.doc88.com/p-145660070193.html:hessian = transpos(x) * x '''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)i = 0costJ = []while i < numIter:H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)## updateprint('Itering %d ;cost is:%f' %(i+1,J))costJ.append(J)Gradient = (np.dot(np.transpose(y-H),x))/nRowGradient = Gradient.reshape(nCol+1, 1)#eplise = 0.4if np.sum(np.fabs(Gradient))<=eplise:return theta, costJHessian = np.dot(np.transpose(x), x)/nRowtheta = theta + alpha * np.dot(np.linalg.inv(Hessian), Gradient)#theta = theta + np.dot(np.linalg.inv(Hessian), Gradient)i = i + 1return theta, costJdef Myfunction_NGD2(data, alpha, numIter, eplise):''' Newton Gradient Descent -- theta := theta - [f'']^(-1)*f':type data: array :param data: contain x and y(label):type step: int/float numeric:param step: length of step when update the theta:reference:http://www.doc88.com/p-145660070193.html:hessian = transpos(x) * x '''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)i = 0costJ = []while i < numIter:H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)## updateprint('Itering %d ;cost is:%f' %(i+1,J))costJ.append(J)Gradient = (np.dot(np.transpose(y-H),x))/nRowGradient = Gradient.reshape(nCol+1, 1)#eplise = 0.4if np.sum(np.fabs(Gradient)) <= eplise:return theta, costJHessian = np.dot(np.transpose(x), x)/nRowtheta = theta + np.dot(np.linalg.inv(Hessian), Gradient)i = i + 1return theta, costJdef Myfunction_QNGD(data, alpha, numIter, eplise):''' Newton Gradient Descent -- theta := theta - alpha* [f'']^(-1)*f'--alpha is search by ForwardAndBack method and huang jin fen ge :type data: array :param data: contain x and y(label):type step: int/float numeric:param step: length of step when update the theta:reference:http://www.doc88.com/p-145660070193.html:hessian = transpos(x) * x '''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)i = 0costJ = []#eplise = 0.4while i < numIter:H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)## updateprint('Itering %d ;cost is:%f' %(i+1,J))costJ.append(J)Gradient = (np.dot(np.transpose(y-H),x))/nRowGradient = Gradient.reshape(nCol+1, 1)if np.sum(np.fabs(Gradient))<= eplise:return theta, costJelse:Hessian = np.dot(np.transpose(x), x)/nRowDk = - np.dot(np.linalg.inv(Hessian), Gradient)## find optimal [a,b] which contain optimal alpha## optimal alpha lead to min{f(theta + alpha*DK)}alpha0 = 0h = np.random.random(1)alpha1 = alpha0alpha2 = alpha0 + htheta1 = theta + alpha1 * Dktheta2 = theta + alpha2 * Dkf1 = (np.sum((y-np.dot(x, theta1))**2))/(2*nRow)f2 = (np.sum((y-np.dot(x, theta2))**2))/(2*nRow)Loop = 1a = 0b = 0while Loop >0:print(' find [a,b] loop is %d' %Loop)Loop = Loop + 1if f1 > f2:h = 2*helse:h = -h(alpha1, alpha2) = (alpha2, alpha1)(f1, f2) = (f2, f1)alpha3 = alpha2 + htheta3 = theta + alpha3 * Dkf3 = (np.sum((y-np.dot(x, theta3))**2))/(2*nRow)print('f3 - f2 is %f' %(f3-f2))if f3 > f2:a = min(alpha1, alpha3)b = max(alpha1, alpha3)breakif f3 <= f2:alpha1 = alpha2alpha2 = alpha3f1 = f2 f2 = f3## find optiaml alpha in [a,b] using huang jin fen ge fa e = 0.01while Loop >0:alpha1 = a + 0.382 * (b - a)alpha2 = a + 0.618 * (b - a)theta1 = theta + alpha1* Dktheta2 = theta + alpha2* Dkf1 = (np.sum((y-np.dot(x, theta1))**2))/(2*nRow)f2 = (np.sum((y-np.dot(x, theta2))**2))/(2*nRow)if f1 > f2:a = alpha1if f1< f2:b = alpha2if np.fabs(a-b) <= e:alpha = (a+b)/2breakprint('optimal alpha is %f' % alpha)theta = theta + alpha * Dki = i + 1return theta, costJdef Myfunction_DFP2(data, alpha, numIter, eplise):''' DFP -- theta := theta + alpha * Dk --alpha is searched by huangjin method --satisfied argmin{f(theta+alpha*Dk)}##:type data: array :param data: contain x and y(label):type step: int/float numeric:param step: length of step when update the theta:reference:http://blog.pfan.cn/miaowei/52925.html:reference:http://max.book118.com/html/2012/1025/3119007.shtm ## important ##:hessian is estimated by DFP method.'''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)i = 0costJ = []Hessian = np.eye(nCol+1)H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)#costJ.append(J)Gradient = (np.dot(np.transpose(y-H),x))/nRowGradient = Gradient.reshape(nCol+1, 1)Dk = - Gradient#eplise = 0.4while i < numIter:if(np.sum(np.fabs(Dk)) <= eplise ): ## stop condition ##return theta, costJelse:## find alpha that min f(thetaK + alpha * Dk)## find optimal [a,b] which contain optimal alpha## optimal alpha lead to min{f(theta + alpha*DK)}alpha0 = 0h = np.random.random(1)alpha1 = alpha0alpha2 = alpha0 + htheta1 = theta + alpha1 * Dktheta2 = theta + alpha2 * Dkf1 = (np.sum((y-np.dot(x, theta1))**2))/(2*nRow)f2 = (np.sum((y-np.dot(x, theta2))**2))/(2*nRow)Loop = 1a = 0b = 0while Loop >0:print(' find [a,b] loop is %d' %Loop)Loop = Loop + 1if f1 > f2:h = 2*helse:h = -h(alpha1, alpha2) = (alpha2, alpha1)(f1, f2) = (f2, f1)alpha3 = alpha2 + htheta3 = theta + alpha3 * Dkf3 = (np.sum((y-np.dot(x, theta3))**2))/(2*nRow)print('f3 - f2 is %f' %(f3-f2))if f3 > f2:a = min(alpha1, alpha3)b = max(alpha1, alpha3)breakif f3 <= f2:alpha1 = alpha2alpha2 = alpha3f1 = f2 f2 = f3## find optiaml alpha in [a,b] using huang jin fen ge fa e = 0.01while Loop >0:alpha1 = a + 0.382 * (b - a)alpha2 = a + 0.618 * (b - a)theta1 = theta + alpha1* Dktheta2 = theta + alpha2* Dkf1 = (np.sum((y-np.dot(x, theta1))**2))/(2*nRow)f2 = (np.sum((y-np.dot(x, theta2))**2))/(2*nRow)if f1 > f2:a = alpha1if f1< f2:b = alpha2if np.fabs(a-b) <= e:alpha = (a+b)/2breakprint('optimal alpha is %f' % alpha)theta_old = thetatheta = theta + alpha * Dk## update the Hessian matrix ##H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)## update print('Itering %d ;cost is:%f' %(i+1,J))costJ.append(J)# here to estimate Hessian'inv ## sk = ThetaNew - ThetaOld = alpha * inv(H) * Gradientsk = theta - theta_old#yk = DelX(k+1) - DelX(k)DelXK = - (np.dot(np.transpose(y-np.dot(x, theta)),x))/nRowDelXk = - (np.dot(np.transpose(y-np.dot(x, theta_old)),x))/nRowyk = (DelXK - DelXk).reshape(nCol+1, 1)#z1 = (sk * sk') # a matrix#z2 = (sk' * yk) # a valuez1 = sk * np.transpose(sk)z2 = np.dot(np.transpose(sk),yk)#z3 = (H * yk * yk' * H) # a matrix#z4 = (yk' * H * yk) # a valuez3 = np.dot(np.dot(np.dot(Hessian, yk), np.transpose(yk)), Hessian)z4 = np.dot(np.dot(np.transpose(yk), Hessian),yk)DHessian = z1/z2 - z3/z4Hessian = Hessian + DHessianDk = - np.dot(Hessian, DelXK.reshape(nCol+1,1))i = i + 1return theta, costJdef Myfunction_DFP1(data, alpha, numIter, eplise):''' DFP -- theta := theta + alpha * Dkalpha is fixed ##:type data: array :param data: contain x and y(label) :type step: int/float numeric:param step: length of step when update the theta:reference:http://blog.pfan.cn/miaowei/52925.html:reference:http://max.book118.com/html/2012/1025/3119007.shtm ## important ##:hessian is estimated by DFP method.'''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)i = 0costJ = []Hessian = np.eye(nCol+1)H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)#costJ.append(J)Gradient = (np.dot(np.transpose(y-H),x))/nRowGradient = Gradient.reshape(nCol+1, 1)Dk = - Gradient#eplise = 0.4while i < numIter:if(np.sum(np.fabs(Dk)) <= eplise ): ## stop condition ##return theta, costJelse:## find alpha that min f(thetaK + alpha * Dk)## here for simple alpha is parameter 'alpha'alpha = alphatheta_old = thetatheta = theta + alpha * Dk## update the Hessian matrix ##H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)## update print('Itering %d ;cost is:%f' %(i+1,J))costJ.append(J)# here to estimate Hessian'inv ## sk = ThetaNew - ThetaOld = alpha * inv(H) * Gradientsk = theta - theta_old#yk = DelX(k+1) - DelX(k)DelXK = - (np.dot(np.transpose(y-np.dot(x, theta)),x))/nRowDelXk = - (np.dot(np.transpose(y-np.dot(x, theta_old)),x))/nRowyk = (DelXK - DelXk).reshape(nCol+1, 1)#z1 = (sk * sk') # a matrix#z2 = (sk' * yk) # a valuez1 = sk * np.transpose(sk)z2 = np.dot(np.transpose(sk),yk)#z3 = (H * yk * yk' * H) # a matrix#z4 = (yk' * H * yk) # a valuez3 = np.dot(np.dot(np.dot(Hessian, yk), np.transpose(yk)), Hessian)z4 = np.dot(np.dot(np.transpose(yk), Hessian),yk)DHessian = z1/z2 - z3/z4Hessian = Hessian + DHessianDk = - np.dot(Hessian, DelXK.reshape(nCol+1,1))i = i + 1return theta, costJdef Myfunction_BFGS1(data, alpha, numIter, eplise):''' BFGS :type data: array :param data: contain x and y(label):type step: int/float numeric:param step: length of step when update the theta:reference:http://blog.pfan.cn/miaowei/52925.html:reference:http://max.book118.com/html/2012/1025/3119007.shtm ## important ##:hessian is estimated by BFGS method.'''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)i = 0costJ = []Hessian = np.eye(nCol+1)H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)#costJ.append(J)Gradient = (np.dot(np.transpose(y-H),x))/nRowGradient = Gradient.reshape(nCol+1, 1)Dk = - Gradient#eplise = 0.4while i < numIter:if(np.sum(np.fabs(Dk)) <= eplise ): ## stop condition ##return theta, costJelse:## find alpha that min J(thetaK + alpha * Dk)## here for simple alpha is parameter 'alpha'alpha = alphatheta_old = thetatheta = theta + alpha * Dk## update the Hessian matrix ##H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)## update print('Itering %d ;cost is:%f' %(i+1,J))costJ.append(J)# here to estimate Hessian ## sk = ThetaNew - ThetaOld = alpha * inv(H) * Gradientsk = theta - theta_old#yk = DelX(k+1) - DelX(k)DelXK = - (np.dot(np.transpose(y-np.dot(x, theta)),x))/nRowDelXk = - (np.dot(np.transpose(y-np.dot(x, theta_old)),x))/nRowyk = (DelXK - DelXk).reshape(nCol+1, 1)#z1 = yk' * H * yk # a value#z2 = (sk' * yk) # a valuez1 = np.dot(np.dot(np.transpose(yk), Hessian), yk)z2 = np.dot(np.transpose(sk),yk)#z3 = sk * sk' # a matrix#z4 = sk * yk' * H # a matrixz3 = np.dot(sk, np.transpose(sk))z4 = np.dot(np.dot(sk, np.transpose(yk)), Hessian)DHessian = (1+z1/z2) * (z3/z2) - z4/z2Hessian = Hessian + DHessianDk = - np.dot(Hessian, DelXK.reshape(nCol+1,1))i = i + 1return theta, costJdef Myfunction_BFGS2(data, alpha, numIter, eplise):''' BFGS :type data: array :param data: contain x and y(label):type step: int/float numeric:param step: length of step when update the theta:reference:http://blog.pfan.cn/miaowei/52925.html:reference:http://max.book118.com/html/2012/1025/3119007.shtm ## important ##:hessian is estimated by BFGS method.'''nCol = data.shape[1]-1nRow = data.shape[0]print nColprint nRowx = data[:, :nCol]print x[1:5, :]z = np.ones(nRow).reshape(nRow, 1)x = np.hstack((z, x)) ## vstack merge like rbind in R; hstack like cbind in R;y = data[:, (nCol)].reshape(nRow, 1)#theta = np.random.random(nCol+1).reshape(nCol+1, 1)theta = np.ones(nCol+1).reshape(nCol+1, 1)i = 0costJ = []Hessian = np.eye(nCol+1)H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)#costJ.append(J)Gradient = (np.dot(np.transpose(y-H),x))/nRowGradient = Gradient.reshape(nCol+1, 1)Dk = - Gradient#eplise = 0.4while i < numIter:if(np.sum(np.fabs(Dk)) <= eplise ): ## stop condition ##return theta, costJelse:## find alpha that min J(thetaK + alpha * Dk)alpha = alpha## find optimal [a,b] which contain optimal alpha## optimal alpha lead to min{f(theta + alpha*DK)}'''alpha0 = 0h = np.random.random(1)alpha1 = alpha0alpha2 = alpha0 + htheta1 = theta + alpha1 * Dktheta2 = theta + alpha2 * Dkf1 = (np.sum((y-np.dot(x, theta1))**2))/(2*nRow)f2 = (np.sum((y-np.dot(x, theta2))**2))/(2*nRow)Loop = 1a = 0b = 0while Loop >0:print(' find [a,b] loop is %d' %Loop)Loop = Loop + 1if f1 > f2:h = 2*helse:h = -h(alpha1, alpha2) = (alpha2, alpha1)(f1, f2) = (f2, f1)alpha3 = alpha2 + htheta3 = theta + alpha3 * Dkf3 = (np.sum((y-np.dot(x, theta3))**2))/(2*nRow)print('f3 - f2 is %f' %(f3-f2))if f3 > f2:a = min(alpha1, alpha3)b = max(alpha1, alpha3)breakif f3 <= f2:alpha1 = alpha2alpha2 = alpha3f1 = f2 f2 = f3## find optiaml alpha in [a,b] using huang jin fen ge fa e = 0.01while Loop >0:alpha1 = a + 0.382 * (b - a)alpha2 = a + 0.618 * (b - a)theta1 = theta + alpha1* Dktheta2 = theta + alpha2* Dkf1 = (np.sum((y-np.dot(x, theta1))**2))/(2*nRow)f2 = (np.sum((y-np.dot(x, theta2))**2))/(2*nRow)if f1 > f2:a = alpha1if f1< f2:b = alpha2if np.fabs(a-b) <= e:alpha = (a+b)/2breakprint('optimal alpha is %f' % alpha)'''## Get Dk and update Hessiantheta_old = thetatheta = theta + alpha * Dk## update the Hessian matrix ##H = np.dot(x,theta)J = (np.sum((y-H)**2))/(2*nRow)## update print('Itering %d ;cost is:%f' %(i+1,J))costJ.append(J)# here to estimate Hessian ## sk = ThetaNew - ThetaOld = alpha * inv(H) * Gradientsk = theta - theta_old#yk = DelX(k+1) - DelX(k)DelXK = - (np.dot(np.transpose(y-np.dot(x, theta)),x))/nRowDelXk = - (np.dot(np.transpose(y-np.dot(x, theta_old)),x))/nRowyk = (DelXK - DelXk).reshape(nCol+1, 1)#z1 = yk' * H * yk # a value#z2 = (sk' * yk) # a valuez1 = np.dot(np.dot(np.transpose(yk), Hessian), yk)z2 = np.dot(np.transpose(sk),yk)#z3 = sk * sk' # a matrix#z4 = sk * yk' * H # a matrixz3 = np.dot(sk, np.transpose(sk))z4 = np.dot(np.dot(sk, np.transpose(yk)), Hessian)DHessian = (1+z1/z2) * (z3/z2) - z4/z2Hessian = Hessian + DHessianDk = - np.dot(Hessian, DelXK.reshape(nCol+1,1))i = i + 1return theta, costJ## test ##num = 10000
#theta, costJ = Myfunction_BGD(dataArray, alpha=0.0005, numIter=num, eplise=0.4) ##
#theta, costJ = Myfunction_SGD(dataArray, alpha=0.00005, numIter=num, eplise=0.4)
#theta, costJ = Myfunction_NGD1(dataArray, alpha=0.0005, numIter=num, eplise=0.4) ## alpha is fixed ##
#theta, costJ = Myfunction_NGD2(dataArray, alpha=0.0005, numIter=num, eplise=0.4) ## alpha is 1 ##
#theta, costJ = Myfunction_QNGD(dataArray, alpha=0.0005, numIter=num, eplise=0.4) ## alpha is searched ##
#theta, costJ = Myfunction_DFP1(dataArray, alpha=0.0005, numIter=num, eplise=0.4) ## alpha is fixed ##
#theta, costJ = Myfunction_DFP2(dataArray, alpha=0.0005, numIter=num, eplise=0.4) ## alpha is searched ##
theta, costJ = Myfunction_BFGS1(dataArray, alpha=0.0005, numIter=num, eplise=0.4) ## alpha is fxied ##
print theta
klen = len(costJ)
leng = np.linspace(1, klen, klen)
plt.plot(leng, costJ)
plt.show()
實驗數據和結果展示
數據csv格式
0 28.22401669
1 33.24921693
2 35.82084277
3 36.87096878
4 30.98488531
5 38.78221296
6 38.46753324
7 41.96065845
8 36.82656413
9 35.5081121
10 35.74647181
11 36.17110987
12 37.51165999
13 41.27109257
14 44.03842677
15 48.03001705
16 45.50401843
17 45.02635608
18 51.70574034
19 46.76359881
20 52.6487595
21 48.81383593
22 50.69451254
23 55.54200403
24 54.55639586
25 53.19036223
26 58.89269091
27 54.78884251
28 57.9033951
29 62.21114967
30 64.51025468
31 62.20710537
32 62.94736304
33 60.30447933
34 65.32044406
35 65.82903452
36 66.37872216
37 69.75640553
38 66.02112594
39 65.87119039
40 74.27209751
41 67.57661628
42 73.19444088
43 69.4533117
44 74.91129817
45 71.21187609
46 77.0962545
47 81.95066837
48 78.04636838
49 83.42842526
50 80.40217563
51 78.68650206
52 82.91395215
53 85.09663115
54 88.71540907
55 87.73955
56 89.18654776
57 91.09337441
58 83.95614422
59 93.30683179
60 93.27618596
61 88.07859238
62 89.10667856
63 95.61443666
64 93.39899106
65 94.38258758
66 96.87641802
67 96.87896946
68 97.0094412
69 100.076115
70 104.7619905
71 100.7917093
72 99.85523362
73 106.9018494
74 103.6061063
75 103.4105058
76 106.4304576
77 110.7357249
78 107.0420455
79 107.2834221
80 113.9299496
81 111.2187627
82 116.4100596
83 108.0237256
84 112.7773592
85 117.3464957
86 117.1976807
87 120.0538521
88 114.4584964
89 122.2860022
結果展示
橫軸是迭代次數,縱軸是代價
?
Batch Gradient Descent- 批量梯度下降法
?
?
Stochastic Gradient Descent- 隨機梯度下降法
?
?
Newton下降法,固定alpha=1
?
?
Newton下降法,固定alpha=0.0005
?
?
DFP,alpha是一維搜索得到的
?
?
阻尼牛頓法,alpha是一維搜索得到的
總結
不管什么最優化方法,都是試圖去尋找代價下降最快的方向和合適的步幅。
作者簡介:于建民,關注領域數據挖掘,模式識別。我的博客。
from: http://dataunion.org/20714.html
總結
以上是生活随笔為你收集整理的寻找最优参数解:最速下降法,牛顿下降法,阻尼牛顿法,拟牛顿法的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。