笔记2深度学习 梯度和梯度法
enumerate() 函數(shù)用于將一個(gè)可遍歷的數(shù)據(jù)對(duì)象(如列表、元組或字符串)組合為一個(gè)索引序列,同時(shí)列出數(shù)據(jù)和數(shù)據(jù)下標(biāo),一般用在 for 循環(huán)當(dāng)中。
函數(shù)實(shí)現(xiàn)
對(duì)于一個(gè)函數(shù)f(x0,x1)=x0的平方+x1的平方
偏導(dǎo)數(shù)可以這樣實(shí)現(xiàn):
梯度可以這樣實(shí)現(xiàn):
def _numerical_gradient_no_batch(f, x):h = 1e-4 # 0.0001grad = np.zeros_like(x)for idx in range(x.size):tmp_val = x[idx]x[idx] = float(tmp_val) + hfxh1 = f(x) # f(x+h)x[idx] = tmp_val - h fxh2 = f(x) # f(x-h)grad[idx] = (fxh1 - fxh2) / (2*h)x[idx] = tmp_val # 還原值return graddef numerical_gradient(f, X):參數(shù)f為函數(shù),x為NumPy數(shù)組,該函數(shù)對(duì)NumPy數(shù)組X的各個(gè)元素求數(shù)值微分
現(xiàn)在用它來(lái)求(3,4)處的梯度:
梯度指示的方向是各點(diǎn)處函數(shù)值減小最多的方向。
梯度法
機(jī)器學(xué)習(xí)主要任務(wù)是在學(xué)習(xí)時(shí)尋找最優(yōu)參數(shù),神經(jīng)網(wǎng)也必須在學(xué)習(xí)時(shí)找到最優(yōu)參數(shù)(權(quán)重和偏置)這里的最優(yōu)參數(shù)是損失函數(shù)取最小值時(shí)的函數(shù),通過(guò)巧妙地使用梯度來(lái)尋找函數(shù)最小值的方法就是梯度法。
函數(shù)的極小值最小值以及被稱為鞍點(diǎn)的地方梯度為0.
通過(guò)不斷沿著梯度方向前進(jìn),逐漸減小函數(shù)值過(guò)程就是梯度法,尋找最小值的梯度法稱為梯度下降法,尋找最大值的梯度法稱為梯度上升法。一般,神經(jīng)網(wǎng)絡(luò)(深度學(xué)習(xí))中,梯度法主要是梯度下降法。
python實(shí)現(xiàn)梯度下降法:
import numpy as np import matplotlib.pylab as pltdef _numerical_gradient_no_batch(f, x):h = 1e-4 # 0.0001grad = np.zeros_like(x)for idx in range(x.size):tmp_val = x[idx]x[idx] = float(tmp_val) + hfxh1 = f(x) # f(x+h)x[idx] = tmp_val - hfxh2 = f(x) # f(x-h)grad[idx] = (fxh1 - fxh2) / (2 * h)x[idx] = tmp_val # 還原值return graddef numerical_gradient(f, X):if X.ndim == 1: #ndim返回的是數(shù)組的維度,返回的只有一個(gè)數(shù),該數(shù)即表示數(shù)組的維度。return _numerical_gradient_no_batch(f, X)else:grad = np.zeros_like(X)for idx, x in enumerate(X):grad[idx] = _numerical_gradient_no_batch(f, x)return graddef gradient_descent(f, init_x, lr=0.01, step_num=100):x = init_xx_history = []for i in range(step_num):x_history.append( x.copy() )grad = numerical_gradient(f, x)x -= lr * gradreturn x, np.array(x_history)def function_2(x):return x[0]**2 + x[1]**2init_x = np.array([-3.0, 4.0])lr = 0.1 step_num = 20 x, x_history = gradient_descent(function_2, init_x, lr=lr, step_num=step_num)plt.plot( [-5, 5], [0,0], '--b') plt.plot( [0,0], [-5, 5], '--b') plt.plot(x_history[:,0], x_history[:,1], 'o')plt.xlim(-3.5, 3.5) plt.ylim(-4.5, 4.5) plt.xlabel("X0") plt.ylabel("X1") plt.show()
init_x = np.array([-3.0, 4.0]) :設(shè)置初始值為(-3,4) 最終尋找的結(jié)果很接近0
def gradient_descent(f, init_x, lr=0.01, step_num=100):
第二個(gè)參數(shù)是初始值,第三個(gè)參數(shù)是學(xué)習(xí)率,第四個(gè)是梯度法的重復(fù)次數(shù),使用這個(gè)函數(shù)求函數(shù)的極小值,順利的話還可以求最小值
其中l(wèi)r過(guò)小或者過(guò)大都會(huì)無(wú)法得到好的結(jié)果
當(dāng)lr為10:
lr
當(dāng)lr為1e-8:
像學(xué)習(xí)率這樣的參數(shù)成為超參數(shù)。相對(duì)于神經(jīng)網(wǎng)絡(luò)的權(quán)重參數(shù)是通過(guò)訓(xùn)練數(shù)據(jù)和學(xué)習(xí)算法自動(dòng)獲得的,學(xué)習(xí)率這樣的超參數(shù)則是**人工設(shè)定的,**一般超參數(shù)需要嘗試多個(gè)值,以便找到一種可以使學(xué)習(xí)順利進(jìn)行的設(shè)定。
神經(jīng)網(wǎng)絡(luò)的梯度
這里的梯度是指損失函數(shù)關(guān)于權(quán)重參數(shù)的梯度.
神經(jīng)網(wǎng)絡(luò)的學(xué)習(xí)的實(shí)現(xiàn)使用的是前面介紹過(guò)的mini-batch學(xué)習(xí),即從訓(xùn)練數(shù)據(jù)中隨機(jī)選擇一部分?jǐn)?shù)據(jù)(稱為mini-batch),再以這些mini-batch為對(duì)象,使用梯度法更新參數(shù)的過(guò)程
common.functions.py:
common.gradient.py:
import numpy as npdef _numerical_gradient_1d(f, x):h = 1e-4 # 0.0001grad = np.zeros_like(x)for idx in range(x.size):tmp_val = x[idx]x[idx] = float(tmp_val) + hfxh1 = f(x) # f(x+h)x[idx] = tmp_val - h fxh2 = f(x) # f(x-h)grad[idx] = (fxh1 - fxh2) / (2*h)x[idx] = tmp_val # 還原值return graddef numerical_gradient_2d(f, X):if X.ndim == 1:return _numerical_gradient_1d(f, X)else:grad = np.zeros_like(X)for idx, x in enumerate(X):grad[idx] = _numerical_gradient_1d(f, x)return graddef numerical_gradient(f, x):h = 1e-4 # 0.0001grad = np.zeros_like(x)it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])while not it.finished:idx = it.multi_indextmp_val = x[idx]x[idx] = float(tmp_val) + hfxh1 = f(x) # f(x+h)x[idx] = tmp_val - h fxh2 = f(x) # f(x-h)grad[idx] = (fxh1 - fxh2) / (2*h)x[idx] = tmp_val # 還原值it.iternext() return gradenumerate函數(shù):
list(enumerate(seasons))
[(0, ‘Spring’), (1, ‘Summer’), (2, ‘Fall’), (3, ‘Winter’)]
實(shí)現(xiàn)一個(gè)二層神經(jīng)網(wǎng)絡(luò)的類:
from common.functions import * from common.gradient import numerical_gradientclass TwoLayerNet:def __init__(self, input_size, hidden_size, output_size, weight_init_std=0.01):# 初始化權(quán)重 self.params = {}self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size)self.params['b1'] = np.zeros(hidden_size)self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size)self.params['b2'] = np.zeros(output_size)def predict(self, x): #進(jìn)行識(shí)別(推理) 參數(shù)X是圖像數(shù)據(jù)W1, W2 = self.params['W1'], self.params['W2']b1, b2 = self.params['b1'], self.params['b2']a1 = np.dot(x, W1) + b1z1 = sigmoid(a1)a2 = np.dot(z1, W2) + b2y = softmax(a2)return y# x:輸入數(shù)據(jù), t:監(jiān)督數(shù)據(jù)def loss(self, x, t): #計(jì)算損失函數(shù)的值x是圖像數(shù)據(jù),t是正確解標(biāo)簽y = self.predict(x)return cross_entropy_error(y, t)def accuracy(self, x, t): #計(jì)算識(shí)別精度y = self.predict(x)y = np.argmax(y, axis=1)t = np.argmax(t, axis=1)accuracy = np.sum(y == t) / float(x.shape[0])return accuracy# x:輸入數(shù)據(jù), t:監(jiān)督數(shù)據(jù)def numerical_gradient(self, x, t): #計(jì)算權(quán)重參數(shù)的梯度loss_W = lambda W: self.loss(x, t)grads = {}grads['W1'] = numerical_gradient(loss_W, self.params['W1'])grads['b1'] = numerical_gradient(loss_W, self.params['b1'])grads['W2'] = numerical_gradient(loss_W, self.params['W2'])grads['b2'] = numerical_gradient(loss_W, self.params['b2'])return gradsdef gradient(self, x, t):#計(jì)算權(quán)重參數(shù)的梯度 numerical_gradient的高速版 W1, W2 = self.params['W1'], self.params['W2']b1, b2 = self.params['b1'], self.params['b2']grads = {}batch_num = x.shape[0]# forwarda1 = np.dot(x, W1) + b1z1 = sigmoid(a1)a2 = np.dot(z1, W2) + b2y = softmax(a2)# backwarddy = (y - t) / batch_numgrads['W2'] = np.dot(z1.T, dy)grads['b2'] = np.sum(dy, axis=0)da1 = np.dot(dy, W2.T)dz1 = sigmoid_grad(a1) * da1grads['W1'] = np.dot(x.T, dz1)grads['b1'] = np.sum(dz1, axis=0)return gradsparams[‘W1’]:第一層的權(quán)重
params[‘b1’] 第一層的偏置
grads :保存梯度的字典型變量(numerical_gradient的返回值)
總結(jié)
以上是生活随笔為你收集整理的笔记2深度学习 梯度和梯度法的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: JDBC批量操作批量增加批量修改
- 下一篇: mybatis-plus代码生成器使用和