吴恩达深度学习笔记2-Course1-Week2【神经网络基础:损失函数、梯度下降】
神經(jīng)網(wǎng)絡(luò)基礎(chǔ):損失函數(shù)、梯度下降
本篇以最簡單的多個(gè)輸入一個(gè)輸出的1層神經(jīng)網(wǎng)絡(luò)為例,使用logistic regression講解了神經(jīng)網(wǎng)絡(luò)的前向反向計(jì)算(forward/backward propagation)、損失和成本函數(shù)(loss/cost function)、梯度下降(gradient)和向量化(vectorization)。
一、二分分類(binary classification)
In a binary classification problem, the result is a discrete value output (1 or 0).
Example: Cat vs Non-Cat
The goal is to train a classifier that the input is an image represented by a feature vector and predicts whether the corresponding label y is 1 or 0. In this case, whether this is a cat image (1) or a non-cat image (0).
An image is store in the computer in three separate matrices corresponding to the Red, Green, and Blue color channels of the image. The three matrices have the same size as the image, for example, the resolution of the cat image is 64 pixels X 64 pixels, the three matrices (RGB) are 64 X 64 each.
The value in a cell represents the pixel intensity which will be used to create a feature vector of n dimension. In pattern recognition and machine learning, a feature vector represents an object, in this case, a cat or no cat.
To create a feature vector, x, the pixel intensity values will be “unroll” or “reshape” for each color. The dimension of the input feature vector is nx= 64x64x3 = 12288.
本文使用的Notation:
單個(gè)樣本example:(x,y)
訓(xùn)練樣本個(gè)數(shù):m
每個(gè)樣本的特征數(shù):nx
所有樣本:每個(gè)樣本一列。X.Shape=(nx,m); Y.shape=(1,m)
二、邏輯回歸(logistic regression)
Logistic regression is a learning algorithm used in a supervised learning problem when the output y are all either zero or one. The goal of logistic regression is to minimize the error between its predictions and training data.
Given an image represented by a feature vector x, the algorithm will evaluate the probability of a cat being in that image.
三、邏輯回歸損失函數(shù)(logistic regression cost function)
凸函數(shù)(convex):只有一個(gè) local optimal solution 找到的 optimal solution 即 global optimal solution
非凸函(non-convex):有很多個(gè) local optimal solution 找到的 optimal solution 不一定是 global optimal solution
L=(y-y)2是非凸函數(shù),本文使用的L是凸函數(shù)。
i:代表第i個(gè)example
四、梯度下降(gradient descent)
梯度下降法是用來求Cost function的最小值,經(jīng)過多次迭代得到w和b的值。下圖說明迭代是如何下降到 global optimal solution.
每次迭代都要更新w和b:α表示學(xué)習(xí)率(learning rate)
在程序中,偏導(dǎo)的符號通常使用dw和db來表示。
五、導(dǎo)數(shù)(derivatives)
只要學(xué)過點(diǎn)微積分的就知道什么是導(dǎo)數(shù)了。可以簡單看做是斜率(slope)。
六、計(jì)算圖(computation graph)–前向和反向傳播的簡單示例
前向如上圖,反向計(jì)算如下:
because: dJ/dv=3, dv/da=1, dv/du=1, du/db=c=2, du/dc=b=3
so: da=dJ/dvdv/da=31=3
db=dJ/dvdv/dudu/db=312=6
dc=dJ/dvdv/dudu/dc=313=9
計(jì)算偏導(dǎo)用鏈?zhǔn)椒▌t(chain rule)。
七、邏輯回歸的梯度下降(logistic regression gradient descent )
single example:
求da,dz:
求dw1,dw2,db:
更新w和b:
m example: 改為對cost function求導(dǎo)。
八、向量化(vectorization)
for loop的運(yùn)行時(shí)間是向量化的幾百倍,訓(xùn)練時(shí)通常有大量的數(shù)據(jù)所以應(yīng)該盡最可能的少使用 for loop語句,利用python的numpy可以實(shí)現(xiàn)向量化即矩陣運(yùn)算,提高程序的運(yùn)行速度。
X: (nx,m) Y: (1,m) w: (nx,1) b: scalar
python代碼:
九、python的broadcasting和編程注意點(diǎn)
broadcasting:
矩陣加減乘除向量/數(shù):該向量/數(shù)會自動擴(kuò)展成和矩陣一樣大小的矩陣
向量加減乘除數(shù):該數(shù)會自動擴(kuò)展成和向量一樣大小的向量
編程note:
#生成5個(gè)高斯隨機(jī)數(shù) a = np.random.randn(5) #rank為1,維度為(5,)#如果需要定義(5,1)或者(1,5)向量,要使用下面標(biāo)準(zhǔn)的語句: a = np.random.randn(5,1) b = np.random.randn(1,5)#使用assert語句對向量或數(shù)組的維度進(jìn)行判斷。不符合條件,則程序在此處停止。幫助我們及時(shí)檢查、發(fā)現(xiàn)語句是否正確。 assert(a.shape == (5,1))#使用reshape函數(shù)把數(shù)組設(shè)置為我們所需的維度 a.reshape((5,1))十、logistic loss和cost函數(shù)的理解
總結(jié)
以上是生活随笔為你收集整理的吴恩达深度学习笔记2-Course1-Week2【神经网络基础:损失函数、梯度下降】的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Java核心类库篇8——网络编程
- 下一篇: 有关家居产品设计的外国专著_为啥外国的二