訓練神經網絡的三要素是:1. 確定判別函數2. 確定損失函數3. 最優化
實踐中很少用像素的取值去直接訓練神經網絡。
我們采用的是圖像的特征,例如:直方圖、HOG特征、 Bag of Words
HOG提取算法:每次掃描圖像的一塊8*8的小區域,獲取邊緣方向。一共有9中方向。
將圖片塊的大小調整為32*32,在每個調整后的圖片塊(patch)上采用HOG算法。
計算梯度有兩種算法:一種簡單但是比較慢,另一種比較快但是有較高的錯誤率,需要用算法修正。下例給出簡單的梯度算法: 首先定義每一個樣本的損失函數L_i_vectorized(x, y, W)
defL_i_vectorized(x, y, W):"""A faster half-vectorized implementation. half-vectorizedrefers to the fact that for a single example the implementation containsno for loops, but there is still one loop over the examples (outside this function)"""delta = 1.0scores = W.dot(x)# compute the margins for all classes in one vector operationmargins = np.maximum(0, scores - scores[y] + delta)# on y-th position scores[y] - scores[y] canceled and gave delta. We want# to ignore the y-th position and only consider margin on max wrong classmargins[y] = 0loss_i = np.sum(margins)return loss_i
然后計算所有訓練集樣本的總的損失函數的平均值。
defL(X, Y, W):"""fully-vectorized implementation :- X holds all the training examples as columns (e.g. 3073 x 50,000 in CIFAR-10)- y is array of integers specifying correct class (e.g. 50,000-D array)- W are weights (e.g. 10 x 3073)"""Loss_sum=0for i in range(50000):x=X[i]x=np.append(x,1)y=Y[i]loss_i=L_i_vectorized(x, y, W)#print loss_i#print "#####################"Loss_sum+=loss_i#print Loss_sumL=Loss_sum/50000#print i#print "L="#print L#print "#####################"return LdefCIFAR10_loss_fun(W):return L(Xtr, Ytr, W)
計算梯度
defeval_numerical_gradient(f, x):""" a naive implementation of numerical gradient of f at x - f should be a function that takes a single argument- x is the point (numpy array) to evaluate the gradient at""" fx = f(x) # evaluate function value at original pointgrad = np.zeros(x.shape)h = 0.00001# iterate over all indexes in xit = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])whilenot it.finished:# evaluate function at x+hix = it.multi_indexold_value = x[ix]x[ix] = old_value + h # increment by hfxh = f(x) # evalute f(x + h)x[ix] = old_value # restore to previous value (very important!)# compute the partial derivativegrad[ix] = (fxh - fx) / h # the slopeit.iternext() # step to next dimensionreturn grad
調整,找到最合適的步長(step)
W = np.random.rand(10, 3073) * 0.001# random weight vector
df = eval_numerical_gradient(CIFAR10_loss_fun, W) # get the gradientloss_original = CIFAR10_loss_fun(W) # the original lossprint'original loss: %f' % (loss_original, )# lets see the effect of multiple step sizesfor step_size_log in [-10, -9, -8, -7, -6, -5,-4,-3,-2,-1]:step_size = 10 ** step_size_logW_new = W - step_size * df # new position in the weight spaceloss_new = CIFAR10_loss_fun(W_new)print'for step size %f new loss: %f' % (step_size, loss_new)
上述程序運行之后,可以看出,在step選擇為6的時候,損失函數最小。這就引出了在神經網絡算法中最終要的參數選取問題————step的大小。step如果選的過小,整個運算會消耗很長時間。如果選的過大,運算過程會加速,但是有點冒險。其中細節將在以后討論。在上例中我們采用的是簡單的numerical gradient 來求取梯度,這種算法非常簡單但是計算復雜度大,消耗很多計算資源。另外一種更高效的算法是使用微積分( Calculus )來計算梯度。這種算法更快速高效,但是錯誤率高。在實踐中,我們經常通過比較 Calculus和numerical gradient 來修正Calculus計算出的梯度。這個過程被稱作“梯度矯正”。以SVM為例,若采用Calculus來計算梯度,我們使用下面的式子: