當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

UFLDL教程： Exercise: Sparse Autoencoder

發(fā)布時(shí)間：2023/12/13 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了 UFLDL教程： Exercise: Sparse Autoencoder 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

自編碼可以跟PCA 一樣，給特征屬性降維

一些matlab函數(shù)

bsxfun:C=bsxfun(fun,A,B)表達(dá)的是兩個(gè)數(shù)組A和B間元素的二值操作，fun是函數(shù)句柄或者m文件，或者是內(nèi)嵌的函數(shù)。在實(shí)際使用過(guò)程中fun有很多選擇比如說(shuō)加，減等，前面需要使用符號(hào)’@’.一般情況下A和B需要尺寸大小相同，如果不相同的話，則只能有一個(gè)維度不同，同時(shí)A和B中在該維度處必須有一個(gè)的維度為1。比如說(shuō)bsxfun(@minus, A, mean(A))，其中A和mean(A)的大小是不同的，這里的意思需要先將mean(A)擴(kuò)充到和A大小相同，然后用A的每個(gè)元素減去擴(kuò)充后的mean(A)對(duì)應(yīng)元素的值。rand：生成均勻分布的偽隨機(jī)數(shù)。分布在（0~1）之間主要語(yǔ)法：rand(m,n)生成m行n列的均勻分布的偽隨機(jī)數(shù)rand(m,n,'double')生成指定精度的均勻分布的偽隨機(jī)數(shù)，參數(shù)還可以是'single'rand(RandStream,m,n)利用指定的RandStream(我理解為隨機(jī)種子)生成偽隨機(jī)數(shù)randn：生成標(biāo)準(zhǔn)正態(tài)分布的偽隨機(jī)數(shù)（均值為0，方差為1）。主要語(yǔ)法：和上面一樣randi：生成均勻分布的偽隨機(jī)整數(shù) 主要語(yǔ)法：randi（iMax）在閉區(qū)間（0，iMax）生成均勻分布的偽隨機(jī)整數(shù) randi（iMax，m，n）在閉區(qū)間（0，iMax）生成mXn型隨機(jī)矩陣r = randi([iMin,iMax],m,n)在閉區(qū)間（iMin，iMax）生成mXn型隨機(jī)矩陣exist:測(cè)試參數(shù)是否存在，比如說(shuō)exist('opt_normalize', 'var')表示檢測(cè)變量opt_normalize是否存在，其中的’var’表示變量的意思。colormap:設(shè)置當(dāng)前常見(jiàn)的顏色值表。floor：floor(A):取不大于A的最大整數(shù)。ceil:ceil(A):取不小于A的最小整數(shù)。imagesc:imagesc和image類似，可以用于顯示圖像。比如imagesc(array,'EraseMode','none',[-1 1])，這里的意思是將array中的數(shù)據(jù)線性映射到[-1,1]之間，然后使用當(dāng)前設(shè)置的顏色表進(jìn)行顯示。此時(shí)的[-1,1]充滿了整個(gè)顏色表。背景擦除模式設(shè)置為node，表示不擦除背景。repmat:該函數(shù)是擴(kuò)展一個(gè)矩陣并把原來(lái)矩陣中的數(shù)據(jù)復(fù)制進(jìn)去。比如說(shuō)B = repmat(A,m,n)，就是創(chuàng)建一個(gè)矩陣B，B中復(fù)制了共m*n個(gè)A矩陣，因此B矩陣的大小為[size(A,1)*m size(A,2)*n]。--------------------------------- matlab中的@用法：問(wèn)：f=@(x)acos(x)表示什么意思？其中@代表什么？答：表示f為函數(shù)句柄，@是定義句柄的運(yùn)算符。f=@(x)acos(x) 相當(dāng)于建立了一個(gè)函數(shù)文件： % f.m function y=f(x) y=acos(x);若有下列語(yǔ)句：xsqual=@(x)1/2.*(x==-1/2)+1.*(x>-1/28&x<1/2)+1.2.*(x==-1/2); 則相當(dāng)于建立了一個(gè)函數(shù)文件： % xsqual.m function y=xsqual(x) y=1/2.*(x==-1/2)+1.*(x>-1/28&x<1/2)+1.2.*(x==-1/2);函數(shù)句柄的好處①提高運(yùn)行速度。因?yàn)閙atlab對(duì)函數(shù)的調(diào)用每次都是要搜索所有的路徑，從set path中我們可以看到，路徑是非常的多的，所以如果一個(gè)函數(shù)在你的程序中需要經(jīng)常用到的話，使用函數(shù)句柄，對(duì)你的速度會(huì)有提高的。②使用可以與變量一樣方便。比如說(shuō)，我再這個(gè)目錄運(yùn)行后，創(chuàng)建了本目錄的一個(gè)函數(shù)句柄，當(dāng)我轉(zhuǎn)到其他的目錄下的時(shí)候，創(chuàng)建的函數(shù)句柄還是可以直接調(diào)用的，而不需要把那個(gè)函數(shù)文件拷貝過(guò)來(lái)。因?yàn)槟銊?chuàng)建的function handles中，已經(jīng)包含了路徑。使用函數(shù)句柄的作用：不使用函數(shù)句柄的情況下，對(duì)函數(shù)多次調(diào)用，每次都要為該函數(shù)進(jìn)行全面的路徑搜索，直接影響計(jì)算速度，借助句柄可以完全避免這種時(shí)間損耗。也就是直接指定了函數(shù)的指針。函數(shù)句柄就像一個(gè)函數(shù)的名字，有點(diǎn)類似于C++程序中的引用。

重點(diǎn)公式回顧

公式（1）

公式（2）

公式（3）

公式（4）

公式（5）

公式（6）

公式（7）

公式（8）

公式（9）

公式（10）

公式（11）

反向傳播推導(dǎo)過(guò)程中第l層第i個(gè)節(jié)點(diǎn)殘差的推導(dǎo)過(guò)程：

教程中反向傳播算法的推導(dǎo)中對(duì)于第3.步的推導(dǎo)(ng并沒(méi)有在教程中給出推導(dǎo)，但是譯者進(jìn)行了推導(dǎo))，我用了不同于譯者的推導(dǎo)過(guò)程：

教程回顧及譯者對(duì)第3步的推導(dǎo)

實(shí)驗(yàn)基礎(chǔ)

其實(shí)實(shí)現(xiàn)該功能的主要步驟還是需要計(jì)算出網(wǎng)絡(luò)的損失函數(shù)以及其偏導(dǎo)數(shù).
1. 計(jì)算出網(wǎng)絡(luò)每個(gè)節(jié)點(diǎn)的輸入值（即程序中的z值）和輸出值（即程序中的a值，a是z的sigmoid函數(shù)值）。
2. 利用z值和a值計(jì)算出網(wǎng)絡(luò)每個(gè)節(jié)點(diǎn)的誤差值（即程序中的delta值）。
3. 這樣可以利用上面計(jì)算出的每個(gè)節(jié)點(diǎn)的a，z，delta來(lái)表達(dá)出系統(tǒng)的損失函數(shù)以及損失函數(shù)的偏導(dǎo)數(shù)。

其實(shí)步驟1是前向進(jìn)行的，也就是說(shuō)按照輸入層——>隱含層——>輸出層的方向進(jìn)行計(jì)算。而步驟2是方向進(jìn)行的（這也是該算法叫做BP算法的來(lái)源），即每個(gè)節(jié)點(diǎn)的誤差值是按照輸出層——>隱含層——>輸入層方向進(jìn)行的。

步驟

1.產(chǎn)生訓(xùn)練集。從10張512*512的圖片中，隨機(jī)選擇10000張8*8的小圖塊，然后再把它歸一化，得到訓(xùn)練集patches。具體見(jiàn)程序 sampleIMAGES.m
2.計(jì)算出代價(jià)函數(shù) Jsparse(W,b) 及其梯度。具體見(jiàn)程序sparseAutoencoderCost.m。
3.通過(guò)函數(shù) computeNumericalGradient.m計(jì)算出大概梯度（EPSILON = 10-4），然后通過(guò)函數(shù)checkNumericalGradient.m檢查上一步寫(xiě)的計(jì)算梯度的代碼是否正確。首先，通過(guò)計(jì)算函數(shù) 在點(diǎn)[4，10]處的梯度對(duì)比用computeNumericalGradient.m中的方法計(jì)算該函數(shù)的梯度，這兩者梯度的差值小于10-9就代表computeNumericalGradient.m中方法是正確的。然后，用computeNumericalGradient.m中方法計(jì)算代價(jià)函數(shù) Jsparse(W,b) 的梯度對(duì)比用sparseAutoencoderCost.m中的方法計(jì)算代價(jià)函數(shù) Jsparse(W,b) 的梯度，如果這兩者梯度的差值小于10-9就證明sparseAutoencoderCost.m中方法是正確的。
4.訓(xùn)練稀疏自動(dòng)編碼器。用的 L-BFGS算法（注意：這個(gè)算法不能將它用于商業(yè)用途，若用與商業(yè)用途的話，可以使用fminlbfgs函數(shù)，他比L-BFGS慢但可用于商業(yè)用途），具體見(jiàn)文件夾 minFunc。另外，初始化參數(shù)矩陣θ（包含W(1),W(2),b(1),b(2)）時(shí)，W(1),W(2)的初始值是從中隨機(jī)均勻分布產(chǎn)生，其中 nin是隱藏層神經(jīng)元個(gè)數(shù)， nout 是輸出層神經(jīng)元個(gè)數(shù)。b(1),b(2)初始化為0.
5.可視化結(jié)果。點(diǎn)擊train.m運(yùn)行總程序，訓(xùn)練稀疏自動(dòng)編碼器，得到可視化結(jié)果。把產(chǎn)生的權(quán)重結(jié)果可視化，通過(guò)它我們能夠知道，該算法究竟從圖片中學(xué)習(xí)了哪些特征。

代碼及注釋

train.m

（1）調(diào)用sampleIMAGES函數(shù)從已知圖像中扣取多個(gè)圖像塊兒
（2）調(diào)用display_network函數(shù)，以網(wǎng)格的形式，隨機(jī)顯示多個(gè)扣取的圖像塊兒
（3）梯度校驗(yàn)，該部分的目的是測(cè)試函數(shù)是否正確，可以由單獨(dú)的函數(shù)checkSparseAutoencoderCost實(shí)現(xiàn)
①利用sparseAutoencoderCost函數(shù)計(jì)算網(wǎng)路的代價(jià)函數(shù)和梯度值
②利用computeNumericalGradient函數(shù)計(jì)算梯度值（這里，要利用checkNumericalGradient函數(shù)驗(yàn)證該梯度計(jì)算函數(shù)是否正確）
③比較①和②的梯度計(jì)算結(jié)果，判斷編寫(xiě)的sparseAutoencoderCost函數(shù)是否正確
如果sparseAutoencoderCost函數(shù)是正確的，那么，在實(shí)際訓(xùn)練中，不需要運(yùn)行checkSparseAutoencoderCost
（4）利用L-BFGS方法對(duì)網(wǎng)絡(luò)進(jìn)行訓(xùn)練，從而得到最優(yōu)化的網(wǎng)絡(luò)的權(quán)值和偏執(zhí)項(xiàng)
（5）對(duì)訓(xùn)練結(jié)果進(jìn)行可視化

%http://blog.csdn.net/jiandanjinxin/article/details/72875977%% CS294A/CS294W Programming Assignment Starter Code% Instructions % ------------ % % This file contains code that helps you get started on the % programming assignment. You will need to complete the code in sampleIMAGES.m, % sparseAutoencoderCost.m and computeNumericalGradient.m. % For the purpose of completing the assignment, you do not need to % change the code in this file. % %%====================================================================== %% STEP 0: Here we provide the relevant parameters values that will % allow your sparse autoencoder to get good filters; you do not need to % change the parameters below. %第0步：提供可得到較好濾波器的相關(guān)參數(shù)值，不得改變以下參數(shù) visibleSize = 8*8; % number of input units 輸入層單元數(shù) hiddenSize = 25; % number of hidden units 隱藏層單元數(shù) sparsityParam = 0.01; % desired average activation of the hidden units.稀疏值% (This was denoted by the Greek alphabet rho, which looks like a lower-case "p",% in the lecture notes). lambda = 0.0001; % weight decay parameter 權(quán)重衰減系數(shù) beta = 3; % weight of sparsity penalty term 稀疏值懲罰項(xiàng)的權(quán)重 %%====================================================================== %% STEP 1: Implement sampleIMAGES 第1步：實(shí)現(xiàn)圖片采樣 %實(shí)現(xiàn)圖片采樣后，函數(shù)display_network從訓(xùn)練集中隨機(jī)顯示200張 % After implementing sampleIMAGES, the display_network command should % display a random sample of 200 patches from the dataset %從10000張中隨機(jī)選擇200張顯示 patches = sampleIMAGES; figure display_network(patches(:,randi(size(patches,2),200,1)),8) title('sampleIMAGES') %%為產(chǎn)生一個(gè)200維的列向量，每一維的值為0~10000中的隨機(jī)數(shù)，說(shuō)明是隨機(jī)取200個(gè)patch來(lái)顯示% Obtain random parameters theta 初始化參數(shù)向量theta theta = initializeParameters(hiddenSize, visibleSize);%%====================================================================== %% STEP 2: Implement sparseAutoencoderCost %在計(jì)算代價(jià)函數(shù)時(shí)，可以一次計(jì)算其所有的元素項(xiàng)值（均方差項(xiàng)、權(quán)重衰減項(xiàng)、懲罰項(xiàng)），但是一步一步地計(jì)算各元素項(xiàng)值， % 然后每步完成后運(yùn)行梯度檢驗(yàn)的方法可能會(huì)更容易實(shí)現(xiàn)，建議按照下面的步驟來(lái)實(shí)現(xiàn)函數(shù)sparseAutoencoderCost： % You can implement all of the components (squared error cost, weight decay term, % sparsity penalty) in the cost function at once, but it may be easier to do % it step-by-step and run gradient checking (see STEP 3) after each step. We % suggest implementing the sparseAutoencoderCost function using the following steps: % % (a) Implement forward propagation in your neural network, and implement the % squared error term of the cost function. Implement backpropagation to % compute the derivatives. Then (using lambda=beta=0), run Gradient Checking % to verify that the calculations corresponding to the squared error cost % term are correct.實(shí)現(xiàn)神經(jīng)網(wǎng)絡(luò)中的前向傳播和代價(jià)函數(shù)中的均方差項(xiàng)。通過(guò)反向傳導(dǎo)計(jì)算偏導(dǎo)數(shù)。 % 然后運(yùn)行梯度檢驗(yàn)法來(lái)檢查均方差項(xiàng)是否計(jì)算錯(cuò)誤。 % % (b) Add in the weight decay term (in both the cost function and the derivative % calculations), then re-run Gradient Checking to verify correctness. %在代價(jià)函數(shù)和偏導(dǎo)數(shù)計(jì)算中加入權(quán)重衰減項(xiàng)，然后運(yùn)行梯度檢驗(yàn)法來(lái)檢查其正確性。 % (c) Add in the sparsity penalty term, then re-run Gradient Checking to % verify correctness.加入懲罰項(xiàng)，然后運(yùn)行梯度檢驗(yàn)法來(lái)檢查其正確性。 % % Feel free to change the training settings when debugging your % code. (For example, reducing the training set size or % number of hidden units may make your code run faster; and setting beta % and/or lambda to zero may be helpful for debugging.) However, in your % final submission of the visualized weights, please use parameters we % gave in Step 0 above. % 計(jì)算代價(jià)函數(shù)和梯度 [cost, grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, lambda, ...sparsityParam, beta, patches);%%====================================================================== %% STEP 3: Gradient Checking % % Hint: If you are debugging your code, performing gradient checking on smaller models % and smaller training sets (e.g., using only 10 training examples and 1-2 hidden % units) may speed things up.% First, lets make sure your numerical gradient computation is correct for a % simple function. After you have implemented computeNumericalGradient.m, % run the following: checkNumericalGradient();% Now we can use it to check your cost function and derivative calculations % for the sparse autoencoder. % 利用近似方法計(jì)算梯度（要調(diào)用自編碼器的代價(jià)函數(shù)計(jì)算程序） numgrad = computeNumericalGradient( @(x) sparseAutoencoderCost(x, visibleSize, ...hiddenSize, lambda, ...sparsityParam, beta, ...patches), theta);% Use this to visually compare the gradients side by side % 比較cost函數(shù)計(jì)算得到的梯度和由近似得到的梯度 disp(' numgrad grad') disp([numgrad grad]); % Compare numerically computed gradients with the ones obtained from backpropagation diff = norm(numgrad-grad)/norm(numgrad+grad); disp(diff); % Should be small. In our implementation, these values are% usually less than 1e-9.% When you got this working, Congratulations!!! %%====================================================================== %% STEP 4: After verifying that your implementation of % sparseAutoencoderCost is correct, You can start training your sparse % autoencoder with minFunc (L-BFGS).% Randomly initialize the parameters theta = initializeParameters(hiddenSize, visibleSize);% Use minFunc to minimize the function addpath minFunc/ options.Method = 'lbfgs'; % Here, we use L-BFGS to optimize our cost% function. Generally, for minFunc to work, you% need a function pointer with two outputs: the% function value and the gradient. In our problem,% sparseAutoencoderCost.m satisfies this. options.maxIter = 400; % Maximum number of iterations of L-BFGS to run options.display = 'on';% opttheta是整個(gè)神經(jīng)網(wǎng)絡(luò)的權(quán)值和偏執(zhí)項(xiàng)構(gòu)成的向量 [opttheta, cost] = minFunc( @(p) sparseAutoencoderCost(p, ...visibleSize, hiddenSize, ...lambda, sparsityParam, ...beta, patches), ...theta, options);%%====================================================================== %% STEP 5: Visualization %%第一層的權(quán)值矩陣 W1 = reshape(opttheta(1:hiddenSize*visibleSize), hiddenSize, visibleSize); figure; display_network(W1', 12) title('Visiualization of Weight1') print -djpeg weights.jpg % save the visualization to a file

sampleIMAGES.m

function patches = sampleIMAGES() % sampleIMAGES % Returns 10000 patches for trainingload IMAGES; % load images from disk figure; imshow3D(IMAGES) patchsize = 8; % we'll use 8x8 patches numpatches = 10000;% Initialize patches with zeros. Your code will fill in this matrix--one % column per patch, 10000 columns. patches = zeros(patchsize*patchsize, numpatches);%% ---------- YOUR CODE HERE -------------------------------------- % Instructions: Fill in the variable called "patches" using data % from IMAGES. % % IMAGES is a 3D array containing 10 images % For instance, IMAGES(:,:,6) is a 512x512 array containing the 6th image, % and you can type "imagesc(IMAGES(:,:,6)), colormap gray;" to visualize % it. (The contrast on these images look a bit off because they have % been preprocessed using using "whitening." See the lecture notes for % more details.) As a second example, IMAGES(21:30,21:30,1) is an image % patch corresponding to the pixels in the block (21,21) to (30,30) of % Image 1tic image_size=size(IMAGES); i=randi(image_size(1)-patchsize+1,1,numpatches);%生成元素值隨機(jī)為大于0且小于image_size(1)-patchsize+1的1行numpatches矩陣 j=randi(image_size(2)-patchsize+1,1,numpatches); k=randi(image_size(3),1,numpatches); for num=1:numpatchespatches(:,num)=reshape(IMAGES(i(num):i(num)+patchsize-1,j(num):j(num)+patchsize-1,k(num)),1,patchsize*patchsize); end toc%% --------------------------------------------------------------- % For the autoencoder to work well we need to normalize the data % Specifically, since the output of the network is bounded between [0,1] % (due to the sigmoid activation function), we have to make sure % the range of pixel values is also bounded between [0,1] patches = normalizeData(patches);end%% --------------------------------------------------------------- function patches = normalizeData(patches)% Squash data to [0.1, 0.9] since we use sigmoid as the activation % function in the output layer% Remove DC (mean of images). 把patches數(shù)組中的每個(gè)元素值都減去mean(patches) patches = bsxfun(@minus, patches, mean(patches));% Truncate to +/-3 standard deviations and scale to -1 to 1 pstd = 3 * std(patches(:));%把patches的標(biāo)準(zhǔn)差變?yōu)槠湓瓉?lái)的3倍 patches = max(min(patches, pstd), -pstd) / pstd; %因?yàn)楦鶕?jù)3sigma法則，95%以上的數(shù)據(jù)都在該區(qū)域內(nèi) % 這里轉(zhuǎn)換后將數(shù)據(jù)變到了-1到1之間% Rescale from [-1,1] to [0.1,0.9] patches = (patches + 1) * 0.4 + 0.1;end

sparseAutoencoderCost.m

function [cost,grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, ...lambda, sparsityParam, beta, data) % 計(jì)算網(wǎng)絡(luò)的代價(jià)函數(shù)和梯度 % visibleSize: the number of input units (probably 64) 輸入層神經(jīng)單元節(jié)點(diǎn)數(shù) % hiddenSize: the number of hidden units (probably 25) 隱藏層神經(jīng)單元節(jié)點(diǎn)數(shù) % lambda: weight decay parameter權(quán)重衰減系數(shù) % sparsityParam: The desired average activation for the hidden units (denoted in the lecture % 稀疏性參數(shù) notes by the greek alphabet rho, which looks like a lower-case "p"). % beta: weight of sparsity penalty term稀疏懲罰項(xiàng)的權(quán)重 % data: Our 64x10000 matrix containing the training data. So, data(:,i) is the i-th training example. % 訓(xùn)練集64x10000 %theta：參數(shù)向量，包含W1、W2、b1、b2 % The input theta is a vector (because minFunc expects the parameters to be a vector). % We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this % follows the notation convention of the lecture notes. %%將長(zhǎng)向量轉(zhuǎn)換成每一層的權(quán)值矩陣和偏置向量值 W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize); W2 = reshape(theta(hiddenSize*visibleSize+1:2*hiddenSize*visibleSize), visibleSize, hiddenSize); b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize); b2 = theta(2*hiddenSize*visibleSize+hiddenSize+1:end);% Cost and gradient variables (your code needs to compute these values). % Here, we initialize them to zeros. cost = 0; W1grad = zeros(size(W1)); W2grad = zeros(size(W2)); b1grad = zeros(size(b1)); b2grad = zeros(size(b2));%% ---------- YOUR CODE HERE -------------------------------------- % Instructions: Compute the cost/optimization objective J_sparse(W,b) for the Sparse Autoencoder, % and the corresponding gradients W1grad, W2grad, b1grad, b2grad. % % W1grad, W2grad, b1grad and b2grad should be computed using backpropagation. % Note that W1grad has the same dimensions as W1, b1grad has the same dimensions % as b1, etc. Your code should set W1grad to be the partial derivative of J_sparse(W,b) with % respect to W1. I.e., W1grad(i,j) should be the partial derivative of J_sparse(W,b) % with respect to the input parameter W1(i,j). Thus, W1grad should be equal to the term % [(1/m) \Delta W^{(1)} + \lambda W^{(1)}] in the last block of pseudo-code in Section 2.2 % of the lecture notes (and similarly for W2grad, b1grad, b2grad). % % Stated differently, if we were using batch gradient descent to optimize the parameters, % the gradient descent update to W1 would be W1 := W1 - alpha * W1grad, and similarly for W2, b1, b2. % %% 1.前向傳播forward propagation disp('公式（1）') data_size=size(data); active_value2=repmat(b1,1,data_size(2)); active_value3=repmat(b2,1,data_size(2)); active_value2=sigmoid(W1*data+active_value2); %第2層激活值 active_value3=sigmoid(W2*active_value2+active_value3); %第3層激活值 %% 2.計(jì)算代價(jià)函數(shù)computing error term and cost ave_square=sum(sum((active_value3-data).^2)./2)/data_size(2);%均方差項(xiàng)，誤差項(xiàng)所有樣本代價(jià)函數(shù)均值 weight_decay=lambda/2*(sum(sum(W1.^2))+sum(sum(W2.^2))); %權(quán)重衰減項(xiàng),所有權(quán)值項(xiàng)平方和 disp('公式（2）p_real') p_real=sum(active_value2,2)./data_size(2); %對(duì)active_value2每行求和，再平均，得到每個(gè)隱藏單元的平均活躍度（25行1列） p_para=repmat(sparsityParam,hiddenSize,1); %稀疏性參數(shù)（25行1列） sparsity=beta.*sum(p_para.*log(p_para./p_real)+(1-p_para).*log((1-p_para)./(1-p_real)));%懲罰項(xiàng)，所有隱藏層的神經(jīng)元相對(duì)熵之和，括號(hào)內(nèi)為公式（3） disp('公式（4），公式（5）cost') cost=ave_square+weight_decay+sparsity; %代價(jià)函數(shù)disp('公式（7）delta3') delta3=(active_value3-data).*(active_value3).*(1-active_value3); %第3層殘差 average_sparsity=repmat(sum(active_value2,2)./data_size(2),1,data_size(2));%每個(gè)隱藏單元的平均活躍度（25行10000列）default_sparsity=repmat(sparsityParam,hiddenSize,data_size(2)); %稀疏性參數(shù)（25行10000列） disp('公式（6）sparsity_penalty') sparsity_penalty=beta.*(-(default_sparsity./average_sparsity)+((1-default_sparsity)./(1-average_sparsity))); disp('公式（8）delta2') delta2=(W2'*delta3+sparsity_penalty).*((active_value2).*(1-active_value2));%第2層殘差，這里加入了稀疏項(xiàng) %% 3.反向傳導(dǎo)backword propagation % 計(jì)算代價(jià)函數(shù)對(duì)各層權(quán)值和偏執(zhí)項(xiàng)的梯度 W2grad=delta3*active_value2'./data_size(2)+lambda.*W2; %W2梯度 W1grad=delta2*data'./data_size(2)+lambda.*W1; %W1梯度 b2grad=sum(delta3,2)./data_size(2); %b2梯度 b1grad=sum(delta2,2)./data_size(2); %b1梯度%------------------------------------------------------------------- % After computing the cost and gradient, we will convert the gradients back % to a vector format (suitable for minFunc). Specifically, we will unroll % your gradient matrices into a vector.grad = [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)];end%%% %% 前向傳播算法 % a1=data; % z2=bsxfun(@plus,W1*a1,b1); % a2=sigmoid(z2); % z3=bsxfun(@plus,W2*a2,b2); % a3=sigmoid(z3); % % %% 計(jì)算網(wǎng)絡(luò)誤差 % % 誤差項(xiàng)J1=所有樣本代價(jià)函數(shù)均值 % y=data; % 網(wǎng)絡(luò)的理想輸出值 % Ei=sum((a3-y).^2)/2; %每一個(gè)樣本的代價(jià)函數(shù) % J1=sum(Ei)/m; % % 正則化項(xiàng)J2=所有權(quán)值項(xiàng)平方和 % J2=sum(W1(:).^2)+sum(W2(:).^2); % % 稀疏項(xiàng)J3=所有隱藏層的神經(jīng)元相對(duì)熵之和 % rho_hat=sum(a2,2)/m; % KL=sum(sparsityParam*log(sparsityParam./rho_hat)+... % (1-sparsityParam)*log((1-sparsityParam)./(1-rho_hat))); % J3=KL; % % 網(wǎng)絡(luò)的代價(jià)函數(shù) % cost=J1+lambda*J2/2+beta*J3; % % % %% 反向傳播算法計(jì)算各層敏感度delta % delta3=-(data-a3).*dsigmoid(z3); % spare_delta=beta*(-sparsityParam./rho_hat+(1-sparsityParam)./(1-rho_hat)); % delta2=bsxfun(@plus,W2'*delta3,spare_delta).*dsigmoid(z2); % 這里加入了稀疏項(xiàng) % % %% 計(jì)算代價(jià)函數(shù)對(duì)各層權(quán)值和偏執(zhí)項(xiàng)的梯度 % W1grad=delta2*a1'/m+lambda*W1; % W2grad=delta3*a2'/m+lambda*W2; % b1grad=sum(delta2,2)/m; % b2grad=sum(delta3,2)/m; %%%------------------------------------------------------------------- % Here's an implementation of the sigmoid function, which you may find useful % in your computation of the costs and the gradients. This inputs a (row or % column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)). function sigm = sigmoid(x)sigm = 1 ./ (1 + exp(-x)); end%% 求解sigmoid函數(shù)的導(dǎo)數(shù) % function dsigm = dsigmoid(x) % sigx = sigmoid(x); % dsigm=sigx.*(1-sigx); % end

computeNumericalGradient.m

function numgrad = computeNumericalGradient(J, theta) % numgrad = computeNumericalGradient(J, theta) % theta: a vector of parameters參數(shù)向量，包含W1、W2、b1、b2 % J: a function that outputs a real-number. Calling y = J(theta) will return the % function value at theta. % Initialize numgrad with zeros numgrad = zeros(size(theta));%% ---------- YOUR CODE HERE -------------------------------------- % Instructions: % Implement numerical gradient checking, and return the result in numgrad. % (See Section 2.3 of the lecture notes.) % You should write code so that numgrad(i) is (the numerical approximation to) the % partial derivative of J with respect to the i-th input argument, evaluated at theta. % I.e., numgrad(i) should be the (approximately) the partial derivative of J with % respect to theta(i). % % Hint: You will probably want to compute the elements of numgrad one at a time. EPSILON=0.0001; for i=1:size(theta)theta_plus=theta;theta_minu=theta;theta_plus(i)=theta_plus(i)+EPSILON;theta_minu(i)=theta_minu(i)-EPSILON;numgrad(i)=(J(theta_plus)-J(theta_minu))/(2*EPSILON); end %% --------------------------------------------------------------- end

checkNumericalGradient.m

梯度檢驗(yàn)是在編寫(xiě)機(jī)器學(xué)習(xí)算法時(shí)必備的技術(shù)，可以檢驗(yàn)所編寫(xiě)的cost函數(shù)是否正確
cost函數(shù)的主要功能是：計(jì)算代價(jià)函數(shù)、計(jì)算代價(jià)函數(shù)對(duì)參數(shù)的梯度
實(shí)際程序中，梯度檢驗(yàn)要配合cost函數(shù)一起使用，可以將該部分單獨(dú)放在一個(gè)測(cè)試函數(shù)checkCost() 中
① 給定一組樣本及參數(shù)初始值
② 利用cost函數(shù)計(jì)算grad
③ 利用computeNumericalGradient函數(shù)計(jì)算梯度的近似值numGrad
④ 比較grad和numGrad是否比較相近：如果diff小于1e-6，則cost函數(shù)是正確的，否則，需要檢查cost函數(shù)
diff = norm(numGrad-grad)/norm(numGrad+grad);
disp(diff);
在確定cost函數(shù)沒(méi)有問(wèn)題后，要屏蔽掉梯度檢驗(yàn)部分的代碼，否則，將會(huì)浪費(fèi)許多時(shí)間

function [] = checkNumericalGradient() % 該函數(shù)主要目的是檢驗(yàn)SparseAutoencoderCost函數(shù)是否正確 % This code can be used to check your numerical gradient implementation % in computeNumericalGradient.m % It analytically evaluates the gradient of a very simple function called % simpleQuadraticFunction (see below) and compares the result with your numerical % solution. Your numerical gradient implementation is incorrect if % your numerical solution deviates too much from the analytical solution.% Evaluate the function and gradient at x = [4; 10]; (Here, x is a 2d vector.) x = [4; 10]; [value, grad] = simpleQuadraticFunction(x);% Use your code to numerically compute the gradient of simpleQuadraticFunction at x. % (The notation "@simpleQuadraticFunction" denotes a pointer to a function.) numgrad = computeNumericalGradient(@simpleQuadraticFunction, x);% Visually examine the two gradient computations. The two columns % you get should be very similar. disp([numgrad grad]); fprintf('The above two columns you get should be very similar.\n(Left-Your Numerical Gradient, Right-Analytical Gradient)\n\n');% Evaluate the norm of the difference between two solutions. % If you have a correct implementation, and assuming you used EPSILON = 0.0001 % in computeNumericalGradient.m, then diff below should be 2.1452e-12 diff = norm(numgrad-grad)/norm(numgrad+grad); disp(diff); fprintf('Norm of the difference between numerical and analytical gradient (should be < 1e-9)\n\n'); endfunction [value,grad] = simpleQuadraticFunction(x) % this function accepts a 2D vector as input. % Its outputs are: % value: h(x1, x2) = x1^2 + 3*x1*x2 % grad: A 2x1 vector that gives the partial derivatives of h with respect to x1 and x2 % Note that when we pass @simpleQuadraticFunction(x) to computeNumericalGradients, we're assuming % that computeNumericalGradients will use only the first returned value of this function.value = x(1)^2 + 3*x(1)*x(2);grad = zeros(2, 1); grad(1) = 2*x(1) + 3*x(2); grad(2) = 3*x(1);end % %% some initialize % numgrad = zeros(size(theta));%Initialize numgrad with zeros % n = size(theta,1);% theta(1),...,theta(n) % EPSILON =1e-4; % % %% calculate the partial derivative of J with respect to theta(i) % for i = 1:n % theta_add = zeros(n,1); % theta_add(i) = EPSILON; % numgrad(i) = (J(theta + theta_add) - J(theta-theta_add))./EPSILON/2; % end

參考文獻(xiàn)

UFLDL教程

Exercise:Sparse Autoencoder

Deep Learning 1_深度學(xué)習(xí)UFLDL教程：Sparse Autoencoder練習(xí)（斯坦福大學(xué)深度學(xué)習(xí)教程）

Deep learning：九(Sparse Autoencoder練習(xí))

UFLDL教程答案(1):Exercise:Sparse_Autoencoder

UFLDL教程之（一）sparseae_exercise

梯度檢驗(yàn)！

吳恩達(dá) Andrew Ng 的公開(kāi)課

總結(jié)

以上是生活随笔為你收集整理的UFLDL教程： Exercise: Sparse Autoencoder的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：怎么申请浦发苏宁易购联名信用卡？申请条件
下一篇：农行信用币有额度拒绝是怎么回事？申请被拒

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

UFLDL教程： Exercise: Sparse Autoencoder

自編碼可以跟PCA 一樣，給特征屬性降維

總結(jié)