當(dāng)前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

使用Caffe进行手写数字识别执行流程解析

發(fā)布時間：2023/11/27 生活经验 22 豆豆

生活随笔收集整理的這篇文章主要介紹了使用Caffe进行手写数字识别执行流程解析小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

之前在 http://blog.csdn.net/fengbingchun/article/details/50987185?中仿照Caffe中的examples實(shí)現(xiàn)對手寫數(shù)字進(jìn)行識別，這里詳細(xì)介紹下其執(zhí)行流程并精簡了實(shí)現(xiàn)代碼，使用Caffe對MNIST數(shù)據(jù)集進(jìn)行train的文章可以參考 ?http://blog.csdn.net/fengbingchun/article/details/68065338 ：

1．???先注冊所有層，執(zhí)行l(wèi)ayer_factory.hpp中類LayerRegisterer的構(gòu)造函數(shù)，類LayerRegistry的AddCreator和Registry靜態(tài)函數(shù)；關(guān)于Caffe中Layer的注冊可以參考： http://blog.csdn.net/fengbingchun/article/details/54310956?

2．? 指定執(zhí)行mode是采用CPU還是GPU；

3．???指定需要的.prototxt和.caffemodel文件：注意此處的.prototxt文件(lenet_train_test_.prototxt)與train時.prototxt文件(lenet_train_test.prototxt)在內(nèi)容上的差異。.caffemodel文件即是train后最終生成的二進(jìn)制文件lenet_iter_10000.caffemodel，里面存放著所有層的權(quán)值和偏置。lenet_train_test_.prototxt文件內(nèi)容如下：

name: "LeNet" # net名
layer { # memory required: (784+1)*4=3140name: "data" # layer名字type: "MemoryData" # layer類型，Data enters Caffe through data layers，read data directly from memorytop: "data" # top名字, shape: 1 1 28 28 (784)top: "label"  # top名字, shape: 1 (1) #感覺并無實(shí)質(zhì)作用，僅用于增加一個top blob，不可去掉memory_data_param { # 內(nèi)存數(shù)據(jù)參數(shù)batch_size: 1 # 指定待識別圖像一次的數(shù)量channels: 1 # 指定待識別圖像的通道數(shù)height: 28 # 指定待識別圖像的高度width: 28 # 指定待識別圖像的寬度}transform_param { # 圖像預(yù)處理參數(shù)scale: 0.00390625 # 對圖像像素值進(jìn)行scale操作,范圍[0, 1)}
}
layer { # memory required: 11520*4=46080name: "conv1" # layer名字type: "Convolution" # layer類型，卷積層bottom: "data" # bottom名字top: "conv1" # top名字, shape: 1 20 24 24 (11520)param { # Specifies training parameterslr_mult: 1 # The multiplier on the global learning rate}param { # Specifies training parameterslr_mult: 2 # The multiplier on the global learning rate}convolution_param { # 卷積參數(shù)num_output: 20 # 輸出特征圖(feature map)數(shù)量kernel_size: 5 # 卷積核大小(卷積核其實(shí)就是權(quán)值)stride: 1 # 滑動步長weight_filler { # The filler for the weighttype: "xavier" # 權(quán)值使用xavier濾波}bias_filler { # The filler for the biastype: "constant" # 偏置使用常量濾波}}
}
layer { # memory required: 2880*4=11520name: "pool1" # layer名字type: "Pooling" # layer類型,Pooling層bottom: "conv1" # bottom名字top: "pool1" # top名字, shape: 1 20 12 12 (2880)pooling_param { # pooling parameter，pooling層參數(shù)pool: MAX # pooling方法：最大值采樣kernel_size: 2 # 濾波器大小stride: 2 # 滑動步長}
}
layer { # memory required: 3200*4=12800name: "conv2" # layer名字type: "Convolution" # layer類型,卷積層bottom: "pool1" # bottom名字top: "conv2" # top名字, shape: 1 50 8 8 (3200)param { # Specifies training parameterslr_mult: 1 # The multiplier on the global learning rate}param { # Specifies training parameterslr_mult: 2 # The multiplier on the global learning rate}convolution_param { # 卷積參數(shù)num_output: 50 # 輸出特征圖(feature map)數(shù)量kernel_size: 5 # 卷積核大小(卷積核其實(shí)就是權(quán)值)stride: 1 # 滑動步長weight_filler { # The filler for the weighttype: "xavier" # 權(quán)值使用xavier濾波}bias_filler { # The filler for the biastype: "constant" # 偏置使用常量濾波}}
}
layer { # memory required: 800*4=3200name: "pool2" # layer名字type: "Pooling" # layer類型,Pooling層bottom: "conv2" # bottom名字top: "pool2" # top名字, shape: 1 50 4 4 (800)pooling_param { # pooling parameter，pooling層參數(shù)pool: MAX # pooling方法：最大值采樣kernel_size: 2 # 濾波器大小stride: 2 # 滑動步長}
}
layer { # memory required: 500*4=2000name: "ip1" # layer名字type: "InnerProduct" # layer類型，全連接層bottom: "pool2" # bottom名字top: "ip1" # top名字, shape: 1 500 (500)param { # Specifies training parameterslr_mult: 1 # The multiplier on the global learning rate}param { # Specifies training parameterslr_mult: 2 # The multiplier on the global learning rate}inner_product_param { # 全連接層參數(shù)num_output: 500 # 輸出特征圖(feature map)數(shù)量weight_filler { # The filler for the weighttype: "xavier" # 權(quán)值使用xavier濾波}bias_filler { # The filler for the biastype: "constant" # 偏置使用常量濾波}}
}
# ReLU: Given an input value x, The ReLU layer computes the output as x if x > 0 and 
# negative_slope * x if x <= 0. When the negative slope parameter is not set,
# it is equivalent to the standard ReLU function of taking max(x, 0).
# It also supports in-place computation, meaning that the bottom and
# the top blob could be the same to preserve memory consumption
layer { # memory required: 500*4=2000name: "relu1" # layer名字type: "ReLU" # layer類型bottom: "ip1" # bottom名字top: "ip1" # top名字 (in-place), shape: 1 500 (500)
}
layer { # memory required: 10*4=40name: "ip2" # layer名字type: "InnerProduct" # layer類型,全連接層bottom: "ip1" # bottom名字top: "ip2" # top名字, shape: 1 10 (10)param { # Specifies training parameterslr_mult: 1 # The multiplier on the global learning rate}param { # Specifies training parameterslr_mult: 2 # The multiplier on the global learning rate}inner_product_param {num_output: 10 # 輸出特征圖(feature map)數(shù)量weight_filler { # The filler for the weighttype: "xavier" # 權(quán)值使用xavier濾波}bias_filler { # The filler for the biastype: "constant" # 偏置使用常量濾波}}
}
layer { # memory required: 10*4=40name: "prob" # layer名字type: "Softmax" # layer類型bottom: "ip2" # bottom名字top: "prob" # top名字, shape: 1 10 (10)
}
# 占用總內(nèi)存大小為：3140+46080+11520+12800+3200+2000+2000+40+40=80820

lenet_train_test_.prototxt可視化結(jié)果( http://ethereon.github.io/netscope/quickstart.html?)如下圖：

train時lenet_train_test.prototxt與識別時用到的lenet_train_test_.prototxt差異：

(1)、數(shù)據(jù)層：訓(xùn)練時用Data，是以lmdb數(shù)據(jù)存儲方式載入網(wǎng)絡(luò)的，而識別時用MemoryData方式直接從內(nèi)存載入網(wǎng)絡(luò)；

(2)、Accuracy層：僅訓(xùn)練時用到，用以計(jì)算test集的準(zhǔn)確率；

(3)、輸出層Softmax/SoftmaxWithLoss層：訓(xùn)練時用SoftmaxWithLoss，輸出loss值，識別時用Softmax輸出10類數(shù)字的概率值。

4．???創(chuàng)建Net對象并初始化，有兩種方法：一個是通過傳入string類型(.prototxt文件)參數(shù)創(chuàng)建，一個是通過傳入NetParameter參數(shù)；

5．???調(diào)用Net的CopyTrainedLayersFrom函數(shù)加載在train時生成的二進(jìn)制文件.caffemodel即lenet_iter_10000.caffemodel，有兩種方法，一個是通過傳入string類型(.caffemodel文件)參數(shù)，一個是通過傳入NetParameter參數(shù)；

6．???獲取Net相關(guān)參數(shù)在后面識別時需要用到：

(1)、通過調(diào)用Net的blob_by_name函數(shù)獲得待識別圖像所要求的通道數(shù)、寬、高；

(2)、通過調(diào)用Net的output_blobs函數(shù)獲得輸出blob的數(shù)目及大小，注：這里輸出2個blob，第一個是label，count為1，第二個是prob，count為10，即表示數(shù)字識別結(jié)果的概率值。

7．???開始進(jìn)行手寫數(shù)字識別：

(1)、通過opencv的imread函數(shù)讀入圖像；

(2)、根據(jù)從Net中獲得的需要輸入圖像的要求對圖像進(jìn)行顏色空間轉(zhuǎn)換和縮放；

(3)、因?yàn)镸NIST train時，圖像為前景為白色，背景為黑色，而現(xiàn)在輸入圖像為前景為黑色，背景為白色，因此需要對圖像進(jìn)行取反操作；

(4)、將圖像數(shù)據(jù)傳入Net，有兩種方法：一種是通過MemoryDataLayer類的Reset函數(shù)，一種是通過MemoryDataLayer類的AddMatVector函數(shù)傳入Mat參數(shù)；

(5)、調(diào)用Net的ForwardPrefilled函數(shù)進(jìn)行前向計(jì)算；

(6)、輸出識別結(jié)果，注，前向計(jì)算完返回的Blob有兩個，第二個Blob中的數(shù)據(jù)才是最終的識別結(jié)果的概率值，其中最大值的索引即是識別結(jié)果。

8．???通過lenet_train_test_.prototxt文件分析各層的權(quán)值、偏置和神經(jīng)元數(shù)量，共9層：

(1)、data數(shù)據(jù)層：無權(quán)值和偏置，神經(jīng)元數(shù)量為1*1*28*28+1=785；

(2)、conv1卷積層：卷積窗大小為5*5，輸出特征圖數(shù)量為20，卷積窗種類為20，輸出特征圖大小為24*24，可訓(xùn)練參數(shù)(權(quán)值+閾值(偏置))為 20*1*5*5+20=520，神經(jīng)元數(shù)量為1*20*24*24=11520；

(3)、pool1降采樣層：濾波窗大小為2*2，輸出特征圖數(shù)量為20，濾波窗種類為20，輸出特征圖大小為12*12，可訓(xùn)練參數(shù)(權(quán)值+偏置)為1*20+20=40，神經(jīng)元數(shù)量為1*20*12*12=2880；

(4)、conv2卷積層：卷積窗大小為5*5，輸出特征圖數(shù)量為50，卷積窗種類為50*20，輸出特征圖大小為8*8，可訓(xùn)練參數(shù)(權(quán)值+偏置)為50*20*5*5+50=25050，神經(jīng)元數(shù)量為1*50*8*8=3200；

(5)、pool2降采樣層：濾波窗大小為2*2，輸出特征圖數(shù)量為50，濾波窗種類為50,輸出特征圖大小為4*4，可訓(xùn)練參數(shù)(權(quán)值+偏置)為1*50+50=100，神經(jīng)元數(shù)量為1*50*4*4=800；

(6)、ip1全連接層：濾波窗大小為1*1，輸出特征圖數(shù)量為500，濾波窗種類為500*800，輸出特征圖大小為1*1，可訓(xùn)練參數(shù)(權(quán)值+偏置)為500*800*1*1+500=400500，神經(jīng)元數(shù)量為1*500*1*1=500；

(7)、relu1層：in-placeip1；

(8)、ip2全連接層：濾波窗大小為1*1，輸出特征圖數(shù)量為10，濾波窗種類為10*500，輸出特征圖大小為1*1，可訓(xùn)練參數(shù)(權(quán)值+偏置)為10*500*1*1+10=5010，神經(jīng)元數(shù)量為1*10*1*1=10；

(9)、prob輸出層：神經(jīng)元數(shù)量為1*10*1*1+1=11。

精簡后的手寫數(shù)字識別測試代碼如下：

int mnist_predict()
{caffe::Caffe::set_mode(caffe::Caffe::CPU);const std::string param_file{ "E:/GitCode/Caffe_Test/test_data/model/mnist/lenet_train_test_.prototxt" };const std::string trained_filename{ "E:/GitCode/Caffe_Test/test_data/model/mnist/lenet_iter_10000.caffemodel" };const std::string image_path{ "E:/GitCode/Caffe_Test/test_data/images/" };// 有兩種方法可以實(shí)例化net// 1. 通過傳入?yún)?shù)類型為std::stringcaffe::Net<float> caffe_net(param_file, caffe::TEST);caffe_net.CopyTrainedLayersFrom(trained_filename);// 2. 通過傳入?yún)?shù)類型為caffe::NetParameter//caffe::NetParameter net_param1, net_param2;//caffe::ReadNetParamsFromTextFileOrDie(param_file, &net_param1);//net_param1.mutable_state()->set_phase(caffe::TEST);//caffe::Net<float> caffe_net(net_param1);//caffe::ReadNetParamsFromBinaryFileOrDie(trained_filename, &net_param2);//caffe_net.CopyTrainedLayersFrom(net_param2);int num_inputs = caffe_net.input_blobs().size(); // 0 ??const boost::shared_ptr<caffe::Blob<float> > blob_by_name = caffe_net.blob_by_name("data");int image_channel = blob_by_name->channels();int image_height = blob_by_name->height();int image_width = blob_by_name->width();int num_outputs = caffe_net.num_outputs();const std::vector<caffe::Blob<float>*> output_blobs = caffe_net.output_blobs();int require_blob_index{ -1 };const int digit_category_num{ 10 };for (int i = 0; i < output_blobs.size(); ++i) {if (output_blobs[i]->count() == digit_category_num)require_blob_index = i;}if (require_blob_index == -1) {fprintf(stderr, "ouput blob don't match\n");return -1;}std::vector<int> target{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };std::vector<int> result;for (auto num : target) {std::string str = std::to_string(num);str += ".png";str = image_path + str;cv::Mat mat = cv::imread(str.c_str(), 1);if (!mat.data) {fprintf(stderr, "load image error: %s\n", str.c_str());return -1;}if (image_channel == 1)cv::cvtColor(mat, mat, CV_BGR2GRAY);else if (image_channel == 4)cv::cvtColor(mat, mat, CV_BGR2BGRA);cv::resize(mat, mat, cv::Size(image_width, image_height));cv::bitwise_not(mat, mat);// 將圖像數(shù)據(jù)載入Net網(wǎng)絡(luò)，有2種方法boost::shared_ptr<caffe::MemoryDataLayer<float> > memory_data_layer =boost::static_pointer_cast<caffe::MemoryDataLayer<float>>(caffe_net.layer_by_name("data"));// 1. 通過MemoryDataLayer類的Reset函數(shù)mat.convertTo(mat, CV_32FC1, 0.00390625);float dummy_label[1] {0};memory_data_layer->Reset((float*)(mat.data), dummy_label, 1);// 2. 通過MemoryDataLayer類的AddMatVector函數(shù)//std::vector<cv::Mat> patches{mat}; // set the patch for testing//std::vector<int> labels(patches.size());//memory_data_layer->AddMatVector(patches, labels); // push vector<Mat> to data layerfloat loss{ 0.0 };const std::vector<caffe::Blob<float>*>& results = caffe_net.ForwardPrefilled(&loss); // Net forwardconst float* output = results[require_blob_index]->cpu_data();float tmp{ -1 };int pos{ -1 };fprintf(stderr, "actual digit is: %d\n", target[num]);for (int j = 0; j < 10; j++) {printf("Probability to be Number %d is: %.3f\n", j, output[j]);if (tmp < output[j]) {pos = j;tmp = output[j];}}result.push_back(pos);}for (auto i = 0; i < 10; i++)fprintf(stderr, "actual digit is: %d, result digit is: %d\n", target[i], result[i]);fprintf(stderr, "predict finish\n");return 0;
}

測試結(jié)果如下：

GitHub：https://github.com/fengbingchun/Caffe_Test

總結(jié)

以上是生活随笔為你收集整理的使用Caffe进行手写数字识别执行流程解析的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Caffe中对MNIST执行train操
下一篇： C++/C++11中引用的使用

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

生活经验

使用Caffe进行手写数字识别执行流程解析

總結(jié)