當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

训练MNIST数据集模型

發布時間：2025/7/25 编程问答 25 豆豆

生活随笔收集整理的這篇文章主要介紹了训练MNIST数据集模型小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1. 數據集準備

?? 詳細信息見：　Caffe: LMDB 及其數據轉換

mnist是一個手寫數字庫，由DL大牛Yan LeCun進行維護。mnist最初用于支票上的手寫數字識別, 現在成了DL的入門練習庫。征對mnist識別的專門模型是Lenet，算是最早的cnn模型了。

mnist數據訓練樣本為60000張，測試樣本為10000張，每個樣本為28*28大小的黑白圖片，手寫數字為0-9，因此分為10類。

１）數據可以從MNIST官網上下載）

２）或者執行如下命令

$CAFFE_ROOT表示源代碼的根目錄

cd $CAFFE_ROOT ./data/mnist/get_mnist.sh運行成功后，在 data/mnist/目錄下有四個文件： train-images-idx3-ubyte:? 訓練集樣本 (9912422 bytes)? train-labels-idx1-ubyte: ?訓練集對應標注 (28881 bytes)? t10k-images-idx3-ubyte: ? 測試集圖片 (1648877 bytes)? t10k-labels-idx1-ubyte: ? 測試集對應標注 (4542 bytes)這些數據不能在caffe中直接使用，需要轉換成LMDB數據 ./examples/mnist/create_mnist.sh下載成功會有如下兩個數據集：/data/mnist-train-leveldb 和 /data/mnist-test-leveldb.

２．LeNet: MNIST 分類模型的訓練和測試

2.1 LeNet分類模型

使用LeNet模型網絡來訓練，是數字識別的好模型。The design of LeNet contains the essence of CNNs that are still used in larger models such as the ones in ImageNet. In general, it consists of a convolutional layer followed by a pooling layer, another convolution layer followed by a pooling layer, and then two fully connected layers similar to the conventional multilayer perceptrons. We have defined the layers in `$CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt`.

2.2 MNIST網絡結構定義

此處解釋MNIST手寫數字識別LeNet模型的定義 `lenet_train_test.prototxt`，Caffe中使用的protobuf 定義在 `$CAFFE_ROOT/src/caffe/proto/caffe.proto`中.接下來將會寫一個protobuf定義：`caffe::NetParameter` (或者Ｐｙｔｈｏｎ形式, `caffe.proto.caffe_pb2.NetParameter`) .

　開始，先定義一個網絡名字：
??? name: "LeNet"

2.2.1 數據層

demo中從ｌｍｄｂ創建的MNIST數據，通過一個數據層定義:
??? layer {
????? name: "mnist"
????? type: "Data"
????? transform_param {
??????? scale: 0.00390625
????? }
????? data_param {
??????? source: "mnist_train_lmdb"
??????? backend: LMDB
??????? batch_size: 64
????? }
????? top: "data"
????? top: "label"
??? }

本層有一個屬性name `mnist`, type `data`,數據讀取自 lmdb. batch的大小為64, 并且會對incoming pixels進行ｓｃａｌｅ保證范圍在 [0,1). 為什么是 0.00390625?這個值等于 1除以256. 最終, 該層生成兩個blobs, 一個是 `data` blob, 另一個是 `label` blob.

2.2.2 卷積層

卷機層定義:
??? layer {
????? name: "conv1"
????? type: "Convolution"
????? param { lr_mult: 1 }
????? param { lr_mult: 2 }
????? convolution_param {
??????? num_output: 20
??????? kernel_size: 5
??????? stride: 1
??????? weight_filler {
????????? type: "xavier"
??????? }
??????? bias_filler {
????????? type: "constant"
??????? }
????? }
????? bottom: "data"
????? top: "conv1"
??? }

本層使用數據層提供的 `data` blob , 并產生 `conv1` 層. 產生 20個通道的輸出, 卷積核大小5 and carried out with stride 1.
The fillers允許可以隨機初始化weights和bias的值. 對于 weight filler, 使用`xavier` 算法給予神經元的輸入和輸出的數量自動地決定初始化的scale.對于 bias filler, 將會簡單的初始化為常量t,默認為0.
`lr_mult`為層中參數的學習率調節器.當前權因子的學習率和運行時solver給定的學習率一樣, bias的學習率為權因子學習率的兩倍可以獲得較好的收斂速度.

2.2.3 Pooling 層

Pooling是較容易定義的:

??? layer {
????? name: "pool1"
????? type: "Pooling"
????? pooling_param {
??????? kernel_size: 2
??????? stride: 2
??????? pool: MAX
????? }
????? bottom: "conv1"
????? top: "pool1"
??? }
max pooling 核的大小為2，stride 為 2 (所以相鄰的pooling區域沒有重疊).
類似的，可以寫出第二個 convolution 和 pooling 層. 詳見 `$CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt`

2.2.4 全連接層（Fully Connected Layer）

寫一個全連接層很簡單:

??? layer {
????? name: "ip1"
????? type: "InnerProduct"
????? param { lr_mult: 1 }
????? param { lr_mult: 2 }
????? inner_product_param {
??????? num_output: 500
??????? weight_filler {
????????? type: "xavier"
??????? }
??????? bias_filler {
????????? type: "constant"
??????? }
????? }
????? bottom: "pool2"
????? top: "ip1"
??? }

T這樣定義了一個全鏈接層(Caffe中稱為`InnerProduct`層) 有 500個輸出.所有其他行和之前的很相似對么？

2.2.5 ReLU 層

ReLU 層也很簡單:

??? layer {
????? name: "relu1"
????? type: "ReLU"
????? bottom: "ip1"
????? top: "ip1"
??? }

因為 ReLU是一個element-wise 操作, 我們可以做 *in-place* 操作節省內存. This is achieved by simply giving the same name to the bottom and top blobs. 當然, 不要使用重復的blob名字 for other layer types!

ReLU 層后, 我們將會寫另一個 innerproduct layer:

??? layer {
????? name: "ip2"
????? type: "InnerProduct"
????? param { lr_mult: 1 }
????? param { lr_mult: 2 }
????? inner_product_param {
??????? num_output: 10
??????? weight_filler {
????????? type: "xavier"
??????? }
??????? bias_filler {
????????? type: "constant"
??????? }
????? }
????? bottom: "ip1"
????? top: "ip2"
??? }

2.2.6 Loss 層

最后, loss層

??? layer {
????? name: "loss"
????? type: "SoftmaxWithLoss"
????? bottom: "ip2"
????? bottom: "label"
??? }

?`softmax_loss` 層移植了 softmax 和? multinomial logistic loss (that saves time and improves numerical stability). 使用了兩個blobs, 第一個being the prediction and第二個being the `label` provided by the data layer (remember it?).　這并不產生任何輸出，只是用來計算損失函數的值, report it when backpropagation starts, and initiates the gradient with respect to `ip2`. This is where all magic starts.

2.2.7 備注：寫層的規則

Layer 的定義可以包含是否或者什么時候被包含在網絡定義中的規則, 例如下面這個例子:

??? layer {
????? // ...layer definition...
????? include: { phase: TRAIN }
??? }

這是一個規則, 基于網絡的state controls layer inclusion in the network.
You can refer to `$CAFFE_ROOT/src/caffe/proto/caffe.proto` for more information about layer rules and model schema.

上面的這個例子, 該層只會被包含在 `TRAIN` phase.
如果改變 `TRAIN` 為 `TEST`,該層只會使用在test phase.

默認的, 層是沒有規則的,一個層通常包含在網絡中.
Thus, `lenet_train_test.prototxt` has two `DATA` layers defined (with different `batch_size`), one for the training phase and one for the testing phase.
Also, there is an `Accuracy` layer which is included only in `TEST` phase for reporting the model accuracy every 100 iteration, as defined in `lenet_solver.prototxt`.

整體解釋如下：

#網絡名稱 name: "LeNet" #train數據層 #輸入源：mnist_train_ldmb，batch_size：64 #輸出：data blob，label blob #數據變換：scale歸一化，0.00390625=1/255 layer {name: "mnist"type: "Data"top: "data"top: "label"include {phase: TRAIN}transform_param {scale: 0.00390625}data_param {source: "examples/mnist/mnist_train_lmdb"batch_size: 64backend: LMDB} } #test數據層 #輸入源：mnist_test_ldmb，batch_size：100 #輸出：data blob，label blob #數據變換：scale歸一化 layer {name: "mnist"type: "Data"top: "data"top: "label"include {phase: TEST}transform_param {scale: 0.00390625}data_param {source: "examples/mnist/mnist_test_lmdb"batch_size: 100backend: LMDB} } #卷積層conv1 #輸入數據：data blob #輸出數據:conv1 blob #卷積層參數：20個5*5的特征卷積核，步長為1，卷積核的權重初始化方式為xavier，偏置的初始化方式為constant，常量默認為0 #該層學習率：權重學習率為基學習率base_lr的1倍，偏置學習率為base_lr的兩倍 layer {name: "conv1"type: "Convolution"bottom: "data"top: "conv1"param {lr_mult: 1}param {lr_mult: 2}convolution_param {num_output: 20kernel_size: 5stride: 1weight_filler {type: "xavier"}bias_filler {type: "constant"}} } #池化層pool1 #輸入數據：conv1 blob #輸出數據：pool1 blob #池化方式及參數：Max pool，2*2的池化核，步長為2 layer {name: "pool1"type: "Pooling"bottom: "conv1"top: "pool1"pooling_param {pool: MAXkernel_size: 2stride: 2} }layer {name: "conv2"type: "Convolution"bottom: "pool1"top: "conv2"param {lr_mult: 1}param {lr_mult: 2}convolution_param {num_output: 50kernel_size: 5stride: 1weight_filler {type: "xavier"}bias_filler {type: "constant"}} } layer {name: "pool2"type: "Pooling"bottom: "conv2"top: "pool2"pooling_param {pool: MAXkernel_size: 2stride: 2} }#全連接層ip1 #輸入數據：pool2 blob #輸出數據：ip1 blob #全連接層參數：500個節點，權值初始化方式為xavier，偏置初始化方式為constant，默認為0 #學習率：權重學習率為base_lr，偏置學習率為base_lr*2layer {name: "ip1"type: "InnerProduct"bottom: "pool2"top: "ip1"param {lr_mult: 1}param {lr_mult: 2}inner_product_param {num_output: 500weight_filler {type: "xavier"}bias_filler {type: "constant"}} } #非線性激活層relu1 #輸入數據：ip1 blob #輸出數據：ip1 blob（注意仍然是ip1，由于relu是對每個點操作，輸出也是對應個點的值，這樣做便于省內存） layer {name: "relu1"type: "ReLU"bottom: "ip1"top: "ip1" } #全連接層ip2 #輸入數據：ip1 blob #輸出數據：ip2 blob（此網絡輸出也即用于最終預測的輸出） layer {name: "ip2"type: "InnerProduct"bottom: "ip1"top: "ip2"param {lr_mult: 1}param {lr_mult: 2}inner_product_param {num_output: 10weight_filler {type: "xavier"}bias_filler {type: "constant"}} } #精確度計算層accuracy，只在test階段有用 #輸入數據：ip2 blob，label accuracy #輸出數據：accuracy layer {name: "accuracy"type: "Accuracy"bottom: "ip2"bottom: "label"top: "accuracy"include {phase: TEST} } # 損失函數層：SoftmaxWithLoss類型的loss # 數據輸入：ip2 blob， label blob # 數據輸出：loss blob，最終損失 layer {name: "loss"type: "SoftmaxWithLoss"bottom: "ip2"bottom: "label"top: "loss" }

３定義 MNIST Solver

Check out the comments explaining each line in the prototxt `$CAFFE_ROOT/examples/mnist/lenet_solver.prototxt`:

??? # The train/test net protocol buffer definition
??? net: "examples/mnist/lenet_train_test.prototxt"
??? # test_iter specifies how many forward passes the test should carry out.
??? # In the case of MNIST, we have test batch size 100 and 100 test iterations,
??? # covering the full 10,000 testing images.
??? test_iter: 100
??? # Carry out testing every 500 training iterations.
??? test_interval: 500
??? # The base learning rate, momentum and the weight decay of the network.
??? base_lr: 0.01
??? momentum: 0.9
??? weight_decay: 0.0005
??? # The learning rate policy
??? lr_policy: "inv"
??? gamma: 0.0001
??? power: 0.75
??? # Display every 100 iterations
??? display: 100
??? # The maximum number of iterations
??? max_iter: 10000
??? # snapshot intermediate results
??? snapshot: 5000
??? snapshot_prefix: "examples/mnist/lenet"
??? # solver mode: CPU or GPU
??? solver_mode: GPU

解釋如下：

# 網絡結構 net: "examples/mnist/lenet_train_test.prototxt" # 此時validation總樣本數test_iter*batch_size test_iter: 100 # 每500次訓練迭代進行依次validation迭代 test_interval: 500 #初始學習率，基學習率 base_lr: 0.01 momentum: 0.9 weight_decay: 0.0005 # 學習率變化策略 lr_policy: "inv" gamma: 0.0001 power: 0.75 # 每100次迭代顯示一次loss等參數，包括訓練與驗證 display: 100 # 最多迭代次數 max_iter: 10000 # 每5000次迭代進行一次模型存儲，防止意外中斷 snapshot: 5000 snapshot_prefix: "examples/mnist/lenet" # 訓練和測試模式 solver_mode: GPU

４訓練和測試模型

寫完network definition protobuf and solver protobuf后. 直接運行 `train_lenet.sh`, 或者按照如下命令:

??? cd $CAFFE_ROOT
??? ./examples/mnist/train_lenet.sh

`train_lenet.sh` 是一個簡單的腳本, but here is a quick explanation: the main tool for training is `caffe` with action `train` and the solver protobuf text file as its argument.

Ｗhen you run the code, you will see a lot of messages flying by like this:I1203 net.cpp:66] Creating Layer conv1I1203 net.cpp:76] conv1 <- dataI1203 net.cpp:101] conv1 -> conv1I1203 net.cpp:116] Top shape: 20 24 24I1203 net.cpp:127] conv1 needs backward computation.These messages tell you the details about each layer, its connections and its output shape, which may be helpful in debugging. After the initialization, the training will start:I1203 net.cpp:142] Network initialization done.I1203 solver.cpp:36] Solver scaffolding done.I1203 solver.cpp:44] Solving LeNetBased on the solver setting, we will print the training loss function every 100 iterations, and test the network every 500 iterations. You will see messages like this:I1203 solver.cpp:204] Iteration 100, lr = 0.00992565I1203 solver.cpp:66] Iteration 100, loss = 0.26044...I1203 solver.cpp:84] Testing netI1203 solver.cpp:111] Test score #0: 0.9785I1203 solver.cpp:111] Test score #1: 0.0606671For each training iteration, `lr` is the learning rate of that iteration, and `loss` is the training function. For the output of the testing phase, score 0 is the accuracy, and score 1 is the testing loss function.And after a few minutes, you are done!I1203 solver.cpp:84] Testing netI1203 solver.cpp:111] Test score #0: 0.9897I1203 solver.cpp:111] Test score #1: 0.0324599I1203 solver.cpp:126] Snapshotting to lenet_iter_10000I1203 solver.cpp:133] Snapshotting solver state to lenet_iter_10000.solverstateI1203 solver.cpp:78] Optimization Done.The final model, stored as a binary protobuf file, is stored atlenet_iter_10000which you can deploy as a trained model in your application, if you are training on a real-world application dataset.

如果想在GPU下運行計算，只需要修改lenet_solver.prototxt文件的solver_mode即可，0是CPU，1是GPU。

# solver mode: CPU or GPUsolver_mode: CPU
?How to reduce the learning rate at fixed steps?
Look at lenet_multistep_solver.prototxt

總結

以上是生活随笔為你收集整理的训练MNIST数据集模型的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。