【TensorFlow】笔记3:MNIST数字识别问题
文章目錄
- 一、MNIST數(shù)據(jù)處理
- 1、數(shù)據(jù)集概述
- 2、數(shù)據(jù)獲取
- 二、神經(jīng)網(wǎng)絡(luò)模型訓(xùn)練及不同模型結(jié)果對(duì)比
- 1、TF訓(xùn)練神經(jīng)網(wǎng)絡(luò)
- 2、使用驗(yàn)證數(shù)據(jù)判斷模型效果
- 3、不同模型效果比較
- 三、變量管理
- 1、tf.get_variable()
- 2、tf.variable_scope()管理
- 3、前向傳播的改進(jìn)
- 四、TF模型持久化
- 1、持久化代碼實(shí)現(xiàn)
- (1)ckpt文件的保存
- (2)加載已經(jīng)保存的TF模型
- (3)保存或加載部分變量
- (4)在保存或者加載時(shí)給變量重命名
- (5)示例
- (6)統(tǒng)一保存
- 2、持久化原理及數(shù)據(jù)格式
- 五、TF最佳實(shí)踐樣例
- 1、mnist_inference.py
- 2、mnist_train.py
- 3、mnist_eval.py
一、MNIST數(shù)據(jù)處理
1、數(shù)據(jù)集概述
MNIST 數(shù)據(jù)集是 NIST 數(shù)據(jù)集的一個(gè)子集,它包含了 60000 張圖片作為訓(xùn)練數(shù)據(jù), 10000 張圖片作為測(cè)試數(shù)據(jù)。在 MNIST 數(shù)據(jù)集中的每一張圖片都代表了 0~9 中的一個(gè)數(shù)字。圖片的大小都為 28×2828\times2828×28 , 且數(shù)字都會(huì)出現(xiàn)在圖片的正中間。
2、數(shù)據(jù)獲取
from tensorflow.examples.tutorials.mnist import input_datapath = './datasets/MNIST_DATA' mnist = input_data.read_data_sets(path, one_hot=True)print("Training data size: ", mnist.train.num_examples) print("Validating data size: ", mnist.validation.num_examples) print("Testing data size: ", mnist.test.num_examples)# minist.train.next_batch函數(shù), # 它可以從所有的訓(xùn)練數(shù)據(jù)中讀取一小部分作為一個(gè)訓(xùn)練batch。 batch_size = 100 xs, ys = mnist.train.next_batch(batch_size) print("X shape: ", xs.shape) print("Y shape: ", ys.shape)輸出結(jié)果
Training data size: 55000 Validating data size: 5000 Testing data size: 10000 X shape: (100, 784) Y shape: (100, 10)像素矩陣中的元素的取值范圍為[0, 1],0代表白色背景,1代表黑色前景。
通過(guò) input_data.read_data_sets 函數(shù)生成的類(lèi)提供 minist.train.next_batch函數(shù),它可以從所有的訓(xùn)練數(shù)據(jù)中讀取一小部分作為一個(gè)訓(xùn)練batch。
二、神經(jīng)網(wǎng)絡(luò)模型訓(xùn)練及不同模型結(jié)果對(duì)比
1、TF訓(xùn)練神經(jīng)網(wǎng)絡(luò)
在神經(jīng)網(wǎng)絡(luò)的結(jié)構(gòu)上,深度學(xué)習(xí)一方面需要使用激活函數(shù)實(shí)現(xiàn)神經(jīng)網(wǎng)絡(luò)模型的去線性化,另一方面需要使用一個(gè)或多個(gè)隱藏層使得神經(jīng)網(wǎng)絡(luò)的結(jié)構(gòu)更深,以解決復(fù)雜問(wèn)題。
在訓(xùn)練神經(jīng)網(wǎng)絡(luò)時(shí),通常使用帶指數(shù)衰減的學(xué)習(xí)率設(shè)置、使用正則化來(lái)避免過(guò)度擬合,以及使用滑動(dòng)平均模型來(lái)使得最終模型更加健壯。
import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_datapath = './datasets/MNIST_DATA'# MNIST 數(shù)據(jù)集相關(guān)的常數(shù) INPUT_NODE = 784 # 28*28 OUT_NODE = 10 # 0~9# 配置神經(jīng)網(wǎng)絡(luò)的參數(shù) LAYER1_NODE = 500 # 一個(gè)隱藏層,節(jié)點(diǎn)數(shù):500 BATCH_SIZE = 100 LEARNING_RATE_BASE = 0.8 # 基礎(chǔ)的學(xué)習(xí)率 LEARNING_RATE_DECAY = 0.99 # 學(xué)習(xí)率的衰減率 REGULARIZATION_RATE = 0.0001 # 描述模型復(fù)雜度的正則化項(xiàng)在損失函數(shù)中的系數(shù) TRAINING_STEPS = 30000 # 訓(xùn)練輪數(shù) MOVING_AVERAGE_DECAY = 0.99 # 滑動(dòng)平均衰減率# 一個(gè)輔助函數(shù),給定神經(jīng)網(wǎng)絡(luò)的輸入和所有參數(shù),計(jì)算神經(jīng)網(wǎng)絡(luò)的前向傳播結(jié)果。 # 在這里定義了一個(gè)使用 ReLU 激活函數(shù)的三層全連接神經(jīng)網(wǎng)絡(luò)。 # 通過(guò)加入隱藏層實(shí)現(xiàn)了多層網(wǎng)絡(luò)結(jié)構(gòu),通過(guò) ReLU 激活函數(shù)實(shí)現(xiàn)了去線性化。 # 在這個(gè)函數(shù)中也支持傳入用于計(jì)算參數(shù)平均值的類(lèi),這樣方便在測(cè)試時(shí)使用滑動(dòng)平均模型。 def inference(input_tensor, avg_class, weights1, biases1, weights2, biases2):# 當(dāng)沒(méi)有提供滑動(dòng)平均類(lèi)時(shí),直接使用參數(shù)當(dāng)前的取值if avg_class == None:# 計(jì)算隱藏層的前向傳播結(jié)果,這里使用了 ReLU i版活函數(shù)layer1 = tf.nn.relu(tf.matmul(input_tensor, weights1) + biases1)# 計(jì)算輸出層的前向傳播結(jié)果。# 因?yàn)樵谟?jì)算損失函數(shù)時(shí)會(huì)一并計(jì)算 softmax 函數(shù),所以這里不需要加入激活函數(shù)。# 而且不加入 softmax 不會(huì)影響預(yù)測(cè)結(jié)果。# 因?yàn)轭A(yù)測(cè)時(shí)使用的是不同類(lèi)別對(duì)應(yīng)節(jié)點(diǎn)輸出值的相對(duì)大小,有沒(méi)有 softmax 層對(duì)最后分類(lèi)結(jié)果的# 計(jì)算沒(méi)有影響。于是在計(jì)算整個(gè)神經(jīng)網(wǎng)絡(luò)的前向傳播時(shí)可以不加入最后的 softmax 層 。return tf.matmul(layer1, weights2) + biases2else:# 首先使用 avg_class.average 函數(shù)來(lái)計(jì)算得出變量的滑動(dòng)平均值,# 然后再計(jì)算相應(yīng)的神經(jīng)網(wǎng)絡(luò)前向傳播結(jié)果。layer1 = tf.nn.relu(tf.matmul(input_tensor, avg_class.average(weights1)) + avg_class.average(biases1))return tf.matmul(layer1, avg_class.average(weights2)) + avg_class.average(biases2)# 訓(xùn)練模型的過(guò)程 def train(mnist):x = tf.placeholder(tf.float32, [None, INPUT_NODE], name='x-input')y_ = tf.placeholder(tf.float32, [None, OUT_NODE], name='y-input')# 生成隱藏層的參數(shù)。weights1 = tf.Variable(tf.truncated_normal([INPUT_NODE, LAYER1_NODE], stddev=0.1))biases1 = tf.Variable(tf.constant(0.1, shape=[LAYER1_NODE]))# 生成輸出層的參數(shù)。weights2 = tf.Variable(tf.truncated_normal([LAYER1_NODE, OUT_NODE], stddev=0.1))biases2 = tf.Variable(tf.constant(0.1, shape=[OUT_NODE]))# 計(jì)算在當(dāng)前參數(shù)下神經(jīng)網(wǎng)絡(luò)前向傳播的結(jié)果。這里給出的用于計(jì)算滑動(dòng)平均的類(lèi)為 None,# 所以的數(shù)不會(huì)使用參數(shù)的滑動(dòng)平均值。y = inference(x, None, weights1, biases1, weights2, biases2)# 定義在儲(chǔ)訓(xùn)練輪數(shù)的變量。這個(gè)變量不需要計(jì)算滑動(dòng)平均值,所以這里指定這個(gè)變量為# 不可訓(xùn)練的變量(trainable=Fasle)。在使用 TensorFlow 訓(xùn)練神經(jīng)網(wǎng)絡(luò)時(shí),# -般會(huì)將代表訓(xùn)練輪數(shù)的變量指定為不可訓(xùn)練的參數(shù)。global_step = tf.Variable(0, trainable=False)# 給定消動(dòng)平均哀減率和訓(xùn)練輪數(shù)的變量,初始化滑動(dòng)平均類(lèi)。# 給定訓(xùn)練輪數(shù)的變量可以加快訓(xùn)練早期變量的更新速度。variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)# 在所有代表神經(jīng)網(wǎng)絡(luò)參數(shù)的變量上使用滑動(dòng)平均。# tf.trainable_variables 返回的就是圖上集合GraphKeys.TRAINABLE_VARIABLES 中的元索。# 也就是這個(gè)集合的元索就是所有沒(méi)有指定 trainable=False 的參數(shù)。variable_avergaes_op = variable_averages.apply(tf.trainable_variables())# 計(jì)算使用了滑動(dòng)平均之后的前向傳播結(jié)果.# 滑動(dòng)平均不會(huì)改變變量本身的取值,而是會(huì)維護(hù)一個(gè)影子變量來(lái)記錄其滑動(dòng)平均值。# 所以當(dāng)面要使用這個(gè)滑動(dòng)平均值時(shí),需要明確調(diào)用 average 函數(shù)。average_y = inference(x, variable_averages, weights1, biases1, weights2, biases2)# 計(jì)算交叉熵?fù)p失函數(shù):spare_softmax_cross_entropy_with_logits()# 第一個(gè)參數(shù)是神經(jīng)網(wǎng)絡(luò)不包括 softmax 層的前向傳播結(jié)果,第二個(gè)是訓(xùn)練數(shù)據(jù)的正確答案。# tf.argmax() 得到正確答案對(duì)應(yīng)的類(lèi)別編號(hào)cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))# 計(jì)算在當(dāng)前 batch 中所有樣例的交叉熵平均值。cross_entropy_mean = tf.reduce_mean(cross_entropy)# L2 正則化損失的損失函數(shù)regularizer = tf.contrib.layers.l2_regularizer(REGULARIZATION_RATE)# 計(jì)算模型的正則化損失。一般只計(jì)算神經(jīng)網(wǎng)絡(luò)邊上權(quán)重的正則化損失,而不使用bias。regularization = regularizer(weights1) + regularizer(weights2)# loss funtionsloss = cross_entropy_mean + regularization# 設(shè)置指數(shù)衰減的學(xué)習(xí)率 。learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE, # 基礎(chǔ)的學(xué)習(xí)率,隨著迭代的進(jìn)行,更新變量時(shí)使用的學(xué)習(xí)率在這個(gè)基礎(chǔ)上遞減 。global_step, # 當(dāng)前迭代的輪數(shù) mnist.train.num_examples / BATCH_SIZE, # 過(guò)完所有的訓(xùn)練數(shù)據(jù)的迭代輪數(shù)LEARNING_RATE_DECAY) # 學(xué)習(xí)率衰減速度 。# 使用 tf.train.GradientDescentOptimizer 優(yōu)化算法來(lái)優(yōu)化損失函數(shù)train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step = global_step)# 在訓(xùn)練、神經(jīng)網(wǎng)絡(luò)模型時(shí),每過(guò)一遍數(shù)據(jù)既需要通過(guò)反向傳播來(lái)更新神經(jīng)網(wǎng)絡(luò)中的參數(shù),# 又要更新每一個(gè)參數(shù)的滑動(dòng)平均值。# 為了一次完成多個(gè)操作,TF提供了tf.control_dependencies和tf.group兩種機(jī)制。# train_op = tf.group(train_step, variable_avergaes_op) # 等價(jià)with tf.control_dependencies([train_step, variable_avergaes_op]):train_op = tf.no_op(name='train') # 檢驗(yàn)使用了滑動(dòng)平均模型的神經(jīng)網(wǎng)絡(luò)前向傳播結(jié)果是否正確。# tf.argmax(average_y , 1)計(jì)算每一個(gè)樣例的預(yù)測(cè)答案。# 其中 average_y 是一個(gè) batch_size*10 的二維數(shù)組,每一行表示一個(gè)樣例的前向傳播結(jié)果。# tf.argmax 的第二個(gè)參數(shù)"1”表示選取最大值的操作僅在第一個(gè)維度中進(jìn)行,# 也就是說(shuō),只在每一行選取最大值對(duì)應(yīng)的下標(biāo)。# 得到長(zhǎng)度為batch一維數(shù)組,數(shù)組中的值表示每個(gè)樣例對(duì)應(yīng)的數(shù)字識(shí)別結(jié)果。# tf.equal判斷兩個(gè)張量的每一維是否相等,如果相等返回 True ,否則返回 False 。correct_prediction = tf.equal(tf.argmax(average_y, 1), tf.argmax(y_, 1))# 這個(gè)運(yùn)算先將布爾型的數(shù)值轉(zhuǎn)換為實(shí)數(shù)型,然后計(jì)算平均值。表示模型在這一組數(shù)據(jù)上的正確率。accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))# 初始化會(huì)話并開(kāi)始訓(xùn)練過(guò)程。with tf.Session() as sess:tf.global_variables_initializer().run()# 準(zhǔn)備驗(yàn)證數(shù)據(jù)。 # 一般在神經(jīng)網(wǎng)絡(luò)的訓(xùn)練過(guò)程中會(huì)通過(guò)驗(yàn)證數(shù)據(jù)來(lái)大致判斷停止的條件和評(píng)判訓(xùn)練的效果。validate_feed = {x: mnist.validation.images,y_:mnist.validation.labels}# 準(zhǔn)備測(cè)試數(shù)據(jù), 只是作為模型優(yōu)劣的最后評(píng)價(jià)標(biāo)準(zhǔn)。test_feed = {x:mnist.test.images, y_:mnist.test.labels}# 迭代地訓(xùn)練神經(jīng)網(wǎng)絡(luò)。for i in range(TRAINING_STEPS):# 每 1000 輪輸出一次在驗(yàn)證數(shù)據(jù)集上的測(cè)試結(jié)果if i % 1000 == 0:# 計(jì)算滑動(dòng)平均模型在驗(yàn)證數(shù)據(jù)上的結(jié)果。因?yàn)?MNIST 數(shù)據(jù)集比較小,所以一次# 可以處理所有的驗(yàn)證數(shù)據(jù)。為了計(jì)算方便,本樣例程序沒(méi)有將驗(yàn)證數(shù)據(jù)劃分為更# 小的 batch. 當(dāng)神經(jīng)網(wǎng)絡(luò)棋型比較復(fù)雜或者驗(yàn)證數(shù)據(jù)比較大時(shí),太大的 batch# 會(huì)導(dǎo)致計(jì)算時(shí)間過(guò)長(zhǎng)甚至發(fā)生內(nèi)存溢出的錯(cuò)誤。validate_acc = sess.run(accuracy, feed_dict=validate_feed)print("After %d training step(s), validation accuracy ""using average model is %g " % (i, validate_acc))# 產(chǎn)生這一輪使用的一個(gè) batch 的訓(xùn)練數(shù)據(jù),并運(yùn)行訓(xùn)練過(guò)程。xs, ys = mnist.train.next_batch(BATCH_SIZE)sess.run(train_op, feed_dict={x: xs, y_: ys})# 在訓(xùn)練結(jié)束之后,在測(cè)試數(shù)據(jù)上檢測(cè)神經(jīng)網(wǎng)絡(luò)模型的最終正確率。test_acc = sess.run(accuracy, feed_dict=test_feed)print("After %d training step(s), test accuracy ""using average model is %g" % (TRAINING_STEPS, test_acc))def main(argv=None):mnist = input_data.read_data_sets(path, one_hot=True)train(mnist)if __name__ == "__main__":tf.app.run()輸出結(jié)果
WARNING:tensorflow:From /home/jie/Jie/codes/tf/mnist.py:166: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version. Instructions for updating: Please use alternatives such as official/mnist/dataset.py from tensorflow/models. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version. Instructions for updating: Please write your own downloading logic. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version. Instructions for updating: Please use tf.data to implement this functionality. Extracting ./datasets/MNIST_DATA/train-images-idx3-ubyte.gz WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version. Instructions for updating: Please use tf.data to implement this functionality. Extracting ./datasets/MNIST_DATA/train-labels-idx1-ubyte.gz WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version. Instructions for updating: Please use tf.one_hot on tensors. Extracting ./datasets/MNIST_DATA/t10k-images-idx3-ubyte.gz Extracting ./datasets/MNIST_DATA/t10k-labels-idx1-ubyte.gz WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version. Instructions for updating: Please use alternatives such as official/mnist/dataset.py from tensorflow/models. 2019-03-14 11:30:04.019232: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-03-14 11:30:04.104098: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-03-14 11:30:04.105001: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705 pciBusID: 0000:01:00.0 totalMemory: 5.94GiB freeMemory: 5.50GiB 2019-03-14 11:30:04.105016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2019-03-14 11:30:04.324445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-14 11:30:04.324486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2019-03-14 11:30:04.324493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2019-03-14 11:30:04.324817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5264 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1) After 0 training step(s), validation accuracy using average model is 0.1332 After 1000 training step(s), validation accuracy using average model is 0.9774 After 2000 training step(s), validation accuracy using average model is 0.9806 After 3000 training step(s), validation accuracy using average model is 0.982 After 4000 training step(s), validation accuracy using average model is 0.983 After 5000 training step(s), validation accuracy using average model is 0.9836 After 6000 training step(s), validation accuracy using average model is 0.9834 After 7000 training step(s), validation accuracy using average model is 0.9828 After 8000 training step(s), validation accuracy using average model is 0.9832 After 9000 training step(s), validation accuracy using average model is 0.9826 After 10000 training step(s), validation accuracy using average model is 0.9824 After 11000 training step(s), validation accuracy using average model is 0.9822 After 12000 training step(s), validation accuracy using average model is 0.9842 After 13000 training step(s), validation accuracy using average model is 0.9836 After 14000 training step(s), validation accuracy using average model is 0.9842 After 15000 training step(s), validation accuracy using average model is 0.9834 After 16000 training step(s), validation accuracy using average model is 0.9838 After 17000 training step(s), validation accuracy using average model is 0.9844 After 18000 training step(s), validation accuracy using average model is 0.984 After 19000 training step(s), validation accuracy using average model is 0.9834 After 20000 training step(s), validation accuracy using average model is 0.9838 After 21000 training step(s), validation accuracy using average model is 0.9844 After 22000 training step(s), validation accuracy using average model is 0.984 After 23000 training step(s), validation accuracy using average model is 0.984 After 24000 training step(s), validation accuracy using average model is 0.9844 After 25000 training step(s), validation accuracy using average model is 0.9848 After 26000 training step(s), validation accuracy using average model is 0.9838 After 27000 training step(s), validation accuracy using average model is 0.9844 After 28000 training step(s), validation accuracy using average model is 0.984 After 29000 training step(s), validation accuracy using average model is 0.9842 After 30000 training step(s), test accuracy using average model is 0.9842在訓(xùn)練初期,模型在驗(yàn)證數(shù)據(jù)集上表現(xiàn)越來(lái)越好,后來(lái)出現(xiàn)波動(dòng),說(shuō)明模型已經(jīng)接近極小值,迭代結(jié)束。
2、使用驗(yàn)證數(shù)據(jù)判斷模型效果
所需初始的超參數(shù):上述程序的開(kāi)始,設(shè)置了初始學(xué)習(xí)率、學(xué)習(xí)率衰減率、隱藏層節(jié)點(diǎn)數(shù)、迭代次數(shù)、batch_size、正則項(xiàng)系數(shù)、滑動(dòng)平均衰減數(shù)等7個(gè)不同的參數(shù)。
如何設(shè)置初始的超參數(shù):一般情況需要實(shí)驗(yàn)來(lái)調(diào)整
難點(diǎn):雖然模型的最終效果是在測(cè)試數(shù)據(jù)上進(jìn)行判定的,但不能直接使用測(cè)試數(shù)據(jù),否則會(huì)過(guò)擬合,從而丟失對(duì)未知數(shù)據(jù)的判斷能力,所以要保證測(cè)試數(shù)據(jù)在訓(xùn)練過(guò)程中是不可見(jiàn)的。
解決方法:
- 從訓(xùn)練數(shù)據(jù)中抽取一部分作為驗(yàn)證數(shù)據(jù),來(lái)評(píng)判不同參數(shù)取值下模型的表現(xiàn)。在海量數(shù)據(jù)的情況下,一般會(huì)更多地采用驗(yàn)證數(shù)據(jù)集的形式來(lái)評(píng)測(cè)模型的效果。
- 使用“交叉驗(yàn)證(cross validation)”,但是神經(jīng)網(wǎng)絡(luò)的訓(xùn)練時(shí)間本身就比較長(zhǎng),所以采用該方法會(huì)花費(fèi)大量的時(shí)間,故一般不會(huì)選用。
不同迭代輪次下,模型在驗(yàn)證數(shù)據(jù)和測(cè)試數(shù)據(jù)上的正確率。
==》每1000輪的輸出滑動(dòng)平均的模型在驗(yàn)證數(shù)據(jù)和測(cè)試數(shù)據(jù)上的正確率。
輸出結(jié)果
After 0 training step(s), validation accuracy using average model is 0.0592, test accuracy using average model is 0.058 After 1000 training step(s), validation accuracy using average model is 0.9754, test accuracy using average model is 0.9764 After 2000 training step(s), validation accuracy using average model is 0.981, test accuracy using average model is 0.981 After 3000 training step(s), validation accuracy using average model is 0.982, test accuracy using average model is 0.9824 After 4000 training step(s), validation accuracy using average model is 0.9842, test accuracy using average model is 0.9825 After 5000 training step(s), validation accuracy using average model is 0.984, test accuracy using average model is 0.9841 After 6000 training step(s), validation accuracy using average model is 0.9852, test accuracy using average model is 0.984 .... After 25000 training step(s), validation accuracy using average model is 0.9842, test accuracy using average model is 0.9839 After 26000 training step(s), validation accuracy using average model is 0.984, test accuracy using average model is 0.983 After 27000 training step(s), validation accuracy using average model is 0.9848, test accuracy using average model is 0.9841 After 28000 training step(s), validation accuracy using average model is 0.9846, test accuracy using average model is 0.9837 After 29000 training step(s), validation accuracy using average model is 0.9846, test accuracy using average model is 0.9845分析:
驗(yàn)證數(shù)據(jù)和測(cè)試數(shù)據(jù)上的準(zhǔn)確率趨勢(shì)基本一致,且他們的相關(guān)系數(shù)(correlation coefficient)大于0.9999。意味著可以通過(guò)模型在驗(yàn)證數(shù)據(jù)上的表現(xiàn)來(lái)判斷一個(gè)模型的優(yōu)劣。
==》前提:驗(yàn)證數(shù)據(jù)分布可以很好代表測(cè)試數(shù)據(jù)分布。
3、不同模型效果比較
不同的優(yōu)化方法,需要使用多層和激活函數(shù)的網(wǎng)絡(luò)結(jié)構(gòu)。此外,可以使用指數(shù)衰減學(xué)習(xí)率、加入正則化的損失函數(shù)以及滑動(dòng)平均模型。
在神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)的設(shè)計(jì)上,需要使用激活函數(shù)和多層隱藏層。在神經(jīng)網(wǎng)絡(luò)優(yōu)化時(shí),可以使用指數(shù)衰減的學(xué)習(xí)率,加入正則化的損失函數(shù)以及滑動(dòng)平均模型。
不同優(yōu)化方法的影響:
- 因?yàn)榛瑒?dòng)平均模型、指數(shù)衰減的學(xué)習(xí)率都在限制神經(jīng)網(wǎng)絡(luò)的參數(shù)的更新速度,而該數(shù)據(jù)庫(kù)模型收斂的速度很快,所以影響不大。
- 但是當(dāng)問(wèn)題更復(fù)雜時(shí),迭代不會(huì)很快收斂,所以滑動(dòng)平均模型、指數(shù)衰減的學(xué)習(xí)率可以發(fā)揮更大的作用。
- 正則化帶來(lái)的效果更為顯著
總結(jié):優(yōu)化方法可以對(duì)模型帶來(lái)更好的效果(模型越復(fù)雜,效果越明顯)
三、變量管理
當(dāng)神經(jīng)網(wǎng)絡(luò)的結(jié)構(gòu)復(fù)雜、參數(shù)很多時(shí),需要更好的方式來(lái)傳遞和管理神經(jīng)網(wǎng)絡(luò)的參數(shù)。
TensorFlow提供了通過(guò)變量名稱(chēng)來(lái)創(chuàng)建或獲取一個(gè)變量的機(jī)制,不同函數(shù)可以直接通過(guò)變量的名字來(lái)使用變量,不需要通過(guò)參數(shù)的形式到處傳遞。
==》實(shí)現(xiàn):tf.get_variable()和tf.variable_scope()
1、tf.get_variable()
tf.get_variable() 用于創(chuàng)建或獲取變量。
通過(guò)變量名稱(chēng)來(lái)獲取變量。創(chuàng)建變量時(shí),基本等價(jià)于tf.Variable()
v = tf.get_variable("v", shape=[1], initializer=tf.constant_initializer(1.0)) v = tf.Variable(tf.constant(1.0, shape=[1]), name="v")TensorFlow提供了7種不同的初始化函數(shù):
| tf.constant_initializer | 將變量初始化為給定常量 | 常量的取值 |
| tf.random_normal_initializer | 將變量初始化為滿(mǎn)足正太分布的隨機(jī)值 | 正太分布的均值和標(biāo)準(zhǔn)差 |
| tf.truncated_normal_initializer | 將變量初始化為滿(mǎn)足正太分布的隨機(jī)值,但若隨機(jī)出來(lái)的值偏離平均值超過(guò)兩個(gè)標(biāo)準(zhǔn)差,那么這個(gè)數(shù)將會(huì)被重新隨機(jī) | 正太分布的均值和標(biāo)準(zhǔn)差 |
| tf.random_uniform_initializer | 將變量初始化為滿(mǎn)足平均分布的隨機(jī)值 | 最大,最小值 |
| tf.uniform_unit_scaling_initializer | 將變量初始化為滿(mǎn)足平均分布但不影響輸出數(shù)量級(jí)的隨機(jī)值 | factor(產(chǎn)生隨機(jī)值時(shí)乘以的系數(shù)) |
| tf.zeros_initializer | 將變量設(shè)置為全為0 | 變量維度 |
| tf.ones_initializer | 將變量設(shè)置為全為1 | 變量維度 |
tf.get_variable:
- tf.get_variable變量名是一個(gè)必填參數(shù),其首先會(huì)試圖去創(chuàng)建一個(gè)名字為v的參數(shù),如果創(chuàng)建失敗(比如已經(jīng)有同名的參數(shù)),那么這個(gè)程序就會(huì)報(bào)錯(cuò)。這是為了避免無(wú)意識(shí)的變量復(fù)用造成的錯(cuò)誤。
- 如果需要通過(guò)tf.get_variable獲取一個(gè)已經(jīng)創(chuàng)建的變量,需要通過(guò)tf.variable_scope函數(shù)來(lái)生成一個(gè)上下文管理器,并明確指定在這個(gè)上下文管理器中,tf.get_variable將直接獲取已經(jīng)生成的變量。
2、tf.variable_scope()管理
下面給出一段代碼說(shuō)明如何通過(guò)tf.variable_scope函數(shù)來(lái)控制tf.get_variable函數(shù)獲取已經(jīng)創(chuàng)建過(guò)的變量。
#在名字為foo的命名空間內(nèi)創(chuàng)建名字為v的變量 with tf.variable_scope("foo"):v = tf.get_variable("v",[1],initializer=tf.constant_initializer(1.0))#因?yàn)樵诿臻gfoo已經(jīng)存在名字為v的變量,所有下面的代碼將會(huì)報(bào)錯(cuò): with tf.variable_scope("foo"):v = tf.get_variable("v",[1])#在生成上下文管理器時(shí),將參數(shù)reuse設(shè)置為T(mén)rue。這樣tf.get_variable函數(shù)將直接獲取已經(jīng)生成的變量 with tf.variable_scope("foo",reuse=True):v1 = tf.get_variable("v",[1])print v == v1 #輸出為T(mén)rue,代表v,v1是相同的Tensorflow中的變量#將參數(shù)reuse設(shè)置為T(mén)rue時(shí),tf.variable_scope將只能獲取已經(jīng)創(chuàng)建的變量,因?yàn)樵诿臻gbar中還沒(méi)有創(chuàng)建變量v,所以下面的代碼將會(huì)報(bào)錯(cuò): with tf.variable_scope("bar",reuse=True):v = tf.get_variable("v",[1])通過(guò) tf.variable_scope 控制 tf.get_variable 的語(yǔ)義:
- 如果tf.variable_scope函數(shù)使用參數(shù) reuse=None 或者reuse=False創(chuàng)建上下文管理器,tf.get_variable操作將創(chuàng)建新的變量,如果同名的變量已經(jīng)存在,則tf.get_variable函數(shù)將報(bào)錯(cuò)。另外,Tensorflow中tf.variable_scope函數(shù)是可以嵌套的。
- 如果tf.variable_scope函數(shù)使用參數(shù) reuse=True 生成上下文管理器時(shí),該上下文管理器中的所有 tf.get_variable 函數(shù)會(huì)直接獲取已經(jīng)創(chuàng)建的變量,如果變量不存在,將會(huì)報(bào)錯(cuò)。
- 使用變量管理后,就不再需要將所有變量都作為參數(shù)傳遞到不同的函數(shù)中了,當(dāng)神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)更加復(fù)雜,參數(shù)更多時(shí),使用這種變量管理的方式將大大提高程序的可讀性。
嵌套示例代碼:
with tf.variable_scope("root"):# 獲取當(dāng)前上下文管理器中 reuse 的取值。print(tf.get_variable_scope().reuse)with tf.variable_scope("foo", reuse=True):print(tf.get_variable_scope().reuse)with tf.variable_scope("bar"):print(tf.get_variable_scope().reuse)with tf.variable_scope("bar1"):print(tf.get_variable_scope().reuse)print(tf.get_variable_scope().reuse)輸出結(jié)果
False True True True False結(jié)論:在嵌套中,若指定reuse參數(shù)為T(mén)rue則輸出為T(mén)rue;當(dāng)未指定reuse參數(shù)時(shí),這時(shí)的reuse取值與外面一層保持一致,若為最后層,則為False。
通過(guò) tf.variable_scope 來(lái)管理變量命名空間:
v1 = tf.get_variable("v", 1) print(v1.name) # output: v:0 # "v"表示變量名稱(chēng); # “0”表示該變量是生成變量這個(gè)運(yùn)算的第一個(gè)結(jié)果with tf.variable_scope("foo"):v2 = tf.get_variable("v", [1])print(v2.name)# output: foo/v:0with tf.variable_scope("foo"):with tf.variable_scope("bar"):v3 = tf.get_variable("v", [1])print(v3.name)# output: foo/bar/v:0# 名稱(chēng)會(huì)加入命名空間的名稱(chēng)v4 = tf.get_variable("v1", [1])print(v4.name)# output: foo/v1:0with tf.variable_scope("", reuse=True):v5 = tf.get_variable("foo/bar/v", [1])print(v5 == v3)# output: Truev6 = tf.get_variable("foo/v1", [1])print(v6 == v4)# output: True3、前向傳播的改進(jìn)
def inference(input_tensor, reuse=False):# 定義第一層神經(jīng)網(wǎng)絡(luò)的變量和前向傳播過(guò)程。with tf.variable_scope('layer1', reuse=reuse):# 根據(jù)傳進(jìn)來(lái)的reuse來(lái)判斷是創(chuàng)建新變量還是使用已經(jīng)創(chuàng)建好的,# 第一次構(gòu)造網(wǎng)絡(luò)時(shí)需要?jiǎng)?chuàng)建新的變量,# 之后每次調(diào)用該函數(shù)都直接使用reuse=True就不需要每次將變量傳進(jìn)來(lái)了weights = tf.get_variable("weights", [INPUT_NODE, LAYER1_NODE], initializer=tf.truncated_normal_initializer(stddev=0.1))biases = tf.get_variable("biases", [LAYER1_NODE],initializer=tf.constant_initializer(0.0))layer1 = tf.nn.relu(tf.matmul(input_tensor, weights) + biases)# 定義第 2 層神經(jīng)網(wǎng)絡(luò)的變量和前向傳播過(guò)程。with tf.variable_scope('layer2', reuse=reuse):weights = tf.get_variable("weights", [LAYER1_NODE, OUT_NODE],initializer=tf.truncated_normal_initializer(stddev=0.1))biases = tf.get_variable("biases", [OUT_NODE],initializer=tf.constant_initializer(0.0))layer2 = tf.matmul(layer1, weights) + biasesreturn layer2x = tf.placeholder(tf.float32, [None, INPUT_NODE], name='x-input') y = inference(x)# 在程序中如果需要使用訓(xùn)練好的神經(jīng)網(wǎng)絡(luò)進(jìn)行推導(dǎo)時(shí),可以直接調(diào)用inference(new_x,True) new_x = ... new_y = inference(new_x, True)四、TF模型持久化
為了將訓(xùn)練得到的模型保存下來(lái)方便下次使用,即結(jié)果可以復(fù)用,需要將神經(jīng)網(wǎng)絡(luò)模型持久化。
1、持久化代碼實(shí)現(xiàn)
實(shí)現(xiàn):tf.train.Saver 類(lèi)
(1)ckpt文件的保存
import tensorflow as tf# 聲明兩個(gè)變量并計(jì)算它們的和 v1 = tf.Variable(tf.constant(1.0, shape=[1]), name="v1") v2 = tf.Variable(tf.constant(2.0, shape=[1]), name="v2") result = v1 + v2init_op = tf.global_variables_initializer() # 聲明 tf.train.Saver 類(lèi)用于保存模型。 saver = tf.train.Saver()with tf.Session() as sess:sess.run(init_op)# 將模型保存到model.ckpt 文件。saver.save(sess, "./model.ckpt")Tensorflow模型一般會(huì)存在后綴為 .ckpt 文件中,雖然上面的程序只指定了一個(gè)文件路徑,但是在這個(gè)文件目錄下會(huì)出現(xiàn)三個(gè)文件,這是因?yàn)?font color="red">Tensorflow會(huì)將計(jì)算圖的結(jié)構(gòu)和圖上的參數(shù)取值分來(lái)保存。
- 第一個(gè)文件為model.ckpt.meta,它保存了Tensorflow計(jì)算圖的結(jié)構(gòu),
- 第二個(gè)文件為model.ckpt,這個(gè)文件保存了Tensorflow程序中每一個(gè)變量的取值,
- 最后一個(gè)文件為checkpoint文件,這個(gè)文件保存了一個(gè)目錄下所有的模型文件列表。
(2)加載已經(jīng)保存的TF模型
import tensorflow as tf# load the model v1 = tf.Variable(tf.constant(1.0, shape=[1]), name="v1") v2 = tf.Variable(tf.constant(2.0, shape=[1]), name="v2") result = v1 + v2saver = tf.train.Saver()with tf.Session() as sess:saver.restore(sess, "./model.ckpt")print(sess.run(result))# 輸出:[ 3.]加載模型的代碼中,沒(méi)有運(yùn)行變量的初始化過(guò)程,而是將變量的值通過(guò)已經(jīng)保存的模型加載進(jìn)來(lái).
如果不希望重復(fù)定義圖上的運(yùn)算,也可以直接加載已經(jīng)持久化的圖。
import tensorflow as tf# 加載計(jì)算圖 saver = tf.train.import_meta_graph("./model.ckpt.meta") with tf.Session() as sess:# 加載全部變量saver.restore(sess, "./model.ckpt")# 通過(guò)張量名稱(chēng)來(lái)獲取張量print(sess.run(tf.get_default_graph().get_tensor_by_name("add:0")))# 輸出:[ 3.](3)保存或加載部分變量
情景:可能有一個(gè)之前訓(xùn)練好的五層神經(jīng)網(wǎng)絡(luò)模型,但現(xiàn)在想嘗試一個(gè)六層的神經(jīng)網(wǎng)絡(luò),那么可以將前面五層神經(jīng)網(wǎng)絡(luò)中的參數(shù)直接加載到新的模型,而僅僅將最后一層神經(jīng)網(wǎng)絡(luò)重新訓(xùn)練。
實(shí)現(xiàn):在聲明 tf.train.Saver 類(lèi)時(shí)可以提供一個(gè)列表來(lái)指定需要保存或者加載的變量。比如在加載模型的代碼中使用saver=tf.train.Saver([v1])命令來(lái)構(gòu)建tf.train.Saver類(lèi),那么只有變量v1會(huì)被加載進(jìn)來(lái),如果運(yùn)行修改后之家在v1的代碼會(huì)得到變量未初始化的錯(cuò)誤:
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value v2(4)在保存或者加載時(shí)給變量重命名
v11 = tf.Variable(tf.constant(1.0, shape=[1]), name="other-v1") v22 = tf.Variable(tf.constant(2.0, shape=[1]), name="other-v2")# 如果直接使用tf.train.Saver類(lèi)來(lái)加載模型會(huì)報(bào)變量找不到的錯(cuò)誤。# 使用字典來(lái)重命名變量即可加載原來(lái)的模型 # 字典指定了原來(lái)名稱(chēng)為v1的變量現(xiàn)在加載到變量v11中(名稱(chēng)為other-v1) saver = tf.train.Saver({"v1":v11, "v2":v22})主要目的:方便使用變量的滑動(dòng)平均值,在Tensorflow中,每一個(gè)變量的滑動(dòng)平均值是通過(guò)影子變量維護(hù)的,所以要獲取變量的滑動(dòng)平均值實(shí)際上就是獲取這個(gè)影子變量的取值。如果在加載模型時(shí)直接將影子變量映射到變量自身,那么在使用訓(xùn)練好的模型時(shí)就不需要再調(diào)用函數(shù)來(lái)獲取變量的滑動(dòng)平均值了。
(5)示例
一個(gè)保存滑動(dòng)平均模型的樣例。
import tensorflow as tf# 1. 使用滑動(dòng)平均 vv = tf.Variable(0, dtype=tf.float32, name="v") # 在沒(méi)有申明滑動(dòng)平均模型時(shí)只有一個(gè)變量 v,所以以下語(yǔ)句只會(huì)輸出“v:0” for variables in tf.global_variables():print(variables.name)# v:0ema = tf.train.ExponentialMovingAverage(0.99) maintain_average_op = ema.apply(tf.global_variables()) # 在申明滑動(dòng)平均模型之后, TensorFlow 會(huì)自動(dòng)生成一個(gè)影子變量 for variables in tf.global_variables():print(variables.name)# v:0# v/ExponentialMovingAverage:0# 2. 保存滑動(dòng)平均模型 saver = tf.train.Saver() with tf.Session() as sess:init_op = tf.global_variables_initializer()sess.run(init_op)sess.run(tf.assign(vv, 10))sess.run(maintain_average_op)# 保存的時(shí)候會(huì)將v:0 v/ExponentialMovingAverage:0這兩個(gè)變量都存下來(lái)。saver.save(sess, "./model/model1.ckpt")print(sess.run([vv, ema.average(vv)])) ''' [10.0, 0.099999905] '''# 3. 加載滑動(dòng)平均模型 vv1 = tf.Variable(0, dtype=tf.float32, name="v") # 通過(guò)變量命名將原來(lái)變量v的滑動(dòng)平均值直接賦值給vv saver = tf.train.Saver({"v/ExponentialMovingAverage":vv1}) with tf.Session() as sess:saver.restore(sess, "./model/model1.ckpt")print(sess.run(vv1))在加載時(shí)重命名滑動(dòng)平均變量。使用tf.train.ExponentialMovingAverage類(lèi)提供的variables_to_restore函數(shù)生成tf.train.Saver類(lèi)所需的變量命名字典
import tensorflow as tfv = tf.Variable(0, dtype=tf.float32, name="v") ema = tf.train.ExponentialMovingAverage(0.99)print(ema.variables_to_restore()) # {'v/ExponentialMovingAverage': <tf.Variable 'v:0' shape=() dtype=float32_ref>}saver = tf.train.Saver(ema.variables_to_restore()) with tf.Session() as sess:saver.restore(sess, "./model/model1.ckpt")print(sess.run(v))# 0.099999905(6)統(tǒng)一保存
將變量取值和計(jì)算圖結(jié)構(gòu)統(tǒng)一保存:在TF提供了 convert_variables_to_constants 函數(shù),可以將計(jì)算圖中的變量及其取值通過(guò)變量的方式保存到一個(gè)文件中。
import tensorflow as tf from tensorflow.python.framework import graph_utilv1 = tf.Variable(tf.constant(1.0, shape=[1]), name="v1") v2 = tf.Variable(tf.constant(2.0, shape=[1]), name="v2") result = v1 + v2init_op = tf.global_variables_initializer() with tf.Session() as sess:sess.run(init_op)# 導(dǎo)出當(dāng)前計(jì)算圖的GraphDef部分,只需此部分就可完成從輸入層到輸出層的計(jì)算過(guò)程graph_def = tf.get_default_graph().as_graph_def()# 將圖中的變量及其取值轉(zhuǎn)化為常量,同時(shí)將圖中不必要的節(jié)點(diǎn)去掉output_graph_def = graph_util.convert_variables_to_constants(sess, graph_def, ["add"])# 將導(dǎo)出的模型出入文件with tf.gfile.GFile("./model/combined_model.pb", "wb") as f:f.write(output_graph_def.SerializeToString())# load the model import tensorflow as tf from tensorflow.python.platform import gfilewith tf.Session() as sess:model_filename = "./model/combined_model.pb"# 保存的模型文件,并將文件解析成對(duì)應(yīng)的GrapDef Protocol Bufferwith gfile.FastGFile(model_filename, 'rb') as f:graph_def = tf.GraphDef()graph_def.ParseFromString(f.read())# 將graph_def中保存的圖加載到當(dāng)前的圖中。# return_elements=["add:0"]給出返回的張量的名稱(chēng)。# 在保存的時(shí)候給出的時(shí)計(jì)算節(jié)點(diǎn)的名稱(chēng),所以為"add",在加載的時(shí)候給出的是張量的名稱(chēng),所以時(shí)add:0result = tf.import_graph_def(graph_def, return_elements=["add:0"])print(sess.run(result))# [array([3.], dtype=float32)]2、持久化原理及數(shù)據(jù)格式
- Tensorflow是一個(gè)通過(guò)圖的形式來(lái)表達(dá)計(jì)算的編程系統(tǒng),Tensorflow程序中的所有計(jì)算都會(huì)表達(dá)為計(jì)算圖上的節(jié)點(diǎn)。
- Tensorflow通過(guò)元圖(MetGraph)來(lái)記錄計(jì)算圖中節(jié)點(diǎn)的信息以及運(yùn)行計(jì)算圖中節(jié)點(diǎn)所需要的元數(shù)據(jù)。
- Tensorflow中元圖是由MetaGraphDef Protocol Buffer定義的,MetaGraphDef中的內(nèi)容就構(gòu)成了Tensorflow持久化時(shí)的第一個(gè)文件。==》.meta文件
元圖(MetGraph)主要記錄5類(lèi)信息
保存MetGraph信息的文件默認(rèn)以.meta為后綴名,是一個(gè)二進(jìn)制文件,無(wú)法直接查看,TensorFlow提供export_meta_graph函數(shù)來(lái)以json格式導(dǎo)出MetaGraphDef Protocol Buffer。
import tensorflow as tf from tensorflow.python.framework import graph_utilv1 = tf.Variable(tf.constant(1.0, shape=[1]), name="v1") v2 = tf.Variable(tf.constant(2.0, shape=[1]), name="v2") result = v1 + v2saver = tf.train.Saver() saver.export_meta_graph("./model/model1.ckpt.meda.json", as_text=True)下面分別介紹元圖存儲(chǔ)的信息 :
- meta_info_def 屬性:記錄了計(jì)算圖中的元數(shù)據(jù)(計(jì)算圖版本號(hào)、標(biāo)簽等)及程序中所有用到的運(yùn)算方法信息。
- graph_def 屬性:記錄了計(jì)算圖上的節(jié)點(diǎn)信息,因?yàn)樵趍eta_info_def屬性已經(jīng)包含了所有運(yùn)算的信息,所以graph_def只關(guān)注運(yùn)算的連接結(jié)構(gòu)。
- saver_def 屬性:記錄了持久化模型時(shí)需要使用的一些參數(shù),如保存到文件的文件名、保存操作和加載操作的名稱(chēng),以及保存頻率等。
- collection_def 屬性:計(jì)算圖中維護(hù)集合的底層實(shí)現(xiàn),該屬性是一個(gè)從集合名稱(chēng)到集合內(nèi)容的映射。
五、TF最佳實(shí)踐樣例
- 將不同的功能模塊分開(kāi):將訓(xùn)練和測(cè)試分成兩個(gè)獨(dú)立的程序,這可以使得每一個(gè)組件更加靈活。比如訓(xùn)練神經(jīng)網(wǎng)絡(luò)的程序可以持續(xù)輸出訓(xùn)練好的模型,而測(cè)試程序可以每隔一段實(shí)踐檢驗(yàn)最新模型的正確率,如果模型效果更好,則將這個(gè)模型提供給產(chǎn)品使用。除了將不同的功能模塊分開(kāi);
- 本節(jié)還將前向傳播的過(guò)程抽象成一個(gè)單獨(dú)的庫(kù)函數(shù)。因?yàn)樯窠?jīng)網(wǎng)絡(luò)的前向傳播過(guò)程在訓(xùn)練和測(cè)試的過(guò)程中都會(huì)用到,所以通過(guò)庫(kù)函數(shù)的方式使用起來(lái)既方便又可以保證訓(xùn)練和測(cè)試過(guò)程中使用的前向傳播方法是一致的。
1、mnist_inference.py
定義了前向傳播的過(guò)程以及神經(jīng)網(wǎng)絡(luò)中的參數(shù)。
# -*- coding:utf-8 -*- import tensorflow as tf# 1. 定義神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)相關(guān)參數(shù) INPUT_NODE = 784 # 28*28 OUTPUT_NODE = 10 LAYER1_NODE = 500# 2. 通過(guò)tf.get_variable函數(shù)來(lái)獲取變量 def get_weight_variable(shape, regularizer):weights = tf.get_variable("weights", shape, initializer=tf.truncated_normal_initializer(stddev=0.1))# 當(dāng)給出正則化生成函數(shù),將正則化損失加入lossesif regularizer != None:tf.add_to_collection('losses', regularizer(weights))return weights# 3. 定義神經(jīng)網(wǎng)絡(luò)的前向傳播過(guò)程 def inference(input_tensor, regularizer):with tf.variable_scope('layer1'):weights = get_weight_variable([INPUT_NODE, LAYER1_NODE], regularizer)biases = get_variable("biases", [LAYER1_NODE], initializer=tf.constant_initializer(0.0))layer1 = tf.nn.relu(tf.matmul(input_tensor, weights) + biases)with tf.variable_scope('layer2'):weights = get_weight_variable([LAYER1_NODE, OUTPUT_NODE], regularizer)biases = get_variable("biases", [OUTPUT_NODE], initializer=tf.constant_initializer(0.0))layer2 = tf.matmul(layer1, weights) + biasesreturn layer22、mnist_train.py
定義了神經(jīng)網(wǎng)絡(luò)的訓(xùn)練過(guò)程。
# -*- coding:utf-8 -*- import os import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data# 加載 mnist inference.py 中定義的常量和前向傳播的函數(shù) import mnist_inference# 1. 定義神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)相關(guān)參數(shù) BATCH_SIZE = 100 LEARNING_RATE_BASE = 0.8 LEARNING_RATE_DECAY = 0.99 REGULARAZTION_RATE = 0.0001 TRANING_STEPS = 30000 MOVING_AVERAGE_DECAY = 0.99MODEL_SAVE_PATH = "./model/" MODEL_NAME = "model_mnist.ckpt"# 2. 定義訓(xùn)練過(guò)程 def train(mnist):# 定義輸入輸出placeholderx = tf.placeholder(tf.float32, [None, mnist_inference.INPUT_NODE], name='x-input')y_ = tf.placeholder(tf.float32, [None, mnist_inference.OUTPUT_NODE], name='y-input')regularizer = tf.contrib.layers.l2_regularizer(REGULARAZTION_RATE)y = mnist_inference.inference(x, regularizer)global_step = tf.Variable(0, trainable=False)# 定義損失函數(shù)、學(xué)習(xí)率、滑動(dòng)平均操作以及訓(xùn)練過(guò)程。variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)variable_averages_op = variable_averages.apply(tf.trainable_variables())cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))cross_entropy_mean = tf.reduce_mean(cross_entropy)loss = cross_entropy_mean + tf.add_n(tf.get_collection('losses'))learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE, global_step, mnist.train.num_examples / BATCH_SIZE,LEARNING_RATE_DECAY)train_step = tf.train.GradientDescentOptimizer(learning_rate)\.minimize(loss, global_step=global_step)with tf.control_dependencies([train_step, variable_averages_op]):train_op = tf.no_op(name='train')# 初始化TensorFlow持久化類(lèi)。saver = tf.train.Saver()with tf.Session() as sess:tf.global_variables_initializer().run()# trainfor i in range(TRANING_STEPS):xs, ys = mnist.train.next_batch(BATCH_SIZE)_, loss_value, step = sess.run([train_op, loss, global_step], feed_dict={x:xs, y_:ys})if i % 1000 == 0:print("After %d training step(s), loss on training batch is %g." % (step, loss_value))saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME), global_step=global_step)def main(argv=None):mnist = input_data.read_data_sets("./datasets/MNIST_DATA", one_hot=True)train(mnist)if __name__ == '__main__':tf.app.run()輸出結(jié)果
5216 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1) After 1 training step(s), loss on training batch is 2.84809. After 1001 training step(s), loss on training batch is 0.326219. After 2001 training step(s), loss on training batch is 0.179183. After 3001 training step(s), loss on training batch is 0.131504. After 4001 training step(s), loss on training batch is 0.113809. After 5001 training step(s), loss on training batch is 0.10388. ... After 22001 training step(s), loss on training batch is 0.0408138. After 23001 training step(s), loss on training batch is 0.0397483. After 24001 training step(s), loss on training batch is 0.0349427. After 25001 training step(s), loss on training batch is 0.0382821. After 26001 training step(s), loss on training batch is 0.0368644. After 27001 training step(s), loss on training batch is 0.0366197. After 28001 training step(s), loss on training batch is 0.0426658. After 29001 training step(s), loss on training batch is 0.0358222.3、mnist_eval.py
定義了測(cè)試過(guò)程。
# -*- coding:utf-8 -*- import time import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_dataimport mnist_inference import mnist_train# 每10秒加載一次最新的模型,并在測(cè)試數(shù)據(jù)上測(cè)試最新模型的正確率。 # 加載的時(shí)間間隔。 EVAL_INTERVAL_SECS = 10def evaluate(mnist):with tf.Graph().as_default() as g:x = tf.placeholder(tf.float32, [None, mnist_inference.INPUT_NODE], name='x-input')y_ = tf.placeholder(tf.float32, [None,mnist_inference.OUTPUT_NODE], name='y-input')validate_feed = {x: mnist.validation.images, y_:mnist.validation.labels}# 測(cè)試時(shí)不關(guān)注正則化損失的值,所以設(shè)置為Noney = mnist_inference.inference(x, None)# 計(jì)算正確率,tf.argmax(y, 1)可以得到輸入樣例的預(yù)測(cè)類(lèi)別了correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))# 通過(guò)變量重命名的方式來(lái)加載模型variable_averages = tf.train.ExponentialMovingAverage(mnist_train.MOVING_AVERAGE_DECAY)variable_to_restore = variable_averages.variables_to_restore()saver = tf.train.Saver(variable_to_restore)# 每隔 EVAL_INTERVAL_SECS 秒調(diào)用一次計(jì)算正確率的過(guò)程以檢測(cè)訓(xùn)練過(guò)程中正確率的變化。while True:with tf.Session() as sess:# tf.train.get_checkpoint_state 函數(shù)會(huì)通過(guò) checkpoint 文件自動(dòng)找到目錄中最新模型的名字ckpt = tf.train.get_checkpoint_state(mnist_train.MODEL_SAVE_PATH)if ckpt and ckpt.model_checkpoint_path:# load the modelsaver.restore(sess, ckpt.model_checkpoint_path)# 迦過(guò)文件名得到模型保存時(shí)迭代的輪數(shù)。global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]accuracy_score = sess.run(accuracy, feed_dict=validate_feed)print("After %s training step(s), validation accuracy = %g" % (global_step, accuracy_score))else:print("No checkpoint file found")returntime.sleep(EVAL_INTERVAL_SECS)def main(argv=None):mnist = input_data.read_data_sets("./datasets/MNIST_DATA", one_hot=True)evaluate(mnist)if __name__ == '__main__':tf.app.run()輸出結(jié)果
After 29001 training step(s), validation accuracy = 0.985 與50位技術(shù)專(zhuān)家面對(duì)面20年技術(shù)見(jiàn)證,附贈(zèng)技術(shù)全景圖總結(jié)
以上是生活随笔為你收集整理的【TensorFlow】笔记3:MNIST数字识别问题的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 【TensorFlow】笔记2:深层神经
- 下一篇: 【LeetCode】130.被围绕的区域