Tensorflow实现LSTM详解
關于什么是 LSTM 我就不詳細闡述了,吳恩達老師視頻課里面講的很好,我大概記錄了課上的內容在吳恩達《序列模型》筆記一,網上也有很多寫的好的解釋,比如:LSTM入門、理解LSTM網絡
然而,理解挺簡單,上手寫的時候還是遇到了很多的問題,網上大部分的博客都沒有講清楚 cell 參數的設置,在我看了N多篇文章后終于搞明白了,寫出來讓大家少走一些彎路吧!
如上圖是一個LSTM的單元,可以應用到多種RNN結構中,常用的應該是 one-to-many 和 many-to-many
下面介紹 many-to-many 這種結構:
實現方式一:
import tensorflow as tf import numpy as np from tensorflow.contrib import rnndef add_layer(inputs, in_size, out_size, activation_function=None): # 單層神經網絡weights = tf.Variable(tf.random_normal([in_size, out_size]))baises = tf.Variable(tf.zeros([1, out_size]) + 0.1)wx_b = tf.matmul(inputs, weights) + baisesif activation_function is None:outputs = wx_belse:outputs = activation_function(wx_b)return outputsn_words = 15 embedding_size = 8 hidden_size = 8 # 一般hidden_size和embedding_size是相同的 batch_size = 3 time_steps = 5w = tf.Variable(tf.random_normal([n_words, embedding_size], stddev=0.01)) # 模擬參數 W sentence = tf.Variable(np.arange(15).reshape(batch_size, time_step, 1)) # 模擬訓練的句子:3條句子,每個句子5個單詞 shape(3,5,1) input_s = tf.nn.embedding_lookup(w, sentence) # 將單詞映射到向量:每個單詞變成了size為8的向量 shape=(3,5,1,8) input_s = tf.reshape(input_s, [-1, 5, 8]) # shape(3,5,8)with tf.name_scope("LSTM"): # trustlstm_cell = rnn.BasicLSTMCell(hidden_size, state_is_tuple=True, name='lstm_layer') h_0 = tf.zeros([batch_size, embedding_size]) # shape=(3,8)c_0 = tf.zeros([batch_size, embedding_size]) # shape=(3,8)state = rnn.LSTMStateTuple(c=c_0, h=h_0) # 設置初始狀態outputs = []for i in range(time_steps): # 句子長度if i > 0: tf.get_variable_scope().reuse_variables() # 名字相同cell使用的參數w就一樣,為了避免重名引起別的的問題,設置一下變量重用output, state = lstm_cell(input_s[:, i, :], state) # output:[batch_size,embedding_size] shape=(3,8)outputs.append(output) # outputs:[TIME_STEP,batch_size,embedding_size] shape=(5,3,8)path = tf.concat(outputs, 1) # path:[batch_size,embedding_size*TIME_STEP] shape=(3, 40)path_embedding = add_layer(path, time_step * embedding_size, embedding_size) # path_embedding:[batch_size, embedding_size]with tf.Session() as s:s.run(tf.global_variables_initializer())# 因為使用的參數數量都還比較小,打印一些變量看看就能明白是怎么操作的print(s.run(outputs))print(s.run(path_embedding))比如一批訓練64句話,每句話20個單詞,每個詞向量長度為200,隱藏層單元個數為128
那么訓練一批句子,輸入的張量維度是[64,20,200],ht,ct? 的維度是[128],那么LSTM單元參數矩陣的維度是[128+200,4x128],
在時刻1,把64句話的第一個單詞作為輸入,即輸入一個[64,200]的矩陣,由于會和 ht 進行concat,輸入矩陣變成了[64,200+128],輸入矩陣會和參數矩陣[200+128,4x128]相乘,輸出為[64,4x128],也就是每個黃框的輸出為[64,128],黃框之間會進行一些操作,但不改變維度,輸出依舊是[64,128],即每個句子經過LSTM單元后,輸出的維度是128,所以每個LSTM輸出的都是向量,包括Ct,ht,所以它們的長度都是當前LSTM單元的hidden_size 。那么我們就知道cell_output的維度為[64,128]
之后的時刻重復剛才同樣的操作,那么outputs的維度是[20,64,128].
softmax相當于全連接層,將outputs映射到vocab_size個單詞上,進行交叉熵誤差計算。
然后根據誤差更新LSTM參數矩陣和全連接層的參數。
實現方式二:
測試數據鏈接:https://pan.baidu.com/s/1j9sgPmWUHM5boM5ekj3Q2w 提取碼:go3f
import pandas as pd import numpy as np import matplotlib.pyplot as plt import tensorflow as tfdata = pd.read_excel("seq_data.xlsx") # 讀取序列數據 data = data.values[1:800] # 取前800個 normalize_data = (data - np.mean(data)) / np.std(data) # 標準化數據 s = np.std(data) m = np.mean(data) time_step = 96 # 序列段長度 rnn_unit = 8 # 隱藏層節點數目 lstm_layers = 2 # cell層數 batch_size = 7 # 序列段批處理數目 input_size = 1 # 輸入維度 output_size = 1 # 輸出維度 lr = 0.006 # 學習率train_x, train_y = [], [] for i in range(len(data) - time_step - 1):x = normalize_data[i:i + time_step]y = normalize_data[i + 1:i + time_step + 1]train_x.append(x.tolist())train_y.append(y.tolist()) X = tf.placeholder(tf.float32, [None, time_step, input_size]) # shape(?,time_step, input_size) Y = tf.placeholder(tf.float32, [None, time_step, output_size]) # shape(?,time_step, out_size) weights = {'in': tf.Variable(tf.random_normal([input_size, rnn_unit])),'out': tf.Variable(tf.random_normal([rnn_unit, 1]))} biases = {'in': tf.Variable(tf.constant(0.1, shape=[rnn_unit, ])),'out': tf.Variable(tf.constant(0.1, shape=[1, ]))} def lstm(batch):w_in = weights['in']b_in = biases['in']input = tf.reshape(X, [-1, input_size])input_rnn = tf.matmul(input, w_in) + b_ininput_rnn = tf.reshape(input_rnn, [-1, time_step, rnn_unit])cell = tf.nn.rnn_cell.MultiRNNCell([tf.nn.rnn_cell.BasicLSTMCell(rnn_unit) for i in range(lstm_layers)])init_state = cell.zero_state(batch, dtype=tf.float32)output_rnn, final_states = tf.nn.dynamic_rnn(cell, input_rnn, initial_state=init_state, dtype=tf.float32)output = tf.reshape(output_rnn, [-1, rnn_unit])w_out = weights['out']b_out = biases['out']pred = tf.matmul(output, w_out) + b_outreturn pred, final_statesdef train_lstm():global batch_sizewith tf.variable_scope("sec_lstm"):pred, _ = lstm(batch_size)loss = tf.reduce_mean(tf.square(tf.reshape(pred, [-1]) - tf.reshape(Y, [-1])))train_op = tf.train.AdamOptimizer(lr).minimize(loss)saver = tf.train.Saver(tf.global_variables())loss_list = []with tf.Session() as sess:sess.run(tf.global_variables_initializer())for i in range(100): # We can increase the number of iterations to gain better result.start = 0end = start + batch_sizewhile (end < len(train_x)):_, loss_ = sess.run([train_op, loss], feed_dict={X: train_x[start:end], Y: train_y[start:end]})start += batch_sizeend = end + batch_sizeloss_list.append(loss_)if i % 10 == 0:print("Number of iterations:", i, " loss:", loss_list[-1])if i > 0 and loss_list[-2] > loss_list[-1]:saver.save(sess, 'model_save1\\modle.ckpt')# I run the code in windows 10,so use 'model_save1\\modle.ckpt'# if you run it in Linux,please use 'model_save1/modle.ckpt'print("The train has finished")train_lstm()def prediction():with tf.variable_scope("sec_lstm", reuse=tf.AUTO_REUSE):pred, _ = lstm(1)saver = tf.train.Saver(tf.global_variables())with tf.Session() as sess:saver.restore(sess, 'model_save1\\modle.ckpt')# I run the code in windows 10,so use 'model_save1\\modle.ckpt'# if you run it in Linux,please use 'model_save1/modle.ckpt'predict = []for i in range(0, np.shape(train_x)[0]):next_seq = sess.run(pred, feed_dict={X: [train_x[i]]})predict.append(next_seq[-1])plt.figure()plt.plot(list(range(len(data))), data, color='b')plt.plot(list(range(time_step + 1, np.shape(train_x)[0] + 1 + time_step)), [value * s + m for value in predict],color='r')plt.show()prediction()參考文章:
基于TensorFlow構建LSTM
TensorFlow實戰:LSTM的結構與cell中的參數
總結
以上是生活随笔為你收集整理的Tensorflow实现LSTM详解的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 一篇文章学习Python中的多进程
- 下一篇: 说实话:中文自然语言处理(知识图谱)的N