當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Tensorflow实现LSTM详解

發布時間：2024/7/5 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了 Tensorflow实现LSTM详解小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

關于什么是 LSTM 我就不詳細闡述了，吳恩達老師視頻課里面講的很好，我大概記錄了課上的內容在吳恩達《序列模型》筆記一，網上也有很多寫的好的解釋，比如：LSTM入門、理解LSTM網絡

然而，理解挺簡單，上手寫的時候還是遇到了很多的問題，網上大部分的博客都沒有講清楚 cell 參數的設置，在我看了N多篇文章后終于搞明白了，寫出來讓大家少走一些彎路吧！

如上圖是一個LSTM的單元，可以應用到多種RNN結構中，常用的應該是 one-to-many 和 many-to-many

下面介紹 many-to-many 這種結構：

batch_size：批度訓練大小，即讓 batch_size 個句子同時訓練。

time_steps：時間長度，即句子的長度

embedding_size：組成句子的單詞的向量長度（embedding size）

hidden_size：隱藏單元數，一個LSTM結構是一個神經網絡（如上圖就是一個LSTM單元），每個小黃框是一個神經網絡，小黃框的隱藏單元數就是hidden_size，那么這個LSTM單元就有 4*hidden_size 個隱藏單元。

每個LSTM單元的輸出 C、h，都是向量，他們的長度都是當前 LSTM 單元的 hidden_size。

n_words：語料庫中單詞個數。

實現方式一：

import tensorflow as tf import numpy as np from tensorflow.contrib import rnndef add_layer(inputs, in_size, out_size, activation_function=None): # 單層神經網絡weights = tf.Variable(tf.random_normal([in_size, out_size]))baises = tf.Variable(tf.zeros([1, out_size]) + 0.1)wx_b = tf.matmul(inputs, weights) + baisesif activation_function is None:outputs = wx_belse:outputs = activation_function(wx_b)return outputsn_words = 15 embedding_size = 8 hidden_size = 8 # 一般hidden_size和embedding_size是相同的 batch_size = 3 time_steps = 5w = tf.Variable(tf.random_normal([n_words, embedding_size], stddev=0.01)) # 模擬參數 W sentence = tf.Variable(np.arange(15).reshape(batch_size, time_step, 1)) # 模擬訓練的句子：3條句子，每個句子5個單詞 shape(3,5,1) input_s = tf.nn.embedding_lookup(w, sentence) # 將單詞映射到向量：每個單詞變成了size為8的向量 shape=(3,5,1,8) input_s = tf.reshape(input_s, [-1, 5, 8]) # shape(3,5,8)with tf.name_scope("LSTM"): # trustlstm_cell = rnn.BasicLSTMCell(hidden_size, state_is_tuple=True, name='lstm_layer') h_0 = tf.zeros([batch_size, embedding_size]) # shape=(3,8)c_0 = tf.zeros([batch_size, embedding_size]) # shape=(3,8)state = rnn.LSTMStateTuple(c=c_0, h=h_0) # 設置初始狀態outputs = []for i in range(time_steps): # 句子長度if i > 0: tf.get_variable_scope().reuse_variables() # 名字相同cell使用的參數w就一樣，為了避免重名引起別的的問題，設置一下變量重用output, state = lstm_cell(input_s[:, i, :], state) # output:[batch_size,embedding_size] shape=(3,8)outputs.append(output) # outputs:[TIME_STEP,batch_size,embedding_size] shape=(5,3,8)path = tf.concat(outputs, 1) # path:[batch_size,embedding_size*TIME_STEP] shape=(3, 40)path_embedding = add_layer(path, time_step * embedding_size, embedding_size) # path_embedding:[batch_size, embedding_size]with tf.Session() as s:s.run(tf.global_variables_initializer())# 因為使用的參數數量都還比較小，打印一些變量看看就能明白是怎么操作的print(s.run(outputs))print(s.run(path_embedding))

比如一批訓練64句話，每句話20個單詞，每個詞向量長度為200，隱藏層單元個數為128
那么訓練一批句子，輸入的張量維度是[64,20,200]，h_t，c_t? 的維度是[128]，那么LSTM單元參數矩陣的維度是[128+200,4x128]，
在時刻1，把64句話的第一個單詞作為輸入，即輸入一個[64,200]的矩陣，由于會和 h_t 進行concat，輸入矩陣變成了[64,200+128]，輸入矩陣會和參數矩陣[200+128,4x128]相乘，輸出為[64,4x128]，也就是每個黃框的輸出為[64,128]，黃框之間會進行一些操作，但不改變維度，輸出依舊是[64,128]，即每個句子經過LSTM單元后，輸出的維度是128，所以每個LSTM輸出的都是向量，包括C_t,h_t，所以它們的長度都是當前LSTM單元的hidden_size 。那么我們就知道cell_output的維度為[64,128]
之后的時刻重復剛才同樣的操作，那么outputs的維度是[20,64,128].
softmax相當于全連接層，將outputs映射到vocab_size個單詞上，進行交叉熵誤差計算。
然后根據誤差更新LSTM參數矩陣和全連接層的參數。

實現方式二：

測試數據鏈接：https://pan.baidu.com/s/1j9sgPmWUHM5boM5ekj3Q2w 提取碼：go3f

import pandas as pd import numpy as np import matplotlib.pyplot as plt import tensorflow as tfdata = pd.read_excel("seq_data.xlsx") # 讀取序列數據 data = data.values[1:800] # 取前800個 normalize_data = (data - np.mean(data)) / np.std(data) # 標準化數據 s = np.std(data) m = np.mean(data) time_step = 96 # 序列段長度 rnn_unit = 8 # 隱藏層節點數目 lstm_layers = 2 # cell層數 batch_size = 7 # 序列段批處理數目 input_size = 1 # 輸入維度 output_size = 1 # 輸出維度 lr = 0.006 # 學習率train_x, train_y = [], [] for i in range(len(data) - time_step - 1):x = normalize_data[i:i + time_step]y = normalize_data[i + 1:i + time_step + 1]train_x.append(x.tolist())train_y.append(y.tolist()) X = tf.placeholder(tf.float32, [None, time_step, input_size]) # shape(?,time_step, input_size) Y = tf.placeholder(tf.float32, [None, time_step, output_size]) # shape(?,time_step, out_size) weights = {'in': tf.Variable(tf.random_normal([input_size, rnn_unit])),'out': tf.Variable(tf.random_normal([rnn_unit, 1]))} biases = {'in': tf.Variable(tf.constant(0.1, shape=[rnn_unit, ])),'out': tf.Variable(tf.constant(0.1, shape=[1, ]))} def lstm(batch):w_in = weights['in']b_in = biases['in']input = tf.reshape(X, [-1, input_size])input_rnn = tf.matmul(input, w_in) + b_ininput_rnn = tf.reshape(input_rnn, [-1, time_step, rnn_unit])cell = tf.nn.rnn_cell.MultiRNNCell([tf.nn.rnn_cell.BasicLSTMCell(rnn_unit) for i in range(lstm_layers)])init_state = cell.zero_state(batch, dtype=tf.float32)output_rnn, final_states = tf.nn.dynamic_rnn(cell, input_rnn, initial_state=init_state, dtype=tf.float32)output = tf.reshape(output_rnn, [-1, rnn_unit])w_out = weights['out']b_out = biases['out']pred = tf.matmul(output, w_out) + b_outreturn pred, final_statesdef train_lstm():global batch_sizewith tf.variable_scope("sec_lstm"):pred, _ = lstm(batch_size)loss = tf.reduce_mean(tf.square(tf.reshape(pred, [-1]) - tf.reshape(Y, [-1])))train_op = tf.train.AdamOptimizer(lr).minimize(loss)saver = tf.train.Saver(tf.global_variables())loss_list = []with tf.Session() as sess:sess.run(tf.global_variables_initializer())for i in range(100): # We can increase the number of iterations to gain better result.start = 0end = start + batch_sizewhile (end < len(train_x)):_, loss_ = sess.run([train_op, loss], feed_dict={X: train_x[start:end], Y: train_y[start:end]})start += batch_sizeend = end + batch_sizeloss_list.append(loss_)if i % 10 == 0:print("Number of iterations:", i, " loss:", loss_list[-1])if i > 0 and loss_list[-2] > loss_list[-1]:saver.save(sess, 'model_save1\\modle.ckpt')# I run the code in windows 10,so use 'model_save1\\modle.ckpt'# if you run it in Linux,please use 'model_save1/modle.ckpt'print("The train has finished")train_lstm()def prediction():with tf.variable_scope("sec_lstm", reuse=tf.AUTO_REUSE):pred, _ = lstm(1)saver = tf.train.Saver(tf.global_variables())with tf.Session() as sess:saver.restore(sess, 'model_save1\\modle.ckpt')# I run the code in windows 10,so use 'model_save1\\modle.ckpt'# if you run it in Linux,please use 'model_save1/modle.ckpt'predict = []for i in range(0, np.shape(train_x)[0]):next_seq = sess.run(pred, feed_dict={X: [train_x[i]]})predict.append(next_seq[-1])plt.figure()plt.plot(list(range(len(data))), data, color='b')plt.plot(list(range(time_step + 1, np.shape(train_x)[0] + 1 + time_step)), [value * s + m for value in predict],color='r')plt.show()prediction()

參考文章：

基于TensorFlow構建LSTM
TensorFlow實戰：LSTM的結構與cell中的參數

總結

以上是生活随笔為你收集整理的Tensorflow实现LSTM详解的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：一篇文章学习Python中的多进程
下一篇：说实话：中文自然语言处理(知识图谱)的N