當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

tensorflow实现基于LSTM的文本分类方法

發(fā)布時(shí)間：2025/3/16 编程问答 19 豆豆

生活随笔收集整理的這篇文章主要介紹了 tensorflow实现基于LSTM的文本分类方法小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

http://blog.csdn.net/u010223750/article/details/53334313?locationNum=7&fps=1

引言

學(xué)習(xí)一段時(shí)間的tensor flow之后，想找個(gè)項(xiàng)目試試手，然后想起了之前在看Theano教程中的一個(gè)文本分類的實(shí)例，這個(gè)星期就用tensorflow實(shí)現(xiàn)了一下，感覺(jué)和之前使用的theano還是有很大的區(qū)別，有必要總結(jié)mark一下

模型說(shuō)明

這個(gè)分類的模型其實(shí)也是很簡(jiǎn)單，主要就是一個(gè)單層的LSTM模型，當(dāng)然也可以實(shí)現(xiàn)多層的模型，多層的模型使用Tensorflow尤其簡(jiǎn)單，下面是這個(gè)模型的圖?
?
簡(jiǎn)單解釋一下這個(gè)圖，每個(gè)word經(jīng)過(guò)embedding之后，進(jìn)入LSTM層，這里L(fēng)STM是標(biāo)準(zhǔn)的LSTM，然后經(jīng)過(guò)一個(gè)時(shí)間序列得到的t個(gè)隱藏LSTM神經(jīng)單元的向量，這些向量經(jīng)過(guò)mean pooling層之后，可以得到一個(gè)向量

tensorflow實(shí)現(xiàn)

鄙人接觸tensor flow的時(shí)間不長(zhǎng)，也是在慢慢摸索，但是因?yàn)橛兄笆褂肨heano的經(jīng)驗(yàn)，對(duì)于符號(hào)化編程也不算陌生，因此上手Tensorflow倒也容易。但是感覺(jué)tensorflow還是和theano有著很多不一樣的地方，這里也會(huì)提及一下。?
代碼的模型的主要如下：

import tensorflow as tf import numpy as npclass RNN_Model(object):def __init__(self,config,is_training=True):self.keep_prob=config.keep_probself.batch_size=tf.Variable(0,dtype=tf.int32,trainable=False) num_step=config.num_step self.input_data=tf.placeholder(tf.int32,[None,num_step]) self.target = tf.placeholder(tf.int64,[None]) self.mask_x = tf.placeholder(tf.float32,[num_step,None]) class_num=config.class_num hidden_neural_size=config.hidden_neural_size vocabulary_size=config.vocabulary_size embed_dim=config.embed_dim hidden_layer_num=config.hidden_layer_num self.new_batch_size = tf.placeholder(tf.int32,shape=[],name="new_batch_size") self._batch_size_update = tf.assign(self.batch_size,self.new_batch_size) #build LSTM network lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(hidden_neural_size,forget_bias=0.0,state_is_tuple=True) if self.keep_prob<1: lstm_cell = tf.nn.rnn_cell.DropoutWrapper( lstm_cell,output_keep_prob=self.keep_prob ) cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell]*hidden_layer_num,state_is_tuple=True) self._initial_state = cell.zero_state(self.batch_size,dtype=tf.float32) #embedding layer with tf.device("/cpu:0"),tf.name_scope("embedding_layer"): embedding = tf.get_variable("embedding",[vocabulary_size,embed_dim],dtype=tf.float32) inputs=tf.nn.embedding_lookup(embedding,self.input_data) if self.keep_prob<1: inputs = tf.nn.dropout(inputs,self.keep_prob) out_put=[] state=self._initial_state with tf.variable_scope("LSTM_layer"): for time_step in range(num_step): if time_step>0: tf.get_variable_scope().reuse_variables() (cell_output,state)=cell(inputs[:,time_step,:],state) out_put.append(cell_output) out_put=out_put*self.mask_x[:,:,None] with tf.name_scope("mean_pooling_layer"): out_put=tf.reduce_sum(out_put,0)/(tf.reduce_sum(self.mask_x,0)[:,None]) with tf.name_scope("Softmax_layer_and_output"): softmax_w = tf.get_variable("softmax_w",[hidden_neural_size,class_num],dtype=tf.float32) softmax_b = tf.get_variable("softmax_b",[class_num],dtype=tf.float32) self.logits = tf.matmul(out_put,softmax_w)+softmax_b with tf.name_scope("loss"): self.loss = tf.nn.sparse_softmax_cross_entropy_with_logits(self.logits+1e-10,self.target) self.cost = tf.reduce_mean(self.loss) with tf.name_scope("accuracy"): self.prediction = tf.argmax(self.logits,1) correct_prediction = tf.equal(self.prediction,self.target) self.correct_num=tf.reduce_sum(tf.cast(correct_prediction,tf.float32)) self.accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32),name="accuracy") #add summary loss_summary = tf.scalar_summary("loss",self.cost) #add summary accuracy_summary=tf.scalar_summary("accuracy_summary",self.accuracy) if not is_training: return self.globle_step = tf.Variable(0,name="globle_step",trainable=False) self.lr = tf.Variable(0.0,trainable=False) tvars = tf.trainable_variables() grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars), config.max_grad_norm) # Keep track of gradient values and sparsity (optional) grad_summaries = [] for g, v in zip(grads, tvars): if g is not None: grad_hist_summary = tf.histogram_summary("{}/grad/hist".format(v.name), g) sparsity_summary = tf.scalar_summary("{}/grad/sparsity".format(v.name), tf.nn.zero_fraction(g)) grad_summaries.append(grad_hist_summary) grad_summaries.append(sparsity_summary) self.grad_summaries_merged = tf.merge_summary(grad_summaries) self.summary =tf.merge_summary([loss_summary,accuracy_summary,self.grad_summaries_merged]) optimizer = tf.train.GradientDescentOptimizer(self.lr) optimizer.apply_gradients(zip(grads, tvars)) self.train_op=optimizer.apply_gradients(zip(grads, tvars)) self.new_lr = tf.placeholder(tf.float32,shape=[],name="new_learning_rate") self._lr_update = tf.assign(self.lr,self.new_lr) def assign_new_lr(self,session,lr_value): session.run(self._lr_update,feed_dict={self.new_lr:lr_value}) def assign_new_batch_size(self,session,batch_size_value): session.run(self._batch_size_update,feed_dict={self.new_batch_size:batch_size_value})

模型不復(fù)雜，也就不一一解釋了，在debug的時(shí)候，還是入了幾個(gè)tensorflow的坑，因此想單獨(dú)說(shuō)一下這幾個(gè)坑。

坑1：tensor flow的LSTM實(shí)現(xiàn)?
tensorflow是已經(jīng)寫(xiě)好了幾個(gè)LSTM的實(shí)現(xiàn)類，可以很方便的使用，而且也可以選擇多種類型的LSTM，包括Basic、Bi-Directional等等。?
這個(gè)代碼用的是BasicLSTM：

#build LSTM networklstm_cell = tf.nn.rnn_cell.BasicLSTMCell(hidden_neural_size,forget_bias=0.0,state_is_tuple=True) if self.keep_prob<1: lstm_cell = tf.nn.rnn_cell.DropoutWrapper( lstm_cell,output_keep_prob=self.keep_prob ) cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell]*hidden_layer_num,state_is_tuple=True) self._initial_state = cell.zero_state(self.batch_size,dtype=tf.float32) out_put=[] state=self._initial_state with tf.variable_scope("LSTM_layer"): for time_step in range(num_step): if time_step>0: tf.get_variable_scope().reuse_variables() (cell_output,state)=cell(inputs[:,time_step,:],state) out_put.append(cell_output)

在這段代碼里面，tf.nn.rnn_cell.BasicLSTMCell的初始化只需要制定LSTM神經(jīng)元的隱含神經(jīng)元的個(gè)數(shù)即可，然后需要初始化LSTM網(wǎng)絡(luò)的參數(shù)：self._initial_state = cell.zero_state(self.batch_size,dtype=tf.float32)，這句代碼乍看一下很迷糊，開(kāi)始并不知道是什么意義，在實(shí)驗(yàn)以及查閱源碼之后，返現(xiàn)這句話返回的是兩個(gè)維度是batch_size*hidden_neural_size的零向量元組，其實(shí)就是LSTM初始化的

坑2：這段代碼中的zero_state和循環(huán)代數(shù)num_step都需要制定?
這里比較蛋疼，這就意味著tensorflow中實(shí)現(xiàn)變長(zhǎng)的情況是要padding的，而且需要全部一樣的長(zhǎng)度，但是因?yàn)閿?shù)據(jù)集的原因，不可能每個(gè)batch的size都是一樣的，這里就需要每次運(yùn)行前，動(dòng)態(tài)制定batch_size的大小，代碼中體現(xiàn)這個(gè)的是assign_new_batch_size函數(shù)，但是對(duì)于num_step參數(shù)卻不能動(dòng)態(tài)指定（可能是因?yàn)楣P者沒(méi)找到，但是指定tf.Variable()方法確實(shí)不行），出于無(wú)奈只能將數(shù)據(jù)集全部padding成指定大小的size，當(dāng)然既然使用了padding那就必須使用mask矩陣進(jìn)行計(jì)算。

坑3：cost返回non?
cost返回Non一般是因?yàn)樵谑褂媒徊骒貢r(shí)候，logits這一邊出現(xiàn)了0值，因此stack overflow上推薦的一般是：sparse_softmax_cross_entropy_with_logits(self.logits+1e-10,self.target)這樣寫(xiě)法

訓(xùn)練and結(jié)果

實(shí)驗(yàn)背景：?
tensor flow: tensor flow 1.1?
platform:mac OS?
數(shù)據(jù)集：subject dataset，數(shù)據(jù)集都經(jīng)過(guò)了預(yù)處理，拿到的是其在詞表中的索引?
得益于tensorboard各個(gè)參數(shù)訓(xùn)練過(guò)程都可以可視化，下面是實(shí)驗(yàn)訓(xùn)練結(jié)果：

訓(xùn)練集訓(xùn)練結(jié)果：?
?
驗(yàn)證集訓(xùn)練結(jié)果?
?
損失函數(shù)訓(xùn)練過(guò)程?
?
各個(gè)參數(shù)訓(xùn)練結(jié)果：?

最終在測(cè)試集上，準(zhǔn)確度約為85%，還不錯(cuò)。

比較tensorflow和thenao

tensor flow 和 theano 是最近比較流行的深度學(xué)習(xí)框架，兩者非常相似但是兩者又不一樣，下面就我個(gè)人體驗(yàn)比較下兩者的異同。

難易程度

就使用難度而言，tensorflow的便易性要遠(yuǎn)勝于theano，畢竟theano是一堆學(xué)者研究出來(lái)的，而tensorflow是Google研究出來(lái)的，比較面向工業(yè)化。tensor flow直接集成了學(xué)術(shù)界的很多方法，比如像RNN、LSTM等都已經(jīng)被tensorflow集成了，還有比如參數(shù)更新方法如梯度下降、Adadelta等也已經(jīng)被tensorflow寫(xiě)好了，但是對(duì)于theano這個(gè)就得自己寫(xiě)，當(dāng)然難易程度不一樣了。

靈活性

就靈活性而言，theano是要?jiǎng)龠^(guò)tensor flow的，正是因?yàn)樯弦稽c(diǎn)theano的門檻稍高，卻也使得theano有著更大的彈性，可以實(shí)現(xiàn)自己任意定義的網(wǎng)絡(luò)結(jié)果，這里不是說(shuō)tensorflow不行，tensorflow也能寫(xiě)，但是使用tensorflow久了之后，寫(xiě)一些自定義的結(jié)構(gòu)能力就會(huì)生疏許多，比如修改LSTM內(nèi)的一些結(jié)構(gòu)。而Theano則沒(méi)有這個(gè)約束。

容錯(cuò)性?
我個(gè)人覺(jué)得theano的容錯(cuò)性是比tensor flow要高的，theano定義變量，只需要制定類型，比如imatrix、ivertor之類的而不用制定任何的維度，只要你輸入的數(shù)據(jù)和你的網(wǎng)絡(luò)結(jié)構(gòu)圖能夠?qū)Φ纳系脑?#xff0c;就沒(méi)問(wèn)題，而tensorflow擇需要預(yù)先指定一些參數(shù)（如上面代碼的num_step參數(shù)），相比而言，theano的容錯(cuò)能力多得多，當(dāng)然這樣也有壞處，那就是可能對(duì)導(dǎo)致代碼調(diào)試起來(lái)比較費(fèi)勁兒。

代碼?
本文的代碼可以在這里獲得，轉(zhuǎn)載請(qǐng)注明出處。?
25/11/2016，于北京

轉(zhuǎn)載于:https://www.cnblogs.com/DjangoBlog/p/7279988.html

總結(jié)

以上是生活随笔為你收集整理的tensorflow实现基于LSTM的文本分类方法的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： webx学习（三）——Webx Turb
下一篇：微软发布新品被指剽窃！交涉无果，两年开源