CS224n课程Assignment3参考答案
Assignment#3?solutionByJonariguezAssignment\#3 -solution\quad By\ JonariguezAssignment#3?solutionBy?Jonariguez
所有的代碼題目對應的代碼已上傳至github/CS224n/Jonariguez
所有的代碼題目對應的代碼可查看對應文件夾Assignment3_Code下的.py文件
解:
ii) 只用詞本身的話有點像基于統(tǒng)計的方法,面對低頻詞或者未統(tǒng)計詞模型表現(xiàn)不好,有時候詞也有二義性,無法確定是否為實體或者是什么實體。
iii) 上下文、詞性等。
解:
i) 推算所有變量的形狀:
x(t)∈R1×Vx^{(t)}\in \mathbb{R}^{1\times V} x(t)∈R1×V
x(t)L∈R1×Dx^{(t)}L\in \mathbb{R}^{1\times D} x(t)L∈R1×D
e(t)∈R1×(2w+1)De^{(t)}\in \mathbb{R}^{1\times (2w+1)D} e(t)∈R1×(2w+1)D
h(t)∈R1×Hh^{(t)}\in \mathbb{R}^{1\times H} h(t)∈R1×H
W∈R(2w+1)D×HW \in \mathbb{R}^{(2w+1)D\times H} W∈R(2w+1)D×H
y^(t)∈R1×C\hat{y}^{(t)}\in \mathbb{R}^{1\times C} y^?(t)∈R1×C
U∈RH×CU\in \mathbb{R}^{H\times C} U∈RH×C
b1∈R1×Hb_1\in \mathbb{R}^{1\times H} b1?∈R1×H
b2∈R1×Cb_2\in \mathbb{R}^{1\times C} b2?∈R1×C
ii) 對于1個word的復雜度為:
e(t)=[x(t?w)L,...,x(t)L,...,x(t+w)L]→O(wV)e^{(t)}=[x^{(t-w)}L,...,x^{(t)}L,...,x^{(t+w)}L]\rightarrow O(wV) e(t)=[x(t?w)L,...,x(t)L,...,x(t+w)L]→O(wV)
h(t)=ReLU(e(t)W+b1)→O(wDH)h^{(t)}=ReLU(e^{(t)}W+b_1)\rightarrow O(wDH)h(t)=ReLU(e(t)W+b1?)→O(wDH)
y^(t)=softmax(h(t)U+b2)→O(HC)\hat{y}^{(t)}=softmax(h^{(t)}U+b_2)\rightarrow O(HC)y^?(t)=softmax(h(t)U+b2?)→O(HC)
J=CD(y(t),y^(t))=?∑iyi(t)log(y^i(t))→O(C)J=CD(y^{(t)},\hat{y}^{(t)})=-\sum_{i}{y_i^{(t)}log(\hat{y}_i^{(t)})} \rightarrow O(C)J=CD(y(t),y^?(t))=?i∑?yi(t)?log(y^?i(t)?)→O(C)
所以復雜度為: O(wV+wDH+HC)O(wV+wDH+HC)O(wV+wDH+HC)
長度為T的句子復雜度為: O(T(wV+wDH+HC))O(T(wV+wDH+HC))O(T(wV+wDH+HC))
解:
在python3中利用from io import StringIO來導StringIO。
解:
i) ① window-based model中的W∈R(2w+1)D×HW\in \mathbb{R}^{(2w+1)D\times H}W∈R(2w+1)D×H,而RNN中的Wx∈RD×HW_x\in \mathbb{R}^{D\times H}Wx?∈RD×H;
② RNN多了個Wh∈RH×HW_h\in \mathbb{R}^{H\times H}Wh?∈RH×H。
ii) O((D+H)?H?T)\mathcal{O}((D+H)\cdot H\cdot T)O((D+H)?H?T).
解:
ii) ① F1F_1F1?分數(shù)的意義理解起來不夠明顯、直接明了。
② F1F_1F1?分數(shù)的計算需要整個語料庫來計算,很難進行批訓練和并行運算。
""" __call__函數(shù)的含義:假設實例化了一個該類的對象instan,那么instan(inputs,state)其實就會調(diào)用__call__()函數(shù),這樣在__call__()函數(shù)中實現(xiàn)前向傳播,調(diào)用就很方便 """ def __call__(self, inputs, state, scope=None):"""Updates the state using the previous @state and @inputs.Remember the RNN equations are:h_t = sigmoid(x_t W_x + h_{t-1} W_h + b)TODO: In the code below, implement an RNN cell using @inputs(x_t above) and the state (h_{t-1} above).- Define W_x, W_h, b to be variables of the apporiate shapeusing the `tf.get_variable' functions. Make sure you usethe names "W_x", "W_h" and "b"!- Compute @new_state (h_t) defined aboveTips:- Remember to initialize your matrices using the xavierinitialization as before.Args:inputs: is the input vector of size [None, self.input_size]state: is the previous state vector of size [None, self.state_size]scope: is the name of the scope to be used when defining the variables inside.Returns:a pair of the output vector and the new state vector."""scope = scope or type(self).__name__# It's always a good idea to scope variables in functions lest they# be defined elsewhere!with tf.variable_scope(scope):### YOUR CODE HERE (~6-10 lines)W_x = tf.get_variable('W_x',[self.input_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())W_h = tf.get_variable('W_h',[self._state_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())b = tf.get_variable('b',[self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())new_state = tf.nn.sigmoid(tf.matmul(inputs,W_x)+tf.matmul(state,W_h)+b)### END YOUR CODE #### For an RNN , the output and state are the same (N.B. this# isn't true for an LSTM, though we aren't using one of those in# our assignment)output = new_statereturn output, new_state解:
i) 如果不使用mask vector,對于t>T的部分,本不屬于句子但算入最終的損失,但是算是增大,而這部分對應的x和y都是0,這樣學習出來的模型更容易偏好y=x先這樣的預測。
后面所補的零向量所產(chǎn)生的損失對前面的隱藏狀態(tài)的梯度更新有影響。
ii)
def pad_sequences(data, max_length):ret = []# Use this zero vector when padding sequences.zero_vector = [0] * Config.n_featureszero_label = 4 # corresponds to the 'O' tagfor sentence, labels in data:### YOUR CODE HERE (~4-6 lines)mask = [True]*len(sentence)if len(sentence)>=max_length:sentence_pad = sentence[:max_length]labels_pad = labels[:max_length]mask_pad = mask[:max_length]else :pad_n = max_length-len(sentence)sentence_pad = sentence + [zero_vector]*pad_nlabels_pad = labels + [zero_label]*pad_nmask_pad = mask + [False]*pad_nret.append((sentence_pad,labels_pad,mask_pad))### END YOUR CODE ###return ret def add_placeholders(self):### YOUR CODE HERE (~4-6 lines)self.input_placeholder = tf.placeholder(tf.int32,[None,self.max_length,self.config.n_features],name='input')self.labels_placeholder =tf.placeholder(tf.int32,[None,self.max_length],name='label')self.mask_placeholder = tf.placeholder(tf.bool,[None,self.max_length],name='mask')self.dropout_placeholder=tf.placeholder(tf.float32,name='dropout')### END YOUR CODE def add_embedding(self):### YOUR CODE HERE (~4-6 lines)#注意要使用預訓練的詞向量embed = tf.Variable(self.pretrained_embeddings)embeddings = tf.nn.embedding_lookup(embed,self.input_placeholder)embeddings = tf.reshape(embeddings,[-1,self.max_length,self.config.n_features*self.config.embed_size])### END YOUR CODEreturn embeddings def add_training_op(self, loss):### YOUR CODE HERE (~1-2 lines)train_op = tf.train.AdamOptimizer(self.config.lr).minimize(loss)### END YOUR CODEreturn train_op def add_prediction_op(self):x = self.add_embedding()dropout_rate = self.dropout_placeholderpreds = [] # Predicted output at each timestep should go here!# Use the cell defined below. For Q2, we will just be using the# RNNCell you defined, but for Q3, we will run this code again# with a GRU cell!if self.config.cell == "rnn":cell = RNNCell(Config.n_features * Config.embed_size, Config.hidden_size)elif self.config.cell == "gru":cell = GRUCell(Config.n_features * Config.embed_size, Config.hidden_size)else:raise ValueError("Unsuppported cell type: " + self.config.cell)# Define U and b2 as variables.# Initialize state as vector of zeros.### YOUR CODE HERE (~4-6 lines)with tf.variable_scope('output'):U = tf.get_variable('U',[self.config.hidden_size,self.config.n_classes],initializer=tf.contrib.layers.xavier_initializer())b2= tf.get_variable('b2',[self.config.n_classes],initializer=tf.constant_initializer(0))"""初始化h0,h0的shape的最后一維很明顯是hidden_size,而第一維應該是batch_size,但這里并不寫死,然后而是根據(jù)x的shape的第一維來確定batch_size的大小"""x_shape = tf.shape(x)new_state = tf.zeros((x_shape[0],self.config.hidden_size))### END YOUR CODEwith tf.variable_scope("RNN"):"""1.首先,我們要進行RNN模型的訓練就需要定義RNN模型的cell,也就是q2_rnn_cell.py中RNNCell類的實例(這在269-272行已經(jīng)定義過了)2.先回顧一下,我們在q2_rnn_cell的__call__(input,state,scope)中定義了W_h,W_x和b并且variable_scope(scope),所以,在第一次調(diào)用cell的時候,程序會創(chuàng)建scope的變量命名空間,之后再次調(diào)用的時候應該tf.get_variable_scope().reuse_variables()來重用之前定義的變量,也就是不能重復定義新的W_h,W_x和b。3.定義常量h_0作為起始隱藏狀態(tài),注意是常量,不能訓練的那種。4.其他的按223-223行計算即可,把輸出append進preds中"""for time_step in range(self.max_length):### YOUR CODE HERE (~6-10 lines)if time_step>0:tf.get_variable_scope().reuse_variables()#o_t, h_t = cell(x_t, h_{t-1})#這里的x[:,time_step,:],第一個:代表取一個batch的全部數(shù)據(jù),time_step指定第幾個word,#最后一個:代表取這個批次的全部特征。即:取整個batch的第time_step個word的特征output_state,new_state = cell(x[:,time_step,:],new_state,'rnn-hidden')#o_drop_t = Dropout(o_t, dropout_rate)output_dropout = tf.nn.dropout(output_state,keep_prob=dropout_rate)#y_t = o_drop_t U + b_2y_t = tf.matmul(output_dropout,U)+b2preds.append(y_t)### END YOUR CODE# Make sure to reshape @preds here.### YOUR CODE HERE (~2-4 lines)"""先來推算一下preds的形狀:preds是個list,長度為self.max_length,每一個元素一個batch的輸出,故每一個元素的形狀為[batch_size,n_classes],故preds的形狀為[max_length,batch_size,n_classes]"""#改成了tf.stack,不用tf.pack了#https://blog.csdn.net/qq_33655521/article/details/83750546preds = tf.stack(preds,axis=1)### END YOUR CODEassert preds.get_shape().as_list() == [None, self.max_length, self.config.n_classes], "predictions are not of the right shape. Expected {}, got {}".format([None, self.max_length, self.config.n_classes], preds.get_shape().as_list())return preds def add_loss_op(self, preds):### YOUR CODE HERE (~2-4 lines)"""我們可以根據(jù)mask取出真正的preds和labels,然后再向往常那樣計算交叉熵"""mask_preds = tf.boolean_mask(preds,self.mask_placeholder)mask_label = tf.boolean_mask(self.labels_placeholder,self.mask_placeholder)loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=mask_label,logits=mask_preds))# preds_,labels= [],[]# pred_shape = tf.size(preds)# print(pred_shape.eval())# print(pred_shape[0].eval())# for i in range(tf.to_int32(pred_shape[0])):# batch_data = preds[i]# #查看一個batch數(shù)據(jù)的第i個樣本,這句話中每一個單詞(下標為j)# preds_.append([batch_data[j] for j in range(self.max_length) if self.mask_placeholder[i][j]==True])# labels.append([self.labels_placeholder[i][j] for j in range(self.max_length) if self.mask_placeholder[i][j]==True])# loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,logits=preds_)# loss = tf.reduce_mean(loss)### END YOUR CODEreturn loss結果:
DEBUG:Token-level confusion matrix: go\gu PER ORG LOC MISC O PER 2968.00 26.00 75.00 10.00 70.00 ORG 111.00 1663.00 99.00 81.00 138.00 LOC 27.00 66.00 1938.00 22.00 41.00 MISC 32.00 33.00 47.00 1054.00 102.00 O 34.00 36.00 22.00 30.00 42637.00DEBUG:Token-level scores: label acc prec rec f1 PER 0.99 0.94 0.94 0.94 ORG 0.99 0.91 0.79 0.85 LOC 0.99 0.89 0.93 0.91 MISC 0.99 0.88 0.83 0.86 O 0.99 0.99 1.00 0.99 micro 0.99 0.98 0.98 0.98 macro 0.99 0.92 0.90 0.91 not-O 0.99 0.91 0.89 0.90INFO:Entity level P/R/F1: 0.85/0.86/0.86解:
i) ① 句子太長,容易梯度消失;
② 無法利用后文信息來決策。
ii) ① 加入GRU門控單元;
② 利用雙向的RNN,即biRNN。
RNN
GRU
解:
i) rnn和GRU都會梯度消失,但是rnn消失的更快一些,因此梯度裁剪也不會有幫助。
ii) GRU可以有效防止梯度消失.
結果
DEBUG:Token-level confusion matrix: go\gu PER ORG LOC MISC O PER 2998.00 20.00 17.00 24.00 90.00 ORG 140.00 1639.00 75.00 108.00 130.00 LOC 59.00 82.00 1868.00 39.00 46.00 MISC 42.00 21.00 31.00 1045.00 129.00 O 26.00 42.00 9.00 37.00 42645.00DEBUG:Token-level scores: label acc prec rec f1 PER 0.99 0.92 0.95 0.93 ORG 0.99 0.91 0.78 0.84 LOC 0.99 0.93 0.89 0.91 MISC 0.99 0.83 0.82 0.83 O 0.99 0.99 1.00 0.99 micro 0.99 0.98 0.98 0.98 macro 0.99 0.92 0.89 0.90 not-O 0.99 0.91 0.88 0.89INFO:Entity level P/R/F1: 0.85/0.86/0.86總結
以上是生活随笔為你收集整理的CS224n课程Assignment3参考答案的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 网络相关面试题(持续更新)
- 下一篇: 将16x2 LCD与Arduino连接方