當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

CS224n课程Assignment3参考答案

發(fā)布時間：2024/3/26 编程问答 38 豆豆

生活随笔收集整理的這篇文章主要介紹了 CS224n课程Assignment3参考答案小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

$Assignment#3?solutionByJonariguezAssignment\#3 -solution\quad By\ Jonariguez$

所有的代碼題目對應的代碼已上傳至github/CS224n/Jonariguez

所有的代碼題目對應的代碼可查看對應文件夾Assignment3_Code下的.py文件

解：
ii) 只用詞本身的話有點像基于統(tǒng)計的方法，面對低頻詞或者未統(tǒng)計詞模型表現(xiàn)不好，有時候詞也有二義性，無法確定是否為實體或者是什么實體。
iii) 上下文、詞性等。

解：
i) 推算所有變量的形狀：

$x(t)∈R1×Vx^{(t)}\in \mathbb{R}^{1\times V}$
$x(t)L∈R1×Dx^{(t)}L\in \mathbb{R}^{1\times D}$
$e(t)∈R1×(2w+1)De^{(t)}\in \mathbb{R}^{1\times (2w+1)D}$
$h(t)∈R1×Hh^{(t)}\in \mathbb{R}^{1\times H}$
$\in \mathbb{R}^{(2w+1)D\times H}$
$y^(t)∈R1×C\hat{y}^{(t)}\in \mathbb{R}^{1\times C}$
$U∈RH×CU\in \mathbb{R}^{H\times C}$
$b1∈R1×Hb_1\in \mathbb{R}^{1\times H}$
$b2∈R1×Cb_2\in \mathbb{R}^{1\times C}$

ii) 對于1個word的復雜度為：

$e(t)=[x(t?w)L,...,x(t)L,...,x(t+w)L]→O(wV)e^{(t)}=[x^{(t-w)}L,...,x^{(t)}L,...,x^{(t+w)}L]\rightarrow O(wV)$
$h(t)=ReLU(e(t)W+b1)→O(wDH)h^{(t)}=ReLU(e^{(t)}W+b_1)\rightarrow O(wDH)$
$y^(t)=softmax(h(t)U+b2)→O(HC)\hat{y}^{(t)}=softmax(h^{(t)}U+b_2)\rightarrow O(HC)$
$J=CD(y(t),y^(t))=?∑iyi(t)log(y^i(t))→O(C)J=CD(y^{(t)},\hat{y}^{(t)})=-\sum_{i}{y_i^{(t)}log(\hat{y}_i^{(t)})} \rightarrow O(C)$

所以復雜度為： $O (w V + w D H + H C)$
長度為T的句子復雜度為： $O (T (w V + w D H + H C))$

解：

在python3中利用from io import StringIO來導StringIO。

解：

i) ① window-based model中的 $W∈R(2w+1)D×HW\in \mathbb{R}^{(2w+1)D\times H}$ ，而RNN中的 $Wx∈RD×HW_x\in \mathbb{R}^{D\times H}$ ;

② RNN多了個 $Wh∈RH×HW_h\in \mathbb{R}^{H\times H}$ 。

ii) $O((D+H)?H?T)\mathcal{O}((D+H)\cdot H\cdot T)$ .

解：
ii) ① $F_1$ 分數(shù)的意義理解起來不夠明顯、直接明了。

② $F_1$ 分數(shù)的計算需要整個語料庫來計算，很難進行批訓練和并行運算。

""" __call__函數(shù)的含義：假設實例化了一個該類的對象instan,那么instan(inputs,state)其實就會調(diào)用__call__()函數(shù)，這樣在__call__()函數(shù)中實現(xiàn)前向傳播，調(diào)用就很方便 """ def __call__(self, inputs, state, scope=None):"""Updates the state using the previous @state and @inputs.Remember the RNN equations are:h_t = sigmoid(x_t W_x + h_{t-1} W_h + b)TODO: In the code below, implement an RNN cell using @inputs(x_t above) and the state (h_{t-1} above).- Define W_x, W_h, b to be variables of the apporiate shapeusing the `tf.get_variable' functions. Make sure you usethe names "W_x", "W_h" and "b"!- Compute @new_state (h_t) defined aboveTips:- Remember to initialize your matrices using the xavierinitialization as before.Args:inputs: is the input vector of size [None, self.input_size]state: is the previous state vector of size [None, self.state_size]scope: is the name of the scope to be used when defining the variables inside.Returns:a pair of the output vector and the new state vector."""scope = scope or type(self).__name__# It's always a good idea to scope variables in functions lest they# be defined elsewhere!with tf.variable_scope(scope):### YOUR CODE HERE (~6-10 lines)W_x = tf.get_variable('W_x',[self.input_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())W_h = tf.get_variable('W_h',[self._state_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())b = tf.get_variable('b',[self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())new_state = tf.nn.sigmoid(tf.matmul(inputs,W_x)+tf.matmul(state,W_h)+b)### END YOUR CODE #### For an RNN , the output and state are the same (N.B. this# isn't true for an LSTM, though we aren't using one of those in# our assignment)output = new_statereturn output, new_state

解：
i) 如果不使用mask vector，對于t>T的部分，本不屬于句子但算入最終的損失，但是算是增大，而這部分對應的x和y都是0，這樣學習出來的模型更容易偏好y=x先這樣的預測。
后面所補的零向量所產(chǎn)生的損失對前面的隱藏狀態(tài)的梯度更新有影響。

ii)

def pad_sequences(data, max_length):ret = []# Use this zero vector when padding sequences.zero_vector = [0] * Config.n_featureszero_label = 4 # corresponds to the 'O' tagfor sentence, labels in data:### YOUR CODE HERE (~4-6 lines)mask = [True]*len(sentence)if len(sentence)>=max_length:sentence_pad = sentence[:max_length]labels_pad = labels[:max_length]mask_pad = mask[:max_length]else :pad_n = max_length-len(sentence)sentence_pad = sentence + [zero_vector]*pad_nlabels_pad = labels + [zero_label]*pad_nmask_pad = mask + [False]*pad_nret.append((sentence_pad,labels_pad,mask_pad))### END YOUR CODE ###return ret

def add_placeholders(self):### YOUR CODE HERE (~4-6 lines)self.input_placeholder = tf.placeholder(tf.int32,[None,self.max_length,self.config.n_features],name='input')self.labels_placeholder =tf.placeholder(tf.int32,[None,self.max_length],name='label')self.mask_placeholder = tf.placeholder(tf.bool,[None,self.max_length],name='mask')self.dropout_placeholder=tf.placeholder(tf.float32,name='dropout')### END YOUR CODE def add_embedding(self):### YOUR CODE HERE (~4-6 lines)#注意要使用預訓練的詞向量embed = tf.Variable(self.pretrained_embeddings)embeddings = tf.nn.embedding_lookup(embed,self.input_placeholder)embeddings = tf.reshape(embeddings,[-1,self.max_length,self.config.n_features*self.config.embed_size])### END YOUR CODEreturn embeddings def add_training_op(self, loss):### YOUR CODE HERE (~1-2 lines)train_op = tf.train.AdamOptimizer(self.config.lr).minimize(loss)### END YOUR CODEreturn train_op def add_prediction_op(self):x = self.add_embedding()dropout_rate = self.dropout_placeholderpreds = [] # Predicted output at each timestep should go here!# Use the cell defined below. For Q2, we will just be using the# RNNCell you defined, but for Q3, we will run this code again# with a GRU cell!if self.config.cell == "rnn":cell = RNNCell(Config.n_features * Config.embed_size, Config.hidden_size)elif self.config.cell == "gru":cell = GRUCell(Config.n_features * Config.embed_size, Config.hidden_size)else:raise ValueError("Unsuppported cell type: " + self.config.cell)# Define U and b2 as variables.# Initialize state as vector of zeros.### YOUR CODE HERE (~4-6 lines)with tf.variable_scope('output'):U = tf.get_variable('U',[self.config.hidden_size,self.config.n_classes],initializer=tf.contrib.layers.xavier_initializer())b2= tf.get_variable('b2',[self.config.n_classes],initializer=tf.constant_initializer(0))"""初始化h0,h0的shape的最后一維很明顯是hidden_size,而第一維應該是batch_size,但這里并不寫死，然后而是根據(jù)x的shape的第一維來確定batch_size的大小"""x_shape = tf.shape(x)new_state = tf.zeros((x_shape[0],self.config.hidden_size))### END YOUR CODEwith tf.variable_scope("RNN"):"""1.首先，我們要進行RNN模型的訓練就需要定義RNN模型的cell，也就是q2_rnn_cell.py中RNNCell類的實例(這在269-272行已經(jīng)定義過了)2.先回顧一下，我們在q2_rnn_cell的__call__(input,state,scope)中定義了W_h,W_x和b并且variable_scope(scope)，所以，在第一次調(diào)用cell的時候，程序會創(chuàng)建scope的變量命名空間，之后再次調(diào)用的時候應該tf.get_variable_scope().reuse_variables()來重用之前定義的變量，也就是不能重復定義新的W_h,W_x和b。3.定義常量h_0作為起始隱藏狀態(tài)，注意是常量，不能訓練的那種。4.其他的按223-223行計算即可，把輸出append進preds中"""for time_step in range(self.max_length):### YOUR CODE HERE (~6-10 lines)if time_step>0:tf.get_variable_scope().reuse_variables()#o_t, h_t = cell(x_t, h_{t-1})#這里的x[:,time_step,:]，第一個:代表取一個batch的全部數(shù)據(jù)，time_step指定第幾個word，#最后一個:代表取這個批次的全部特征。即：取整個batch的第time_step個word的特征output_state,new_state = cell(x[:,time_step,:],new_state,'rnn-hidden')#o_drop_t = Dropout(o_t, dropout_rate)output_dropout = tf.nn.dropout(output_state,keep_prob=dropout_rate)#y_t = o_drop_t U + b_2y_t = tf.matmul(output_dropout,U)+b2preds.append(y_t)### END YOUR CODE# Make sure to reshape @preds here.### YOUR CODE HERE (~2-4 lines)"""先來推算一下preds的形狀：preds是個list，長度為self.max_length，每一個元素一個batch的輸出，故每一個元素的形狀為[batch_size,n_classes]，故preds的形狀為[max_length,batch_size,n_classes]"""#改成了tf.stack，不用tf.pack了#https://blog.csdn.net/qq_33655521/article/details/83750546preds = tf.stack(preds,axis=1)### END YOUR CODEassert preds.get_shape().as_list() == [None, self.max_length, self.config.n_classes], "predictions are not of the right shape. Expected {}, got {}".format([None, self.max_length, self.config.n_classes], preds.get_shape().as_list())return preds def add_loss_op(self, preds):### YOUR CODE HERE (~2-4 lines)"""我們可以根據(jù)mask取出真正的preds和labels，然后再向往常那樣計算交叉熵"""mask_preds = tf.boolean_mask(preds,self.mask_placeholder)mask_label = tf.boolean_mask(self.labels_placeholder,self.mask_placeholder)loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=mask_label,logits=mask_preds))# preds_,labels= [],[]# pred_shape = tf.size(preds)# print(pred_shape.eval())# print(pred_shape[0].eval())# for i in range(tf.to_int32(pred_shape[0])):# batch_data = preds[i]# #查看一個batch數(shù)據(jù)的第i個樣本，這句話中每一個單詞(下標為j)# preds_.append([batch_data[j] for j in range(self.max_length) if self.mask_placeholder[i][j]==True])# labels.append([self.labels_placeholder[i][j] for j in range(self.max_length) if self.mask_placeholder[i][j]==True])# loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,logits=preds_)# loss = tf.reduce_mean(loss)### END YOUR CODEreturn loss

結果：

DEBUG:Token-level confusion matrix: go\gu PER ORG LOC MISC O PER 2968.00 26.00 75.00 10.00 70.00 ORG 111.00 1663.00 99.00 81.00 138.00 LOC 27.00 66.00 1938.00 22.00 41.00 MISC 32.00 33.00 47.00 1054.00 102.00 O 34.00 36.00 22.00 30.00 42637.00DEBUG:Token-level scores: label acc prec rec f1 PER 0.99 0.94 0.94 0.94 ORG 0.99 0.91 0.79 0.85 LOC 0.99 0.89 0.93 0.91 MISC 0.99 0.88 0.83 0.86 O 0.99 0.99 1.00 0.99 micro 0.99 0.98 0.98 0.98 macro 0.99 0.92 0.90 0.91 not-O 0.99 0.91 0.89 0.90INFO:Entity level P/R/F1: 0.85/0.86/0.86

解：
i) ① 句子太長，容易梯度消失；
② 無法利用后文信息來決策。

ii) ① 加入GRU門控單元；
② 利用雙向的RNN，即biRNN。

def __call__(self, inputs, state, scope=None):"""Updates the state using the previous @state and @inputs.Remember the GRU equations are:z_t = sigmoid(x_t U_z + h_{t-1} W_z + b_z)r_t = sigmoid(x_t U_r + h_{t-1} W_r + b_r)o_t = tanh(x_t U_o + r_t * h_{t-1} W_o + b_o)h_t = z_t * h_{t-1} + (1 - z_t) * o_tTODO: In the code below, implement an GRU cell using @inputs(x_t above) and the state (h_{t-1} above).- Define W_r, U_r, b_r, W_z, U_z, b_z and W_o, U_o, b_o tobe variables of the apporiate shape using the`tf.get_variable' functions.- Compute z, r, o and @new_state (h_t) defined aboveTips:- Remember to initialize your matrices using the xavierinitialization as before.Args:inputs: is the input vector of size [None, self.input_size]state: is the previous state vector of size [None, self.state_size]scope: is the name of the scope to be used when defining the variables inside.Returns:a pair of the output vector and the new state vector."""scope = scope or type(self).__name__# It's always a good idea to scope variables in functions lest they# be defined elsewhere!"""z_t = sigmoid(x_t U_z + h_{t-1} W_z + b_z)r_t = sigmoid(x_t U_r + h_{t-1} W_r + b_r)o_t = tanh(x_t U_o + r_t * h_{t-1} W_o + b_o)h_t = z_t * h_{t-1} + (1 - z_t) * o_t"""with tf.variable_scope(scope):### YOUR CODE HERE (~20-30 lines)W_z = tf.get_variable('W_z',[self._state_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())U_z = tf.get_variable('U_z',[self.input_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())b_z = tf.get_variable('b_z',[self._state_size],dtype=tf.float32,initializer=tf.constant_initializer(0))W_r = tf.get_variable('W_r',[self._state_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())U_r = tf.get_variable('U_r',[self.input_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())b_r = tf.get_variable('b_r',[self._state_size],dtype=tf.float32,initializer=tf.constant_initializer(0))W_o = tf.get_variable('W_o',[self._state_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())U_o = tf.get_variable('U_o',[self.input_size,self._state_size],dtype=tf.float32,initializer=tf.contrib.layers.xavier_initializer())b_o = tf.get_variable('b_o',[self._state_size],dtype=tf.float32,initializer=tf.constant_initializer(0))#更新門z_t = tf.nn.sigmoid(tf.matmul(inputs,U_z)+tf.matmul(state,W_z)+b_z)#重置門r_t = tf.nn.sigmoid(tf.matmul(inputs,U_r)+tf.matmul(state,W_r)+b_r)#候選狀態(tài)h_ = tf.nn.tanh(tf.matmul(inputs,U_o)+tf.matmul(r_t*state,W_o)+b_o)new_state = tf.multiply(z_t,state)+tf.multiply(1-z_t,h_)### END YOUR CODE #### For a GRU, the output and state are the same (N.B. this isn't true# for an LSTM, though we aren't using one of those in our# assignment)output = new_statereturn output, new_state

def add_prediction_op(self): """Runs an rnn on the input using TensorFlows's@tf.nn.dynamic_rnn function, and returns the final state as a prediction.TODO: - Call tf.nn.dynamic_rnn using @cell below. See:https://www.tensorflow.org/api_docs/python/nn/recurrent_neural_networks- Apply a sigmoid transformation on the final state tonormalize the inputs between 0 and 1.Returns:preds: tf.Tensor of shape (batch_size, 1)"""# Pick out the cell to use here.if self.config.cell == "rnn":cell = RNNCell(1, 1)elif self.config.cell == "gru":cell = GRUCell(1, 1)elif self.config.cell == "lstm":cell = tf.nn.rnn_cell.LSTMCell(1)else:raise ValueError("Unsupported cell type.")x = self.inputs_placeholder### YOUR CODE HERE (~2-3 lines)preds = tf.nn.dynamic_rnn(cell,x,dtype=tf.float32)[1]preds = tf.nn.sigmoid(preds)### END YOUR CODEreturn preds #state # preds def add_training_op(self, loss):optimizer = tf.train.GradientDescentOptimizer(learning_rate=self.config.lr)### YOUR CODE HERE (~6-10 lines)# - Remember to clip gradients only if self.config.clip_gradients# is True.# - Remember to set self.grad_normgrad_and_var = optimizer.compute_gradients(loss)gradients = [item[0] for item in grad_and_var]variables = [item[1] for item in grad_and_var]if self.config.clip_gradients:clipped_grad = tf.clip_by_global_norm(gradients,clip_norm=self.config.max_grad_norm)[0]gradients = clipped_gradgrad_and_var = list(zip(gradients,variables))self.grad_norm = gradientstrain_op = optimizer.apply_gradients(grad_and_var)### END YOUR CODEassert self.grad_norm is not None, "grad_norm was not set properly!"return train_op

RNN

GRU

解：

i) rnn和GRU都會梯度消失，但是rnn消失的更快一些，因此梯度裁剪也不會有幫助。

ii) GRU可以有效防止梯度消失.

結果

DEBUG:Token-level confusion matrix: go\gu PER ORG LOC MISC O PER 2998.00 20.00 17.00 24.00 90.00 ORG 140.00 1639.00 75.00 108.00 130.00 LOC 59.00 82.00 1868.00 39.00 46.00 MISC 42.00 21.00 31.00 1045.00 129.00 O 26.00 42.00 9.00 37.00 42645.00DEBUG:Token-level scores: label acc prec rec f1 PER 0.99 0.92 0.95 0.93 ORG 0.99 0.91 0.78 0.84 LOC 0.99 0.93 0.89 0.91 MISC 0.99 0.83 0.82 0.83 O 0.99 0.99 1.00 0.99 micro 0.99 0.98 0.98 0.98 macro 0.99 0.92 0.89 0.90 not-O 0.99 0.91 0.88 0.89INFO:Entity level P/R/F1: 0.85/0.86/0.86

總結

以上是生活随笔為你收集整理的CS224n课程Assignment3参考答案的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：网络相关面试题（持续更新）
下一篇：将16x2 LCD与Arduino连接方

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

CS224n课程Assignment3参考答案

總結