原文翻译:深度学习测试题(L1 W3 测试题)
導語
本文翻譯自deeplearning.ai的深度學習課程測試作業,近期將逐步翻譯完畢,一共五門課。
翻譯:黃海廣
本集翻譯Lesson1 Week 2:
Lesson1 Neural Networks and Deep Learning (第一門課 神經網絡和深度學習)
Week 3 Quiz - Shallow Neural Networks(第三周測驗 - 淺層神經網絡)
1.Which of the following are true? (Check all that apply.) Notice that I only list correct options
(以下哪一項是正確的?只列出了正確的答案)
【★】?is a matrix in which each column is one training example.(是一個矩陣,其中每個列都是一個訓練樣本。)
【★】?is the activation output by the 4th neuron of the 2nd layer(是第二層第四層神經元的激活的輸出。)
【★】?denotes the activation vector of the 2nd layer for the 12th training example.(表示第二層和第十二層的激活向量。)
【★】?denotes the activation vector of the 2nd layer.(?表示第二層的激活向量。)
2. The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer. True/False?
(tanh激活函數通常比隱藏層單元的sigmoid激活函數效果更好,因為其輸出的平均值更接近于零,因此它將數據集中在下一層是更好的選擇,請問正確嗎?)
【★】True(正確)
【 】 False(錯誤)
Note: You can check [this post]:(https://stats.stackexchange.com/a/101563/169377) and (this ?paper):[http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf].(請注意,你可以看一下這篇文章:(https://stats.stackexchange.com/a/101563/169377)和這篇文檔:(http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf).)
As seen in lecture the output of the tanh is between -1 and 1, it thus centers the data which makes the learning simpler for the next layer.(tanh的輸出在-1和1之間,因此它將數據集中在一起,使得下一層的學習變得更加簡單。)
3. Which of these is a correct vectorized implementation of forward propagation for layer?, where?? Notice that I only list correct options
(其中哪一個是第l層向前傳播的正確向量化實現,其中)(以下哪一項是正確的?只列出了正確的答案)
【★】
【★】
4. You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?
(您正在構建一個識別黃瓜(y = 1)與西瓜(y = 0)的二元分類器。你會推薦哪一種激活函數用于輸出層?)
【 】 ReLU
【 】 Leaky ReLU
【★】 sigmoid
【 】 tanh
Note: The output value from a sigmoid function can be easily understood as a probability.(注意:來自sigmoid函數的輸出值可以很容易地理解為概率。)
Sigmoid outputs a value between 0 and 1 which makes it a very good choice for binary classification. You can classify as 0 if the output is less than 0.5 and classify as 1 if the output is more than 0.5. It can be done with tanh as well but it is less convenient as the output is between -1 and 1.(Sigmoid輸出的值介于0和1之間,這使其成為二元分類的一個非常好的選擇。如果輸出小于0.5,則可以將其歸類為0,如果輸出大于0.5,則歸類為1。它也可以用tanh來完成,但是它不太方便,因為輸出在-1和1之間。)
5. Consider the following code:(看一下下面的代碼:)
A = np.random.randn(4,3) B = np.sum(A, axis = 1, keepdims = True)What will be B.shape?(請問B.shape的值是多少?)
B.shape = (4, 1)we use (keepdims = True) to make sure that A.shape is (4,1) and not (4, ). It makes our code more rigorous.(我們使用(keepdims = True)來確保A.shape是(4,1)而不是(4,),它使我們的代碼更加嚴格。)
6. Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements are True? (Check all that apply)
(假設你已經建立了一個神經網絡。您決定將權重和偏差初始化為零。以下哪項陳述是正確的?)
【★】Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent each neuron in the layer will be computing the same thing as other neurons.(第一個隱藏層中的每個神經元節點將執行相同的計算。所以即使經過多次梯度下降迭代后,層中的每個神經元節點都會計算出與其他神經元節點相同的東西。)
【 】Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent they will learn to compute different things because we have “broken symmetry”.( 第一個隱藏層中的每個神經元將在第一次迭代中執行相同的計算。但經過一次梯度下降迭代后,他們將學會計算不同的東西,因為我們已經“破壞了對稱性”。)
【 】Each neuron in the first hidden layer will compute the same thing, but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in lecture.(第一個隱藏層中的每一個神經元都會計算出相同的東西,但是不同層的神經元會計算不同的東西,因此我們已經完成了“對稱破壞”。)
【 】The first hidden layer’s neurons will perform different computations from each other even in the first iteration; their parameters will thus keep evolving in their own way.(即使在第一次迭代中,第一個隱藏層的神經元也會執行不同的計算,他們的參數將以自己的方式不斷發展。)
7. Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?
(Logistic回歸的權重w應該隨機初始化,而不是全零,因為如果初始化為全零,那么邏輯回歸將無法學習到有用的決策邊界,因為它將無法“破壞對稱性”,是正確的嗎?)
【 】True(正確)
【★】 False(錯誤)
Note: Logistic Regression doesn’t have a hidden layer. If you initialize the weights to zeros, the first example x fed in the logistic regression will output zero but the derivatives of the Logistic Regression depend on the input x (because there’s no hidden layer) which is not zero. So at the second iteration, the weights values follow x’s distribution and are different from each other if x is not a constant vector.(Logistic回歸沒有隱藏層。如果將權重初始化為零,則Logistic回歸中的第一個樣本x將輸出零,但Logistic回歸的導數取決于不是零的輸入x(因為沒有隱藏層)。因此,在第二次迭代中,如果x不是常量向量,則權值遵循x的分布并且彼此不同。)
8. You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(..,..)*1000. What will happen?
(您已經為所有隱藏單元使用tanh激活建立了一個網絡。使用np.random.randn(..,..)* 1000將權重初始化為相對較大的值。會發生什么?)
【 】 It doesn’t matter. So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.(這沒關系。只要隨機初始化權重,梯度下降不受權重大小的影響。)
【 】 This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set 偽 to be very small to prevent divergence; this will slow down learning.(這將導致tanh的輸入也非常大,因此導致梯度也變大。因此,您必須將α設置得非常小以防止發散; 這會減慢學習速度。)
【 】 This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.(這會導致tanh的輸入也非常大,導致單位被“高度激活”,從而加快了學習速度,而權重必須從小數值開始。)
【★】 This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.(這將導致tanh的輸入也很大,因此導致梯度接近于零, 優化算法將因此變得緩慢。)
Note:tanh becomes flat for large values, this leads its gradient to be close to zero. This slows down the optimization algorithm.(注:tanh對于較大的值變得平坦,這導致其梯度接近于零。這減慢了優化算法。)
9. Consider the following 1 hidden layer neural network:Notice that I only list correct options
(看一下下面的單隱層神經網絡:只列出了正確的答案)
【★】?will have shape (4, 1)(的維度是(4, 1))
【★】?will have shape (4, 2)(的維度是 (4, 2))
【★】?will have shape (1, 4)(的維度是 (1, 4))
【★】?will have shape (1, 1)(的維度是 (1, 1))
10. In the same network as the previous question, what are the dimensions of??and??
(在和上一個相同的網絡中,?和?的維度是多少?只列出了正確的答案)
【★】and??are (4,m)(和??的維度都是 (4,m))
Note: For general formulas to do this.(請注意: 來看一下公式)
Week 4 Quiz - Key concepts on Deep Neural Networks(第四周測驗 – 深層神經網絡)
1. What is the “cache” used for in our implementation of forward propagation and backward propagation?(在實現前向傳播和反向傳播中使用的“cache”是什么?)
【 】It is used to cache the intermediate values of the cost function during training.(用于在訓練期間緩存成本函數的中間值。)
【★】We use it to pass variables computed during forward propagation to the corresponding backward propagation step. It contains useful values for backward propagation to compute derivatives.(我們用它傳遞前向傳播中計算的變量到相應的反向傳播步驟,它包含用于計算導數的反向傳播的有用值。)
【 】It is used to keep track of the hyperparameters that we are searching over, to speed up computation.(它用于跟蹤我們正在搜索的超參數,以加速計算。)
【 】We use it to pass variables computed during backward propagation to the corresponding forward propagation step. It contains useful values for forward propagation to compute activations.(我們使用它將向后傳播計算的變量傳遞給相應的正向傳播步驟,它包含用于計算計算激活的正向傳播的有用值。)
Note: the “cache” records values from the forward propagation units and sends it to the backward propagation units because it is needed to compute the chain rule derivatives.(請注意:“cache”記錄來自正向傳播單元的值并將其發送到反向傳播單元,因為需要鏈式計算導數。)
2. Among the following, which ones are “hyperparameters”? (Check all that apply.) I only list correct options.(以下哪些是“超參數”?只列出了正確選項)
【★】size of the hidden layers?(隱藏層的大小)
【★】learning rate α(學習率α)
【★】number of iterations(迭代次數)
【★】number of layers??in the neural network(神經網絡中的層數)
Note: You can check this Quora post orthis blog post.(請注意:你可以查看Quora的這篇文章或者這篇博客.)
3. Which of the following statements is true?(下列哪個說法是正確的?)
【★】The deeper layers of a neural network are typically computing more complex features of the input than the earlier layers. (神經網絡的更深層通常比前面的層計算更復雜的輸入特征。)
【 】 The earlier layers of a neural network are typically computing more complex features of the input than the deeper layers.(神經網絡的前面的層通常比更深層計算更復雜的輸入特征。)
Note: You can check the lecture videos. I think Andrew used a CNN example to explain this.(注意:您可以查看視頻,我想用吳恩達的用美國有線電視新聞網的例子來解釋這個。)
4. Vectorization allows you to compute forward propagation in an?-layer neural network without an explicit for-loop (or any other explicit iterative loop) over the layers l=1, 2, …,L. True/False?(向量化允許您在層神經網絡中計算前向傳播,而不需要在層(l = 1,2,…,L)上顯式的使用for-loop(或任何其他顯式迭代循環),正確嗎?)
【 】 True(正確)
【★】 False(錯誤)
Note: We cannot avoid the for-loop iteration over the computations among layers.(請注意:在層間計算中,我們不能避免for循環迭代。)
5. Assume we store the values for??in an array called layers, as follows: layer_dims = [, 4,3,2,1]. So layer 1 has four hidden units, layer 2 has 3 hidden units and so on. Which of the following for-loops will allow you to initialize the parameters for the model?(假設我們將的值存儲在名為layers的數組中,如下所示:layer_dims = [, 4,3,2,1]。因此,第1層有四個隱藏單元,第2層有三個隱藏單元,依此類推。您可以使用哪個for循環初始化模型參數?)
for(i in range(1, len(layer_dims))):parameter[‘W’ + str(i)] = np.random.randn(layers[i], layers[i - 1])) * 0.01 `parameter[‘b’ + str(i)] = np.random.randn(layers[i], 1) * 0.016. Consider the following neural network.(下面關于神經網絡的說法正確的是:只列出了正確選項)
【★】The number of layers??is 4. The number of hidden layers is 3.(層數為4,隱藏層數為3)
Note: The input layer () does not count.(注意:輸入層()不計數。)
As seen in lecture, the number of layers is counted as the number of hidden layers + 1. The input and output layers are not counted as hidden layers.(正如視頻中所看到的那樣,層數被計為隱藏層數+1。輸入層和輸出層不計為隱藏層。)
7. During forward propagation, in the forward function for a layer??you need to know what is the activation function in a layer (Sigmoid, tanh, ReLU, etc.). During backpropagation, the corresponding backward function also needs to know what is the activation function for layer?, since the gradient depends on it. True/False?(在前向傳播期間,在層的前向傳播函數中,您需要知道層中的激活函數(Sigmoid,tanh,ReLU等)是什么, 在反向傳播期間,相應的反向傳播函數也需要知道第層的激活函數是什么,因為梯度是根據它來計算的,正確嗎?)
【★】 True(正確)
【 】False(錯誤)
Note: During backpropagation you need to know which activation was used in the forward propagation to be able to compute the correct derivative.(注:在反向傳播期間,您需要知道正向傳播中使用哪種激活函數才能計算正確的導數。)
8.There are certain functions with the following properties:(有一些函數具有以下屬性:)
(i) To compute the function using a shallow network circuit, you will need a large network (where we measure size by the number of logic gates in the network), but (ii) To compute it using a deep network circuit, you need only an exponentially smaller network. True/False?((i)使用淺網絡電路計算函數時,需要一個大網絡(我們通過網絡中的邏輯門數量來度量大小),但是(ii)使用深網絡電路來計算它,只需要一個指數較小的網絡。真/假?)
【★】True(正確)
【 】False(錯誤)
Note: See lectures, exactly same idea was explained.(參見視頻,完全相同的題。)
9. Consider the following 2 hidden layer neural network: Which of the following statements are True? (Check all that apply).((在2層隱層神經網絡中,下列哪個說法是正確的?只列出了正確選項))
【★】?will have shape (4, 4)(的維度為 (4, 4))
【★】?will have shape (4, 1)(的維度為 (4, 1))
【★】?will have shape (3, 4)(的維度為 (3, 4))
【★】?will have shape (3, 1)(的維度為 (3, 1))
【★】?will have shape (1, 1)(的維度為 (1, 1))
【★】?will have shape (1, 3)(的維度為 (1, 3))
Note: See [this image] for general formulas.(注:請參閱圖片。)
10. Whereas the previous question used a specific network, in the general case what is the dimension of??, the weight matrix associated with layer??(前面的問題使用了一個特定的網絡,與層ll有關的權重矩陣在一般情況下,?的維數是多少,只列出了正確選項)
【★】?has shape (,)(的維度是 (,)
Note: See this imagefor general formulas.(注:請參閱圖片)
備注:公眾號菜單包含了整理了一本AI小抄,非常適合在通勤路上用學習。
往期精彩回顧2019年公眾號文章精選適合初學者入門人工智能的路線及資料下載機器學習在線手冊深度學習在線手冊AI基礎下載(第一部分)備注:加入本站微信群或者qq群,請回復“加群”加入知識星球(4500+用戶,ID:92416895),請回復“知識星球”喜歡文章,點個在看
總結
以上是生活随笔為你收集整理的原文翻译:深度学习测试题(L1 W3 测试题)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 原文翻译:深度学习测试题(L1 W4 测
- 下一篇: 心路历程:「双非」研究生数据分析春招