辍学的名人_辍学效果如此出色的5个观点
輟學的名人
Dropout works by randomly blocking off a fraction of neurons in a layer during training. Then, during prediction (after training), Dropout does not block any neurons. The results of this practice have been enormously successful — competition-winning networks almost always make Dropout an essential part of the architecture.
輟學的工作原理是在訓練過程中隨機阻塞一層神經元的一部分。 然后,在預測期間(訓練后),Dropout不會阻塞任何神經元。 這種實踐的結果非常成功-贏得比賽的網絡幾乎總是使Dropout成為體系結構的重要組成部分。
It can be a bit confusing to understand why Dropout works at all. For one, we are essentially inserting randomness into the model, and hence one would expect its predictions would vary widely as certain important nodes are blocked. With such a volatile environment, it is difficult to imagine how any useful information could be propagated. Furthermore, how does a network adapted to such a random environment perform well when the randomness is suddenly eliminated during prediction?
理解為什么Dropout完全起作用可能會有些混亂。 首先,我們實質上是在模型中插入隨機性,因此人們可以預期,由于某些重要節點被阻止,其預測會發生很大變化。 在這樣一個動蕩的環境中,很難想象如何傳播有用的信息。 此外,當在預測過程中突然消除隨機性時,適應這種隨機環境的網絡如何表現良好?
There are many perspectives to why Dropout works, and while many of them are interconnected and related, understanding them can give a holistic and deep understanding of why the method has been so successful.
對于Dropout為何有效的方法有很多觀點,盡管其中很多是相互聯系和相關的,但了解它們可以全面而深入地了解該方法為何如此成功。
Here’s one approach: because the network is trained in an environment where nodes may be randomly blocked, there are two possibilities:
這是一種方法:由于網絡是在節點可能被隨機阻塞的環境中訓練的,因此有兩種可能性:
- The node that is blocked is a ‘bad node’, or a node that does not provide any information. In this case, the network’s other nodes receive a positive signal through backpropagation and are able to learn better in the absence of a negative node. 被阻止的節點是“壞節點”,或不提供任何信息的節點。 在這種情況下,網絡的其他節點通過反向傳播接收正信號,并且在不存在負節點的情況下能夠更好地學習。
- The node that is blocked is a ‘good node’, or a node that provides important information for prediction. In this case, the network must learn a separate representation of the data in other neurons. 被阻止的節點是“好節點”,或者是提供重要信息以進行預測的節點。 在這種情況下,網絡必須學習其他神經元中數據的單獨表示。
In this view of Dropout, no matter what nodes Dropout blocks, the network can benefit from it. This perspective of the method sees it as a disrupter of sorts, an externally introduced source of randomness to stir up accelerated learning.
在Dropout的這種視圖中,無論Dropout阻塞了哪些節點,網絡都可以從中受益。 這種方法的觀點將其視為某種破壞者,一種從外部引入的隨機性源,可激發加速學習。
Another perspective of Dropout is as an ensemble. In the often successful Random Forest algorithm, several Decision Trees are trained on randomly selected subsets of the data, a process known as bagging. By incorporating randomness into the model, the variance of the model was actually suppressed. As an intuitive understanding, consider the following data, a sine wave with lots of normally distributed noise:
Dropout的另一種觀點是合奏。 在通常成功的隨機森林算法中,在隨機選擇的數據子集上訓練幾個決策樹,這一過程稱為裝袋。 通過將隨機性納入模型,模型的方差實際上得到了抑制。 作為直觀的理解,請考慮以下數據,即具有很多正態分布噪聲的正弦波:
From this data, we take dozens of approximator curves, which randomly select points along the original curve. Then, these approximator curves are aggregated through averages, and the result is a much cleaner curve:
從這些數據中,我們獲得了數十個近似曲線,它們沿著原始曲線隨機選擇點。 然后,這些逼近器曲線通過平均值進行匯總,結果是一條更清晰的曲線:
The transparent lines are the approximator curves.透明線是近似曲線。Bagging works well with high-variance data because it is an instance in which it is possible to fight fire with fire (noise with more noise). In this case, by repeatedly randomly selecting parts of the curve, we neglect other data points, which contributes to a lower variance.
套袋可以很好地處理高方差數據,因為在這種情況下,有可能發生火災(噪音多,噪音大)。 在這種情況下,通過重復隨機選擇曲線的各個部分,我們忽略了其他數據點,這有助于降低方差。
The same idea can be applied to Dropout. When there are hundreds or even thousands of signals coming in from the previous layer in deep neural networks, especially towards the beginning of training, there is bound to be lots of variance and perhaps incorrect signals. By randomly selecting subsets of the previous signals and passing them on, Dropout acts as an approximator and leaves a more purified signal for backpropagation.
相同的想法可以應用于輟學。 當深層神經網絡中的上一層有數百甚至數千個信號進入時,尤其是在訓練開始時,勢必會有很多差異,甚至可能是錯誤的信號。 通過隨機選擇先前信號的子集并將其傳遞,Dropout充當近似器,并留下更純凈的信號用于反向傳播。
We can take this perspective further. Every time Dropout is re-applied in an iteration, one could argue that a new network is being created. In bagging with, say, Decision Trees, each model has a different architecture, and it is the aggregation of these different feature maps and specialties in subsets of the data that allow for a rich understanding of the entire feature space. The final model is compromised of the learnings of sub-models.
我們可以進一步看待這一觀點。 每次在迭代中重新應用Dropout時,都會有人爭辯說正在創建一個新的網絡。 例如,在用決策樹打包時,每個模型都有不同的體系結構,正是這些不同的特征圖和數據子集中的特殊性的集合,才使人們對整個特征空間有了更深入的了解。 最終模型折衷了子模型的知識。
Each training iteration, a ‘new network’ is created and the weights are updated to reflect the learnings of the new network. Although the method that this is done — more one-dimensional than two-dimensional — is different, it is essentially performing the same task as an ensemble. After enough iterations, the network learns to find so-called ‘universal weights’, or parameters that perform well regardless of changes to the architecture. Like ensembles, Dropout allows for networks to learn from the composition of many more detailed and focused networks.
每次訓練迭代時,都會創建一個“新網絡”,并更新權重以反映新網絡的學習情況。 盡管完成此操作的方法(一維多于二維)是不同的,但實際上它執行的是與合奏相同的任務。 經過足夠的迭代后,網絡將學會尋找所謂的“通用權重”,即無論架構如何變化都可以正常運行的參數。 像合奏一樣,Dropout允許網絡從許多更詳細和專注的網絡組成中學習。
Dropout is also seen as a form of regularization, which is a family of methods to prevent neural networks from overfitting. By randomly cutting off part of the signal flowing from one layer to the next, we prevent an overly detailed rush of numbers to the end of the network, which will be met by an equally complex flow of updates through backpropagation.
輟學也被視為正則化的一種形式,這是防止神經網絡過度擬合的一系列方法。 通過隨機地切斷從一層流到下一層的部分信號,我們可以防止過多的數字涌向網絡的末端,這可以通過反向傳播通過同樣復雜的更新流程來解決。
Another perspective to Dropout has roots in the overfitting problem, with the fundamental idea that networks overfit because they try to update millions of parameters all at the same time. When neural networks are initialized, their parameters are not accustomed to the dataset and begin exploring the error landscape. When all of this individual exploration is summed in a massive network, it rushes like a tsunami towards backpropagation, and the network rapidly develops and quickly overfits.
Dropout的另一種觀點源于過度擬合問題,其基本思想是網絡過度擬合,因為它們試圖同時更新數百萬個參數。 初始化神經網絡后,它們的參數將不適應數據集并開始探索錯誤情況。 當所有這些單獨的探索都匯總到一個龐大的網絡中時,它像海嘯一樣向后傳播,網絡Swift發展并Swift過擬合。
Dropout — especially Dropout implemented extensively through a deep network and with a high fraction of dropped neurons (40 to 50 percent) — lets the network learn in a slower and more gradual format, updating the network part-by-part in a stochastic way.
輟學-尤其是通過深度網絡廣泛實施的輟學,且掉落的神經元所占比例很高(40%至50%)-可使網絡以較慢和漸進的格式學習,以隨機方式逐部分更新網絡。
Each new randomly selected portion of the network to be updated must not only update itself but be conscious of the other previously updated parameters. Hence, although it may seem paradoxical, adding randomness helps the model learn in a more controlled format.
網絡中每個要更新的隨機選擇的新部分,不僅必須更新自身,還必須意識到其他先前更新的參數。 因此,盡管看似自相矛盾,但增加隨機性有助于模型以更可控的格式學習。
All images created by author.
作者創作的所有圖像。
翻譯自: https://towardsdatascience.com/5-perspectives-to-why-dropout-works-so-well-1c10617b8028
輟學的名人
總結
以上是生活随笔為你收集整理的辍学的名人_辍学效果如此出色的5个观点的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 微信:春节期间处置不当营销内容账号 50
- 下一篇: 强化学习-动态规划_强化学习-第5部分