深度学习~生成式对抗神经网络GAN
目錄
出現背景(why?)
概念
出現背景(why?)
在分類任務中,訓練機器學習和深度學習模塊需要大量的真實世界數據,并且在某些情況下,獲取足夠數量的真實數據存在局限性,或者僅僅是時間和人力資源的投入也可能會受到限制。
概念
生成式對抗神經網絡是由Goodfellow第一次提出的,現在被廣泛地應用于計算機視覺CV,圖像處理等許多領域。
IanGoodfellow,JeanPouget-Abadie,MehdiMirza,BingXu,DavidWarde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In?Advances in neural information processing systems. 2672–2680.
GAN于2014年提出并在最近幾年變得越來越活躍,主要用于數據擴充,以解決如何通過隱含生成模型來生成人造自然外觀樣本以模擬真實世界數據的問題,從而可以增加未公開的訓練數據樣本數量[122]。
K. G. Hartmann, R. T. Schirrmeister, and T. Ball, “Eeg-gan: Generative adversarial networks for electroencephalograhic (eeg) brain signals,”?arXiv preprint arXiv:1806.01875, 2018.
對抗性攻擊Adversarial Attacks
盡管深度學習模型具有出色的性能,但它們很容易受到對抗性攻擊,其中精心設計的小擾動(可能很難被人眼或計算機程序檢測到)被添加到良性示例中,從而誤導了深度學習模型并導致性能急劇下降。這種現象最早是在2014年在計算機視覺中發現的[159],并很快引起了人們的廣泛關注[160] [161] [162]。
Zhang and Wu [164] were the first to study adversarial attacks in EEG-based BCIs. They considered three different attack scenarios:
1) White-box attacks, where the attacker has access to all information of the target model, including its architecture and parameters;
2) Black-box attacks, where the attacker can observe the target model’s responses to inputs;
3) Gray-box attacks, where the attacker knows some but not all information about the target model, e.g., the training data that the target model is tuned on, instead of its architecture and parameters.
They showed that three popular CNN models in EEG-based BCIs, i.e., EEGNet [165], DeepCNN and ShallowCNN [166], can all be effectively attacked in all three scenarios.
reference:
[159] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Good- fellow, and R. Fergus, “Intriguing properties of neural networks,”arXiv preprint arXiv:1312.6199, 2013.
[164] X. Zhang and D. Wu, “On the vulnerability of cnn classifiers in eeg-based bcis,” IEEE Transactions on Neural Systems and Rehabili- tation Engineering, vol. 27, no. 5, pp. 814–825, 2019.
模型架構
GAN包括兩個在零和博弈框架(zero-sum game framework)中相互競爭的同步訓練的神經網絡,即生成器網絡和鑒別器網絡。生成網絡學習從潛在空間映射到感興趣的數據分布,而鑒別器網絡則將生成器產生的候選與真實數據分布區分開。 生成網絡的訓練目標是通過產生鑒別器標識為未合成的新穎候選(是真實數據分布的一部分)來提高鑒別器網絡的錯誤率,即“欺騙”鑒別器網絡[9]。 該框架包括同時訓練的兩個模型:捕獲數據分布的生成模型G和估計樣本來自訓練數據的概率的鑒別模型D。通過對抗過程,我們反復評估生成器模型以確定哪個模型是最適合實際數據分布的模型,從兩個網絡聯合訓練預測模型的目標函數是:
The generative network learns to map from a latent space to a data distribution of interest, while the discriminator network distinguishes candidates produced by the generator from the true data distribution. The generative network’s training objective is to increase the error rate of the discriminator network, i.e., “fool" the discriminator network by producing novel candidates that the discriminator identifies as not synthesized (are part of the true data distribution) [9]. Specifically, this framework includes two models simultaneously trained: a generative model G?that captures the data distribution, and a discriminator model?D that estimates the probability that a sample came from the training data. Through an adversarial process, we iteratively evaluate the generative models to determine which one is the best for fitting real data distribution. The objective function for jointly training the prediction model from two networks is:
?如圖7所示。
- 生成器網絡嘗試通過生成逼真的標簽來欺騙鑒別器。 具體地說,鑒別器(θd)想要最大化目標,以使D(x)接近1(實數),而D(G(z))接近0(預測)。
Generator network:?Try to fool the discriminator by generat- ing real-looking labels. Specifically, Discriminator (θd?) wants to maximize the objective such that D(x) is close to 1 (real) and D(G(z)) is close to 0 (prediction).
- 鑒別器網絡嘗試區分真實標簽和預測標簽。 具體來說,生成器(θg)希望最小化目標,以使D(G(z))接近1(鑒別器被愚弄為認為生成的G(z)是實數)。
Discriminator network:?Try to distinguish between the real and predicted label. Specifically, Generator (θg?) wants to min- imize objectives such that D(G(z)) is close to 1 (discriminator is fooled into thinking generated G(z) is real).
這兩個神經網絡旨在從經過預訓練的生成器生成樣本集合,并將樣本用于其他功能(例如分類)。
正則化技術
為了解決過度擬合的問題,我們通過使用三種合適的正則化技術(即L1正則化,L2正則化和dropout技術)將附加信息引入網絡。
In order to solve the overfitting problem, we introduce additional information to the network by using three suitable regularization techniques, i.e. L1 regularization, L2 regularization and dropout technique.
1)網絡的L1正則化。眾所周知,即使配置選項之間可能發生的交互數量是指數級的,但很大一部分潛在交互對軟件系統的性能沒有影響[16]。這意味著只有少量參數會對模型產生重大影響。換句話說,神經網絡的參數可能是稀疏的。 L1正則化通過分配權重為零的不重要輸入特征和權重為非零的有用特征來實現特征選擇。因此,我們可以使用L1正則化來滿足此條件。如等式中所示。在公式6中,L1正則化的想法是在參數上添加每個隱藏層。
1)?L1 regularization of the network.?It is known that even though the possible number of interactions among configuration options is exponential, a very large portion of potential interactions has no influence on the performance of software systems [16]. This means that only a small number of parameters have significant impact on the model. In other words, the parameters of the neural network could be sparse. L1 regularization implements feature selection by assigning insignificant input features with zero weight and useful features with a non zero weight. Hence, we can use L1 regularization to satisfy this condition. As show in Eq. 6, the idea of L1 regularization is to add every hidden layer on the parameters.
2)網絡的L2正則化。盡管L2正則化不能稀疏產生,但它會迫使權重較小。 L2正則化可以通過為每個功能分配不同的權重來判斷不同的功能是否會對輸出產生不同的影響??梢哉f,它是用于對抗過度擬合的機器學習中最流行的技術。在我們的模型中,我們在參數的每個隱藏層中使用L2正則化。該公式在等式6中給出。
2)?L2 regularization of the network.?Although L2 regularization cannot produce sparsely, it forces the weights to be small. L2 regularization can judge whether the different features have the different impact on output through allocating different weights to every fea- ture. It is arguably the most popular technique in machine learning used to combat overfitting. In our model, we use L2 regularization in every hidden layer of the parameters. The formula is given in Eq. 6.
3)網絡的dropout技術。具有大量參數的軟件系統的深度神經網絡確實是一個高計算程序,從而導致嚴重的過擬合問題。dropout也是解決此問題的一種技術。在訓練過程中,它會從神經網絡中隨機丟棄一些單位(及其連接)。這樣可以防止單元之間的相互適應過多。在訓練期間,dropout從指數數量的不同“瘦”網絡中抽取樣本。在測試時,僅通過使用單個權重較小的未精簡網絡,就可以輕松近似出這些精簡網絡的平均預測結果。與其他正則化方法相比,這顯著減少了過度擬合并帶來了重大改進[29]。有了這個好處,我們在每個隱藏層中都應用了dropout技術。
應用這三種正則化技術并進行實驗后,我們將在網絡中選擇最佳的正則化技術。根據我們的實驗(在4.6節中描述),L2正則化在這三種正則化技術中表現最好。因此,我們在PERF-AL網絡中選擇L2正則化
3)?Dropout technique of the network.?The deep neural network of software systems with a large number of parameters is indeed a high computational program, resulting in serious overfitting problems. Dropout is also a technique for addressing this problem. It randomly drops units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different “thinned” networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks simply by using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods [29]. With this benefit, we apply dropout technique in every hidden layer.
After we apply these three regularization techniques and conduct experiments, we select the best regularization technique in our net- work. According to our experiments (described in Section 4.6), the L2 regularization performs best among these three regularization techniques. Therefore we choose L2 regularization in our PERF-AL network.
參考文獻
-
Yangyang Shu, Yulei Sui, Hongyu Zhang, and Guandong Xu?. 2020. Perf- AL: Performance Prediction for Configurable Software through Adversarial Learning. In?ESEM ’20: ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) (ESEM ’20), October 8–9, 2020, Bari, Italy.?ACM, New York, NY, USA, 11 pages. https://doi.org/10. 1145/3382494.3410677
總結
以上是生活随笔為你收集整理的深度学习~生成式对抗神经网络GAN的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 深度学习~卷积神经网络(CNN)概述
- 下一篇: 深度学习~循环神经网络RNN, LSTM