sagan 自注意力_请使用英语:自我注意生成对抗网络(SAGAN)
sagan 自注意力
介紹 (Introduction)
In my effort to better understand the concept of self-attention, I tried dissecting one of its particular use cases on one of my current deep learning subtopic interests: Generative Adversarial Networks (GANs). As I delved deeply into the Self-Attention GAN (or “SAGAN”) research paper, while following similar implementations on Pytorch and Tensorflow in parallel, I noticed how exhausting it could get to power through the formality and the mathematically intense blocks to arrive at a clear intuition of the paper’s contents. Although I get that formal papers are written that way for precision of language, I do think there’s a need for bite-sized versions that define the prerequisite knowledge needed and also lay down the advantages and disadvantages candidly.
為了更好地理解自我注意的概念,我嘗試將其特定用例之一與我當前的深度學習子主題興趣之一進行比較:生成對抗網絡(GAN)。 當我深入研究Self-Attention GAN(或“ SAGAN”)研究論文時,在并行地在Pytorch和Tensorflow上執行類似的實現時,我注意到如何通過形式化和數學上密集的塊來耗盡它會變得強大起來清楚地了解論文的內容。 盡管我認為正式論文的編寫方式是為了提高語言的精確度,但我確實認為需要一口大小的版本來定義所需的先決知識 ,并坦率地列出優缺點 。
In this article, I am going to try to make a computationally efficient interpretation of the SAGAN without reducing too much of the accuracy for the “hacky” people out there who want to just get started (Wow, so witty).
在本文中,我將嘗試對SAGAN進行計算上有效的解釋,而又不會降低那些剛開始使用“ hacky”的人的準確性(哇,很機智)。
So, here’s how I’m going to do it:
所以,這就是我要做的事情:
What do I need to know?
我需要知道些什么?
What is it? Who made it?
它是什么? 誰干的?
What does it solve? Advantages and Disadvantages?
它能解決什么? 的優點和缺點?
Possible further studies?
可能需要進一步研究嗎?
Source/s
源/秒
我需要知道些什么? (What do I need to know?)
- Basic Machine Learning and Deep Learning concepts (Dense Layers, Activation Functions, Optimizers, Backpropagation, Normalization, etc.) 基本機器學習和深度學習概念(密集層,激活功能,優化器,反向傳播,規范化等)
- Vanilla GAN 甘香草
- Other GANs: Deep Convolutional GAN (DCGAN), Wasserstein GANs (WGAN) 其他GAN:深度卷積GAN(DCGAN),Wasserstein GAN(WGAN)
- Convolutional Neural Networks — Intuition, Limitations and Relational Inductive Biases (Just think of this as assumptions) 卷積神經網絡-直覺,局限性和關系歸納偏見(僅將其視為假設)
- Spectral Norms and the Power Iteration Method 譜范數和功率迭代法
- Two Time-Scale Update Rule (TTUR) 兩個時標更新規則(TTUR)
- Self-Attention 自我注意
First and foremost, basic concepts are always necessary. Let’s just leave it at that, haha. Moving on, a working understanding of the game mechanics of classical GAN training would be quite handy. In practice, I think most versions of GANs now are trained with convolutional layers and a non-saturating or wasserstein loss so learning about DCGANs and WGANs are very useful. Also, the understanding that CNNs have a locality assumption are key to the usefulness of self-attention in SAGANs (or, in general). For the people who get restless without the proof (a.k.a. math nerds), it would be helpful to check out spectral norms and the power iteration method, an eigenvector approximation algorithm, beforehand. As for TTUR, honestly this is just having two separate learning rates for your generator and discriminator models. Feel free to check out the paper on Attention too even though I’ll be mildly going through it.
首先, 最基本的概念總是必要的 。 我們就這樣吧,哈哈。 繼續,對經典GAN訓練的游戲機制的工作理解將非常方便。 實際上,我認為現在大多數GAN版本都經過卷積層訓練,并且具有非飽和或wasserstein損失,因此了解DCGAN和WGAN非常有用。 同樣,對于CNN具有局部性假設的理解對于SAGAN(或一般而言)中自我注意的有用性至關重要。 對于那些沒有證明而煩躁不安的人(又稱數學書呆子),事先檢查頻譜范數和功率迭代方法(特征向量近似算法)將很有幫助。 至于TTUR,說實話,這只是針對生成器和鑒別器模型的兩個單獨的學習率。 即使我會溫和地進行閱讀,也可以隨時查閱有關注意的論文。
它是什么? 誰干的? (What is it? Who made it?)
Essentially, SAGAN is a convolutional GAN that uses a self-attention layer/block in the generator model, does spectral normalization on both the generator and discriminator, and trains via the two time-scale update rule (TTUR) and the hinge version of the adversarial loss. Everything else is common GAN practice; some of these would be using tanh function at the end of a generator model, using leaky ReLU for the discriminator and just generally using Adam as your optimizer. This architecture was created by Han Zhang, Ian Goodfellow, Dimitris Metaxas and Augustus Odena.
本質上,SAGAN是卷積GAN,它在生成器模型中使用了一個自注意層/塊,對生成器和鑒別器進行了頻譜歸一化,并通過兩個時標更新規則(TTUR)和該算法的鉸鏈版本進行訓練。對抗性損失。 一切都是GAN的慣例。 其中一些將在生成器模型的末尾使用tanh函數,使用泄漏的ReLU作為判別器,并且通常只使用Adam作為優化器。 該架構由Han Zhang,Ian Goodfellow,Dimitris Metaxas和Augustus Odena創建 。
If you looked through the prerequisites, this definition would be pretty straightforward.
如果您仔細研究了先決條件,則此定義將非常簡單。
Hinge Version of Adversarial Loss Used in the Paper本文中使用的對抗性損失的鉸鏈版本它能解決什么? 的優點和缺點? (What does it solve? Advantages and Disadvantages?)
To start, an attention module is something that is incorporated in your model to be able to use all of your input’s information (global access) for the output in a not so computationally expensive way. Self-attention is just a specific version wherein your query, key and value vectors are all the same. In the figure below, these are the f, g and h functions. Primarily used in NLP, it has found its way to CNNs and GANs because of the locality assumption that CNNs make. Since CNNs and previous convolution-based GANs use a small window to predict the next layer, complex geometry of certain outputs (ex. dogs, full body photos, etc.) are harder to generate as compared to pictures of oceans, skies and other backgrounds. I’ve also read that previous GANs had a harder time generating images in multi-class situations but I need to read up more on that. Now, self-attention makes it possible to have global access to input information, giving the generator the ability to learn from all feature locations.
首先,注意模塊是模型中集成的模塊,它能夠以計算上不那么昂貴的方式將輸入的所有信息(全局訪問)用于輸出。 自我注意只是一個特定的版本,其中您的查詢,鍵和值向量都相同。 在下圖中,這些是f,g和h函數。 它主要用于NLP中,由于CNN的位置假設,它已經找到了通往CNN和GAN的途徑。 由于CNN和以前的基于卷積的GAN使用一個小窗口來預測下一層,因此與海洋,天空和其他背景的圖片相比,某些輸出(例如狗,全身照片等)的復雜幾何圖形更難以生成。 我還閱讀了以前的GAN,很難在多類情況下生成圖像,但是我需要內容。 現在,自我關注使全局訪問輸入信息成為可能,從而使生成器能夠從所有特征位置進行學習。
? just means matrix multiplication. The first part just shows how the previous layer is converted into three identical pieces (query, key and value) using 1x1 convolutions.?僅表示矩陣乘法。 第一部分僅說明如何使用1x1卷積將上一層轉換為三個相同的部分(查詢,鍵和值)。Another thing about the SAGAN is that it uses spectral normalization on both the generator and the discriminator for better conditioning. What spectral normalization does is that it allows less discriminator updates per generator update via limiting the spectral norm of the weight matrices to constrain the Lipschitz of the network function. That’s a mouthful but you can just imagine it to be a more powerful normalization technique. Lastly, SAGANs use the two time-scale update rule to address slow learning discriminators. Typically, the the discriminator starts with a higher learning rate to avoid mode collapse.
關于SAGAN的另一件事是,它在生成器和鑒別器上都使用頻譜歸一化,以實現更好的調節。 頻譜歸一化的作用是通過限制權重矩陣的頻譜范數來限制網絡函數的Lipschitz,從而允許每個生成器更新執行的鑒別器更新更少。 那是一個大嘴,但是您可以想象它是一種更強大的規范化技術。 最后,SAGAN使用兩個時標更新規則來解決學習緩慢的區分因素。 通常,鑒別器以較高的學習率開始,以避免模式崩潰。
可能需要進一步研究嗎? (Possible further studies?)
As of the moment, I’m personally having a difficult time generating 256x256 images due to either computational expense or something I don’t fully understand about the capacity or nuances of the model. Has anyone tried progressively growing a SAGAN?
目前,由于計算費用或我對模型的容量或細微之處尚不完全了解,因此我個人很難生成256x256圖像。 有沒有人嘗試逐步發展SAGAN?
Thanks for reading! I hope you enjoyed! I would love to do more of these so feedback is very much welcome. :)
謝謝閱讀! 希望您喜歡! 我想做更多的事情,因此非常歡迎反饋。 :)
源/秒 (Source/s)
Self-Attention GAN Paper
自我注意GAN論文
Spectral Normalization for GANs
GAN的光譜歸一化
翻譯自: https://medium.com/swlh/english-please-self-attention-generative-adversarial-networks-sagan-852aba463ac
sagan 自注意力
總結
以上是生活随笔為你收集整理的sagan 自注意力_请使用英语:自我注意生成对抗网络(SAGAN)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 《超级马里奥兄弟大电影》最新预告片:马里
- 下一篇: AOC推出P3系列显示器:支持65W输出