《Gans in Action》第一章 对抗神经网络介绍
此為《Gans in Action》(對(duì)抗神經(jīng)網(wǎng)絡(luò)實(shí)戰(zhàn))第一章讀書筆記
Chapter 1. Introduction to GANs 對(duì)抗神經(jīng)網(wǎng)絡(luò)介紹
This chapter covers
- An overview of Generative Adversarial Networks
- What makes this class of machine learning algorithms special
- Some of the exciting GAN applications that this book covers
本章內(nèi)容包括:GAN概述、GAN的特別之處以及GAN的應(yīng)用
The notion of whether machines can think is older than the computer itself. In 1950, the famed mathematician, logician, and computer scientist Alan Turing—perhaps best known for his role in decoding the Nazi wartime enciphering machine, Enigma—penned a paper that would immortalize his name for generations to come, “Computing Machinery and Intelligence.”
In the paper, Turing proposed a test he called the imitation game, better known today as the Turing test. In this hypothetical scenario, an unknowing observer talks with two counterparts behind a closed door: one, a fellow human; the other, a computer. Turing reasons that if the observer is unable to tell which is the person and which is the machine, the computer passed the test and must be deemed intelligent.
圖靈機(jī)的提出,門后是電腦和人,另一測(cè)試者跟他們交談,無法區(qū)分人與電腦時(shí),則認(rèn)為電腦有了智能。
Anyone who has attempted to engage in a dialogue with an automated chatbot or a voice-powered intelligent assistant knows that computers have a long way to go to pass this deceptively simple test. However, in other tasks, computers have not only matched human performance but also surpassed it—even in areas that were until recently considered out of reach for even the smartest algorithms, such as superhumanly accurate face recognition or mastering the game of Go.[1]
[1]:See “Surpassing Human-Level Face Verification Performance on LFW with GaussianFace,” by Chaochao Lu and Xiaoou Tang, 2014, https://arXiv.org/abs/1404.3840. See also the New York Times article “Google’s AlphaGo Defeats Chinese Go Master in Win for A.I.,” by Paul Mozur, 2017, http://mng.bz/07WJ.
盡管人工智能還有很長的路要走,但在某些方面的能力已經(jīng)超越人類,比如人臉識(shí)別和圍棋。
Machine learning algorithms are great at recognizing patterns in existing data and using that insight for tasks such as classification (assigning the correct category to an example) and regression (estimating a numerical value based on a variety of inputs). When asked to generate new data, however, computers have struggled. An algorithm can defeat a chess grandmaster, estimate stock price movements, and classify whether a credit card transaction is likely to be fraudulent. In contrast, any attempt at making small talk with Amazon’s Alexa or Apple’s Siri is doomed. Indeed, humanity’s most basic and essential capacities—including a convivial conversation or the crafting of an original creation—can leave even the most sophisticated supercomputers in digital spasms.
之前機(jī)器學(xué)習(xí)算法擅長已有數(shù)據(jù)的分類和回歸任務(wù),但對(duì)于生成新的數(shù)據(jù)表現(xiàn)不佳。
This all changed in 2014 when Ian Goodfellow, then a PhD student at the University of Montreal, invented Generative Adversarial Networks (GANs). This technique has enabled computers to generate realistic data by using not one, but two, separate neural networks. GANs were not the first computer program used to generate data, but their results and versatility set them apart from all the rest. GANs have achieved remarkable results that had long been considered virtually impossible for artificial systems, such as the ability to generate fake images with real-world-like quality, turn a scribble into a photograph-like image, or turn video footage of a horse into a running zebra—all without the need for vast troves of painstakingly labeled training data.
直到2014年,博士生 Ian Goodfellow提出了生成對(duì)抗網(wǎng)絡(luò)(GAN)。GAN是由兩個(gè)神經(jīng)網(wǎng)絡(luò)組成,在產(chǎn)生新數(shù)據(jù)方面具有很好的通用性也被廣泛應(yīng)用
A telling example of how far machine data generation has been able to advance thanks to GANs is the synthesis of human faces, illustrated in figure 1.1. As recently as 2014, when GANs were invented, the best that machines could produce was a blurred countenance—and even that was celebrated as a groundbreaking success. By 2017, just three years later, advances in GANs enabled computers to synthesize fake faces whose quality rivals high-resolution portrait photographs. In this book, we look under the hood of the algorithm that made all this possible.
一個(gè)比較好的例子是人臉圖像合成,如圖1.1所示,GAN能夠生成高分辨率的圖像
Figure 1.1. Progress in human face generation
(Source: “The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation,” by Miles Brundage et al., 2018, https://arxiv.org/abs/1802.07228.)
1.1. What are Generative Adversarial Networks? 什么是GAN
Generative Adversarial Networks (GANs) are a class of machine learning techniques that consist of two simultaneously trained models: one (the Generator) trained to generate fake data, and the other (the Discriminator) trained to discern the fake data from real examples.
GAN包含生成器和識(shí)別器,前者生成虛假圖像,后者把虛假圖像識(shí)別出來。
The word generative indicates the overall purpose of the model: creating new data. The data that a GAN will learn to generate depends on the choice of the training set. For example, if we want a GAN to synthesize images that look like Leonardo da Vinci’s, we would use a training dataset of da Vinci’s artwork.
生成:生成器生成訓(xùn)練數(shù)據(jù)集類似的數(shù)據(jù),比如用達(dá)芬奇的作品作為訓(xùn)練集,合成達(dá)芬奇風(fēng)格的圖像
The term adversarial points to the game-like, competitive dynamic between the two models that constitute the GAN framework: the Generator and the Discriminator. The Generator’s goal is to create examples that are indistinguishable from the real data in the training set. In our example, this means producing paintings that look just like da Vinci’s. The Discriminator’s objective is to distinguish the fake examples produced by the Generator from the real examples coming from the training dataset. In our example, the Discriminator plays the role of an art expert assessing the authenticity of paintings believed to be da Vinci’s. The two networks are continually trying to outwit each other: the better the Generator gets at creating convincing data, the better the Discriminator needs to be at distinguishing real examples from the fake ones.
對(duì)抗:生成器努力生成以假亂真的圖像,識(shí)別器努力識(shí)別出真假來,兩者就像造假者與鑒假者一樣,互相對(duì)抗
Finally, the word networks indicates the class of machine learning models most commonly used to represent the Generator and the Discriminator: neural networks. Depending on the complexity of the GAN implementation, these can range from simple feed-forward neural networks (as you’ll see in chapter 3) to convolutional neural networks (as you’ll see in chapter 4) or even more complex variants, such as the U-Net (as you’ll see in chapter 9).
網(wǎng)絡(luò):生成器和識(shí)別器一般由兩個(gè)神經(jīng)網(wǎng)絡(luò)構(gòu)成,可以是前饋神經(jīng)網(wǎng)絡(luò)(第三章)、卷積神經(jīng)網(wǎng)絡(luò)(第四章)、以及更復(fù)雜額變種,例如U-Net(第九章)
1.2. How do GANs work? GAN工作原理
The mathematics underpinning GANs are complex (as you’ll explore in later chapters, especially chapters 3 and 5); fortunately, many real-world analogies can make GANs easier to understand. Previously, we discussed the example of an art forger (the Generator) trying to fool an art expert (the Discriminator). The more convincing the fake paintings the forger makes, the better the art expert must be at determining their authenticity. This is true in the reverse situation as well: the better the art expert is at telling whether a particular painting is genuine, the more the forger must improve to avoid being caught red-handed.
GAN的數(shù)學(xué)知識(shí)比較復(fù)雜,這里用達(dá)芬奇作品造假者與鑒假專家的比喻比較形象。生成器(造假)與識(shí)別器(鑒假)的能力,在訓(xùn)練過程中是相互促進(jìn)提升的。
Another metaphor often used to describe GANs—one that Ian Goodfellow himself likes to use—is that of a criminal (the Generator) who forges money, and a detective (the Discriminator) who tries to catch him. The more authentic-looking the counterfeit bills become, the better the detective must be at detecting them, and vice versa.
另一個(gè)比喻是造假鈔者與警探的例子。
In more technical terms, the Generator’s goal is to produce examples that capture the characteristics of the training dataset, so much so that the samples it generates look indistinguishable from the training data. The Generator can be thought of as an object recognition model in reverse. Object recognition algorithms learn the patterns in images to discern an image’s content. Instead of recognizing the patterns, the Generator learns to create them essentially from scratch; indeed, the input into the Generator is often no more than a vector of random numbers.
以上是專業(yè)表達(dá),生成器輸入是隨機(jī)向量,訓(xùn)練過程中捕獲訓(xùn)練數(shù)據(jù)特征,生成真假難辨的樣本;識(shí)別器是捕獲訓(xùn)練數(shù)據(jù)特征,用以識(shí)別假樣本。
The Generator learns through the feedback it receives from the Discriminator’s classifications. The Discriminator’s goal is to determine whether a particular example is real (coming from the training dataset) or fake (created by the Generator). Accordingly, each time the Discriminator is fooled into classifying a fake image as real, the Generator knows it did something well. Conversely, each time the Discriminator correctly rejects a Generator-produced image as fake, the Generator receives the feedback that it needs to improve.
The Discriminator continues to improve as well. Like any classifier, it learns from how far its predictions are from the true labels (real or fake). So, as the Generator gets better at producing realistic-looking data, the Discriminator gets better at telling fake data from the real, and both networks continue to improve simultaneously.
如果識(shí)別器識(shí)別對(duì)了,識(shí)別器就知道自己做對(duì)了,生成器就會(huì)收到反饋進(jìn)行自我提升。反之亦然。
Table 1.1 summarizes the key takeaways about the two GAN subnetworks.
1.3. GANs in action GAN實(shí)戰(zhàn)
Now that you have a high-level understanding of GANs and their constituent networks, let’s take a closer look at the system in action. Imagine that our goal is to teach a GAN to produce realistic-looking handwritten digits. (You’ll learn to implement such a model in chapter 3 and expand on it in chapter 4.) Figure 1.2 illustrates the core GAN architecture.
圖1.2描述了GAN核心架構(gòu)
Figure 1.2. The two GAN subnetworks, their inputs and outputs, and their interactions
Let’s walk through the details of the diagram:
1. Training dataset— The dataset of real examples that we want the Generator to learn to emulate with near-perfect quality. In this case, the dataset consists of images of handwritten digits. This dataset serves as input (x) to the Discriminator network.
2. Random noise vector— The raw input (z) to the Generator network. This input is a vector of random numbers that the Generator uses as a starting point for synthesizing fake examples.
3. Generator network— The Generator takes in a vector of random numbers (z) as input and outputs fake examples (x*). Its goal is to make the fake examples it produces indistinguishable from the real examples in the training dataset.
4. Discriminator network— The Discriminator takes as input either a real example (x) coming from the training set or a fake example (x*) produced by the Generator. For each example, the Discriminator determines and outputs the probability of whether the example is real.
5. Iterative training/tuning— For each of the Discriminator’s predictions, we determine how good it is—much as we would for a regular classifier—and use the results to iteratively tune the Discriminator and the Generator networks through backpropagation:
- The Discriminator’s weights and biases are updated to maximize its classification accuracy (maximizing the probability of correct prediction: x as real and x* as fake).
- The Generator’s weights and biases are updated to maximize the probability that the Discriminator misclassifies x* as real.
1 表示真實(shí)訓(xùn)練數(shù)據(jù),作為識(shí)別器輸入xxx
2 表示隨機(jī)向量,作為生成器輸入zzz,用于產(chǎn)生虛假圖像
3 表示生成器網(wǎng)絡(luò),輸入zzz,輸出虛假圖像x?x^*x?
4 表示識(shí)別器網(wǎng)絡(luò),將真實(shí)圖像xxx與虛假圖像x?x^*x?作為輸入,輸出圖像為真實(shí)圖像的可能性。
5 表示迭代訓(xùn)練/調(diào)參,
1.3.1. GAN training GAN訓(xùn)練
Learning about the purpose of the various GAN components may feel like looking at a snapshot of an engine: it cannot be understood fully until we see it in motion. That’s what this section is all about. First, we present the GAN training algorithm; then, we illustrate the training process so you can see the architecture diagram in action.
我們先了解算法,再通過訓(xùn)練過程來理解GAN。
GAN training algorithm
GAN訓(xùn)練算法
GAN training visualized
GAN訓(xùn)練圖解
Figure 1.3 illustrates the GAN training algorithm. The letters in the diagram refer to the list of steps in the GAN training algorithm.
1.3.2. Reaching equilibrium 達(dá)到平衡
You may wonder when the GAN training loop is meant to stop. More precisely, how do we know when a GAN is fully trained so that we can determine the appropriate number of training iterations? With a regular neural network, we usually have a clear objective to achieve and measure. For example, when training a classifier, we measure the classification error on the training and validation sets, and we stop the process when the validation error starts getting worse (to avoid overfitting). In a GAN, the two networks have competing objectives: when one network gets better, the other gets worse. How do we determine when to stop?
Those familiar with game theory may recognize this setup as a zero-sum game—a situation in which one player’s gains equal the other player’s losses. When one player improves by a certain amount, the other player worsens by the same amount. All zero-sum games have a Nash equilibrium, a point at which neither player can improve their situation or payoff by changing their actions.
GAN reaches Nash equilibrium when the following conditions are met:
- The Generator produces fake examples that are indistinguishable from the real data in the training dataset.
- The Discriminator can at best randomly guess whether a particular example is real or fake (that is, make a 50/50 guess whether an example is real).
達(dá)到納什均衡的時(shí)候停止訓(xùn)練,需滿足如下條件:
- 難以區(qū)分生成器產(chǎn)生的假圖片和訓(xùn)練數(shù)據(jù)集的真圖片
- 識(shí)別器對(duì)圖片真假識(shí)別的概率都是50%
NOTE
Nash equilibrium is named after the American economist and mathematician John Forbes Nash Jr., whose life story and career were captured in the biography titled A Beautiful Mind and inspired the eponymous film.
Let us convince you of why this is the case. When each of the fake examples (x*) is truly indistinguishable from the real examples (x) coming from the training dataset, there is nothing the Discriminator can use to tell them apart from one another. Because half of the examples it receives are real and half are fake, the best the Discriminator can do is to flip a coin and classify each example as real or fake with 50% probability.
The Generator is likewise at a point where it has nothing to gain from further tuning. Because the examples it produces are already indistinguishable from the real ones, even a tiny change to the process it uses to turn the random noise vector (z) into a fake example (x*) may give the Discriminator a cue for how to discern the fake example from the real data, making the Generator worse off.
上面描述了達(dá)到納什均衡時(shí),識(shí)別器和生成器都難以更進(jìn)一步
With equilibrium achieved, GAN is said to have converged. Here is when it gets tricky. In practice, it is nearly impossible to find the Nash equilibrium for GANs because of the immense complexities involved in reaching convergence in nonconvex games (more on convergence in later chapters, particularly chapter 5). Indeed, GAN convergence remains one of the most important open questions in GAN research.
Fortunately, this has not impeded GAN research or the many innovative applications of generative adversarial learning. Even in the absence of rigorous mathematical guarantees, GANs have achieved remarkable empirical results. This book covers a selection of the most impactful ones, and the following section previews some of them.
實(shí)際中是很難達(dá)到納什均衡(GAN收斂),這是當(dāng)前GAN研究中亟待解決的問題之一。但這并不妨礙GAN在研究應(yīng)用中取得非凡的成就
1.4. Why study GANs? 為什么研究GAN
Since their invention, GANs have been hailed by academics and industry experts as one of the most consequential innovations in deep learning. Yann LeCun, the director of AI research at Facebook, went so far as to say that GANs and their variations are “the coolest idea in deep learning in the last 20 years.”[2]
[2]:See “Google’s Dueling Neural Networks Spar to Get Smarter,” by Cade Metz, Wired, 2017, http://mng.bz/KE1X.
GAN自發(fā)明以來,一直被學(xué)術(shù)界和業(yè)界專家譽(yù)為深度學(xué)習(xí)領(lǐng)域最重要的創(chuàng)新之一。Facebook人工智能研究主管Yann LeCun甚至表示,GAN及其變體是“近20年來深度學(xué)習(xí)中最酷的想法”
The excitement is well justified. Unlike other advancements in machine learning that may be household names among researchers but would elicit no more than a quizzical look from anyone else, GANs have captured the imagination of researchers and the wider public alike. They have been covered by the New York Times, the BBC, Scientific American, and many other prominent media outlets. Indeed, it was one of those exciting GAN results that probably drove you to buy this book in the first place. (Right?)
GAN為科研人員和吃瓜群眾提供了足夠的創(chuàng)新空間,被各大媒體報(bào)道
Perhaps most notable is the capacity of GANs to create hyperrealistic imagery. None of the faces in figure 1.4 belongs to a real human; they are all fake, showcasing GANs’ ability to synthesize images with photorealistic quality. The faces were produced using Progressive GANs, a technique covered in chapter 6.
GAN能生成超真實(shí)的圖像,難以想象,圖1.4全是假的。這是使用漸進(jìn)生成對(duì)抗網(wǎng)絡(luò)生成的,第六章會(huì)提到
Figure 1.4. These photorealistic but fake human faces were synthesized by a Progressive GAN trained on high-resolution portrait photos of celebrities.
(Source: “Progressive Growing of GANs for Improved Quality, Stability, and Variation,” by Tero Karras et al., 2017, https://arxiv.org/abs/1710.10196.)
Another remarkable GAN achievement is image-to-image translation. Similarly to the way a sentence can be translated from, say, Chinese to Spanish, GANs can translate an image from one domain to another. As shown in figure 1.5, GANs can turn an image of a horse into an image of zebra (and back!), and a photo into a Monet-like painting—all with virtually no supervision and no labels whatsoever. The GAN variant that made this possible is called CycleGAN; you’ll learn all about it in chapter 9.
GAN的另一應(yīng)用是圖像轉(zhuǎn)換,如圖1.5所示,GAN將馬的圖像變成斑馬(或者反過來)、將圖像變成Monet風(fēng)格。這是由CycleGAN實(shí)現(xiàn)的,第九章會(huì)提到
Figure 1.5. By using a GAN variant called CycleGAN, we can turn a Monet painting into a photograph or turn an image of a zebra into a depiction of a horse, and vice versa.
(Source: See “Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks,” by Jun-Yan Zhu et al., 2017, https://arxiv.org/abs/1703.10593.)
The more practically minded GAN use cases are just as fascinating. The online giant Amazon is experimenting with harnessing GANs for fashion recommendations: by analyzing countless outfits, the system learns to produce new items matching any given style.[3] In medical research, GANs are used to augment datasets with synthetic examples to improve diagnostic accuracy.[4] In chapter 11—after you’ve mastered the ins and outs of training GANs and their variants—you’ll explore both of these applications in detail.
[3]: See “Amazon Has Developed an AI Fashion Designer,” by Will Knight, MIT Technology Review, 2017, http://mng.bz/9wOj.
[4]: See “Synthetic Data Augmentation Using GAN for Improved Liver Lesion Classification,” by Maayan Frid-Adar et al., 2018, https://arxiv.org/abs/1801.02385.
GAN被亞馬遜用于服裝設(shè)計(jì),也被用于醫(yī)療研究提高診斷準(zhǔn)確性。在第十一章有相關(guān)內(nèi)容
GANs are also seen as an important stepping stone toward achieving artificial general intelligence,[5] an artificial system capable of matching human cognitive capacity to acquire expertise in virtually any domain—from motor skills involved in walking, to language, to creative skills needed to compose sonnets.
[5]: See “OpenAI Founder: Short-Term AGI Is a Serious Possibility,” by Tony Peng, Synced, 2018, http://mng.bz/j5Oa. See also “A Path to Unsupervised Learning Through Adversarial Networks,” by Soumith Chintala, f Code, 2016, http://mng.bz/WOag.
GAN被視為實(shí)現(xiàn)通用人工智能的重要基石。
But with the ability to generate new data and imagery, GANs also have the capacity to be dangerous. Much has been discussed about the spread and dangers of fake news, but the potential of GANs to create credible fake footage is disturbing. At the end of an aptly titled 2018 piece about GANs—“How an A.I. ‘Cat-and-Mouse Game’ Generates Believable Fake Photos”—the New York Times journalists Cade Metz and Keith Collins discuss the worrying prospect of GANs being exploited to create and spread convincing misinformation, including fake video footage of statements by world leaders. Martin Giles, the San Francisco bureau chief of MIT Technology Review, echoes their concern and mentions another potential risk in his 2018 article “The GANfather: The Man Who’s Given Machines the Gift of Imagination”: in the hands of skilled hackers, GANs can be used to intuit and exploit system vulnerabilities at an unprecedented scale. These concerns are what motivated us to discuss the ethical considerations of GANs in chapter 12.
GAN被用于制造虛假圖片、視頻信息以及網(wǎng)絡(luò)攻擊,讓人感到憂慮。關(guān)于這些考慮,會(huì)在第十二章提到
GANs can do much good for the world, but all technological innovations have misuses. Here the philosophy has to be one of awareness: because it is impossible to “uninvent” a technique, it is crucial to make sure people like you are aware of this technique’s rapid emergence and its substantial potential.
科技是把雙刃劍,我們無法阻止它到來,那么就認(rèn)識(shí)它的潛力并讓其造福世界吧
In this book, we are only able to scratch the surface of what is possible with GANs. However, we hope that this book will provide you with the necessary theoretical knowledge and practical skills to continue exploring any facet of this field that you find most interesting.
So, without further ado, let’s dive in!
本書只探索了GAN的冰山一角,希望能給你提供必要的知識(shí)技能,讓你繼續(xù)探索感興趣的領(lǐng)域。言歸正傳,我們開始吧!
Summary 總結(jié)
- GANs are a deep learning technique that uses a competitive dynamic between two neural networks to synthesize realistic data samples, such as fake photorealistic imagery. The two networks that constitute a GAN are as follows:
- The Generator, whose goal is to fool the Discriminator by producing data indistinguishable from the training dataset
- The Discriminator, whose goal is to correctly distinguish between real data coming from the training dataset and the fake data produced by the Generator
- GANs have extensive applications across many different sectors, such as fashion, medicine, and cybersecurity.
- GAN是通過兩個(gè)互相競爭的神經(jīng)網(wǎng)絡(luò)來合成逼真的數(shù)據(jù)樣本,例如圖像。它包含兩部分:生成器和識(shí)別器。
- GAN在很多領(lǐng)域有廣泛的應(yīng)用,例如時(shí)尚、醫(yī)學(xué)和網(wǎng)絡(luò)安全
總結(jié)
以上是生活随笔為你收集整理的《Gans in Action》第一章 对抗神经网络介绍的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Matlab实现二维Goldstein分
- 下一篇: 学会重构与对比 ——码农鼻祖天才香农