生成高分辨率pdf_用于高分辨率图像合成的生成变分自编码器
生成高分辨率pdf
This article presents our research on high resolution image generation using Generative Variational Autoencoder.
本文介紹了我們使用生成變分自動編碼器進行高分辨率圖像生成的研究。
重要事項 (Important Points)
介紹 (Introduction)
The training of deep neural networks requires hundreds or even thousands of images. Lack of labelled datasets especially for medical images often hinders the progress. Hence it becomes imperative to create additional training data. Another area which is actively researched is using generative adversarial networks for image generation. Using this technique, new images can be generated by training on the existing images present in the dataset. The new images are realistic but different from the original data. There are two main approaches of using data augmentation using GANs: image to image translation and sampling from random distribution. The main challenge with GANs is the mode collapse problem i.e. the generated images are quite similar to each other and there is not enough variety in the images generated.
深度神經網絡的訓練需要數百甚至數千張圖像。 缺少特別是醫學圖像的標記數據集通常會阻礙這一進展。 因此,必須創建其他訓練數據。 積極研究的另一個領域是使用生成對抗網絡進行圖像生成。 使用這種技術,可以通過對數據集中存在的現有圖像進行訓練來生成新圖像。 新圖像逼真但與原始數據不同。 使用GAN進行數據增強的主要方法有兩種:圖像到圖像的轉換和隨機分布的采樣。 GAN的主要挑戰是模式崩潰問題,即生成的圖像彼此非常相似,并且生成的圖像種類不足。
Another approach for image generation uses Variational Autoencoders. This architecture contains an encoder which is also known as generative network which takes a latent encoding as input and outputs the parameters for a conditional distribution of the observation. The decoder is also known as an inference network which takes as input an observation and outputs a set of parameters for the conditional distribution of the latent representation. During training VAEs use a concept known as reparameterization trick, in which sampling is done from a gaussian distribution. The main challenge with VAEs is that they are not able to generate sharp images.
圖像生成的另一種方法是使用變分自動編碼器。 該體系結構包含一個編碼器,也稱為生成網絡,它以潛在編碼為輸入并輸出用于條件分布觀測的參數。 解碼器也稱為推理網絡,其將觀察值作為輸入并輸出用于潛在表示的條件分布的一組參數。 在訓練過程中,VAE使用一種稱為“重新參數化技巧”的概念,其中從高斯分布中進行采樣。 VAE的主要挑戰是它們無法生成清晰的圖像。
數據集 (Dataset)
The following datasets are used for training and evaluation:
以下數據集用于訓練和評估:
VAE與我們的網絡 (VAE vs Ours Network)
We show how instead of inference made in the way shown in original VAE architecture, we can add the error vector to the original data and multiply by standard distribution. The new term goes to the encoder and gets converted to the latent space. In the decoder, similarly the error vector gets added to the latent vector and multiplied by standard deviation. In this manner, we use the encoder of VAE in a manner similar to that in the original VAE. While we replace the decoder with a discriminator and hence change the loss function accordingly. The comparison between model architectures of VAE and our architecture is shown in Fig 1.
我們展示了如何代替原始VAE體系結構中所示的方式進行推理,而是可以將誤差矢量添加到原始數據并乘以標準分布。 新術語進入編碼器并轉換為潛在空間。 在解碼器中,類似地,將誤差矢量添加到潛矢量,并乘以標準偏差。 以這種方式,我們以類似于原始VAE的方式使用VAE的編碼器。 雖然我們用鑒別器代替了解碼器,因此相應地改變了損失函數。 VAE的模型架構與我們的架構之間的比較如圖1所示。
Figure 1: Comparison between standard VAE and our network where e1 and e2 denote samples from some noise distribution, x denotes image vector, z denotes latent space vector, f and g denotes encoder and decoder functions respectively and +, ? denotes addition and concat operators.圖1:標準VAE與我們的網絡之間的比較,其中e1和e2表示來自某些噪聲分布的樣本,x表示圖像矢量,z表示潛在空間矢量,f和g分別表示編碼器和解碼器函數,+,*表示加法和concat運算符。Our architecture can be seen both as an extension of VAE as well as that of GAN. Reasoning it as the former is easy as this requires a change in loss function for decoder, while the latter can be made by recalling the fact that GAN essentially works on the concept of zero sum game maintaining Nash Equilibrium between the generator and discriminator. In our case, both the encoder from VAE and discriminator from GAN are playing zero sum game and are competing with each other. As the training proceeds, the loss decreases in both the cases until it stabilizes.
我們的架構既可以看作是VAE的擴展,也可以看作是GAN的擴展。 將其推理為前者很容易,因為這需要更改解碼器的損失函數,而后者可以通過回顧GAN實質上是在零和博弈的概念上起作用,以保持生成器與鑒別器之間的納什均衡這一事實來實現。 在我們的案例中,VAE的編碼器和GAN的鑒別器都在玩零和游戲,并且彼此競爭。 隨著訓練的進行,兩種情況下的損失都會減少,直到穩定為止。
網絡架構 (Network Architecture)
The network architecture used in this work is explained in the below points:
以下幾點解釋了此工作中使用的網絡體系結構:
Our network architecture is shown in Fig 2.
我們的網絡架構如圖2所示。
Figure 2: Our network architecture圖2:我們的網絡架構建筑細節 (Architecture Details)
The generator and discriminator layerwise architecture details is shown in Table 1 and Table 2 respectively. We denoted ResNet block as consisting of the following layers — convolutional, max pooling layer, 30 percent dropouts in between the layers and batch normalization layer.
生成器和鑒別器分層體系結構的詳細信息分別顯示在表1和表2中。 我們將ResNet塊表示為由以下幾層組成-卷積,最大池化層,各層與批處理規范化層之間的30%的失落。
算法 (Algorithm)
The algorithm used in this work is trained using Stochastic Gradient Descent (SGD) as shown below:
這項工作中使用的算法是使用隨機梯度下降(SGD)進行訓練的,如下所示:
實驗 (Experiments)
All the generated samples are generator outputs from random latent vectors. We normalize all data into the range [-1, 1] and use two evaluation metrics to measure the performance of our network. First of them measures the distribution distance between the real and generated samples with maximum mean discrepancy (MMD) scores. The second metric evaluates the generation diversity with multi-scale structural similarity metric (MS-SSIM). Table 4. compares MMD and MS-SSIM scores with previous state of the art architectures.
所有生成的樣本都是隨機潛矢量的生成器輸出。 我們將所有數據歸一化為[-1,1]范圍,并使用兩個評估指標來衡量我們網絡的性能。 首先,它們以最大平均差異(MMD)分數測量實際樣本與生成的樣本之間的分布距離。 第二個指標使用多尺度結構相似性指標(MS-SSIM)評估世代多樣性。 表4.將MMD和MS-SSIM得分與先前的最新體系結構進行了比較。
We noticed the model with a small latent vector size of 100 suffers from severe mode collapse. The best results can be obtained using a moderately large latent vector size. Table 5 compares the effect of different latent variable sizes on the MMD and MS-SSIM scores respectively.
我們注意到,較小的潛在矢量大小為100的模型會遭受嚴重的模式崩潰。 使用適度大的潛在向量大小可以獲得最佳結果。 表5比較了不同潛在變量大小分別對MMD和MS-SSIM分數的影響。
As can be seen, latent variable size with value 1000 produces the best results of those being compared. Both at low and high latent variable size mode collapse is seen which is one of the main challenges faced while training GANs.
可以看出,值1000的潛在變量大小產生了被比較的最佳結果。 在低潛變量和高潛變量模式下都可以看到崩潰,這是訓練GAN時面臨的主要挑戰之一。
Four common evaluation metrics have been used in the literature for testing the performance of generative models. These are log-likelihood, reconstruction error, ELBO and KL divergence.
文獻中已使用四種常見的評估指標來測試生成模型的性能。 這些是對數似然,重構誤差,ELBO和KL差異。
The log-likelihood is calculated by finding the parameter that maximizes the log-likelihood of the observed sample. The reconstruction error is the distance between the original data point and its projection onto a lower-dimensional subspace. The optimization problem used in our model uses KL divergence error which is intractable hence we maximize ELBO instead of minimizing the KL divergence. KL divergence is a measure of how similar the generated probability distribution is to the true probability distribution. The comparison using these evaluation metrics for our model on MNIST dataset with the original VAE architecture is shown in Table 6.
通過找到使所觀察樣品的對數似然性最大的參數來計算對數似然性。 重建誤差是原始數據點與其在低維子空間上的投影之間的距離。 我們模型中使用的優化問題使用了KL散度誤差,這是很難解決的,因此我們將ELBO最大化而不是將KL散度最小化。 KL散度是衡量所生成的概率分布與真實概率分布的相似程度的度量。 表6顯示了使用這些評估指標對我們的模型在MNIST數據集與原始VAE體系結構上進行的比較。
We compare our log probability distribution value with those obtained by previous state of the art methods which is shown in Table 7. The log probability distribution is an important evaluation metric in the sense that it shows the diversity of the samples generated.
我們將對數概率分布值與通過表7所示的現有技術方法獲得的對數概率分布值進行比較。就對數概率分布而言,它顯示了所生成樣本的多樣性,這是一項重要的評估指標。
結果 (Results)
We present the generated images on all the 3 datasets used for testing. The images were trained for 1000 iterations. The images generated using the CELEBA-HQ dataset is shown in Fig 3.
我們在用于測試的所有3個數據集上展示生成的圖像。 對圖像進行了1000次迭代訓練。 使用CELEBA-HQ數據集生成的圖像如圖3所示。
Figure 3: 1024 × 1024 images generated using the CELEBA-HQ dataset.圖3:使用CELEBA-HQ數據集生成的1024×1024圖像。The images generated using the LSUN BEDROOM dataset is shown in Fig 4.
使用LSUN BEDROOM數據集生成的圖像如圖4所示。
Figure 4: 256 × 256 images generated using LSUN BEDROOM dataset圖4:使用LSUN BEDROOM數據集生成的256×256圖像The images generated from different LSUN categories is shown in Fig 5.
從不同的LSUN類別生成的圖像如圖5所示。
Figure 5: Sample 256 × 256 images generated from different LSUN categories圖5:從不同的LSUN類別生成的示例256×256圖像We compare our results with previous state of the art networks on MNIST dataset in Fig 6.
我們將結果與圖6中MNIST數據集上的現有技術網絡進行了比較。
Figure 6: Generated MNIST images a) GAN b) WGAN c) VAE d) GVAE圖6:生成的MNIST圖像a)GAN b)WGAN c)VAE d)GVAE結論 (Conclusions)
In this blog, we presented a new training procedure for Variational Autoencoders based on generative models. This allows us to make the inference model much more flexible, allowing it to represent almost any posterior distributions over the latent variables. Our network was trained and tested on 3 publicly available datasets. On evaluating using MMD, SSIM, log likelihood, reconstruction error, ELBO and KL divergence as the evaluation metrics, our network beats the previous state of the art algorithms. Using generative model approaches to generate additional training data especially in fields like medical imaging could be revolutionary as there is a shortage of medical data for training deep convolutional neural network architectures.
在此博客中,我們介紹了基于生成模型的變分自動編碼器的新訓練程序。 這使我們可以使推理模型更加靈活,從而可以表示潛在變量上的幾乎任何后驗分布。 我們的網絡在3個公開可用的數據集上進行了培訓和測試。 在使用MMD,SSIM,對數似然,重構誤差,ELBO和KL散度作為評估指標進行評估時,我們的網絡擊敗了現有算法。 使用生成模型方法生成額外的訓練數據,尤其是在醫學成像等領域,可能是革命性的,因為缺乏用于訓練深度卷積神經網絡架構的醫學數據。
翻譯自: https://towardsdatascience.com/generative-variational-autoencoder-for-high-resolution-image-synthesis-48dd98d4dcc2
生成高分辨率pdf
總結
以上是生活随笔為你收集整理的生成高分辨率pdf_用于高分辨率图像合成的生成变分自编码器的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 目标检测 dcn v2_使用Detect
- 下一篇: 酷似“小熊”!NASA公布火星表面图片