當(dāng)前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

图像合成与风格转换实战

發(fā)布時間：2023/11/28 生活经验 40 豆豆

生活随笔收集整理的這篇文章主要介紹了图像合成与风格转换实战小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

圖像合成與風(fēng)格轉(zhuǎn)換實(shí)戰(zhàn)

神經(jīng)式轉(zhuǎn)移 Neural Style Transfer

如果使用社交分享應(yīng)用程序或者碰巧是個業(yè)余攝影師，對過濾器很熟悉。濾鏡可以改變照片的顏色樣式，使背景更清晰或人的臉更白。然而，過濾器通常只能改變照片的一個方面。要創(chuàng)建理想的照片，通常需要嘗試多種不同的過濾器組合。這個過程就像調(diào)整模型的超參數(shù)一樣復(fù)雜。

在本文中，將討論如何使用卷積神經(jīng)網(wǎng)絡(luò)（CNNs）自動將一個圖像的樣式應(yīng)用到另一個圖像，這一操作稱為樣式傳輸。這里，需要兩個輸入圖像，一個內(nèi)容圖像和一個樣式圖像。使用神經(jīng)網(wǎng)絡(luò)來改變內(nèi)容圖像，使其樣式與樣式圖像一致。在圖1中，內(nèi)容圖片是作者在西雅圖附近的雷尼爾山國家部分拍攝的風(fēng)景照片。風(fēng)格意象是一幅秋天橡樹油畫。輸出的復(fù)合圖像保留了內(nèi)容圖像中對象的整體形狀，但應(yīng)用了風(fēng)格圖像的油畫筆觸，使整體色彩更加生動。

Fig. 1 Content and style input images and composite image produced by style transfer.

1．技術(shù)

基于CNN的風(fēng)格轉(zhuǎn)換模型如圖2所示。首先，初始化合成圖像。例如，可以將其初始化為內(nèi)容圖像。此合成圖像是樣式傳遞過程中唯一需要更新的變量，即樣式傳遞中要更新的模型參數(shù)。然后，選取一個預(yù)先訓(xùn)練好的CNN進(jìn)行圖像特征提取。這些模型參數(shù)在訓(xùn)練期間不需要更新。deepcnn使用多個神經(jīng)層連續(xù)提取圖像特征。可以選擇某些圖層的輸出用作內(nèi)容特征或樣式特征。如果使用圖2中的結(jié)構(gòu)，預(yù)訓(xùn)練的神經(jīng)網(wǎng)絡(luò)包含三個卷積層。第二層輸出圖像內(nèi)容特征，而第一層和第三層的輸出用作樣式特征。接下來，使用正向傳播（在實(shí)線方向）來計算樣式傳遞損失函數(shù)，而反向傳播（在虛線方向）來更新模型參數(shù)，不斷更新合成圖像。風(fēng)格轉(zhuǎn)換中使用的損失函數(shù)一般有三個部分：

第一，內(nèi)容丟失用于使合成圖像在內(nèi)容特征方面與內(nèi)容圖像近似。

第二，樣式丟失是指通過樣式特征使合成圖像接近樣式圖像。

第三. 總變異損失有助于減少合成圖像中的噪聲。最后，在完成模型訓(xùn)練后，輸出風(fēng)格轉(zhuǎn)換模型參數(shù)，得到最終的合成圖像。

Fig. 2 CNN-based style transfer process. Solid lines show the direction of forward propagation and dotted lines show backward propagation.

接下來，將進(jìn)行一個實(shí)驗(yàn)，幫助更好地理解風(fēng)格轉(zhuǎn)換的技術(shù)細(xì)節(jié)。

首先，閱讀內(nèi)容和風(fēng)格圖像。通過打印出圖像坐標(biāo)軸，可以看到有不同的尺寸。

%matplotlib inline

from d2l
import mxnet as d2l

from mxnet
import autograd, gluon, image, init, np, npx

from mxnet.gluon
import nn

npx.set_np()

d2l.set_figsize((3.5, 2.5))

content_img = image.imread(’…/img/rainier.jpg’)

d2l.plt.imshow(content_img.asnumpy());

style_img = image.imread(’…/img/autumn_oak.jpg’)

d2l.plt.imshow(style_img.asnumpy());

3. Preprocessing and Postprocessing

下面，定義圖像預(yù)處理和后處理的函數(shù)。預(yù)處理功能對輸入圖像的三個RGB通道中的每一個進(jìn)行規(guī)范化，并將結(jié)果轉(zhuǎn)換為可以輸入到CNN的格式。后處理函數(shù)將輸出圖像中的像素值恢復(fù)為標(biāo)準(zhǔn)化之前的原始值。因?yàn)閳D像打印功能要求每個像素都有一個從0到1的浮點(diǎn)值，所以使用clip函數(shù)將小于0或大于1的值分別替換為0或1。

rgb_mean = np.array([0.485, 0.456, 0.406])

rgb_std = np.array([0.229, 0.224, 0.225])

def preprocess(img,
image_shape):

img = image.imresize(img, *image_shape)img = (img.astype('float32') / 255 - rgb_mean) / rgb_stdreturn

np.expand_dims(img.transpose(2, 0, 1), axis=0)

def postprocess(img):

img = img[0].as_in_ctx(rgb_std.ctx)

return (img.transpose(1, 2, 0) * rgb_std + rgb_mean).clip(0, 1)

Extracting Features

使用在ImageNet數(shù)據(jù)集上預(yù)先訓(xùn)練的VGG-19模型來提取圖像特征。

pretrained_net = gluon.model_zoo.vision.vgg19(pretrained=True)

為了提取圖像的內(nèi)容和風(fēng)格特征，可以選擇VGG網(wǎng)絡(luò)中某些層的輸出。一般來說，輸出離輸入層越近，提取圖像細(xì)節(jié)信息就越容易。輸出越遠(yuǎn)，提取全局信息就越容易。為了防止合成圖像從內(nèi)容圖像中保留太多細(xì)節(jié)，在輸出層附近選擇VGG網(wǎng)絡(luò)層來輸出圖像的內(nèi)容特征。這個層叫做內(nèi)容層。還從VGG網(wǎng)絡(luò)中選擇不同層的輸出，以匹配本地和全局樣式。這些被稱為樣式層。VGG網(wǎng)絡(luò)有五個卷積塊。在這個實(shí)驗(yàn)中，選擇第四個卷積塊的最后一個卷積層作為內(nèi)容層，每個塊的第一層作為樣式層。可以通過打印預(yù)訓(xùn)練后的網(wǎng)絡(luò)實(shí)例來獲得這些層的索引。

style_layers, content_layers = [0, 5, 10, 19, 28], [25]

在特征提取過程中，只需要使用從輸入層到最接近輸出層的內(nèi)容或樣式層的所有VGG層。下面，構(gòu)建一個新的網(wǎng)絡(luò)，net，只保留需要使用的VGG網(wǎng)絡(luò)中的層。然后使用net來提取特征。

net = nn.Sequential()

for i in range(max(content_layers + style_layers) + 1):

net.add(pretrained_net.features[i])

給定輸入X，如果只調(diào)用前向計算網(wǎng)（X），則只能得到最后一層的輸出。因?yàn)檫€需要中間層的輸出，所以需要執(zhí)行逐層計算并保留內(nèi)容和樣式層輸出。

def extract_features(X, content_layers, style_layers):

contents = []styles = []for i in range(len(net)):X = net[i](X)if i in style_layers:styles.append(X)if i in content_layers:contents.append(X)return contents, styles

接下來，定義了兩個函數(shù)：get_contents函數(shù)獲取從內(nèi)容圖像中提取的內(nèi)容特征，而get_styles函數(shù)獲取從樣式圖像中提取的樣式特征。由于在訓(xùn)練過程中不需要改變預(yù)先訓(xùn)練的VGG模型的參數(shù)，所以可以在訓(xùn)練開始前從內(nèi)容圖像中提取內(nèi)容特征，從樣式圖像中提取風(fēng)格特征。由于合成圖像是樣式轉(zhuǎn)換過程中必須更新的模型參數(shù)，因此只能在訓(xùn)練過程中調(diào)用extract_features函數(shù)來提取合成圖像的內(nèi)容和樣式特征。

def get_contents(image_shape, ctx):

content_X = preprocess(content_img, image_shape).copyto(ctx)contents_Y, _ = extract_features(content_X, content_layers, style_layers)return content_X, contents_Y

def get_styles(image_shape, ctx):

style_X = preprocess(style_img, image_shape).copyto(ctx)_, styles_Y = extract_features(style_X, content_layers, style_layers)return style_X, styles_Y

Defining the Loss Function

接下來，將研究用于樣式轉(zhuǎn)換的損失函數(shù)。損失函數(shù)包括內(nèi)容損失、風(fēng)格損失和總變化損失。

5.1. Content Loss

與線性回歸中使用的損失函數(shù)類似，內(nèi)容丟失使用平方誤差函數(shù)來測量合成圖像和內(nèi)容圖像之間內(nèi)容特征的差異。平方誤差函數(shù)的兩個輸入都是從提取特征extract_features函數(shù)獲得的內(nèi)容層輸出。

def content_loss(Y_hat, Y):

return np.square(Y_hat - Y).mean()

5.2. Style Loss

def gram(X):

num_channels, n = X.shape[1], X.size // X.shape[1]X = X.reshape(num_channels, n)return np.dot(X, X.T) / (num_channels * n)

自然地，樣式丟失的平方誤差函數(shù)的兩個Gram矩陣輸入來自合成圖像和樣式圖像樣式層輸出。這里，假設(shè)已經(jīng)預(yù)先計算了樣式圖像的Gram矩陣。

def style_loss(Y_hat, gram_Y):

return np.square(gram(Y_hat) - gram_Y).mean()

5.3. Total Variance Loss

有時，學(xué)習(xí)到的合成圖像有很多高頻噪聲，特別是亮像素或暗像素。一種常用的降噪方法是全變差去噪。假設(shè)xi,j表示（i，j）坐標(biāo)處的像素值，因此總方差損失為：

盡量使相鄰像素的值盡可能相似。

def tv_loss(Y_hat):

return 0.5 * (np.abs(Y_hat[:, :, 1:, :] - Y_hat[:, :, :-1, :]).mean() +np.abs(Y_hat[:, :, :, 1:] - Y_hat[:, :, :, :-1]).mean())

5.4. The Loss Function

風(fēng)格轉(zhuǎn)移的損失函數(shù)是內(nèi)容損失、風(fēng)格損失和總方差損失的加權(quán)和。通過調(diào)整這些權(quán)重超參數(shù)，可以根據(jù)保留內(nèi)容、傳輸樣式和降噪的相對重要性來平衡。

content_weight, style_weight, tv_weight = 1, 1e3, 10

def compute_loss(X, contents_Y_hat, styles_Y_hat, contents_Y, styles_Y_gram):

# Calculate the content, style, and total variance losses respectivelycontents_l = [content_loss(Y_hat, Y) * content_weight for Y_hat, Y in zip(contents_Y_hat, contents_Y)]styles_l = [style_loss(Y_hat, Y) * style_weight for Y_hat, Y in zip(styles_Y_hat, styles_Y_gram)]tv_l = tv_loss(X) * tv_weight# Add up all the lossesl = sum(styles_l + contents_l + [tv_l])return contents_l, styles_l, tv_l, l

Creating and Initializing the Composite Image

在樣式傳輸中，合成圖像是唯一需要更新的變量。因此，可以定義一個簡單的模型，生成圖像，并將合成圖像作為模型參數(shù)。在模型中，正向計算只返回模型參數(shù)。

class GeneratedImage(nn.Block):

def __init__(self, img_shape, **kwargs):super(GeneratedImage, self).__init__(**kwargs)self.weight = self.params.get('weight', shape=img_shape)def forward(self):return self.weight.data()

接下來，定義get_inits函數(shù)。此函數(shù)創(chuàng)建一個復(fù)合圖像模型實(shí)例，并將其初始化為圖像X。在訓(xùn)練之前，將計算樣式圖像的各個樣式層的Gram矩陣styles_Y_gram。

def get_inits(X, ctx, lr, styles_Y):

gen_img = GeneratedImage(X.shape)gen_img.initialize(init.Constant(X), ctx=ctx, force_reinit=True)trainer = gluon.Trainer(gen_img.collect_params(), 'adam',{'learning_rate': lr})styles_Y_gram = [gram(Y) for Y in styles_Y]return gen_img(), styles_Y_gram,

Training

在模型訓(xùn)練過程中，不斷提取合成圖像的內(nèi)容和風(fēng)格特征，并計算損失函數(shù)。同步函數(shù)如何強(qiáng)制前端等待計算結(jié)果。因?yàn)橹幻扛?0個時間段調(diào)用一次asscalar同步函數(shù)，這個過程可能會占用大量內(nèi)存。因此，在每個時間段期間調(diào)用waitall同步函數(shù)。

def train(X, contents_Y, styles_Y, ctx, lr, num_epochs, lr_decay_epoch):

X, styles_Y_gram, trainer = get_inits(X, ctx, lr, styles_Y)animator = d2l.Animator(xlabel='epoch', ylabel='loss',xlim=[1, num_epochs],legend=['content', 'style', 'TV'],ncols=2, figsize=(7, 2.5))for epoch in range(1, num_epochs+1):with autograd.record():contents_Y_hat, styles_Y_hat = extract_features(X, content_layers, style_layers)contents_l, styles_l, tv_l, l = compute_loss(X, contents_Y_hat, styles_Y_hat, contents_Y, styles_Y_gram)l.backward()trainer.step(1)npx.waitall()if epoch % lr_decay_epoch == 0:trainer.set_learning_rate(trainer.learning_rate * 0.1)if epoch % 10 == 0:animator.axes[1].imshow(postprocess(X).asnumpy())animator.add(epoch, [float(sum(contents_l)),float(sum(styles_l)),float(tv_l)])return X

接下來，開始訓(xùn)練模型。首先，將內(nèi)容和樣式圖像的高度和寬度設(shè)置為150×225像素。使用合成圖像初始化內(nèi)容。

ctx, image_shape = d2l.try_gpu(), (225, 150)

net.collect_params().reset_ctx(ctx)

content_X, contents_Y = get_contents(image_shape, ctx)

_, styles_Y = get_styles(image_shape, ctx)

output = train(content_X, contents_Y, styles_Y, ctx, 0.01, 500, 200)

如所見，合成圖像保留了內(nèi)容圖像的風(fēng)景和對象，同時引入了樣式圖像的顏色。因?yàn)閳D像比較小，細(xì)節(jié)有點(diǎn)模糊。為了獲得更清晰的合成圖像，使用更大的圖像尺寸訓(xùn)練模型：900×600。將之前使用的圖像的高度和寬度增加四倍，并初始化更大的合成圖像。

image_shape = (900, 600)

_, content_Y = get_contents(image_shape, ctx)

_, style_Y = get_styles(image_shape, ctx)

X = preprocess(postprocess(output) * 255, image_shape)

output = train(X, content_Y, style_Y, ctx, 0.01, 300, 100)

d2l.plt.imsave(’…/img/neural-style.png’, postprocess(output).asnumpy())

如所見，由于圖像尺寸較大，每個紀(jì)元花費(fèi)的時間更長。如圖3所示，合成圖像因其尺寸較大而保留了更多細(xì)節(jié)。合成圖像不僅有像樣式圖像那樣的大色塊，而且這些塊甚至具有畫筆筆觸的微妙紋理。

Fig. 3 900×600 composite image.

Summary
The loss functions used in style transfer generally have three parts:
Content loss is used to make the composite image approximate the content image as regards content features.
Style loss is used to make the composite image approximate
the style image in terms of style features.
Total variation loss helps reduce the noise in the composite image.
We can use a pre-trained CNN to extract image features and minimize the loss function to continuously update the composite image.
We use a Gram matrix to represent the style output by the style layers.

總結(jié)

以上是生活随笔為你收集整理的图像合成与风格转换实战的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：基于区域的CNN（R-CNN）
下一篇： Kaggle上的犬种识别（ImageNe

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

生活经验

图像合成与风格转换实战

總結(jié)