神经风格迁移(Neural Style Transfer)程序实现(Caffe)
前言
上次的博客寫了神經(jīng)風(fēng)格遷移(Neural Style Transfer)程序?qū)崿F(xiàn)(Keras),使用keras的一個(gè)好處就是api簡(jiǎn)單,能夠快速部署模型,使用很方便。出于學(xué)習(xí)目的,這次又使用caffe實(shí)現(xiàn)了一遍,整體思路跟前面的差不多,就不多說了。詳細(xì)可以參考論文:一個(gè)藝術(shù)風(fēng)格化的神經(jīng)網(wǎng)絡(luò)算法(A Neural Algorithm of Artistic Style)(譯)。
程序
不說廢話了,直接上代碼。
log.py
# *_*coding:utf-8 *_* # author: 許鴻斌 # 郵箱:2775751197@qq.comimport logging import sys# 獲取logger實(shí)例,如果參數(shù)為空則返回root logger logger = logging.getLogger('Test') # 指定logger輸出格式 LOG_FORMAT = "%(filename)s:%(funcName)s:%(asctime)s.%(msecs)03d -- %(message)s" # formatter = logging.Formatter('%(asctime)s %(levelname)-8s: %(message)s') formatter = logging.Formatter(LOG_FORMAT) # 文件日志 # file_handler = logging.FileHandler("test.log") # file_handler.setFormatter(formatter) # 可以通過setFormatter指定輸出格式 # 控制臺(tái)日志 console_handler = logging.StreamHandler(sys.stdout) console_handler.formatter = formatter # 也可以直接給formatter賦值 # 為logger添加的日志處理器 # logger.addHandler(file_handler) logger.addHandler(console_handler) # 指定日志的最低輸出級(jí)別,默認(rèn)為WARN級(jí)別 logger.setLevel(logging.INFO)style_transfer.py
# *_*coding:utf-8 *_* # author: 許鴻斌 # 郵箱:2775751197@qq.com# 日志模塊 from log import logger# 導(dǎo)入庫 import argparse import os import sys import timeit import logging# 導(dǎo)入caffe caffe_root = '/home/xhb/caffe/caffe' pycaffe_root = os.path.join(caffe_root, 'python') sys.path.append(pycaffe_root) import caffeimport numpy as np import progressbar as pb from scipy.fftpack import ifftn from scipy.linalg.blas import sgemm from scipy.misc import imsave from scipy.optimize import minimize from skimage import img_as_ubyte from skimage.transform import rescale# numeric constants INF = np.float32(np.inf) STYLE_SCALE = 1.2# 幾個(gè)CNN框架:VGG19、VGG16、GOOGLENET、CAFFENET # 定義了從特定層上取出特征譜作為內(nèi)容輸出或者風(fēng)格輸出 # 默認(rèn)會(huì)使用VGG16 VGG19_WEIGHTS = {"content": {"conv4_2": 1},"style": {"conv1_1": 0.2,"conv2_1": 0.2,"conv3_1": 0.2,"conv4_1": 0.2,"conv5_1": 0.2}} VGG16_WEIGHTS = {"content": {"conv4_2": 1},"style": {"conv1_1": 0.2,"conv2_1": 0.2,"conv3_1": 0.2,"conv4_1": 0.2,"conv5_1": 0.2}} GOOGLENET_WEIGHTS = {"content": {"conv2/3x3": 2e-4,"inception_3a/output": 1-2e-4},"style": {"conv1/7x7_s2": 0.2,"conv2/3x3": 0.2,"inception_3a/output": 0.2,"inception_4a/output": 0.2,"inception_5a/output": 0.2}} CAFFENET_WEIGHTS = {"content": {"conv4": 1},"style": {"conv1": 0.2,"conv2": 0.2,"conv3": 0.2,"conv4": 0.2,"conv5": 0.2}}# argparse parser = argparse.ArgumentParser(description='Neural Style Transfer', usage='xxx.py -s <style.image> -c <content_image>') parser.add_argument('-s', '--style_img', type=str, required=True, help='Style (art) image') parser.add_argument('-c', '--content_img', type=str, required=True, help='Content image') parser.add_argument('-g', '--gpu_id', default=-1, type=int, required=False, help='GPU device number') parser.add_argument('-m', '--model', default='vgg16', type=str, required=False, help='Which model to use') parser.add_argument('-i', '--init', default='content', type=str, required=False, help='initialization strategy') parser.add_argument("-r", "--ratio", default="1e4", type=str, required=False, help="style-to-content ratio") parser.add_argument("-n", "--num-iters", default=512, type=int, required=False, help="L-BFGS iterations") parser.add_argument("-l", "--length", default=512, type=float, required=False, help="maximum image length") parser.add_argument("-v", "--verbose", action="store_true", required=False, help="print minimization outputs") parser.add_argument("-o", "--output", default=None, required=False, help="output path")def _compute_style_grad(F, G, G_style, layer):"""Computes style gradient and loss from activation features."""# compute loss and gradient(Fl, Gl) = (F[layer], G[layer])c = Fl.shape[0]**-2 * Fl.shape[1]**-2El = Gl - G_style[layer]loss = c/4 * (El**2).sum()grad = c * sgemm(1.0, El, Fl) * (Fl>0)return loss, graddef _compute_content_grad(F, F_content, layer):"""Computes content gradient and loss from activation features."""# compute loss and gradientFl = F[layer]El = Fl - F_content[layer]loss = (El**2).sum() / 2grad = El * (Fl>0)return loss, graddef _compute_reprs(net_in, net, layers_style, layers_content, gram_scale=1):"""Computes representation matrices for an image."""# input data and forward pass(repr_s, repr_c) = ({}, {})net.blobs["data"].data[0] = net_innet.forward()# loop through combined set of layersfor layer in set(layers_style)|set(layers_content):F = net.blobs[layer].data[0].copy()F.shape = (F.shape[0], -1)repr_c[layer] = Fif layer in layers_style:repr_s[layer] = sgemm(gram_scale, F, F.T)return repr_s, repr_cdef style_optfn(x, net, weights, layers, reprs, ratio):"""Style transfer optimization callback for scipy.optimize.minimize().:param numpy.ndarray x:Flattened data array.:param caffe.Net net:Network to use to generate gradients.:param dict weights:Weights to use in the network.:param list layers:Layers to use in the network.:param tuple reprs:Representation matrices packed in a tuple.:param float ratio:Style-to-content ratio."""# 更新參數(shù)layers_style = weights["style"].keys() # 風(fēng)格對(duì)應(yīng)的層layers_content = weights["content"].keys() # 內(nèi)容對(duì)應(yīng)的層net_in = x.reshape(net.blobs["data"].data.shape[1:])# 計(jì)算風(fēng)格和內(nèi)容表示(G_style, F_content) = reprs(G, F) = _compute_reprs(net_in, net, layers_style, layers_content)# 反向傳播loss = 0net.blobs[layers[-1]].diff[:] = 0for i, layer in enumerate(reversed(layers)):next_layer = None if i == len(layers)-1 else layers[-i-2]grad = net.blobs[layer].diff[0]# 風(fēng)格部分if layer in layers_style:wl = weights["style"][layer](l, g) = _compute_style_grad(F, G, G_style, layer)loss += wl * l * ratiograd += wl * g.reshape(grad.shape) * ratio# 內(nèi)容部分if layer in layers_content:wl = weights["content"][layer](l, g) = _compute_content_grad(F, F_content, layer)loss += wl * lgrad += wl * g.reshape(grad.shape)# compute gradientnet.backward(start=layer, end=next_layer)if next_layer is None:grad = net.blobs["data"].diff[0]else:grad = net.blobs[next_layer].diff[0]# format gradient for minimize() functiongrad = grad.flatten().astype(np.float64)return loss, gradclass StyleTransfer(object):"""Style transfer class."""def __init__(self, model_name, use_pbar=True):"""Initialize the model used for style transfer.:param str model_name:Model to use.:param bool use_pbar:Use progressbar flag."""style_path = os.path.abspath(os.path.split(__file__)[0])base_path = os.path.join(style_path, "models", model_name)# 導(dǎo)入各模型的結(jié)構(gòu)文件、預(yù)訓(xùn)練權(quán)重;均值文件為ImageNet數(shù)據(jù)集圖片的均值,訓(xùn)練時(shí)減去;# vgg19if model_name == 'vgg19':model_file = os.path.join(base_path, 'VGG_ILSVRC_19_layers_deploy.prototxt')pretrained_file = os.path.join(base_path, 'VGG_ILSVRC_19_layers.caffemodel')mean_file = os.path.join(base_path, 'ilsvrc_2012_mean.npy')weights = VGG19_WEIGHTS# vgg16elif model_name == 'vgg16':model_file = os.path.join(base_path, 'VGG_ILSVRC_16_layers_deploy.prototxt')pretrained_file = os.path.join(base_path, 'VGG_ILSVRC_16_layers.caffemodel')mean_file = os.path.join(base_path, 'ilsvrc_2012_mean.npy')weights = VGG16_WEIGHTS# googlenetelif model_name == 'googlenet':model_file = os.path.join(base_path, 'deploy.prototxt')pretrained_file = os.path.join(base_path, 'bvlc_googlenet.caffemodel')mean_file = os.path.join(base_path, 'ilsvrc_2012_mean.npy')weights = GOOGLENET_WEIGHTS# caffenetelif model_name == 'caffenet':model_file = os.path.join(base_path, 'deploy.prototxt')pretrained_file = os.path.join(base_path, 'bvlc_reference_caffenet.caffemodel')mean_file = os.path.join(base_path, 'ilsvrc_2012_mean.npy')weights = CAFFENET_WEIGHTSelse:assert False, 'Model not available'# 添加模型和權(quán)重self.load_model(model_file, pretrained_file, mean_file)self.weights = weights# 找出屬于'style'和'content'的層,存放在layers列表中self.layers = []for layer in self.net.blobs:if layer in self.weights['style'] or layer in self.weights['content']:self.layers.append(layer)self.use_pbar = use_pbar# 設(shè)置回調(diào)函數(shù)if self.use_pbar:def callback(xk):self.grad_iter += 1try:self.pbar.update(self.grad_iter)except:self.pbar.finished = Trueif self._callback is not None:net_in = xk.reshape(self.net.blobs['data'].data.shape[1:])self._callback(self.transformer.deprocess('data', net_in))else:def callback(xk):if self._callback is not None:net_in = xk.reshape(self.net.blobs['data'].data.shape[1:])self._callback(self.transformer.deprocess('data', net_in))self.callback = callbackdef load_model(self, model_file, pretrained_file, mean_file):"""Loads specified model from caffe install (see caffe docs).:param str model_file:Path to model protobuf.:param str pretrained_file:Path to pretrained caffe model.:param str mean_file:Path to mean file."""# caffe中導(dǎo)入網(wǎng)絡(luò)# 抑制了在控制臺(tái)打印的輸出,也就是去掉了caffe自己默認(rèn)會(huì)打印的那一堆信息null_fds = os.open(os.devnull, os.O_RDWR)out_orig = os.dup(2)os.dup2(null_fds, 2)net = caffe.Net(str(model_file), str(pretrained_file), caffe.TEST) # 導(dǎo)入模型os.dup2(out_orig, 2)os.close(null_fds)# 配置輸入數(shù)據(jù)格式transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})transformer.set_mean('data', np.load(mean_file).mean(1).mean(1)) # 求均值transformer.set_channel_swap('data', (2, 1, 0)) transformer.set_transpose('data', (2, 0, 1)) transformer.set_raw_scale('data', 255)self.net = netself.transformer = transformerdef get_generated(self):"""Saves the generated image (net input, after optimization).:param str path:Output path."""data = self.net.blobs["data"].dataimg_out = self.transformer.deprocess('data', data)return img_outdef _rescale_net(self, img):"""Rescales the network to fit a particular image."""# get new dimensions and rescale net + transformernew_dims = (1, img.shape[2]) + img.shape[:2]self.net.blobs["data"].reshape(*new_dims)self.transformer.inputs["data"] = new_dimsdef _make_noise_input(self, init):"""Creates an initial input (generated) image."""# specify dimensions and create grid in Fourier domaindims = tuple(self.net.blobs["data"].data.shape[2:]) + \(self.net.blobs["data"].data.shape[1], ) # (height, width, channels)grid = np.mgrid[0:dims[0], 0:dims[1]]# create frequency representation for pink noiseSf = (grid[0] - (dims[0]-1)/2.0) ** 2 + \(grid[1] - (dims[1]-1)/2.0) ** 2Sf[np.where(Sf == 0)] = 1Sf = np.sqrt(Sf)Sf = np.dstack((Sf**int(init),)*dims[2])# apply ifft to create pink noise and normalizeifft_kernel = np.cos(2*np.pi*np.random.randn(*dims)) + \1j*np.sin(2*np.pi*np.random.randn(*dims))img_noise = np.abs(ifftn(Sf * ifft_kernel))img_noise -= img_noise.min()img_noise /= img_noise.max()# preprocess the pink noise imagex0 = self.transformer.preprocess("data", img_noise)return x0def _create_pbar(self, max_iter):"""Creates a progress bar."""self.grad_iter = 0self.pbar = pb.ProgressBar()self.pbar.widgets = ["Optimizing: ", pb.Percentage(), " ", pb.Bar(marker=pb.AnimatedMarker())," ", pb.ETA()]self.pbar.maxval = max_iterdef transfer_style(self, img_style, img_content, length=512, ratio=1e5,n_iter=512, init="-1", verbose=False, callback=None):"""Transfers the style of the artwork to the input image.:param numpy.ndarray img_style:A style image with the desired target style.:param numpy.ndarray img_content:A content image in floating point, RGB format.:param function callback:A callback function, which takes images at iterations."""# 求出'data'層的寬和高較小的一個(gè)orig_dim = min(self.net.blobs["data"].shape[2:])# 調(diào)整圖像尺寸scale = max(length / float(max(img_style.shape[:2])),orig_dim / float(min(img_style.shape[:2])))img_style = rescale(img_style, STYLE_SCALE*scale)scale = max(length / float(max(img_content.shape[:2])),orig_dim / float(min(img_content.shape[:2])))img_content = rescale(img_content, scale)self._rescale_net(img_style) # 調(diào)整風(fēng)格圖像尺寸,設(shè)為輸入layers = self.weights["style"].keys() # 取出風(fēng)格表示所對(duì)應(yīng)的特定層的名字,存在layers里面net_in = self.transformer.preprocess("data", img_style) # 對(duì)風(fēng)格圖像預(yù)處理,處理成'data'層可接受的格式gram_scale = float(img_content.size)/img_style.size # gram矩陣的維度# 計(jì)算風(fēng)格表示G_style = _compute_reprs(net_in, self.net, layers, [],gram_scale=1)[0]self._rescale_net(img_content) # 調(diào)整內(nèi)容圖像尺寸,設(shè)為輸入layers = self.weights["content"].keys() # 取出內(nèi)容表示所對(duì)應(yīng)的特定層的名字,存在layers里面net_in = self.transformer.preprocess("data", img_content) # 對(duì)內(nèi)容圖像預(yù)處理,處理成'data'層可接受的格式# 計(jì)算內(nèi)容表示F_content = _compute_reprs(net_in, self.net, [], layers)[1]# 初始化網(wǎng)絡(luò)輸入# 如果是numpy數(shù)組,則視作圖像,直接將其作為輸入;# 如果是"content",則將內(nèi)容圖像作為圖像輸入;# 如果是"mixed",則將其內(nèi)容圖像與風(fēng)格圖像乘以一定權(quán)重輸入;# 其他情況,隨機(jī)初始化噪聲作為輸入。if isinstance(init, np.ndarray):img0 = self.transformer.preprocess("data", init)elif init == "content":img0 = self.transformer.preprocess("data", img_content)elif init == "mixed":img0 = 0.95*self.transformer.preprocess("data", img_content) + \0.05*self.transformer.preprocess("data", img_style)else:img0 = self._make_noise_input(init)# compute data boundsdata_min = -self.transformer.mean["data"][:,0,0]data_max = data_min + self.transformer.raw_scale["data"]data_bounds = [(data_min[0], data_max[0])] * int(img0.size / 3) + \[(data_min[1], data_max[1])] * int(img0.size / 3) + \[(data_min[2], data_max[2])] * int(img0.size / 3)# 優(yōu)化問題相關(guān)參數(shù)grad_method = "L-BFGS-B"reprs = (G_style, F_content)minfn_args = {"args": (self.net, self.weights, self.layers, reprs, ratio),"method": grad_method, "jac": True, "bounds": data_bounds,"options": {"maxcor": 8, "maxiter": n_iter, "disp": verbose}}# 求解優(yōu)化問題self._callback = callbackminfn_args["callback"] = self.callbackif self.use_pbar and not verbose:self._create_pbar(n_iter)self.pbar.start()res = minimize(style_optfn, img0.flatten(), **minfn_args).nitself.pbar.finish()else:res = minimize(style_optfn, img0.flatten(), **minfn_args).nitreturn resdef main(args):# set level of loggerlevel = logging.INFO if args.verbose else logging.DEBUGlogger.setLevel(level)logger.info('Starting style transfer.')# 設(shè)置模式:CPU/GPU,默認(rèn)CPUif args.gpu_id == -1:caffe.set_mode_cpu()logger.info('Caffe setted on CPU.')else:caffe.set_device(args.gpu_id)caffe.set_mode_gpu()logger.info('Caffe setted on GPU {}'.format(args.gpu_id))# 導(dǎo)入圖像style_img = caffe.io.load_image(args.style_img)content_img = caffe.io.load_image(args.content_img)logger.info('Successfully loaded images.')# artistic style classuse_pbar = not args.verbosest = StyleTransfer(args.model.lower(), use_pbar=use_pbar)logging.info("Successfully loaded model {0}.".format(args.model))# 調(diào)用style transfer函數(shù)start = timeit.default_timer()n_iters = st.transfer_style(style_img, content_img, length=args.length, init=args.init, ratio=np.float(args.ratio), n_iter=args.num_iters, verbose=args.verbose)end = timeit.default_timer()logging.info("Ran {0} iterations in {1:.0f}s.".format(n_iters, end-start))img_out = st.get_generated()# 生成圖片輸出路徑if args.output is not None:out_path = args.outputelse:out_path_fmt = (os.path.splitext(os.path.split(args.content_img)[1])[0], os.path.splitext(os.path.split(args.style_img)[1])[0], args.model, args.init, args.ratio, args.num_iters)out_path = "outputs/{0}-{1}-{2}-{3}-{4}-{5}.jpg".format(*out_path_fmt)# 保存生成的藝術(shù)風(fēng)格圖片imsave(out_path, img_as_ubyte(img_out))logging.info("Output saved to {0}.".format(out_path))if __name__ == '__main__':args = parser.parse_args()main(args)補(bǔ)充說明
還有幾點(diǎn)補(bǔ)充說明的:
caffe路徑
一定要編譯好pycaffe,目錄指定到caffe的根目錄。
# 導(dǎo)入caffe caffe_root = '/home/xhb/caffe/caffe' # 自行修改caffe的根目錄 pycaffe_root = os.path.join(caffe_root, 'python') sys.path.append(pycaffe_root) import caffe模型文件
因?yàn)闀?huì)用到在ImageNet下預(yù)訓(xùn)練好的模型文件,統(tǒng)一保存在models目錄中。
我會(huì)把百度云鏈接放在最后,自行下載即可。
圖片
隨便找一些測(cè)試圖像即可,但是注意要放到能找到的路徑里。
網(wǎng)上隨便找的一些圖片:
內(nèi)容圖片
風(fēng)格圖片
最后生成的藝術(shù)風(fēng)格圖片
運(yùn)行腳本
python style_transfer.py -s 風(fēng)格圖片路徑 -c 內(nèi)容圖片路徑還有其他參數(shù)可以配置,一般用默認(rèn)值就足夠了。
后記
僅作學(xué)習(xí)交流用,如有事請(qǐng)私信。如果有的博文評(píng)論不了,請(qǐng)不要把評(píng)論發(fā)在不相干的地方,請(qǐng)直接私信。重要的事情說兩遍!(o′ω`o)
完整工程:
鏈接:https://pan.baidu.com/s/1O11yEuAn4vRdBUMXW8djkQ 密碼:sto6
由于caffemodel文件較大,所以里面沒有把caffemodel放進(jìn)去,需要自行下載。
預(yù)訓(xùn)練權(quán)重文件:
googlenet:http://dl.caffe.berkeleyvision.org/bvlc_googlenet.caffemodel
alexnet:http://dl.caffe.berkeleyvision.org/bvlc_reference_caffenet.caffemodel
vgg16:http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_16_layers.caffemodel
vgg19:http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_19_layers.caffemodel
總結(jié)
以上是生活随笔為你收集整理的神经风格迁移(Neural Style Transfer)程序实现(Caffe)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 神经风格迁移(Neural Style
- 下一篇: 迭代最近点(Iterative Clos