當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

SimGAN-Captcha代码阅读与复现

發(fā)布時間：2025/3/15 编程问答 20 豆豆

生活随笔收集整理的這篇文章主要介紹了 SimGAN-Captcha代码阅读与复现小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

項目介紹

項目地址：戳這里大概的講一下這個項目的起因是大神要參加HackMIT,需要他們在15000張驗證碼中識別出10000張或者每個字符的識別準確率要到90%。然后他不想標注數(shù)據(jù)(就是這么任性~)。于是決定先自己生成一批驗證碼(synthesizer合成器)，然后把這些驗證碼用一個refiner(GAN)去對這批合成的驗證碼做一些調(diào)整讓它們看起來和真實的訓練樣本的樣式差不多。這樣他就相當于有了一批標注好的驗證碼，用這部分的標注驗證碼去訓練一個分類器，然后對需要hack的15000張圖片做分類。借鑒的paper是Apple在2016年發(fā)的，戳這里。但是呢，他發(fā)現(xiàn)他的這批數(shù)據(jù)訓練出來的模型對真實樣本的準確率只有55%，于是他讓一個同學標注了4000張要hack的圖片(這個同學原本打算標注10000張)，最后開開心心的一張圖片都沒標注的有了參加這個比賽的資格。

下面如果不想關(guān)注paper細節(jié)可以跳過這部分，直接到項目代碼這一塊就可以。

Overview

下圖是paper中的總體結(jié)構(gòu)。paper中是要合成和訓練集相似的眼睛圖片。

Overview.jpg

模擬器先合成一些圖片(Synthetic),然后用一個Refiner對這個圖片進行refine(改善，調(diào)整)，再用一個判別器(discriminator)去判別refine之后的圖片和真實的但沒有標注的圖片。目標是讓判別器沒有辦法區(qū)分真實圖片和refine出來的圖片。那么我們就可以用模擬器生成一批有標注的數(shù)據(jù)，然后用refiner去進行修正，得到的圖片就和原始的訓練數(shù)據(jù)集很相近了。

Objective

這里簡要的概述一下模型需要用到的損失函數(shù)。Simulated+Unsupervised learning要用一些沒有標注的的真實圖片Y來學習一個Refiner，這個Refiner進一步用來refine我們的合成圖片X。關(guān)鍵點是需要讓合成的圖片x'看起來和真實的圖片差不多，并且還要保留標注的信息。比如你要讓你的合成圖片的紋理和真實圖片的紋理是一樣的，同時你不能丟失合成圖片的內(nèi)容信息(realism)(驗證碼上面的數(shù)字字母)。因此有兩個loss需要Refiner去優(yōu)化:

上圖中的l_real指的是refine之后的合成圖片(x_i')和真實圖片Y之間的loss。l_reg是原始合成圖片x_i和被refine之后的合成圖片的x_i'之間的loss。lambda是一個高參。

Refiner的目標就是盡可能的糊弄判別器D，讓判別器沒有辦法區(qū)分一個圖片是real還是合成的。判別器D的目標正好相反，是盡可能的能夠區(qū)分出來。那么判別器的loss是這樣的:

這個是一個二分類的交叉熵，D(.)是輸入圖片是合成圖片的概率，1-D(.)就是輸入圖片是真實圖片的概率。換句話說，如果輸入的圖片是合成圖片，那么loss就是前半部分，如果輸入是真實圖片，loss就是后半部分。在實現(xiàn)的細節(jié)里面，當輸入是合成圖片x_i那么label就是1，反之為0。并且每個mini-batch當中，我們會隨機采樣一部分的真實圖片和一部分的合成圖片。模型方面用了ConvNet，最后一層輸出是sample是合成圖片的概率。最后用SGD來更新參數(shù)。(這里的判別器就是用了一個卷積網(wǎng)絡(luò)，然后加了一個binary_categorical_crossentropy，再用SGD降低loss)。

那么和判別器目標相反，refiner應(yīng)該是迫使判別器沒有辦法區(qū)分refine之后的合成圖片。所以它的l_real是醬紫的：

接下來是l_reg, 為了保留原始圖片的內(nèi)容信息，我們需要一個loss來迫使模型不要把圖片修改的和原始圖片差異很大，這里引入了self-regularization loss。這個loss就是讓refine之后的圖片像素點和原始的圖片的像素點之間的差不要太大。

綜合起來refiner的loss如下:

在訓練過程中，我們分別減小refiner和discriminator的loss。在更新refiner的時候就把discriminator的參數(shù)固定住不更新，在更新discriminator的參數(shù)的時候就固定refiner的參數(shù)。

這里有兩個tricks。

local adversarial lossrefiner在學習為真實圖片建模的時候不應(yīng)該引入artifacts，當我們訓練一個強判別器的時候，refiner會傾向于強調(diào)一些圖片特征來fool當前的判別器，從而導致生成了一些artifacts。那么怎么解決呢？我可以可以觀察到如果我們從refine的合成圖片上挖出一塊(patch)，這一塊的統(tǒng)計信息(statistics)應(yīng)該是和真實圖片的統(tǒng)計信息應(yīng)該是相似的。因此，我們可以不用定義一個全局的判別器(對整張圖片判斷合成Or真實)，我們可以對圖片上的每一塊都判別一下。這樣的話，不僅僅是限定了接收域(receptive field)，也為訓練判別器提供了更多的樣本。判別器是一個全卷積網(wǎng)絡(luò)，它的輸出是w*h個patches是合成圖片的概率。所以在更新refiner的時候，我們可以把這些w*h個patches的交叉熵loss相加。

比如上面這張圖，輸出就是2*3的矩陣，每個值表示的是這塊patch是合成圖片的概率值。算loss的時候把這6塊圖片的交叉熵都加起來。

2.用refined的歷史圖片來更新判別器對抗訓練的一個問題是判別器只關(guān)注最近的refined圖片，這會引起兩個問題-對抗訓練的分散和refiner網(wǎng)絡(luò)又引進了判別器早就忘掉的artifacts。因此通過用refined的歷史圖片作為一個buffer而不單單是當前的mini-batch來更新分類器。具體方法是，在每一輪分類器的訓練中，我們先從當前的batch中采樣b/2張圖片，然后從大小為B的buffer中采樣b/2張圖片，合在一起來更新判別器的參數(shù)。然后這一輪之后，用新生成的b/2張圖片來替換掉B中的b/2張圖片。

參數(shù)細節(jié)

實現(xiàn)細節(jié):Refiner:輸入圖片55*35=> 64個3*3的filter => 4個resnet block => 1個1*1的fitler => 輸出作為合成的圖片(黑白的，所以1個通道)1個resnet block是醬紫的:

Discriminator:96個3*3filter, stride=2 => 64個3*3filter, stride = 2 => max_pool: 3*3, stride=1 => 32個3*3filter，stride=1 => 32個1*1的filter, stride=1 => 2個1*1的filter, stride=1 => softmax

我們的網(wǎng)絡(luò)都是全卷積網(wǎng)絡(luò)的，Refiner和Disriminator的最后層是很相似的(refiner的輸出是和原圖一樣大小的, discriminator要把原圖縮一下變成比如W/4 * H/4來表示這么多個patch的概率值)。首先只用self-regularization loss來訓練Refiner網(wǎng)絡(luò)1000步，然后訓練Discriminator 200步。接著每次更新一次判別器，我們都更新Refiner兩次。

算法具體細節(jié)如下:

項目代碼Overview

challenges:需要預測的數(shù)據(jù)樣本文件夾imgs: 從challenges解壓之后的圖片文件夾SimGAN-Captcha.ipynb: 整個項目的流程notebookarial-extra.otf: 模擬器生成驗證碼的字體類型avg.png: 比賽主辦方根據(jù)每個人的信息做了一些加密生成的一些線條，訓練的時候需要去掉這些線條。image_history_buffer.py:

預處理

這部分原本作者是寫了需要從某個地址把圖片對應(yīng)的base64加密的圖片下載下來，但是因為這個是去年的比賽，url已經(jīng)不管用了。所以作者把對應(yīng)的文件直接放到了challenges里面。我們直接從第二步解壓開始就可以了。因為python2和python3不太一樣，作者應(yīng)該用的是Python2，我這里給出python3版本的代碼。

解壓

每個challenges文件下下的文件都是一個json文件，包含了1000個base64加密的jpg圖片文件，所以對每一個文件，我們把base64的str解壓成一個jpeg，然后把他們放到orig文件夾下。

import requests import threading URL = "https://captcha.delorean.codes/u/rickyhan/challenge" DIR = "challenges/" NUM_CHALLENGES = 20 lock = threading.Lock()import json, base64, os IMG_DIR = "./orig" fnames = ["{}/challenge-{}".format(DIR, i) for i in range(NUM_CHALLENGES)] if not os.path.exists(IMG_DIR):os.mkdir(IMG_DIR) def save_imgs(fname):with open(fname,'r') as f:l = json.loads(f.read(), encoding="latin-1")for image in l['images']:byte_image = bytes(map(ord,image['jpg_base64']))b = base64.decodebytes(byte_image)name = image['name']with open(IMG_DIR+"/{}.jpg".format(name), 'wb') as f:f.write(b)for fname in fnames:save_imgs(fname) assert len(os.listdir(IMG_DIR)) == 1000 * NUM_CHALLENGES

解壓之后的圖片長這個樣子:

from PIL import Image imgpath = IMG_DIR + "/"+ os.listdir(IMG_DIR)[0] imgpath2 = IMG_DIR + "/"+ os.listdir(IMG_DIR)[3] im = Image.open(example_image_path) im2 = Image.open(example_image_path2) IMG_FNAMES = [IMG_DIR + '/' + p for p in os.listdir(IMG_DIR)] im img2

轉(zhuǎn)換成黑白圖片

二值圖會節(jié)省很大的計算，所以我們這里設(shè)置了一個閾值，然后把圖片一張張轉(zhuǎn)換成相應(yīng)的二值圖。(這里采用的轉(zhuǎn)換方式見下面的注釋。)

def gray(img_path):# convert to grayscale, then binarize#L = R * 299/1000 + G * 587/1000 + B * 114/1000img = Image.open(img_path).convert("L") # convert to gray scale, one 8-bit byte per pixelimg = img.point(lambda x: 255 if x > 200 or x == 0 else x) # value found through T&Eimg = img.point(lambda x: 0 if x < 255 else 255, "1") # convert to binary imageimg.save(img_path)for img_path in IMG_FNAMES:gray(img_path) im = Image.open(example_image_path) im

抽取mask

可以看到這些圖片上面都有相同的水平的線，前面講過，因為是比賽，所以這些captcha上的線都是根據(jù)參賽者的名字生成的。在現(xiàn)實生活中，我們可以用openCV的一些形態(tài)轉(zhuǎn)換函數(shù)(morphological transformation)來把這些噪音給過濾掉。這里作者用的是把所有圖片相加取平均得到了mask。他也推薦大家可以用bit mask(&=)來過濾掉。

mask = np.ones((height, width)) for im in ims:mask &= im

這里是把所有圖片相加取平均:

import numpy as np WIDTH, HEIGHT = im.size MASK_DIR = "avg.png" def generateMask():N=1000*NUM_CHALLENGESarr=np.zeros((HEIGHT, WIDTH),np.float)for fname in IMG_FNAMES:imarr=np.array(Image.open(fname),dtype=np.float)arr=arr+imarr/Narr=np.array(np.round(arr),dtype=np.uint8)out=Image.fromarray(arr,mode="L") # save as gray scaleout.save(MASK_DIR)generateMask() im = Image.open(MASK_DIR) # ok this can be done with binary mask: &= im

再修正一下

im = Image.open(MASK_DIR) im = im.point(lambda x:255 if x > 230 else x) im = im.point(lambda x:0 if x<255 else 255, "1") # 1-bit bilevel, stored with the leftmost pixel in the most significant bit. 0 means black, 1 means white. im.save(MASK_DIR) im

真實圖片的生成器

我們在訓練的時候也需要把真實的圖片丟進去，所以這里直接用keras的flow_from_directory來自動生成圖片并且把圖片做一些預處理。

from keras import models from keras import layers from keras import optimizers from keras import applications from keras.preprocessing import image import tensorflow as tf # Real data generatordatagen = image.ImageDataGenerator(preprocessing_function=applications.xception.preprocess_input# 調(diào)用imagenet_utils的preoprocess input函數(shù)# tf: will scale pixels between -1 and 1,sample-wise. )flow_from_directory_params = {'target_size': (HEIGHT, WIDTH),'color_mode': 'grayscale','class_mode': None,'batch_size': BATCH_SIZE}real_generator = datagen.flow_from_directory(directory=".",**flow_from_directory_params )

(Dumb)生成器(模擬器Simulator)

接著我們需要定義個生成器來幫我們生成(驗證碼，標注label)對，這些生成的驗證碼應(yīng)該盡可能的和真實圖片的那些比較像。

# Synthetic captcha generator from PIL import ImageFont, ImageDraw from random import choice, random from string import ascii_lowercase, digits alphanumeric = ascii_lowercase + digitsdef fuzzy_loc(locs):acc = []for i,loc in enumerate(locs[:-1]):if locs[i+1] - loc < 8:continueelse:acc.append(loc)return accdef seg(img):arr = np.array(img, dtype=np.float)arr = arr.transpose()# arr = np.mean(arr, axis=2)arr = np.sum(arr, axis=1)locs = np.where(arr < arr.min() + 2)[0].tolist()locs = fuzzy_loc(locs)return locsdef is_well_formed(img_path):original_img = Image.open(img_path)img = original_img.convert('1')return len(seg(img)) == 4noiseimg = np.array(Image.open("avg.png").convert("1")) # noiseimg = np.bitwise_not(noiseimg) fnt = ImageFont.truetype('./arial-extra.otf', 26) def gen_one():og = Image.new("1", (100,50))text = ''.join([choice(alphanumeric) for _ in range(4)])draw = ImageDraw.Draw(og)for i, t in enumerate(text):txt=Image.new('L', (40,40))d = ImageDraw.Draw(txt)d.text( (0, 0), t, font=fnt, fill=255)if random() > 0.5:w=txt.rotate(-20*(random()-1), expand=1)og.paste( w, (i*20 + int(25*random()), int(25+30*(random()-1))), w)else:w=txt.rotate(20*(random()-1), expand=1)og.paste( w, (i*20 + int(25*random()), int(20*random())), w)segments = seg(og)if len(segments) != 4:return gen_one()ogarr = np.array(og)ogarr = np.bitwise_or(noiseimg, ogarr)ogarr = np.expand_dims(ogarr, axis=2).astype(float)ogarr = np.random.random(size=(50,100,1)) * ogarrogarr = (ogarr > 0.0).astype(float) # add noisereturn ogarr, textdef synth_generator():arrs = []while True:for _ in range(BATCH_SIZE):img, text = gen_one()arrs.append(img)yield np.array(arrs)arrs = []

上面這段代碼主要是隨機產(chǎn)生了不同的字符數(shù)字，然后進行旋轉(zhuǎn)，之后把字符貼在一起，把原來的那個噪音圖片avg.png加上去，把一些重合的字符的驗證碼給去掉。這里如果發(fā)現(xiàn)有問題，強烈建議先升級一下PILLOW，debug了好久....sigh~

def get_image_batch(generator):"""keras generators may generate an incomplete batch for the last batch"""#img_batch = generator.next()img_batch = next(generator)if len(img_batch) != BATCH_SIZE:img_batch = generator.next()assert len(img_batch) == BATCH_SIZEreturn img_batch

看一下真實的圖片長什么樣子

import matplotlib.pyplot as plt %matplotlib inline imarr = get_image_batch(real_generator) imarr = imarr[0, :, :, 0] plt.imshow(imarr)

我們生成的圖片長什么樣子

imarr = get_image_batch(synth_generator())[0, :, :, 0] print imarr.shape plt.imshow(imarr)

注意上面的圖片之所以顯示的有顏色是因為用了plt.imshow, 實際上是灰白的二值圖。

這部分生成的代碼，我個人覺得讀者可以直接在github上下載一個驗證碼生成器就好，然后把圖片根據(jù)之前的步驟搞成二值圖就行，而且可以盡可能的選擇跟自己需要預測的驗證碼比較相近的字體。

模型定義

整個網(wǎng)絡(luò)一共有三個部分

Refiner
Refiner,Rθ,是一個RestNet, 它在像素維度上去修改我們生成的圖片，而不是整體的修改圖片內(nèi)容，這樣才可以保留整體圖片的結(jié)構(gòu)和標注。(要不然就尷尬了，萬一把字母a都變成別的字母標注就不準確了)

Discriminator
判別器，Dφ，是一個簡單的ConvNet, 包含了5個卷積層和2個max-pooling層，是一個二分類器，區(qū)分一個驗證碼是我們合成的還是真實的樣本集。

把他們合在一起
把refined的圖片合到判別器里面

Refiner

主要是4個resnet_block疊加在一起，最后再用一個1*1的filter來構(gòu)造一個feature_map作為生成的圖片。可以看到全部的border_mode都是same，也就是說當中任何一步的輸出都和原始的圖片長寬保持一致(fully convolution)。
一個resnet_block是醬紫的:

我們先把輸入圖片用64個3*3的filter去conv一下，得到的結(jié)果（input_features）再把它丟到4個resnet_block中去。

def refiner_network(input_image_tensor):""":param input_image_tensor: Input tensor that corresponds to a synthetic image.:return: Output tensor that corresponds to a refined synthetic image."""def resnet_block(input_features, nb_features=64, nb_kernel_rows=3, nb_kernel_cols=3):"""A ResNet block with two `nb_kernel_rows` x `nb_kernel_cols` convolutional layers,each with `nb_features` feature maps.See Figure 6 in https://arxiv.org/pdf/1612.07828v1.pdf.:param input_features: Input tensor to ResNet block.:return: Output tensor from ResNet block."""y = layers.Convolution2D(nb_features, nb_kernel_rows, nb_kernel_cols, border_mode='same')(input_features)y = layers.Activation('relu')(y)y = layers.Convolution2D(nb_features, nb_kernel_rows, nb_kernel_cols, border_mode='same')(y)y = layers.merge([input_features, y], mode='sum')return layers.Activation('relu')(y)# an input image of size w × h is convolved with 3 × 3 filters that output 64 feature mapsx = layers.Convolution2D(64, 3, 3, border_mode='same', activation='relu')(input_image_tensor)# the output is passed through 4 ResNet blocksfor _ in range(4):x = resnet_block(x)# the output of the last ResNet block is passed to a 1 × 1 convolutional layer producing 1 feature map# corresponding to the refined synthetic imagereturn layers.Convolution2D(1, 1, 1, border_mode='same', activation='tanh')(x)

Discriminator

這里注意一下subsample就是strides, 由于subsample=(2,2)所以會把圖片長寬減半,因為有兩個，所以最后的圖片會變成原來的1/16左右。比如一開始圖片大小是10050, 經(jīng)過一次變換之后是5025，再經(jīng)過一次變換之后是25*13。

最后生成了兩個feature_map，一個是用來判斷是不是real還有一個用來判斷是不是refined的。

def discriminator_network(input_image_tensor):""":param input_image_tensor: Input tensor corresponding to an image, either real or refined.:return: Output tensor that corresponds to the probability of whether an image is real or refined."""x = layers.Convolution2D(96, 3, 3, border_mode='same', subsample=(2, 2), activation='relu')(input_image_tensor)x = layers.Convolution2D(64, 3, 3, border_mode='same', subsample=(2, 2), activation='relu')(x)x = layers.MaxPooling2D(pool_size=(3, 3), border_mode='same', strides=(1, 1))(x)x = layers.Convolution2D(32, 3, 3, border_mode='same', subsample=(1, 1), activation='relu')(x)x = layers.Convolution2D(32, 1, 1, border_mode='same', subsample=(1, 1), activation='relu')(x)x = layers.Convolution2D(2, 1, 1, border_mode='same', subsample=(1, 1), activation='relu')(x)# here one feature map corresponds to `is_real` and the other to `is_refined`,# and the custom loss function is then `tf.nn.sparse_softmax_cross_entropy_with_logits`return layers.Reshape((-1, 2))(x) # (batch_size, # of local patches, 2)

把它們合起來

refiner 加到discriminator中去。這里有兩個loss:

self_regularization_loss
論文中是這么寫的: The self-regularization term minimizes the image difference
between the synthetic and the refined images. 就是用來控制refine的圖片不至于跟原來的圖片差別太大，由于paper中沒有具體寫公式，但是大致就是讓生成的像素值和原始圖片的像素值之間的距離不要太大。這里項目的原作者是用了:

def self_regularization_loss(y_true, y_pred):delta = 0.0001 # FIXME: need to figure out an appropriate value for thisreturn tf.multiply(delta, tf.reduce_sum(tf.abs(y_pred - y_true)))

y_true: 丟到refiner里面的input_image_tensor
y_pred: refiner的output
這里的delta是用來控制這個loss的權(quán)重，論文里面是lambda。
整個loss就是把refiner的輸入圖片和輸出圖片的每個像素點值相減取絕對值，最后把整張圖片的差值都相加起來再乘以delta。

local_adversarial_loss
為了讓refiner能夠?qū)W習到真實圖片的特征而不是一些artifacts來欺騙判別器，我們認為我們從refined的圖片中sample出來的patch, 應(yīng)該是和真實圖片的patch的statistics是相似的。所以我們在所有的local patches上定義判別器而不是學習一個全局的判別器。

def local_adversarial_loss(y_true, y_pred):# y_true and y_pred have shape (batch_size, # of local patches, 2), but really we just want to average over# the local patches and batch size so we can reshape to (batch_size * # of local patches, 2)y_true = tf.reshape(y_true, (-1, 2))y_pred = tf.reshape(y_pred, (-1, 2))loss = tf.nn.softmax_cross_entropy_with_logits(labels=y_true, logits=y_pred)return tf.reduce_mean(loss)

合起來如下:

# Refiner synthetic_image_tensor = layers.Input(shape=(HEIGHT, WIDTH, 1)) #合成的圖片 refined_image_tensor = refiner_network(synthetic_image_tensor) refiner_model = models.Model(input=synthetic_image_tensor, output=refined_image_tensor, name='refiner') # Discriminator refined_or_real_image_tensor = layers.Input(shape=(HEIGHT, WIDTH, 1)) #真實的圖片 discriminator_output = discriminator_network(refined_or_real_image_tensor) discriminator_model = models.Model(input=refined_or_real_image_tensor, output=discriminator_output,name='discriminator')# Combined refiner_model_output = refiner_model(synthetic_image_tensor) combined_output = discriminator_model(refiner_model_output) combined_model = models.Model(input=synthetic_image_tensor, output=[refiner_model_output, combined_output],name='combined')def self_regularization_loss(y_true, y_pred):delta = 0.0001 # FIXME: need to figure out an appropriate value for thisreturn tf.multiply(delta, tf.reduce_sum(tf.abs(y_pred - y_true)))# define custom local adversarial loss (softmax for each image section) for the discriminator # the adversarial loss function is the sum of the cross-entropy losses over the local patches def local_adversarial_loss(y_true, y_pred):# y_true and y_pred have shape (batch_size, # of local patches, 2), but really we just want to average over# the local patches and batch size so we can reshape to (batch_size * # of local patches, 2)y_true = tf.reshape(y_true, (-1, 2))y_pred = tf.reshape(y_pred, (-1, 2))loss = tf.nn.softmax_cross_entropy_with_logits(labels=y_true, logits=y_pred)return tf.reduce_mean(loss)# compile models BATCH_SIZE = 512 sgd = optimizers.RMSprop()refiner_model.compile(optimizer=sgd, loss=self_regularization_loss) discriminator_model.compile(optimizer=sgd, loss=local_adversarial_loss) discriminator_model.trainable = False combined_model.compile(optimizer=sgd, loss=[self_regularization_loss, local_adversarial_loss])

預訓練

預訓練對于GAN來說并不是一定需要的，但是預訓練可以讓GAN收斂的更快一些。這里我們兩個模型都先預訓練。
對于真實樣本label標注為[1,0], 對于合成的圖片label為[0,1]。

# the target labels for the cross-entropy loss layer are 0 for every yj (real) and 1 for every xi (refined) # discriminator_model.output_shape = num of local patches y_real = np.array([[[1.0, 0.0]] * discriminator_model.output_shape[1]] * BATCH_SIZE) y_refined = np.array([[[0.0, 1.0]] * discriminator_model.output_shape[1]] * BATCH_SIZE) assert y_real.shape == (BATCH_SIZE, discriminator_model.output_shape[1], 2)

對于refiner, 我們根據(jù)self_regularization_loss來預訓練，也就是說對于refiner的輸入和輸出都是同一張圖(類似于auto-encoder)。

LOG_INTERVAL = 10 MODEL_DIR = "./model/" print('pre-training the refiner network...') gen_loss = np.zeros(shape=len(refiner_model.metrics_names))for i in range(100):synthetic_image_batch = get_image_batch(synth_generator())gen_loss = np.add(refiner_model.train_on_batch(synthetic_image_batch, synthetic_image_batch), gen_loss)# log every `log_interval` stepsif not i % LOG_INTERVAL:print('Refiner model self regularization loss: {}.'.format(gen_loss / LOG_INTERVAL))gen_loss = np.zeros(shape=len(refiner_model.metrics_names))refiner_model.save(os.path.join(MODEL_DIR, 'refiner_model_pre_trained.h5'))··

對于判別器，我們用一個batch的真實圖片來訓練，再用另一個batch的合成圖片來交替訓練。

from tqdm import tqdm print('pre-training the discriminator network...') disc_loss = np.zeros(shape=len(discriminator_model.metrics_names))for _ in tqdm(range(100)):real_image_batch = get_image_batch(real_generator)disc_loss = np.add(discriminator_model.train_on_batch(real_image_batch, y_real), disc_loss)synthetic_image_batch = get_image_batch(synth_generator())refined_image_batch = refiner_model.predict_on_batch(synthetic_image_batch)disc_loss = np.add(discriminator_model.train_on_batch(refined_image_batch, y_refined), disc_loss)discriminator_model.save(os.path.join(MODEL_DIR, 'discriminator_model_pre_trained.h5'))# hard-coded for now print('Discriminator model loss: {}.'.format(disc_loss / (100 * 2)))

訓練

這里有兩個點1)用refined的歷史圖片來更新判別器，2)訓練的整體流程
1）用refined的歷史圖片來更新判別器
對抗訓練的一個問題是判別器只關(guān)注最近的refined圖片，這會引起兩個問題-對抗訓練的分散和refiner網(wǎng)絡(luò)又引進了判別器早就忘掉的artifacts。因此通過用refined的歷史圖片作為一個buffer而不單單是當前的mini-batch來更新分類器。具體方法是，在每一輪分類器的訓練中，我們先從當前的batch中采樣b/2張圖片，然后從大小為B的buffer中采樣b/2張圖片，合在一起來更新判別器的參數(shù)。然后這一輪之后，用新生成的b/2張圖片來替換掉B中的b/2張圖片。

由于論文中沒有寫B(tài)的大小為多少，這里作者用了100*batch_size作為buffer的大小。

2）訓練流程
xi是合成的的圖片
yj是真實的圖片
T是步數(shù)(steps)
K_d是每個step，判別器更新的次數(shù)
K_g是每個step，生成網(wǎng)絡(luò)的更新次數(shù)(refiner的更新次數(shù))

這里要注意在判別器更新的每一輪，其中的合成的圖片的minibatch已經(jīng)用1)當中的采樣方式來替代了。

from image_history_buffer import ImageHistoryBufferk_d = 1 # number of discriminator updates per step k_g = 2 # number of generative network updates per step nb_steps = 1000# TODO: what is an appropriate size for the image history buffer? image_history_buffer = ImageHistoryBuffer((0, HEIGHT, WIDTH, 1), BATCH_SIZE * 100, BATCH_SIZE)combined_loss = np.zeros(shape=len(combined_model.metrics_names)) disc_loss_real = np.zeros(shape=len(discriminator_model.metrics_names)) disc_loss_refined = np.zeros(shape=len(discriminator_model.metrics_names))# see Algorithm 1 in https://arxiv.org/pdf/1612.07828v1.pdf for i in range(nb_steps):print('Step: {} of {}.'.format(i, nb_steps))# train the refinerfor _ in range(k_g * 2):# sample a mini-batch of synthetic imagessynthetic_image_batch = get_image_batch(synth_generator())# update θ by taking an SGD step on mini-batch loss LR(θ)combined_loss = np.add(combined_model.train_on_batch(synthetic_image_batch,[synthetic_image_batch, y_real]), combined_loss) #注意combine模型的local adversarial loss是要用y_real來對抗學習，從而迫使refiner去修改圖片來做到跟真實圖片很像for _ in range(k_d):# sample a mini-batch of synthetic and real imagessynthetic_image_batch = get_image_batch(synth_generator())real_image_batch = get_image_batch(real_generator)# refine the synthetic images w/ the current refinerrefined_image_batch = refiner_model.predict_on_batch(synthetic_image_batch)# use a history of refined imageshalf_batch_from_image_history = image_history_buffer.get_from_image_history_buffer()image_history_buffer.add_to_image_history_buffer(refined_image_batch)if len(half_batch_from_image_history):refined_image_batch[:batch_size // 2] = half_batch_from_image_history# update φ by taking an SGD step on mini-batch loss LD(φ)disc_loss_real = np.add(discriminator_model.train_on_batch(real_image_batch, y_real), disc_loss_real)disc_loss_refined = np.add(discriminator_model.train_on_batch(refined_image_batch, y_refined),disc_loss_refined)if not i % LOG_INTERVAL:# log loss summaryprint('Refiner model loss: {}.'.format(combined_loss / (LOG_INTERVAL * k_g * 2)))print('Discriminator model loss real: {}.'.format(disc_loss_real / (LOG_INTERVAL * k_d * 2)))print('Discriminator model loss refined: {}.'.format(disc_loss_refined / (LOG_INTERVAL * k_d * 2)))combined_loss = np.zeros(shape=len(combined_model.metrics_names))disc_loss_real = np.zeros(shape=len(discriminator_model.metrics_names))disc_loss_refined = np.zeros(shape=len(discriminator_model.metrics_names))# save model checkpointsmodel_checkpoint_base_name = os.path.join(MODEL_DIR, '{}_model_step_{}.h5')refiner_model.save(model_checkpoint_base_name.format('refiner', i))discriminator_model.save(model_checkpoint_base_name.format('discriminator', i))

SimGAN的結(jié)果

我們從合成圖片的生成器中拿一個batch的圖片，用訓練好的refiner去Predict一下，然后顯示其中的一張圖(我運行生成的圖片當中是一些點點的和作者的不太一樣，但是跟真實圖片更像，待補充):

synthetic_image_batch = get_image_batch(synth_generator()) arr = refiner_model.predict_on_batch(synthetic_image_batch) plt.imshow(arr[200, :, :, 0]) plt.show() plt.imshow(get_image_batch(real_generator)[2,:,:,0]) plt.show()

這里作者認為生成的圖片中字母的邊都模糊和有噪音的，不那么的平滑了。(我覺得和原始圖片比起來，在refine之前的圖片看起來和真實圖片也很像啊，唯一不同的應(yīng)該是當中那些若有若無的點啊，讀者可以在生成圖片的時候把噪音給去掉，再來refine圖片，看能不能生成字母邊是比較噪音的(noisy)，我這邊refine之后的圖片就是當中有一點一點的，圖片待補充)

開始運用到實際的驗證碼識別

那么有了可以很好的生成和要預測的圖片很像的refiner之后，我們就可以構(gòu)造我們的驗證碼分類模型了，這里作者用了多輸出的模型，就是給定一張圖片，有固定的輸出(這里是4，因為要預測4個字母)。

我們先用之前的合成圖片的生成器(gen_one)來構(gòu)造一個生成器，接著用refiner_model來預測一下作為這個generator的輸出圖片。由于分類模型的輸出要用categorical_crossentropy，所以我們需要把輸出的字母變成one-hot形式。

n_class = len(alphanumeric) def mnist_generator(batch_size=128):X = np.zeros((batch_size, HEIGHT, WIDTH, 1), dtype=np.uint8)y = [np.zeros((batch_size, n_class), dtype=np.uint8) for _ in range(4)] # 4 charswhile True:for i in range(batch_size):im, random_str = gen_one()X[i] = imfor j, ch in enumerate(random_str):y[j][i, :] = 0y[j][i, alphanumeric.find(ch)] = 1 # one_hot形式，讓當前字母的index為1yield refiner_model.predict(np.array(X)), ymg = mnist_generator().next()

建模

from keras.layers import *input_tensor = Input((HEIGHT, WIDTH, 1)) x = input_tensor x = Conv2D(32, kernel_size=(3, 3),activation='relu')(x) # 4個conv-max_polling for _ in range(4):x = Conv2D(128, (3, 3), activation='relu')(x)x = MaxPooling2D(pool_size=(2, 2))(x) x = Dropout(0.25)(x) x = Flatten()(x) x = Dense(128, activation='relu')(x) x = Dropout(0.5)(x) x = [Dense(n_class, activation='softmax', name='c%d'%(i+1))(x) for i in range(4)] # 4個輸出model = models.Model(inputs=input_tensor, outputs=x) model.compile(loss='categorical_crossentropy',optimizer='rmsprop',metrics=['accuracy'])from keras.callbacks import History history = History() # history call back現(xiàn)在已經(jīng)是每個模型在訓練的時候都會自帶的了，fit函數(shù)會返回，主要用于記錄事件，比如loss之類的 model.fit_generator(mnist_generator(), steps_per_epoch=1000, epochs=20, callbacks=[history])

測試模型

先看一下在合成圖片上的預測：

def decode(y):y = np.argmax(np.array(y), axis=2)[:,0]return ''.join([alphanumeric[x] for x in y])X, y = next(mnist_generator(1)) y_pred = model.predict(X) plt.title('real: %s\npred:%s'%(decode(y), decode(y_pred))) plt.imshow(X[0, :, :, 0], cmap='gray') plt.axis('off')

看一下對于要預測的圖片的預測：

X = next(real_generator) X = refiner_model.predict(X) # 不確定作者為什么要用refiner來predict，應(yīng)該是可以省去這一步的 # 事實證明是不可以的，后面會分析 y_pred = model.predict(X) plt.title('pred:%s'%(decode(y_pred))) plt.imshow(X[0,:,:,0], cmap='gray') plt.axis('off')

后續(xù)補充

將預測模型這里的圖片替換掉，改成實際操作時候生成的圖片
在訓練過程中可以發(fā)現(xiàn)判別器的loss下降的非常快，并且到后面很難讓refine的和real的loss都變高。有的時候運氣好的話也許可以。我在訓練的時候出現(xiàn)了兩種情況:
第一種情況:
合成前:

合成后:

可以看到合成之后的圖片中也是有一點一點的。拿這種圖片去做訓練，后面對真實圖片做預測的時候就可以直接丟進分類器訓練了。

第二種情況(作者notebook中展示的):
也就是前面寫到的情況。
類似于下面這樣，看起來refiner之后沒什么變化的感覺：

這個看起來并沒有感覺和真實圖片很像啊！！！
可是神奇的是，作者在預測真實的圖片的時候，他居然用refiner去predict真實的圖片！
真實的圖片之前是長這個樣子的:

refiner之后居然長成了這樣:

無語了呢！它居然把那些噪聲點給去掉了一大半........他這波反向的操作讓我很措手不及。于是他用refine之后的真實圖片丟到分類器去做預測.....效果居然還不錯.....

反正我已經(jīng)凌亂了呢..............................

不過如何讓模型能夠?qū)W到我們?nèi)四X做識別的過程是件非常重要的事情呢...這里如果你想用合成的圖片直接當作訓練集去訓練然后預測真實圖片，準確率應(yīng)該會非常低(我試了一下)，也就是說模型在學習的過程中還是沒有學習到字符的輪廓概念，但是我們又沒辦法控制教會它去學習怎么"識別"物體，應(yīng)該學習哪些特征，最近發(fā)布的論文(戳這里)大家可以去看看(我還沒有看...)。

未完待續(xù)

評估準確率

修改驗證碼生成器，改成其他任意的生成器

將模型用到更復雜的背景的驗證碼上，評估準確率

總結(jié)

以上是生活随笔為你收集整理的SimGAN-Captcha代码阅读与复现的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Deep learning：一(基础知识
下一篇： TensorFlow 使用例子-LSTM