self-supervised text erasing with controllable image synthesis
Scene Text Erasing綜述和自己的思考 - 知乎最近在看關(guān)于Scene Text Erasing的論文,先在這挖一個坑,論文細(xì)節(jié)以后慢慢整理。 1 介紹場景文字消除(Scene Text Erasing),顧名思義為消除自然場景的文字,輸入是一張完整的帶有文字的任意場景的圖片,輸出是…https://zhuanlan.zhihu.com/p/441303427
????????從文字擦除的角度上,一般輸入有原圖,去除了文字的gt圖和包含了文字位置的mask圖,至于為什么有些帶有原始文字位置mask作為額外的監(jiān)督信息是不是會更好幫助網(wǎng)絡(luò)定位文字,作為輔助信息,本文也為了這個維度信息。思路上有兩種方案,一種是端到端網(wǎng)絡(luò),將文字定位和文字擦除當(dāng)成一個整體來做,第二種是分兩步來做,將文字定位和文字擦除當(dāng)成上下游任務(wù),先解決定位文字,再將得到的文字位置信息作為先驗知識和圖片一起輸入到圖像修復(fù)中,這種想法非常普遍,ocr的文字檢測和識別,去水印等都是這個思路。SynthText,SCUT-EnsText是gt和mask的,EnsNetset是只有原圖和gt,沒有mask。其實文字擦除本質(zhì)還是生成任務(wù),現(xiàn)如今,這個思路上主要是圖像修復(fù),包括超分之類的,不一定要通過生成式的GAN去做,擴(kuò)散也可以做,前期的思路,通過ae或者encoder-decoder架構(gòu),用l1損失就可以很好的建模,只不過伴隨著gan的發(fā)展可控,gan思路就多起來。
1.Introduction
自監(jiān)督在文字擦除或者圖像修復(fù)或者去水印上的應(yīng)用是廣泛的,以Synthesis為例,輸入原圖之后,在原圖上添加水印或者其他的東西,以這張圖為輸入,原圖為gt,通過數(shù)據(jù)合成,來達(dá)到去除的目的,這中做法是非常普遍的,作者說不夠好,提出了下面的對比:
上面這張圖的尚不就是synthText的方式,從均勻分布中采樣,一般就是random.randit的這種方式,而本文提出的STE在style分布上采樣更貼合原始數(shù)據(jù),效果更好,本質(zhì)上講,生成數(shù)據(jù)和原始數(shù)據(jù)是同分布的,模型擬合這種分布學(xué)到的是更本質(zhì)的知識。
? ? ? ? 提出了self-supervised Text Erasing(STE) framework。包括了兩部分,image synthesis和text erasing。在圖像合成中,除了之前的生成方法,還利用了mser提取區(qū)域近似真實圖,考慮到合成文本和真實文本之間文本風(fēng)格差距,構(gòu)建了policy network,reward是根據(jù)所選風(fēng)格的現(xiàn)實性和難度來計算。在擦除模塊,是一個從粗到細(xì)的生成模型來擦除文本并用適當(dāng)?shù)募y理填充缺失的像素,并提出了對比學(xué)習(xí)的一個三元組損失。
2.self-supervised text erasing
????????這篇論文的核心在看懂這張圖,首先STE是一個自監(jiān)督的框架,它的輸入只有一張圖,沒有g(shù)t,整體是一個GAN的架構(gòu),輸入I,生成器架構(gòu)包括了合成函數(shù)F和一個Text Erasing Module,在經(jīng)過了合成函數(shù)F之后產(chǎn)生了合成圖和text mask,此處的text mask是合成時添加到I上的,即Isyn,此圖輸入Text Erasing Module,輸出的是Ipred,Ipred在形式上應(yīng)當(dāng)與I相同,事實上輸出的Ipred即為fake_img,和real_img構(gòu)成了判別器的輸入。Text Erasing Module是一個文字擦除模塊EraseNet,這個是之前就有的模型,如果有原圖及其對應(yīng)的gt圖,取消了文字的圖,可以直接用來訓(xùn)練,STE取了巧,把它包進(jìn)了GAN這個框架里,用它做生成器,輸入合成圖,輸出是沒有文字的圖,這個過程有一個coarse model和一個refine model組成,這里是一個生成架構(gòu),但是通過什么信號來監(jiān)督呢?refine model輸出圖和F添加上的text mask做乘積得到輸出圖fake_img,fake_img是包括原始的text的,refine model還輸出了一個預(yù)測的text mask,這個預(yù)測的text mask將來會在判別器里起作用,就是說整個GAN是監(jiān)督原始圖生成的真不真,通過F函數(shù)是添加了text上去的,這個text的去除是在生成器中完成,并且生成器其實是祛除了所有的text的,我們在構(gòu)造loss的時候給重新加回來了,Erasenet就是去text用,真實場景前向推理時只用這個模塊。
2.1 Overview
????????在STE中,I和Isyn是一對數(shù)據(jù),利用合成數(shù)據(jù)Isyn來訓(xùn)練生成模型G。為了將合成文本與原始文本對齊,使用策略網(wǎng)絡(luò)A為F選擇合適的樣式s,通過反饋進(jìn)行優(yōu)化,包括文字難度Rdiff和風(fēng)格現(xiàn)實獎勵Rreal,Msyn表示合成區(qū)域的二進(jìn)制掩碼,損失計算的結(jié)果Ipred,由Ir和Isyn以Msyn為條件合成。
2.2 style-aware synthesis function
customization mechanism:文本樣式被分解為多個單獨的單元,通過選擇每個單元中的操作,確定樣式參數(shù)。
replication mechanism:它旨在通過復(fù)制目標(biāo)分布中的原始文本來合成樣本。
2.3 controllable synthesis module
2.3.1 search space
合成函數(shù)F提供了多種樣式,所有這些構(gòu)成了策略網(wǎng)絡(luò)的搜索空間。
2.3.2 style optimization via reindorce.
策略網(wǎng)絡(luò)的目標(biāo)是在大搜索空間中為每個圖像找到合適的合成樣式。
2.3.3 reward setting
style realistic reward:為了捕獲目標(biāo)分布,實現(xiàn)了一個文本鑒別器Dtext來指導(dǎo)數(shù)據(jù)合成,具體來說,構(gòu)建Dtext是為了預(yù)測合成圖像Isyn中文本區(qū)域,該模塊將生成器G的特征圖g(Isyn)作為輸入。
text difficult reward:
2.4 erasing module with triplet erasure loss
計算損失,只關(guān)注合成文本區(qū)域。
?損失函數(shù):
基礎(chǔ)版本:沒有對比學(xué)習(xí),風(fēng)格控制和強(qiáng)化學(xué)習(xí)模塊,整體是一個GAN,GAN的判別器的真假樣本是輸入圖,就是帶有原始的text的圖,只不過在做判別器時融合了text_mask信息。
dataset,dataset_size=init_dataset-> CreateDataLoader-> CustomDatasetDataLoader.initialize()-> data_loader=CreateDataset-> dataset=ItemsDataset()-> dataset.initialize()-> imageFiles/infos->dataloader=torch.utils.data.DataLoader(dataset)-> dataset=data_loader.load()(dataloader)->train/valid/test-> create_model()(pix2pix/disc/gateconv/erase)-> model=EraseModel-> model.initialize()-> BaseModel.initialize-> set_param-> self.adaptive_feature_norm=AdaptiveFeatureNorm(0.1)-> netG=STRnet2(3)-> netD=Discriminator_STE(3)-> LossWithGAN_STE(vggfeatureextractor,netD)-> dis=torch.nn.L1loss-> self.optimizer_G=torch.optim.Adam(netG.parameters())-> netG.train()切換mode-> model.setup-> Visualizer-> model.update_learning_rate()-> ItemsDataset.__getitem__-> load_item()-> gt/info->gen_config=random_gen_config-> space_config是一個配置區(qū)間-> img=generate_img_with_config(gt,info,gen_config)-> generate_img(image,info,config)其實這就是synthesis function F-> raw_mask=gen_raw_mask text的mask->mask=get_mask()-> transformers_param=get_params()獲得一些數(shù)據(jù)預(yù)處理的參數(shù)-> trans=get_transform-> img=trans(img):512,768/mask=trans(mask)/gt=trans(gt)/raw_mask=trans(raw_mask)-> img/gt/mask/raw_mask=input_transform(img/gt/mask/raw_mask):3,768,512-> data=next(dataset_iter):gt/img/mask/raw_mask:2,3,768,512-> model.set_input(data)-> model.optimize_parameters()-> STRnet2.forward(img:2,3,768,512)-> x_o0:2,512,24,16 x_o1:2,64,192,128,x_o2:2,32,384,256,x_o3:2,3,768,512,fake_B:2,3,768,512,gen_mask:2,3,768,512,x_mask:2,256,48,32-> comp_B=fake_B*(1-mask)+real_A*mask:2,3,768,512-> mask_sigmoid-> comp_G=self.fake_B*(1-mask)+real_A*mask-> comp_all=fake_B*(1-mask)+real_A*mask*raw_mask+fake_B*(1-raw_mask)-> lossWithGAN_STE.forward(real_A輸入圖,mask/mask_gt:后續(xù)加上的mask,fake_B/output輸出的假圖,這個假圖是什么文字也沒有的圖,gen_mask/mm:這個是預(yù)測的加上去的text?mask圖,gt:和輸入圖一樣的,raw_mask:原始的text?mask的圖)-> D_real=discriminator(gt:2,3,768,512,masks:2,3,768,512)-> Discriminator_STE.forward()-> concat_feat:2,512,12,8->D_real:2,45-> D_fake=discriminator(output,mask)-> gan_mode:vanilla-> D_loss=D_real+D_fake->D_optimizer.zero_grad()-> D_loss.backward()->D_optimizer.step()-> G_fake=discriminator(output,mask)-> output_comp=mask*input+(1-mask)*output->holeLoss=10*l1((1-mask)*output,(1-mask)*gt)-> mask_loss=dice_loss(mm,1-mask_gt*raw_mask,mask_sigmoid)-> validAreaLoss-> mask:2,3,768,512;masks_a:2,3,192,128;masks_b:2,3,384,256;gt:2,3,768,512;imgs1:2,3,192,128;imgs2:2,3,384,256;raw_mask:2,3,768,512;raw_masks_a:2,3,192,128;raw_masks_b:2,3,384,192-> msr_loss-> extractor:VGG16FeatureExtractor-> feat_output_comp=extractor(output_comp)[2,64,384,256;2,128,192,128;2,256,96,64]-> feat_output=extractor(output)-> feat_gt=extractor(gt)-> prcLoss=0.01*l1(feat_output,feat_gt)-> styleLoss=120*l1(gram_matrix(feat_output),gram_matrix(feat_gt))-> GLoss=msrloss+holeLoss+validAreaLoss+prcLoss+styleLoss+G_fake+1*mask_loss-> optimizer_G.zero_grad()-> G_loss.backward()-> optimizer_G.step()dataset,dataset_size=init_dataset->CreateDataLoader->CustomDatasetDataLoader.initialize()->data_loader=CreateDataset->dataset=ItemsDataset()->dataset.initialize()->imageFiles/infos->dataloader=torch.utils.data.DataLoader(dataset)->dataset=data_loader.load()(dataloader)->train/valid/test->create_model()(pix2pix/disc/gateconv/erase)->model=EraseModel->model.initialize()->BaseModel.initialize->set_param->self.adaptive_feature_norm=AdaptiveFeatureNorm(0.1)->netG=STRnet2(3)->netD=Discriminator_STE(3)->LossWithGAN_STE(vggfeatureextractor,netD)->dis=torch.nn.L1loss->self.optimizer_G=torch.optim.Adam(netG.parameters())->netG.train()切換mode->model.setup->Visualizer->model.update_learning_rate()->ItemsDataset.__getitem__->load_item()->gt/info->gen_config=random_gen_config->space_config是一個配置區(qū)間->img=generate_img_with_config(gt,info,gen_config)->generate_img(image,info,config)其實這就是synthesis function F->raw_mask=gen_raw_mask text的mask->mask=get_mask()->transformers_param=get_params()獲得一些數(shù)據(jù)預(yù)處理的參數(shù)->trans=get_transform->img=trans(img):512,768/mask=trans(mask)/gt=trans(gt)/raw_mask=trans(raw_mask)->img/gt/mask/raw_mask=input_transform(img/gt/mask/raw_mask):3,768,512->data=next(dataset_iter):gt/img/mask/raw_mask:2,3,768,512->model.set_input(data)->model.optimize_parameters()->STRnet2.forward(img:2,3,768,512)->x_o0:2,512,24,16 x_o1:2,64,192,128,x_o2:2,32,384,256,x_o3:2,3,768,512,fake_B:2,3,768,512,gen_mask:2,3,768,512,x_mask:2,256,48,32->comp_B=fake_B*(1-mask)+real_A*mask:2,3,768,512->mask_sigmoid->comp_G=self.fake_B*(1-mask)+real_A*mask->comp_all=fake_B*(1-mask)+real_A*mask*raw_mask+fake_B*(1-raw_mask)->lossWithGAN_STE.forward(real_A輸入圖,mask/mask_gt:后續(xù)加上的mask,fake_B/output輸出的假圖,這個假圖是什么文字也沒有的圖,gen_mask/mm:這個是預(yù)測的加上去的text?mask圖,gt:和輸入圖一樣的,raw_mask:原始的text?mask的圖)->D_real=discriminator(gt:2,3,768,512,masks:2,3,768,512)->Discriminator_STE.forward()->concat_feat:2,512,12,8->D_real:2,45->D_fake=discriminator(output,mask)->gan_mode:vanilla->D_loss=D_real+D_fake->D_optimizer.zero_grad()->D_loss.backward()->D_optimizer.step()->G_fake=discriminator(output,mask)->output_comp=mask*input+(1-mask)*output->holeLoss=10*l1((1-mask)*output,(1-mask)*gt)->mask_loss=dice_loss(mm,1-mask_gt*raw_mask,mask_sigmoid)->validAreaLoss->mask:2,3,768,512;masks_a:2,3,192,128;masks_b:2,3,384,256;gt:2,3,768,512;imgs1:2,3,192,128;imgs2:2,3,384,256;raw_mask:2,3,768,512;raw_masks_a:2,3,192,128;raw_masks_b:2,3,384,192->msr_loss->extractor:VGG16FeatureExtractor->feat_output_comp=extractor(output_comp)[2,64,384,256;2,128,192,128;2,256,96,64]->feat_output=extractor(output)->feat_gt=extractor(gt)->prcLoss=0.01*l1(feat_output,feat_gt)->styleLoss=120*l1(gram_matrix(feat_output),gram_matrix(feat_gt))->GLoss=msrloss+holeLoss+validAreaLoss+prcLoss+styleLoss+G_fake+1*mask_loss->optimizer_G.zero_grad()->G_loss.backward()->optimizer_G.step()
EraseModel->STRnet(2):
STRnet2((conv1): ConvWithActivation((conv2d): Conv2d(3, 32, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(conva): ConvWithActivation((conv2d): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(convb): ConvWithActivation((conv2d): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(res1): Residual((conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(batch_norm2d): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))(res2): Residual((conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(batch_norm2d): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))(res3): Residual((conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))(conv2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(conv3): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2))(batch_norm2d): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))(res4): Residual((conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(batch_norm2d): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))(res5): Residual((conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))(conv2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(conv3): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2))(batch_norm2d): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))(res6): Residual((conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(batch_norm2d): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))(res7): Residual((conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))(conv2): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(conv3): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2))(batch_norm2d): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))(res8): Residual((conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(batch_norm2d): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))(conv2): ConvWithActivation((conv2d): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(deconv1): DeConvWithActivation((conv2d): ConvTranspose2d(512, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(deconv2): DeConvWithActivation((conv2d): ConvTranspose2d(512, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(deconv3): DeConvWithActivation((conv2d): ConvTranspose2d(256, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(deconv4): DeConvWithActivation((conv2d): ConvTranspose2d(128, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(deconv5): DeConvWithActivation((conv2d): ConvTranspose2d(64, 3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(lateral_connection1): Sequential((0): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))(1): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)))(lateral_connection2): Sequential((0): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))(1): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(3): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1)))(lateral_connection3): Sequential((0): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))(1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(3): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1)))(lateral_connection4): Sequential((0): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1))(1): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(3): Conv2d(64, 32, kernel_size=(1, 1), stride=(1, 1)))(conv_o1): Conv2d(64, 3, kernel_size=(1, 1), stride=(1, 1))(conv_o2): Conv2d(32, 3, kernel_size=(1, 1), stride=(1, 1))(mask_deconv_a): DeConvWithActivation((conv2d): ConvTranspose2d(512, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(mask_conv_a): ConvWithActivation((conv2d): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(mask_deconv_b): DeConvWithActivation((conv2d): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(mask_conv_b): ConvWithActivation((conv2d): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(mask_deconv_c): DeConvWithActivation((conv2d): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(mask_conv_c): ConvWithActivation((conv2d): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(mask_deconv_d): DeConvWithActivation((conv2d): ConvTranspose2d(64, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(mask_conv_d): Conv2d(32, 3, kernel_size=(1, 1), stride=(1, 1))(coarse_conva): ConvWithActivation((conv2d): Conv2d(3, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(coarse_convb): ConvWithActivation((conv2d): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(coarse_convc): ConvWithActivation((conv2d): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(coarse_convd): ConvWithActivation((conv2d): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(coarse_conve): ConvWithActivation((conv2d): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(coarse_convf): ConvWithActivation((conv2d): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(astrous_net): Sequential((0): ConvWithActivation((conv2d): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(1): ConvWithActivation((conv2d): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(4, 4), dilation=(4, 4))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(2): ConvWithActivation((conv2d): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(8, 8), dilation=(8, 8))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(3): ConvWithActivation((conv2d): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(16, 16), dilation=(16, 16))(activation): LeakyReLU(negative_slope=0.2, inplace=True)))(coarse_convk): ConvWithActivation((conv2d): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(coarse_convl): ConvWithActivation((conv2d): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(coarse_deconva): DeConvWithActivation((conv2d): ConvTranspose2d(384, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(coarse_convm): ConvWithActivation((conv2d): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(coarse_deconvb): DeConvWithActivation((conv2d): ConvTranspose2d(192, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(coarse_convn): Sequential((0): ConvWithActivation((conv2d): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(1): ConvWithActivation((conv2d): Conv2d(16, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))))(c1): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))(c2): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1)) )EraseModel:Discriminator_STE
Discriminator_STE((globalDis): Sequential((0): ConvWithActivation((conv2d): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(1): ConvWithActivation((conv2d): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(2): ConvWithActivation((conv2d): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(3): ConvWithActivation((conv2d): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(4): ConvWithActivation((conv2d): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(5): ConvWithActivation((conv2d): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True)))(localDis): Sequential((0): ConvWithActivation((conv2d): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(1): ConvWithActivation((conv2d): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(2): ConvWithActivation((conv2d): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(3): ConvWithActivation((conv2d): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(4): ConvWithActivation((conv2d): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True))(5): ConvWithActivation((conv2d): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))(activation): LeakyReLU(negative_slope=0.2, inplace=True)))(fusion): Sequential((0): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1))) )總結(jié)
以上是生活随笔為你收集整理的self-supervised text erasing with controllable image synthesis的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 数据可视化(三)基于 Graphviz
- 下一篇: 赶紧学会!开发者愚人节怎么写代码。。。