文本训练集_训练文本中的不稳定性
文本訓(xùn)練集
介紹 (Introduction)
In text generation, conventionally, maximum likelihood estimation is used to train a model to generate a text one token at a time. Each generated token will be compared against the ground-truth data. If any token is different from the actual token, this information will be used to update the model. However, such training may cause the generation to be generic or repetitive.
通常,在文本生成中,最大似然估計(jì)用于訓(xùn)練模型以一次生成一個(gè)令牌的文本。 每個(gè)生成的令牌將與真實(shí)數(shù)據(jù)進(jìn)行比較。 如果任何令牌與實(shí)際令牌不同,則此信息將用于更新模型。 但是,這樣的訓(xùn)練可能導(dǎo)致生成是通用的或重復(fù)的。
Generative Adversarial Network (GAN) tackles this problem by introducing 2 models — generator and discriminator. The goal of the discriminator is to determine whether a sentence x is real or fake (fake refers to generated by models), whereas the generator attempts to produce a sentence that can fool the discriminator. These two models are competing against each other, which results in the improvement of both networks until the generator can produce a human-like sentence.
生成對(duì)抗網(wǎng)絡(luò)(GAN)通過(guò)引入兩種模型(生成器和鑒別器)解決了這個(gè)問(wèn)題。 判別器的目標(biāo)是確定句子x是真實(shí)的還是假的(偽造是指由模型生成),而生成器會(huì)嘗試生成可以使判別器蒙蔽的句子。 這兩個(gè)模型相互競(jìng)爭(zhēng),導(dǎo)致兩個(gè)網(wǎng)絡(luò)都得到改善,直到生成器可以產(chǎn)生類似于人的句子為止。
Although we may see some promising results with computer vision and text generation communities, getting hands-on this type of modeling is difficult.
盡管我們可能會(huì)在計(jì)算機(jī)視覺(jué)和文本生成社區(qū)中看到一些令人鼓舞的結(jié)果,但是很難進(jìn)行這種類型的建模。
GANS問(wèn)題 (Problem with GANS)
Mode Collapse (Lack of Diversity) This is a common problem with GAN training. Mode collapse occurs when the model does not care about the input random noise, and it keeps generating the same sentence regardless of the input. In this sense, the model is now trying to fool the discriminator, and finding a single point is sufficient enough to do so.
模式崩潰(缺乏多樣性)這是GAN訓(xùn)練中的常見(jiàn)問(wèn)題。 當(dāng)模型不關(guān)心輸入的隨機(jī)噪聲時(shí),就會(huì)發(fā)生模式崩潰,并且不管輸入如何,它都會(huì)不斷生成相同的句子。 從這個(gè)意義上講,模型現(xiàn)在正試圖欺騙鑒別器,找到一個(gè)單點(diǎn)足以做到這一點(diǎn)。
Unstable Training. The most important problem is to ensure that the generator and the discriminator are on par to each other. If either one outperforms each other, the whole training will become unstable, and no useful information will be learned. For example, when the generator’s loss is slowly reducing, that means the generator starts to find a way to fool the discriminator even though the generation is still immature. On the other hand, when discriminator is OP, there is no new information for the generator to learn. Every generation will be evaluated as fake; therefore, the generator will have to rely on changing the word randomly in searching for a sentence that may fool the D.
培訓(xùn)不穩(wěn)定。 最重要的問(wèn)題是確保生成器和鑒別器彼此相同。 如果任何一個(gè)人的表現(xiàn)都超過(guò)對(duì)方,則整個(gè)培訓(xùn)將變得不穩(wěn)定,并且將不會(huì)學(xué)習(xí)有用的信息。 例如,當(dāng)發(fā)電機(jī)的損耗逐漸降低時(shí),這意味著即使發(fā)電機(jī)還不成熟,發(fā)電機(jī)也開始尋找一種欺騙鑒別器的方法。 另一方面,當(dāng)判別器為OP時(shí),沒(méi)有新的信息可供生成器學(xué)習(xí)。 每一代都將被視為假貨; 因此,生成器將不得不依靠隨機(jī)更改單詞來(lái)搜索可能使D蒙蔽的句子。
Intuition is NOT Enough. Sometimes, your intended modeling is correct, but it may not work as you want it to be. It may require more than that to work. Frequently, you need to do hyperparameters tuning by tweaking learning rate, trying different loss functions, using batch norm, or trying different activation functions.
直覺(jué)還不夠 。 有時(shí),您預(yù)期的建模是正確的,但可能無(wú)法按您希望的那樣工作。 可能需要更多的工作。 通常,您需要通過(guò)調(diào)整學(xué)習(xí)率,嘗試使用不同的損失函數(shù),使用批處理規(guī)范或嘗試使用不同的激活函數(shù)來(lái)進(jìn)行超參數(shù)調(diào)整。
Lots of Training Time. Some work reported training up to 400 epochs. That is tremendous if we compare with Seq2Seq that might take only 50 epochs or so to get a well-structured generation. The reason that causes it to be slow is the exploration of the generation. G does not receive any explicit signal of which token is bad. Rather it receives for the whole generation. To able to produce a natural sentence, G needs to explore various combinations of words to reach there. How often do you think G can accidentally produce <eos> out of nowhere? If we use MLE, the signal is pretty clear that there should be <eos> and there are <pad> right after it.
大量的培訓(xùn)時(shí)間。 一些工作報(bào)告說(shuō)培訓(xùn)了多達(dá)400個(gè)紀(jì)元。 如果我們與Seq2Seq進(jìn)行比較,那可能只花費(fèi)50個(gè)紀(jì)元左右即可得到結(jié)構(gòu)良好的世代,這是巨大的。 導(dǎo)致它變慢的原因是一代人的探索。 G沒(méi)有收到任何明顯的信號(hào),指出哪個(gè)令牌不好。 相反,它為整個(gè)世代所接受。 為了產(chǎn)生自然的句子,G需要探索各種單詞組合以到達(dá)那里。 您認(rèn)為G多久會(huì)偶然地偶然產(chǎn)生<eos>? 如果我們使用MLE,則信號(hào)很清楚,應(yīng)該有<eos>,緊隨其后的是<pad>。
潛在解決方案 (Potential Solutions)
Many approaches have been attempted to handle this type of training.
已經(jīng)嘗試了許多方法來(lái)處理這種訓(xùn)練。
Use ADAM Optimizer. Some suggest using the ADAM for the generator and SGD for the discriminator. But most importantly, some paper starts to tweak the beta for the ADAM. betas=(0.5, 0.999)
使用ADAM優(yōu)化器 。 有些人建議使用ADAM作為生成器,使用SGD作為鑒別器。 但最重要的是,一些論文開始調(diào)整ADAM的beta版本。 beta =(0.5,0.999)
Wasserstein GAN. Some work reports using WGAN can stabilize the training greatly. From our experiments, however, WGAN can not even reach the quality of regular GAN. Perhaps we are missing something. (See? It’s quite difficult)
瓦瑟斯坦甘 。 使用WGAN的一些工作報(bào)告可以大大穩(wěn)定培訓(xùn)。 但是,根據(jù)我們的實(shí)驗(yàn),WGAN甚至無(wú)法達(dá)到常規(guī)GAN的質(zhì)量。 也許我們?nèi)鄙倭艘恍〇|西。 (看?這很困難)
GAN Variation. Some suggest trying KL-GAN, or VAE-GAN. These can make the models easier to train.
GAN變化 。 有些人建議嘗試KL-GAN或VAE-GAN。 這些可以使模型更容易訓(xùn)練。
Input Noise to the Discriminator. To make the discriminator’s learning on par with the generator, which in general have a harder time than the D, we input some noise along with the input as well as using dropout to make things easier.
鑒別器的輸入噪聲 。 為了使鑒別器的學(xué)習(xí)與生成器(通常比生成器困難)相提并論,我們?cè)谳斎氲耐瑫r(shí)輸入一些噪聲,并使用壓差使事情變得更容易。
DCGAN (Deep Convolutional GAN). This is only for computer vision tasks. However, this model is known to avoid unstable training. The key in this model is to not use ReLU, use BatchNorm, and use Strided Convolution.
DCGAN(深度卷積GAN) 。 這僅用于計(jì)算機(jī)視覺(jué)任務(wù)。 但是,已知該模型可以避免不穩(wěn)定的訓(xùn)練。 該模型的關(guān)鍵是不使用ReLU,使用BatchNorm和使用Strided Convolution。
Ensemble of Discriminator. Instead of having a single discriminator, multiple discriminators are trained with different batch, to capture different aspects of respect. Thus, the generator can not just fool a single D, but to be more generalized so that it can fool all of them. This is also related to Dropout GAN (many D and dropout some during training).
鑒別器合奏 。 代替單個(gè)鑒別器,而是用不同的批次訓(xùn)練多個(gè)鑒別器,以捕獲尊重的不同方面。 因此,生成器不僅可以欺騙單個(gè)D,還可以對(duì)其進(jìn)行更廣泛的概括以使其欺騙所有D。 這也與輟學(xué)GAN(許多D,并且在培訓(xùn)期間輟學(xué))有關(guān)。
Parameter Tuning. With the learning rate, dropout ratio, batch size, and so on. It is difficult to determine how much a model is better than another. Therefore, some would test on multiple parameters and see whichever works best. One bottleneck is there is no evaluation metric for GAN, which results in a lot of manual checks to determine the quality.
參數(shù)調(diào)整 。 具有學(xué)習(xí)率,輟學(xué)率,批量大小等。 很難確定一個(gè)模型比另一個(gè)模型好多少。 因此,有些人會(huì)在多個(gè)參數(shù)上進(jìn)行測(cè)試,然后看哪個(gè)效果最好。 一個(gè)瓶頸是沒(méi)有針對(duì)GAN的評(píng)估指標(biāo),這導(dǎo)致需要大量手動(dòng)檢查來(lái)確定質(zhì)量。
Scheduling G and D. Trying to learn G 5 times followed by D 1 times are reported to be useless in many work. If you want to try scheduling, do something more meaningful.
安排G和D。 據(jù)報(bào)告,嘗試學(xué)習(xí)G 5次然后學(xué)習(xí)D 1次在許多工作中是沒(méi)有用的。 如果您想嘗試安排時(shí)間,請(qǐng)做一些更有意義的事情。
train_G()while discriminator
train_D()
結(jié)論 (Conclusion)
Adversarial-based text generation opens a new avenue on how a model is trained. Instead of relying on MLE, discriminator(s) are used to signal whether or not the generation is correct. However, such training has its downside that it is quite hard to train. Many studies suggest some tips on how to avoid the problems are described above; however, you need to try a variety of settings (or parameters) to assure your generative model can learn properly.
基于對(duì)抗的文本生成為如何訓(xùn)練模型開辟了一條新途徑。 鑒別器不是依靠MLE,而是用于發(fā)信號(hào)通知生成是否正確。 但是,這種訓(xùn)練有其缺點(diǎn),那就是很難訓(xùn)練。 上面有許多研究提出了一些避免問(wèn)題的技巧。 但是,您需要嘗試各種設(shè)置(或參數(shù))以確保生成模型可以正確學(xué)習(xí)。
進(jìn)一步閱讀 (Further Reading)
翻譯自: https://towardsdatascience.com/instability-in-training-text-gan-20273d6a859a
文本訓(xùn)練集
總結(jié)
以上是生活随笔為你收集整理的文本训练集_训练文本中的不稳定性的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 做梦连续几天梦到同一个人
- 下一篇: 数据库数据过长避免_为什么要避免使用商业