BERT论文阅读(二): CG-BERT:Conditional Text Generation with BERT for Generalized Few-shot Intent Detection
目錄
The proposed method
Input Representation
The Encoder?
?The Decoder
?fine-tuning
discriminate a joint label space consisting of both existing intent which have enough labeled data and novel intents which only have a few examples for each class.
==> Conditional Text Generation with BERT
The proposed method
CG-BERT: adopts the CVAE(Condiional ?Variational?AutoEncoder) framework and incorporates BERT into both the encoder and the decoder.
采用條件變分自編碼器,并將BERT融入到encoder-decoder中
- the encoder: encodes the utterance x and its intent y together into a latent variable z and models the posterior distribution p(z|x,y), where y is the condition in the CVAE model.
編碼器:同時將話語x和意圖y編碼為一個潛變量z,并且模擬z的后驗概率分布p(z|x,y),y是CVAE模型中的條件。 ==> encoder模擬few-shot intent的數據概率分布
- the decoder: decodes z and the intent y together to reconstruct the input utterance x.
解碼器:同時解碼變量z和意圖y,以便重構輸入話語x ==> 利用masked attention的特性限制attend,以保持文本生成這種特定任務left-to-right的特性,保留其autoregressive特性!
- to generate new utterances for an novel intent y, we sample the latent variable z from a prior distribution p(z|y) and utilize the decoder to decode z and y??into new utterances.
為新意圖y生成新話語,我們通過從一個先驗分布p(z|y)采樣潛變量z,并且用解碼器解碼變量z和y to 新話語。
It's able to generate more utterances for the novel intent through sampling from the learned distribution.
通過從學到的概率分布中采樣,為新意圖生成更多的話語
Input Representation
input: intent + utterance text sentences (concatenated)
句子S1: CLS token + intent y + SEP token --> first intent sentence
句子S2: utterance x + SEP --> second utterance sentence
whole input: S1 + S2
CLS: as the representation for the whole input
variable z: encode the embeddings for [CLS] to the latent variable z
Text are tokenized into subword units by WordPiece
embedding: obtained for each token --> token embeddings, position embeddings, segment embeddings
a given token: constructed by summing these three embeddings and represented as?? with a total length of T tokens.
The Encoder?
models the distribution of diverse utterances for a given intent.
對給定intent,即few-shot intent,的不同話語分布進行建模
to obtain deep bidirectional context information <-- models the attention between the intent tokens and the utterance tokens
為獲得深度雙向上下文信息 <-- 利用意圖令牌和話語令牌之間的attention進行建模
the input representation:??
multiple self-attention heads:?
output of the previous layer??--> a triple of queries, keys and values
embeddings for the [CLS] token in the 6-th transformer block??--> sentence-level representation
sentence-level representation?? --> a latent variable z =?a latent vector z,?where prior distribution p(z|y) is a multivariate standard Gaussian distribution.
?u and??in the Gaussian distribution q(z|x,y) = N(u, ) --> to sample z
?The Decoder
?aims to reconstruct the input utterance x using the latent variable z and the intent y.
目的是用潛變量z和意圖y重構輸入話語x
residual connection from input representation H0 --> decoder H6'殘差連接z和H0
==> input of the decoder ?
left-to-right manner ==> 掩碼masked attention
the attention mask --> helps the transformer blocks fit into the conditional text generation task.?
attention掩碼 --> 幫助transformer塊適應有條件文本生成任務
not whole bidirectional attention to the input ==> instead a mask matrix to determine whether a pair of tokens can be attended to each other.
并不是全部雙向attention的輸入 ==> 而是用一個掩碼矩陣去決定一對令牌是否要相互關注
updated Attention:
?
?output of 12-th transformer block in decoder?,?is the embeddings for the latent variable z
To further increase the?impact of z and alleviate the vanishing latent variable?problem,
embeddings of z with all the tokens?,
Two fully-connected layers with?a layer normalization to get the final representation
to predict the next token at position t+1 <-- the embeddings in Hf at position t
?fine-tuning
in order to improve the performance in the few-shot intent?of model?learned from existing intents with enough labeled data.
reference: Cross-Lingual Natural Language Generation via Pre-training
總結
以上是生活随笔為你收集整理的BERT论文阅读(二): CG-BERT:Conditional Text Generation with BERT for Generalized Few-shot Intent Detection的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 深度学习数学基础(三): 激活函数、正则
- 下一篇: 指代消解论文阅读(一): END-TO-