當(dāng)前位置：首頁(yè) > 人文社科 > 生活经验 >内容正文

生活经验

基于PyTorch的Seq2Seq翻译模型详细注释介绍（一）

發(fā)布時(shí)間：2023/11/28 生活经验 28 豆豆

生活随笔收集整理的這篇文章主要介紹了基于PyTorch的Seq2Seq翻译模型详细注释介绍（一）小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

版權(quán)聲明：本文為博主原創(chuàng)文章，遵循 CC 4.0 BY-SA 版權(quán)協(xié)議，轉(zhuǎn)載請(qǐng)附上原文出處鏈接和本聲明。
本文鏈接：https://blog.csdn.net/qysh123/article/details/91245246
Seq2Seq是目前主流的深度學(xué)習(xí)翻譯模型，在自然語(yǔ)言翻譯，甚至跨模態(tài)知識(shí)映射方面都有不錯(cuò)的效果。在軟件工程方面，近年來(lái)也得到了廣泛的應(yīng)用，例如：

Jiang, Siyuan, Ameer Armaly, and Collin McMillan. "Automatically generating commit messages from diffs using neural machine translation." In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, pp. 135-146. IEEE Press, 2017.

Hu, Xing, Ge Li, Xin Xia, David Lo, and Zhi Jin. "Deep code comment generation." In Proceedings of the 26th Conference on Program Comprehension, pp. 200-210. ACM, 2018.

這里我結(jié)合PyTorch給出的Seq2Seq的示例代碼來(lái)簡(jiǎn)單總結(jié)一下這個(gè)模型實(shí)現(xiàn)時(shí)的細(xì)節(jié)以及PyTorch對(duì)應(yīng)的API。PyTorch在其官網(wǎng)上有Tutorial：https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html，其對(duì)應(yīng)的GitHub鏈接是：https://github.com/pytorch/tutorials/blob/master/intermediate_source/seq2seq_translation_tutorial.py。這里就以這段代碼為例來(lái)進(jìn)行總結(jié)：

在上面那個(gè)官網(wǎng)的鏈接中給出了對(duì)應(yīng)數(shù)據(jù)的下載鏈接：https://download.pytorch.org/tutorial/data.zip，另外，其實(shí)網(wǎng)上很多教程也都是翻譯上面這個(gè)官方教程的，我也參考了一些，主要包括：

https://www.cnblogs.com/HolyShine/p/9850822.html

https://www.cnblogs.com/www-caiyin-com/p/10123346.html

http://www.pianshen.com/article/5376154542/

所以大家可以以這些教程為基礎(chǔ)，我也只是在它們的基礎(chǔ)上進(jìn)行一些補(bǔ)充和解釋，所以并不會(huì)像上面教程一樣給出完整的解釋，只是總結(jié)一些我覺(jué)得重要的內(nèi)容。首先，初始化編碼這些就不總結(jié)了，大家看看現(xiàn)有的教程就理解。從Encoder開(kāi)始總結(jié)：

class EncoderRNN(nn.Module):
def __init__(self, input_size, hidden_size):
super(EncoderRNN, self).__init__()#對(duì)繼承自父類(lèi)的屬性進(jìn)行初始化。
self.hidden_size = hidden_size

self.embedding = nn.Embedding(input_size, hidden_size)#對(duì)輸入做初始化Embedding。
self.gru = nn.GRU(hidden_size, hidden_size)#Applies a multilayer gated recurrent unit (GRU) RNN to an input sequence.

def forward(self, input, hidden):
embedded = self.embedding(input).view(1, 1, -1)#view實(shí)際上是對(duì)現(xiàn)有tensor改造的方法。
output = embedded
output, hidden = self.gru(output, hidden)
return output, hidden

def initHidden(self):
return torch.zeros(1, 1, self.hidden_size, device=device)#初始化，生成(1,1,256)維的全零Tensor。
雖然只有短短幾行，可還是有些需要討論的內(nèi)容：nn.Embedding是進(jìn)行初始embedding，當(dāng)然，這種embedding是完全隨機(jī)的，并不通過(guò)訓(xùn)練或具有實(shí)際意義，我覺(jué)得網(wǎng)上有些文章連這一點(diǎn)都沒(méi)搞清楚（例如這里的解釋就是錯(cuò)誤的：https://my.oschina.net/earnp/blog/1113896），具體可以參看這里的討論：https://blog.csdn.net/qq_36097393/article/details/88567942。其參數(shù)含義可以參考這個(gè)解釋：nn.Embedding(2, 5)，這里的2表示有2個(gè)詞，5表示維度為5，其實(shí)也就是一個(gè)2x5的矩陣，所以如果你有1000個(gè)詞，每個(gè)詞希望是100維，你就可以這樣建立一個(gè)word embedding，nn.Embedding(1000, 100)。也可以運(yùn)行下面我總結(jié)示例代碼：

import torch
import torch.nn as nn

word_to_ix={'hello':0, 'world':1}
embeds=nn.Embedding(2,5)
hello_idx=torch.LongTensor([word_to_ix['hello']])
world_idx=torch.LongTensor([word_to_ix['world']])
hello_embed=embeds(hello_idx)
print(hello_embed)
world_embed=embeds(world_idx)
print(world_embed)
具體含義相信大家一看便知，可以試著跑一下（每次print的結(jié)果不相同，并且也沒(méi)啥實(shí)際含義）。

另外就是.view(1, 1, -1)的含義，說(shuō)實(shí)話(huà)我也沒(méi)搞清楚過(guò)，其實(shí)在stackoverflow上已經(jīng)有人討論了這個(gè)問(wèn)題：

https://stackoverflow.com/questions/42479902/how-does-the-view-method-work-in-pytorch

大家看看就知，我這里也把上面別人給出的例子提供一下：

import torch
a = torch.range(1, 16)
print(a)
a = a.view(4, 4)
print(a)
Encoder就簡(jiǎn)單總結(jié)這些。下面直接進(jìn)入到帶注意力機(jī)制的解碼器的總結(jié)（為了幫助理解，下面增加了一些注釋，說(shuō)明每一步Tensor的緯度，我個(gè)人覺(jué)得還是能夠便于理解的）：

class AttnDecoderRNN(nn.Module):
def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):#MAX_LENGTH在翻譯任務(wù)中定義為10
super(AttnDecoderRNN, self).__init__()
self.hidden_size = hidden_size
self.output_size = output_size#這里的output_size是output_lang.n_words
self.dropout_p = dropout_p#dropout的比例。
self.max_length = max_length

self.embedding = nn.Embedding(self.output_size, self.hidden_size)
self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)#按照維度要求，進(jìn)行線性變換。
self.dropout = nn.Dropout(self.dropout_p)
self.gru = nn.GRU(self.hidden_size, self.hidden_size)
self.out = nn.Linear(self.hidden_size, self.output_size)

def forward(self, input, hidden, encoder_outputs):

print(input)
print('size of input: '+str(input.size()))
print('size of self.embedding(input): '+str(self.embedding(input).size()))

embedded = self.embedding(input).view(1, 1, -1)
print('size of embedded: '+str(embedded.size()))

embedded = self.dropout(embedded)
print('size of embedded[0]: '+str(embedded[0].size()))
print('size of torch.cat((embedded[0], hidden[0]), 1): '+str(torch.cat((embedded[0], hidden[0]), 1).size()))
print('size of self.attn(torch.cat((embedded[0], hidden[0]), 1)): '+str(self.attn(torch.cat((embedded[0], hidden[0]), 1)).size()))

#Size of embedded: [1,1,256]
#Size of embedded[0]: [1,256]
#Size of size of torch.cat((embedded[0], hidden[0]), 1): [1,512]

# 此處相當(dāng)于學(xué)出來(lái)了attention的權(quán)重
# 需要注意的是torch的concatenate函數(shù)是torch.cat，是在已有的維度上拼接，按照代碼中的寫(xiě)法，就是在第二個(gè)緯度上拼接。
# 而stack是建立一個(gè)新的維度，然后再在該緯度上進(jìn)行拼接。
attn_weights = F.softmax(
self.attn(torch.cat((embedded[0], hidden[0]), 1)), dim=1)#這里的F.softmax表示的是torch.nn.functional.softmax

#Size of attn_weights: [1,10]
#Size of attn_weights.unsqueeze(0): [1,1,10]
#Size of encoder_outputs: [10,256]
#Size of encoder_outputs.unsqueeze(0): [1,10,256]

#unsqueeze的解釋是Returns a new tensor with a dimension of size one inserted at the specified position.
attn_applied = torch.bmm(attn_weights.unsqueeze(0),
encoder_outputs.unsqueeze(0))#bmm本質(zhì)上來(lái)講是個(gè)批量的矩陣乘操作。

#Size of attn_applied: [1,1,256]
output = torch.cat((embedded[0], attn_applied[0]), 1)
#Size of output here is: [1,512]
print('size of output (at this location): '+str(output.size()))
output = self.attn_combine(output).unsqueeze(0)
#Size of output here is: [1,1,256]
#print(output)
output = F.relu(output)#rectified linear unit function element-wise:
#print(output)
output, hidden = self.gru(output, hidden)
output = F.log_softmax(self.out(output[0]), dim=1)
print('')
print('------------')
return output, hidden, attn_weights

def initHidden(self):
return torch.zeros(1, 1, self.hidden_size, device=device)
首先是dropout，關(guān)于dropout可以首先參考一下PyTorch的官方解釋：

https://pytorch.org/docs/stable/nn.html?highlight=nn%20dropout#torch.nn.Dropout

簡(jiǎn)單來(lái)說(shuō)，就是During training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution，有朋友給出了很詳細(xì)的討論和解釋：

https://blog.csdn.net/stdcoutzyx/article/details/49022443

其次應(yīng)該注意一下nn.Linear的含義和作用，還是給出官網(wǎng)的解釋：Applies a linear transformation to the incoming data，類(lèi)似地，可以參考一下我下面給出的示例代碼：

import torch
import torch.nn as nn
m = nn.Linear(2, 3)
input = torch.randn(2, 2)
print(input)
output = m(input)
print(output)
接下來(lái)解釋一下torch.bmm。按照PyTorch官網(wǎng)的解釋，https://pytorch.org/docs/stable/torch.html?highlight=torch%20bmm#torch.bmm

torch.bmm起的作用是：Performs a batch matrix-matrix product of matrices stored in batch1 and batch2，這樣的解釋還是太抽象，其實(shí)通過(guò)一個(gè)例子就很好懂了，實(shí)際就是一個(gè)批量矩陣乘法：

import torch
batch1=torch.randn(2,3,4)
print(batch1)
batch2=torch.randn(2,4,5)
print(batch2)
res=torch.bmm(batch1,batch2)
print(res)
具體的乘法規(guī)則是：If batch1 is a (b×n×m) tensor, batch2 is a (b×m×p) tensor, out will be a (b×n×p) tensor.

關(guān)于torch.cat，還是以PyTorch官網(wǎng)給出的例子做一個(gè)簡(jiǎn)單說(shuō)明：

Concatenates the given sequence of seq tensors in the given dimension. 例子如下：

import torch
x=torch.randn(2,3)
print(x)
print(torch.cat((x, x, x), 0))
print(torch.cat((x, x, x), 1))
這里就先總結(jié)到這里，會(huì)在下一篇博客中繼續(xù)總結(jié)。
————————————————
版權(quán)聲明：本文為CSDN博主「蛐蛐蛐」的原創(chuàng)文章，遵循 CC 4.0 BY-SA 版權(quán)協(xié)議，轉(zhuǎn)載請(qǐng)附上原文出處鏈接及本聲明。
原文鏈接：https://blog.csdn.net/qysh123/article/details/91245246

總結(jié)

以上是生活随笔為你收集整理的基于PyTorch的Seq2Seq翻译模型详细注释介绍（一）的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： python if elif else
下一篇：命名实体识别训练集汇总（一直更新）

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

生活经验

基于PyTorch的Seq2Seq翻译模型详细注释介绍（一）

總結(jié)