當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

GAT模型介绍

發布時間：2024/9/18 编程问答 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 GAT模型介绍小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

在GCN里介紹了處理cora數據集，以及返回的結果：

features：論文的屬性特征，維度2708 × 1433 2708 \times 14332708×1433，并且做了歸一化，即每一篇論文屬性值的和為1.
labels：每一篇論文對應的分類編號：0-6
adj：鄰接矩陣，維度2708 × 2708 2708 \times 27082708×2708
idx_train：0-139
idx_val：200-499
idx_test：500-1499

這一節介紹GAT模型：

GAT模型

model:

class GAT(nn.Module):def __init__(self, nfeat, nhid, nclass, dropout, alpha, nheads):"""Dense version of GAT."""super(GAT, self).__init__()self.dropout = dropoutself.attentions = [GraphAttentionLayer(nfeat, nhid, dropout=dropout, alpha=alpha, concat=True) for _ in range(nheads)]for i, attention in enumerate(self.attentions):self.add_module('attention_{}'.format(i), attention)self.out_att = GraphAttentionLayer(nhid * nheads, nclass, dropout=dropout, alpha=alpha, concat=False) # 第二層(最后一層)的attention layerdef forward(self, x, adj):x = F.dropout(x, self.dropout, training=self.training)x = torch.cat([att(x, adj) for att in self.attentions], dim=1) # 將每層attention拼接x = F.dropout(x, self.dropout, training=self.training)x = F.elu(self.out_att(x, adj)) # 第二層的attention layerreturn F.log_softmax(x, dim=1)

layers:

class GraphAttentionLayer(nn.Module):"""Simple GAT layer, similar to https://arxiv.org/abs/1710.10903"""def __init__(self, in_features, out_features, dropout, alpha, concat=True):super(GraphAttentionLayer, self).__init__()self.dropout = dropoutself.in_features = in_featuresself.out_features = out_featuresself.alpha = alphaself.concat = concatself.W = nn.Parameter(torch.empty(size=(in_features, out_features)))nn.init.xavier_uniform_(self.W.data, gain=1.414)self.a = nn.Parameter(torch.empty(size=(2*out_features, 1))) # concat(V,NeigV)nn.init.xavier_uniform_(self.a.data, gain=1.414)self.leakyrelu = nn.LeakyReLU(self.alpha)def forward(self, h, adj):Wh = torch.mm(h, self.W) # h.shape: (N, in_features), Wh.shape: (N, out_features)a_input = self._prepare_attentional_mechanism_input(Wh) # 每一個節點和所有節點，特征。(Vall, Vall, feature)e = self.leakyrelu(torch.matmul(a_input, self.a).squeeze(2)) # a_input.shape=(2708,2708,16) self.a.shape=(16,l) numpy.matmul(a_input,self.a) shape=(2708, 2708, 1) squeeze表示去掉最后一個維度, (2708,2708)# 之前計算的是一個節點和所有節點的attention，其實需要的是連接的節點的attention系數zero_vec = -9e15*torch.ones_like(e)attention = torch.where(adj > 0, e, zero_vec) # 將鄰接矩陣中小于0的變成負無窮attention = F.softmax(attention, dim=1) # 按行求softmax。 sum(axis=1) === 1attention = F.dropout(attention, self.dropout, training=self.training)h_prime = torch.matmul(attention, Wh) # 聚合鄰居函數if self.concat:return F.elu(h_prime) # elu-激活函數else:return h_primedef _prepare_attentional_mechanism_input(self, Wh):N = Wh.size()[0] # number of nodes# Below, two matrices are created that contain embeddings in their rows in different orders.# (e stands for embedding)# These are the rows of the first matrix (Wh_repeated_in_chunks): # e1, e1, ..., e1, e2, e2, ..., e2, ..., eN, eN, ..., eN# '-------------' -> N times '-------------' -> N times '-------------' -> N times# # These are the rows of the second matrix (Wh_repeated_alternating): # e1, e2, ..., eN, e1, e2, ..., eN, ..., e1, e2, ..., eN # '----------------------------------------------------' -> N times# Wh_repeated_in_chunks = Wh.repeat_interleave(N, dim=0) # 復制Wh_repeated_alternating = Wh.repeat(N, 1)# Wh_repeated_in_chunks.shape == Wh_repeated_alternating.shape == (N * N, out_features)# The all_combination_matrix, created below, will look like this (|| denotes concatenation):# e1 || e1# e1 || e2# e1 || e3# ...# e1 || eN# e2 || e1# e2 || e2# e2 || e3# ...# e2 || eN# ...# eN || e1# eN || e2# eN || e3# ...# eN || eNall_combinations_matrix = torch.cat([Wh_repeated_in_chunks, Wh_repeated_alternating], dim=1)# all_combinations_matrix.shape == (N * N, 2 * out_features)return all_combinations_matrix.view(N, N, 2 * self.out_features)def __repr__(self):return self.__class__.__name__ + ' (' + str(self.in_features) + ' -> ' + str(self.out_features) + ')'

初始化模型

model = GAT(nfeat=1433, nhid=8, nclass=7, dropout=0.6, nheads=8, alpha=0.2)

構建attention：

self.dropout = 0.6self.attentions = [GraphAttentionLayer(nfeat, nhid, dropout=dropout, alpha=alpha, concat=True) for _ in range(nheads)] for i, attention in enumerate(self.attentions):self.add_module('attention_{}'.format(i), attention)self.out_att = GraphAttentionLayer(nhid * nheads, nclass, dropout=dropout, alpha=alpha, concat=False) # 第二層(最后一層)的attention layer

attentions和out_att

首先構建attentions層，主要包括8個GraphAttentionLayer，每一個GraphAttentionLayer如下：

def __init__(self, in_features, out_features, dropout, alpha, concat=True):super(GraphAttentionLayer, self).__init__()self.dropout = 0.6self.in_features = 1433self.out_features = 8self.alpha = 0.2self.concat = Trueself.W = nn.Parameter(torch.empty(size=(1433, 8)))nn.init.xavier_uniform_(self.W.data, gain=1.414) # 初始化Wself.a = nn.Parameter(torch.empty(size=(2*out_features, 1))) # concat(V,NeigV)nn.init.xavier_uniform_(self.a.data, gain=1.414) # 初始化aself.leakyrelu = nn.LeakyReLU(0.2)

參數W的維度是 $W1433×8W_{1433 \times 8}$
參數a的維度是 $a16×1a_{16 \times 1}$

out_att

而out_att與attention相似，區別是out_att只有一個GraphAttentionLayer，而且參數也有所不同：

def __init__(self, in_features, out_features, dropout, alpha, concat=True):super(GraphAttentionLayer, self).__init__()self.dropout = 0.6self.in_features = 64self.out_features = 7self.alpha = 0.2self.concat = Falseself.W = nn.Parameter(torch.empty(size=(64, 7)))nn.init.xavier_uniform_(self.W.data, gain=1.414) # 初始化Wself.a = nn.Parameter(torch.empty(size=(2*out_features, 1))) # concat(V,NeigV)nn.init.xavier_uniform_(self.a.data, gain=1.414) # 初始化aself.leakyrelu = nn.LeakyReLU(0.2)

參數W的維度是 $W64×7W_{64 \times 7}$
參數a的維度是 $a14×1a_{14 \times 1}$

forward執行模型

首先執行model:

def forward(self, x, adj):x = F.dropout(x, self.dropout, training=self.training)x = torch.cat([att(x, adj) for att in self.attentions], dim=1) # 將每層attention拼接x = F.dropout(x, self.dropout, training=self.training)x = F.elu(self.out_att(x, adj)) # 第二層的attention layerreturn F.log_softmax(x, dim=1)

執行F.dropout，將輸入數據的特征進行dropout=0.6，

遍歷self.attentions，得到每個att，執行GraphAttentionLayer中的forward方法，att(x, adj)將傳入到forward中，參數是數據特征x和鄰接矩陣adj

在GraphAttentionLayer中的forward:

def forward(self, h, adj):Wh = torch.mm(h, self.W) # h.shape: (N, in_features), Wh.shape: (N, out_features)a_input = self._prepare_attentional_mechanism_input(Wh) # 每一個節點和所有節點，特征。(Vall, Vall, feature)e = self.leakyrelu(torch.matmul(a_input, self.a).squeeze(2)) # a_input.shape=(2708,2708,16) self.a.shape=(16,l) numpy.matmul(a_input,self.a) shape=(2708, 2708, 1) squeeze表示去掉最后一個維度, (2708,2708)# 之前計算的是一個節點和所有節點的attention，其實需要的是連接的節點的attention系數zero_vec = -9e15*torch.ones_like(e)attention = torch.where(adj > 0, e, zero_vec) # 將鄰接矩陣中小于0的變成負無窮attention = F.softmax(attention, dim=1) # 按行求softmax。 sum(axis=1) === 1attention = F.dropout(attention, self.dropout, training=self.training)h_prime = torch.matmul(attention, Wh) # 聚合鄰居函數if self.concat:return F.elu(h_prime) # elu-激活函數else:return h_prime

將數據特征h與第一個attention的權重w相乘，得到結果 $wh=x2708×1433×W1433×8w_h=x_{2708 \times 1433} \times W_{1433 \times 8}$ ， $w_h$ 的維度是 $2708 \times 8$ ，然后執行self._prepare_attentional_mechanism_input(Wh)
5. 執行self._prepare_attentional_mechanism_input(Wh):

def _prepare_attentional_mechanism_input(self, Wh):N = Wh.size()[0] # number of nodesWh_repeated_in_chunks = Wh.repeat_interleave(N, dim=0) # 復制維度:[2708*2708, 8]Wh_repeated_alternating = Wh.repeat(N, 1)all_combinations_matrix = torch.cat([Wh_repeated_in_chunks, Wh_repeated_alternating], dim=1)return all_combinations_matrix.view(N, N, 2 * self.out_features)

假如Wh[0]的內容如下：

Wh[0] tensor([-0.0118, -0.0033, -0.0051, 0.0151, -0.0151, 0.0186, -0.0097, 0.0387],grad_fn=<SelectBackward>)

那么經過Wh_repeated_in_chunks=Wh.repeat_interleave(N, dim=0)后，Wh_repeated_in_chunks的維度變為 $2708 \times 8$ ，且Wh_repeated_in_chunks[0]到Wh_repeated_in_chunks[2707]的數據與Wh[0]一致，
Wh_repeated_in_chunks[2708]到Wh_repeated_in_chunks[2707+2708]的數據與Wh[1]一致，以此類推。

經過Wh_repeated_alternating = Wh.repeat(N, 1)后，Wh_repeated_alternating維度變為 $2708 \times 8$ ，且Wh_repeated_alternating[0]到Wh_repeated_alternating[2707]的數據與Wh[0]到Wh[2707]一致，形式如下：

# e1, e1, ..., e1, e2, e2, ..., e2, ..., eN, eN, ..., eN# '-------------' -> N times '-------------' -> N times '-------------' -> N times

Wh_repeated_alternating[2708]到Wh_repeated_alternating[2707+2708]的數據與Wh[2708]到Wh[2707+2708]一致，形式如下：

# These are the rows of the second matrix (Wh_repeated_alternating): # e1, e2, ..., eN, e1, e2, ..., eN, ..., e1, e2, ..., eN # '----------------------------------------------------' -> N times

all_combinations_matrix = torch.cat([Wh_repeated_in_chunks, Wh_repeated_alternating], dim=1)的作用是將上面的Wh_repeated_alternating與Wh_repeated_alternating拼接起來，維度為 $[2708 ? 2708, 16]$ ，形式如下：

# e1 || e1 # e1 || e2 # e1 || e3 # ... # e1 || eN # e2 || e1 # e2 || e2 # e2 || e3 # ... # e2 || eN # ... # eN || e1 # eN || e2 # eN || e3 # ... # eN || eN

返回結果a_input的維度被改為 $[2708, 2708, 16]$ ，a_input格式如下，a_input[0]表示第0個數據與其他數據的特征拼接；a_input[1]表示第1個數據與其他數據的特征拼接；

6. e = self.leakyrelu(torch.matmul(a_input, self.a).squeeze(2))計算attention，a_input的維度是 $[2708, 2708, 16]$ ，self.a的維度是 $[16, 1]$ ，相乘得到維度 $[2708, 2708, 1]$ ，這個表示每一個節點與其他所有節點的attention的值，squeeze表示去掉最后一個維度, 也就是維度變為 $[2708, 2708]$
7. zero_vec = -9e15*torch.ones_like(e), 之前計算的是一個節點和所有節點的attention，其實需要的是連接的節點的attention系數，所以生成與e同樣的結構的矩陣zero_vec
8. attention = torch.where(adj > 0, e, zero_vec) 將鄰接矩陣中小于0的變成負無窮，形成與鄰接矩陣shape相同的attention矩陣，每個值表示該節點與其他節點的attention值
9. attention = F.softmax(attention, dim=1)，對attention矩陣的每一行求softmax，求得關聯度最大的概率節點
10. attention = F.dropout(attention, self.dropout, training=self.training)對attention矩陣進行dropout
11. h_prime = torch.matmul(attention, Wh)將最終得到的attention矩陣與當前attention的權重矩陣相乘
12. 將結果h_prime進行激活函數elu，h_prime的維度是 $2708 \times 8$
13. 這樣就完成了一次attention，總共有8次attention
14. 得到8個h_prime，將這8個h_prime拼接起來，得到x的shape是[2708, 64]

總結

以上是生活随笔為你收集整理的GAT模型介绍的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

模型
GAT