這里寫(xiě)自定義目錄標(biāo)題
- 論文
- TransE
- 評(píng)價(jià)指標(biāo)
- 問(wèn)題
論文
Translating Embeddings for Modeling Multi-relational Data
TransE
算法概覽
核心思想
實(shí)體向量 + 關(guān)系向量 = 實(shí)體向量 (h+l = t)
Tips
- 關(guān)系向量 (l)需要?dú)w一化,避免訓(xùn)練時(shí)帶來(lái)實(shí)體向量的尺度變化
- 正樣本 - 即原有樣本,公式中的d(h+l, t)
- 負(fù)樣本 - 隨機(jī)替換h或者l, 不同時(shí)替換,即為負(fù)樣本, 公式中的d(h’+l, t’)
- 距離采用l1 norm 或者l2 norm
- 訓(xùn)練方式采用SGD訓(xùn)練法
參考代碼
https://github.com/wuxiyu/transE/blob/master/tranE.py
關(guān)鍵代碼片段
def update(self
, Tbatch
):copyEntityList
= deepcopy
(self
.entityList
)copyRelationList
= deepcopy
(self
.relationList
)for tripletWithCorruptedTriplet
in Tbatch
:headEntityVector
= copyEntityList
[tripletWithCorruptedTriplet
[0][0]]tailEntityVector
= copyEntityList
[tripletWithCorruptedTriplet
[0][1]]relationVector
= copyRelationList
[tripletWithCorruptedTriplet
[0][2]]headEntityVectorWithCorruptedTriplet
= copyEntityList
[tripletWithCorruptedTriplet
[1][0]]tailEntityVectorWithCorruptedTriplet
= copyEntityList
[tripletWithCorruptedTriplet
[1][1]]headEntityVectorBeforeBatch
= self
.entityList
[tripletWithCorruptedTriplet
[0][0]]tailEntityVectorBeforeBatch
= self
.entityList
[tripletWithCorruptedTriplet
[0][1]]relationVectorBeforeBatch
= self
.relationList
[tripletWithCorruptedTriplet
[0][2]]headEntityVectorWithCorruptedTripletBeforeBatch
= self
.entityList
[tripletWithCorruptedTriplet
[1][0]]tailEntityVectorWithCorruptedTripletBeforeBatch
= self
.entityList
[tripletWithCorruptedTriplet
[1][1]]if self
.L1
:distTriplet
= distanceL1
(headEntityVectorBeforeBatch
, tailEntityVectorBeforeBatch
, relationVectorBeforeBatch
)distCorruptedTriplet
= distanceL1
(headEntityVectorWithCorruptedTripletBeforeBatch
, tailEntityVectorWithCorruptedTripletBeforeBatch
, relationVectorBeforeBatch
)else:distTriplet
= distanceL2
(headEntityVectorBeforeBatch
, tailEntityVectorBeforeBatch
, relationVectorBeforeBatch
)distCorruptedTriplet
= distanceL2
(headEntityVectorWithCorruptedTripletBeforeBatch
, tailEntityVectorWithCorruptedTripletBeforeBatch
, relationVectorBeforeBatch
)eg
= self
.margin
+ distTriplet
- distCorruptedTriplet
if eg
> 0: self
.loss
+= eg
if self
.L1
:tempPositive
= 2 * self
.learingRate
* (tailEntityVectorBeforeBatch
- headEntityVectorBeforeBatch
- relationVectorBeforeBatch
)tempNegtative
= 2 * self
.learingRate
* (tailEntityVectorWithCorruptedTripletBeforeBatch
- headEntityVectorWithCorruptedTripletBeforeBatch
- relationVectorBeforeBatch
)tempPositiveL1
= []tempNegtativeL1
= []for i
in range(self
.dim
):if tempPositive
[i
] >= 0:tempPositiveL1
.append
(1)else:tempPositiveL1
.append
(-1)if tempNegtative
[i
] >= 0:tempNegtativeL1
.append
(1)else:tempNegtativeL1
.append
(-1)tempPositive
= array
(tempPositiveL1
) tempNegtative
= array
(tempNegtativeL1
)else:tempPositive
= 2 * self
.learingRate
* (tailEntityVectorBeforeBatch
- headEntityVectorBeforeBatch
- relationVectorBeforeBatch
)tempNegtative
= 2 * self
.learingRate
* (tailEntityVectorWithCorruptedTripletBeforeBatch
- headEntityVectorWithCorruptedTripletBeforeBatch
- relationVectorBeforeBatch
)headEntityVector
= headEntityVector
+ tempPositivetailEntityVector
= tailEntityVector
- tempPositiverelationVector
= relationVector
+ tempPositive
- tempNegtativeheadEntityVectorWithCorruptedTriplet
= headEntityVectorWithCorruptedTriplet
- tempNegtativetailEntityVectorWithCorruptedTriplet
= tailEntityVectorWithCorruptedTriplet
+ tempNegtativecopyEntityList
[tripletWithCorruptedTriplet
[0][0]] = norm
(headEntityVector
)copyEntityList
[tripletWithCorruptedTriplet
[0][1]] = norm
(tailEntityVector
)copyRelationList
[tripletWithCorruptedTriplet
[0][2]] = norm
(relationVector
)copyEntityList
[tripletWithCorruptedTriplet
[1][0]] = norm
(headEntityVectorWithCorruptedTriplet
)copyEntityList
[tripletWithCorruptedTriplet
[1][1]] = norm
(tailEntityVectorWithCorruptedTriplet
)self
.entityList
= copyEntityListself
.relationList
= copyRelationList
評(píng)價(jià)指標(biāo)
轉(zhuǎn)自https://blog.csdn.net/hello_acm/article/details/95070669?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param
1、Mean rank
首先 對(duì)于每個(gè) testing triple,以預(yù)測(cè)tail entity為例,我們將(h,r,t)中的t用知識(shí)圖譜中的每個(gè)實(shí)體來(lái)代替,然后通過(guò)fr(h,t)函數(shù)來(lái)計(jì)算分?jǐn)?shù),這樣我們可以得到一系列的分?jǐn)?shù),之后按照 升序?qū)⑦@些分?jǐn)?shù)排列。
f函數(shù)值是越小越好,那么在上個(gè)排列中,排的越前越好。
現(xiàn)在重點(diǎn)來(lái)了,我們?nèi)タ疵總€(gè) testing triple中正確答案也就是真實(shí)的t到底能在上述序列中排多少位,比如說(shuō)t1排100,t2排200,t3排60…,之后對(duì)這些排名求平均,Mean rank就得到了。
2、MRR是一個(gè)國(guó)際上通用的對(duì)搜索算法進(jìn)行評(píng)價(jià)的機(jī)制,即第一個(gè)結(jié)果匹配,分?jǐn)?shù)為1,第二個(gè)匹配分?jǐn)?shù)為0.5,第n個(gè)匹配分?jǐn)?shù)為1/n,如果沒(méi)有匹配的句子分?jǐn)?shù)為0。最終的分?jǐn)?shù)為所有得分之和。
3、hits@10
還是按照上述進(jìn)行f函數(shù)值排列,然后去看每個(gè)testing triple正確答案是否排在序列的前十,如果在的話(huà)就計(jì)數(shù)+1,最終 排在前十的個(gè)數(shù)/總個(gè)數(shù) 就是Hit@10
問(wèn)題
- 構(gòu)建負(fù)樣本的時(shí)候,如果是1Vmulit 的關(guān)系,會(huì)造成負(fù)樣本構(gòu)建錯(cuò)誤。
總結(jié)
以上是生活随笔為你收集整理的知识图谱 - TransE算法的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。