當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

知识图谱 - TransE算法

發(fā)布時(shí)間：2023/12/2 编程问答 35 豆豆

生活随笔收集整理的這篇文章主要介紹了知识图谱 - TransE算法小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

這里寫(xiě)自定義目錄標(biāo)題

論文
TransE
- 算法概覽
- 核心思想
- Tips
- 參考代碼
評(píng)價(jià)指標(biāo)
問(wèn)題

論文

Translating Embeddings for Modeling Multi-relational Data

TransE

算法概覽

核心思想

實(shí)體向量 + 關(guān)系向量 = 實(shí)體向量（h+l = t）

Tips

關(guān)系向量（l）需要?dú)w一化，避免訓(xùn)練時(shí)帶來(lái)實(shí)體向量的尺度變化
正樣本 - 即原有樣本，公式中的d(h+l, t)
負(fù)樣本 - 隨機(jī)替換h或者l, 不同時(shí)替換，即為負(fù)樣本, 公式中的d(h’+l, t’)
距離采用l1 norm 或者l2 norm
訓(xùn)練方式采用SGD訓(xùn)練法

參考代碼

https://github.com/wuxiyu/transE/blob/master/tranE.py
關(guān)鍵代碼片段

def update(self, Tbatch):copyEntityList = deepcopy(self.entityList)copyRelationList = deepcopy(self.relationList)for tripletWithCorruptedTriplet in Tbatch:headEntityVector = copyEntityList[tripletWithCorruptedTriplet[0][0]]#tripletWithCorruptedTriplet是原三元組和打碎的三元組的元組tupletailEntityVector = copyEntityList[tripletWithCorruptedTriplet[0][1]]relationVector = copyRelationList[tripletWithCorruptedTriplet[0][2]]headEntityVectorWithCorruptedTriplet = copyEntityList[tripletWithCorruptedTriplet[1][0]]tailEntityVectorWithCorruptedTriplet = copyEntityList[tripletWithCorruptedTriplet[1][1]]headEntityVectorBeforeBatch = self.entityList[tripletWithCorruptedTriplet[0][0]]#tripletWithCorruptedTriplet是原三元組和打碎的三元組的元組tupletailEntityVectorBeforeBatch = self.entityList[tripletWithCorruptedTriplet[0][1]]relationVectorBeforeBatch = self.relationList[tripletWithCorruptedTriplet[0][2]]headEntityVectorWithCorruptedTripletBeforeBatch = self.entityList[tripletWithCorruptedTriplet[1][0]]tailEntityVectorWithCorruptedTripletBeforeBatch = self.entityList[tripletWithCorruptedTriplet[1][1]]if self.L1:distTriplet = distanceL1(headEntityVectorBeforeBatch, tailEntityVectorBeforeBatch, relationVectorBeforeBatch)distCorruptedTriplet = distanceL1(headEntityVectorWithCorruptedTripletBeforeBatch, tailEntityVectorWithCorruptedTripletBeforeBatch , relationVectorBeforeBatch)else:distTriplet = distanceL2(headEntityVectorBeforeBatch, tailEntityVectorBeforeBatch, relationVectorBeforeBatch)distCorruptedTriplet = distanceL2(headEntityVectorWithCorruptedTripletBeforeBatch, tailEntityVectorWithCorruptedTripletBeforeBatch , relationVectorBeforeBatch)eg = self.margin + distTriplet - distCorruptedTripletif eg > 0: #[function]+ 是一個(gè)取正值的函數(shù)self.loss += egif self.L1:tempPositive = 2 * self.learingRate * (tailEntityVectorBeforeBatch - headEntityVectorBeforeBatch - relationVectorBeforeBatch)tempNegtative = 2 * self.learingRate * (tailEntityVectorWithCorruptedTripletBeforeBatch - headEntityVectorWithCorruptedTripletBeforeBatch - relationVectorBeforeBatch)tempPositiveL1 = []tempNegtativeL1 = []for i in range(self.dim):#不知道有沒(méi)有pythonic的寫(xiě)法（比如列表推倒或者numpy的函數(shù)）？if tempPositive[i] >= 0:tempPositiveL1.append(1)else:tempPositiveL1.append(-1)if tempNegtative[i] >= 0:tempNegtativeL1.append(1)else:tempNegtativeL1.append(-1)tempPositive = array(tempPositiveL1) tempNegtative = array(tempNegtativeL1)else:tempPositive = 2 * self.learingRate * (tailEntityVectorBeforeBatch - headEntityVectorBeforeBatch - relationVectorBeforeBatch)tempNegtative = 2 * self.learingRate * (tailEntityVectorWithCorruptedTripletBeforeBatch - headEntityVectorWithCorruptedTripletBeforeBatch - relationVectorBeforeBatch)headEntityVector = headEntityVector + tempPositivetailEntityVector = tailEntityVector - tempPositiverelationVector = relationVector + tempPositive - tempNegtativeheadEntityVectorWithCorruptedTriplet = headEntityVectorWithCorruptedTriplet - tempNegtativetailEntityVectorWithCorruptedTriplet = tailEntityVectorWithCorruptedTriplet + tempNegtative#只歸一化這幾個(gè)剛更新的向量，而不是按原論文那些一口氣全更新了copyEntityList[tripletWithCorruptedTriplet[0][0]] = norm(headEntityVector)copyEntityList[tripletWithCorruptedTriplet[0][1]] = norm(tailEntityVector)copyRelationList[tripletWithCorruptedTriplet[0][2]] = norm(relationVector)copyEntityList[tripletWithCorruptedTriplet[1][0]] = norm(headEntityVectorWithCorruptedTriplet)copyEntityList[tripletWithCorruptedTriplet[1][1]] = norm(tailEntityVectorWithCorruptedTriplet)self.entityList = copyEntityListself.relationList = copyRelationList

評(píng)價(jià)指標(biāo)

轉(zhuǎn)自https://blog.csdn.net/hello_acm/article/details/95070669?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param
1、Mean rank
首先對(duì)于每個(gè) testing triple，以預(yù)測(cè)tail entity為例，我們將（h,r,t）中的t用知識(shí)圖譜中的每個(gè)實(shí)體來(lái)代替，然后通過(guò)fr（h,t）函數(shù)來(lái)計(jì)算分?jǐn)?shù)，這樣我們可以得到一系列的分?jǐn)?shù)，之后按照升序?qū)⑦@些分?jǐn)?shù)排列。
f函數(shù)值是越小越好，那么在上個(gè)排列中，排的越前越好。
現(xiàn)在重點(diǎn)來(lái)了，我們?nèi)タ疵總€(gè) testing triple中正確答案也就是真實(shí)的t到底能在上述序列中排多少位，比如說(shuō)t1排100，t2排200，t3排60…，之后對(duì)這些排名求平均，Mean rank就得到了。

2、MRR是一個(gè)國(guó)際上通用的對(duì)搜索算法進(jìn)行評(píng)價(jià)的機(jī)制，即第一個(gè)結(jié)果匹配，分?jǐn)?shù)為1，第二個(gè)匹配分?jǐn)?shù)為0.5，第n個(gè)匹配分?jǐn)?shù)為1/n，如果沒(méi)有匹配的句子分?jǐn)?shù)為0。最終的分?jǐn)?shù)為所有得分之和。

3、hits@10
還是按照上述進(jìn)行f函數(shù)值排列，然后去看每個(gè)testing triple正確答案是否排在序列的前十，如果在的話(huà)就計(jì)數(shù)+1，最終排在前十的個(gè)數(shù)/總個(gè)數(shù) 就是Hit@10

問(wèn)題

構(gòu)建負(fù)樣本的時(shí)候，如果是1Vmulit 的關(guān)系，會(huì)造成負(fù)樣本構(gòu)建錯(cuò)誤。

總結(jié)

以上是生活随笔為你收集整理的知识图谱 - TransE算法的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： transE(Translating E
下一篇：【自然语言处理】【知识图谱】知识图谱表示