信息抽取(二)花了一個星期走了無數(shù)條彎路終于用TF復(fù)現(xiàn)了蘇神的《Bert三元關(guān)系抽取》模型,我到底悟到了什么? 前言 數(shù)據(jù)格式與任務(wù)目標(biāo) 模型整體思路 復(fù)現(xiàn)代碼 數(shù)據(jù)處理 數(shù)據(jù)讀取 訓(xùn)練數(shù)據(jù)處理 模型搭建 模型參數(shù)圖 Conditional_LayerNormalization SPO的抽取,評估并保存模型 模型訓(xùn)練 一些排坑 可能的優(yōu)化方向 總結(jié)
前言
先上熱菜致敬蘇神:蘇劍林. (2020, Jan 03). 《用bert4keras做三元組抽取 》[Blog post]. Retrieved from https://kexue.fm/archives/7161
建議大家先看蘇神的原文,如果您能看懂思路和代碼的話我的文章可能對你的幫助不大。 拜讀這篇文章之后本人用TF + Transformers 復(fù)現(xiàn)了該baseline模型,并在其基礎(chǔ)上進(jìn)行了大量的嘗試,直到心累也沒有成功復(fù)現(xiàn)相同水平的結(jié)果,但也有所接近,因此用這篇文章復(fù)盤整個過程并分享一些收獲和心得。
數(shù)據(jù)格式與任務(wù)目標(biāo)
數(shù)據(jù)下載地址:https://ai.baidu.com/broad/download?dataset=sked 數(shù)據(jù)格式:
{ "text" : "查爾斯·阿蘭基斯(Charles Aránguiz),1989年4月17日出生于智利圣地亞哥,智利職業(yè)足球運(yùn)動員,司職中場,效力于德國足球甲級聯(lián)賽勒沃庫森足球俱樂部" ,
"spo_list" :
[ { "predicate" : "出生地" , "object_type" : "地點" , "subject_type" : "人物" , "object" : "圣地亞哥" , "subject" : "查爾斯·阿蘭基斯" } ,
{ "predicate" : "出生日期" , "object_type" : "Date" , "subject_type" : "人物" , "object" : "1989年4月17日" , "subject" : "查爾斯·阿蘭基斯" } ] }
簡單來說給定一段文本,我們需要從中抽取出多組 S(subject) P(predicate) O(object_type)的關(guān)系。
例如:“查爾斯·阿蘭基斯–出生日期–1989年4月17日”則是一組我們需要抽取出來的信息。而 P(需要預(yù)測的關(guān)系)已經(jīng)給定范圍,一共49類關(guān)系,具體見 all_50_schemas 。
模型整體思路
這個模型思路的精彩之處:
該任務(wù)本來應(yīng)該分成兩個模塊完成:1.抽取實體(包括S和O)2.判斷實體之間的關(guān)系,理應(yīng)至少需要兩個模型協(xié)同完成,但蘇神將實體之間的關(guān)系類別預(yù)測隱性的放在了O抽取的過程中,即讓模型在預(yù)測O的時候直接預(yù)測O與S的關(guān)系P。
指針標(biāo)注:對每個span的start和end進(jìn)行標(biāo)記,對于多片段抽取問題轉(zhuǎn)化為N個2分類(N為序列長度),如果涉及多類別可以轉(zhuǎn)化為層疊式指針標(biāo)注(C個指針網(wǎng)絡(luò),C為類別總數(shù))。事實上,指針標(biāo)注已經(jīng)成為統(tǒng)一實體、關(guān)系、事件抽取的一個“大殺器”。
由于一個文本中可能存在多對SPO關(guān)系組,甚至可能存在S之間有Overlap,O之間有Overlap的情況,因此模型的輸出層使用的是半指針-半標(biāo)注的sigmoid(類似多標(biāo)簽預(yù)測實體的始末位置,與閱讀理解相似)這樣可以讓模型同時標(biāo)注多對S和O。
使用Conditional Layer Normalization 我們需要在預(yù)測PO時告訴模型,我們的S是什么,以至于使得模型學(xué)習(xí)到PO的預(yù)測是依賴于S的,而不是看見“日期”就認(rèn)為是出生年月。具體的內(nèi)部實現(xiàn)流程也可以參考我的代碼,會有介紹。(這各地方也卡了我很久才跑通)最后評估下來這個方法有利也有弊。
復(fù)現(xiàn)代碼
數(shù)據(jù)處理
數(shù)據(jù)讀取
def load_data ( path
) : text_list
= [ ] spo_list
= [ ] with open ( path
) as json_file
: for i
in json_file
: text_list
. append
( eval ( i
) [ 'text' ] ) spo_list
. append
( eval ( i
) [ 'spo_list' ] ) return text_list
, spo_list
def load_ps ( path
) : with open ( path
, 'r' ) as f
: data
= pd
. DataFrame
( [ eval ( i
) for i
in f
] ) [ 'predicate' ] p2id
= { } id2p
= { } data
= list ( set ( data
) ) for i
in range ( len ( data
) ) : p2id
[ data
[ i
] ] = iid2p
[ i
] = data
[ i
] return p2id
, id2p
訓(xùn)練數(shù)據(jù)處理
這里處理的思路和信息抽取(一)中處理的思路相似,有詳細(xì)的代碼注釋: 信息抽取(一)機(jī)器閱讀理解——樣本數(shù)據(jù)處理與Baseline模型搭建訓(xùn)練(2020語言與智能技術(shù)競賽)
這里主要介紹針對本次任務(wù)的幾個細(xì)節(jié)和trick:
由于一段文本可能存在多個S,因此遍歷一組數(shù)據(jù)里的所有SPO關(guān)系,將所有S的頭尾位置放在一個01數(shù)組中。 對于存在多組SPO關(guān)系的樣本,在標(biāo)注PO時,我們只隨機(jī)選取一個S,理由比較簡單,你沒辦法一下子傳入多個S給下一個模型。 抽取S時,隨機(jī)選取一個S的首位置,從所有S的末位置中選取一個與之匹配,如果是完整的S,則對其所有的PO進(jìn)行標(biāo)注,否則跳過,該樣本作為負(fù)樣本。這是為了讓模型學(xué)會并非所有抽取出來的S都有對應(yīng)的PO關(guān)系。 由于限制了token長度,對于找不到S的樣本最后去除,對于找不到P的樣本保留,同樣作為負(fù)樣本。
def proceed_data ( text_list
, spo_list
, p2id
, id2p
, tokenizer
, MAX_LEN
) : id_label
= { } ct
= len ( text_list
) MAX_LEN
= MAX_LENinput_ids
= np
. zeros
( ( ct
, MAX_LEN
) , dtype
= 'int32' ) attention_mask
= np
. zeros
( ( ct
, MAX_LEN
) , dtype
= 'int32' ) start_tokens
= np
. zeros
( ( ct
, MAX_LEN
) , dtype
= 'int32' ) end_tokens
= np
. zeros
( ( ct
, MAX_LEN
) , dtype
= 'int32' ) send_s_po
= np
. zeros
( ( ct
, 2 ) , dtype
= 'int32' ) object_start_tokens
= np
. zeros
( ( ct
, MAX_LEN
, len ( p2id
) ) , dtype
= 'int32' ) object_end_tokens
= np
. zeros
( ( ct
, MAX_LEN
, len ( p2id
) ) , dtype
= 'int32' ) invalid_index
= [ ] for k
in range ( ct
) : context_k
= text_list
[ k
] . lower
( ) . replace
( ' ' , '' ) enc_context
= tokenizer
. encode
( context_k
, max_length
= MAX_LEN
, truncation
= True ) if len ( spo_list
[ k
] ) == 0 : invalid_index
. append
( k
) continue start
= [ ] end
= [ ] S_index
= [ ] for j
in range ( len ( spo_list
[ k
] ) ) : answers_text_k
= spo_list
[ k
] [ j
] [ 'subject' ] . lower
( ) . replace
( ' ' , '' ) chars
= np
. zeros
( ( len ( context_k
) ) ) index
= context_k
. find
( answers_text_k
) chars
[ index
: index
+ len ( answers_text_k
) ] = 1 offsets
= [ ] idx
= 0 for t
in enc_context
[ 1 : ] : w
= tokenizer
. decode
( [ t
] ) if '#' in w
and len ( w
) > 1 : w
= w
. replace
( '#' , '' ) if w
== '[UNK]' : w
= '。' offsets
. append
( ( idx
, idx
+ len ( w
) ) ) idx
+= len ( w
) toks
= [ ] for i
, ( a
, b
) in enumerate ( offsets
) : sm
= np
. sum ( chars
[ a
: b
] ) if sm
> 0 : toks
. append
( i
) input_ids
[ k
, : len ( enc_context
) ] = enc_contextattention_mask
[ k
, : len ( enc_context
) ] = 1 if len ( toks
) > 0 : start_tokens
[ k
, toks
[ 0 ] + 1 ] = 1 end_tokens
[ k
, toks
[ - 1 ] + 1 ] = 1 start
. append
( toks
[ 0 ] + 1 ) end
. append
( toks
[ - 1 ] + 1 ) S_index
. append
( j
) if len ( start
) > 0 : start_np
= np
. array
( start
) end_np
= np
. array
( end
) start_
= np
. random
. choice
( start_np
) end_
= np
. random
. choice
( end_np
[ end_np
>= start_
] ) send_s_po
[ k
, 0 ] = start_send_s_po
[ k
, 1 ] = end_s_index
= start
. index
( start_
) if end_
== end
[ s_index
] : for index
in range ( len ( start
) ) : if start
[ index
] == start_
and end
[ index
] == end_
: object_text_k
= spo_list
[ k
] [ S_index
[ index
] ] [ 'object' ] . lower
( ) . replace
( ' ' , '' ) predicate
= spo_list
[ k
] [ S_index
[ index
] ] [ 'predicate' ] p_id
= p2id
[ predicate
] chars
= np
. zeros
( ( len ( context_k
) ) ) index
= context_k
. find
( object_text_k
) chars
[ index
: index
+ len ( object_text_k
) ] = 1 offsets
= [ ] idx
= 0 for t
in enc_context
[ 1 : ] : w
= tokenizer
. decode
( [ t
] ) if '#' in w
and len ( w
) > 1 : w
= w
. replace
( '#' , '' ) if w
== '[UNK]' : w
= '。' offsets
. append
( ( idx
, idx
+ len ( w
) ) ) idx
+= len ( w
) toks
= [ ] for i
, ( a
, b
) in enumerate ( offsets
) : sm
= np
. sum ( chars
[ a
: b
] ) if sm
> 0 : toks
. append
( i
) if len ( toks
) > 0 : id_label
[ p_id
] = predicateobject_start_tokens
[ k
, toks
[ 0 ] + 1 , p_id
] = 1 object_end_tokens
[ k
, toks
[ - 1 ] + 1 , p_id
] = 1 else : invalid_index
. append
( k
) return input_ids
, attention_mask
, start_tokens
, end_tokens
, send_s_po
, object_start_tokens
, object_end_tokens
, invalid_index
, id_label
def proceed_var_data ( text_list
, spo_list
, tokenizer
, MAX_LEN
) : ct
= len ( text_list
) MAX_LEN
= MAX_LENinput_ids
= np
. zeros
( ( ct
, MAX_LEN
) , dtype
= 'int32' ) attention_mask
= np
. zeros
( ( ct
, MAX_LEN
) , dtype
= 'int32' ) for k
in range ( ct
) : context_k
= text_list
[ k
] . lower
( ) . replace
( ' ' , '' ) enc_context
= tokenizer
. encode
( context_k
, max_length
= MAX_LEN
, truncation
= True ) input_ids
[ k
, : len ( enc_context
) ] = enc_contextattention_mask
[ k
, : len ( enc_context
) ] = 1 return input_ids
, attention_mask
模型搭建
模型與上文給出的模型示意圖結(jié)構(gòu)一致,也是該思路下最基本的baseline模型,后面還嘗試使用了不同的隱藏層和多種不同連接層,均沒有得到理想的提升,具體會在文末介紹。
Trick:為了改善類別不平和的問題(正樣本遠(yuǎn)少于負(fù)樣本)對于sigmoid之后輸出的概率值作n次方,思路與focal_loss類似但不用自己調(diào)參數(shù)。關(guān)于n的取值,以下是蘇神給出的解釋:
def extract_subject ( inputs
) : """根據(jù)subject_ids從output中取出subject的向量表征""" output
, subject_ids
= inputsstart
= tf
. gather
( output
, subject_ids
[ : , 0 ] , axis
= 1 , batch_dims
= 0 ) end
= tf
. gather
( output
, subject_ids
[ : , 1 ] , axis
= 1 , batch_dims
= 0 ) subject
= tf
. keras
. layers
. Concatenate
( axis
= 2 ) ( [ start
, end
] ) return subject
[ : , 0 ] '''output.shape = (None,128,768)subjudec_ids.shape = (None,2)start.shape = (None,None,768)subject.shape = (None,None,1536)subject[:,0].shape = (None,1536)這一部分給出各個變量的shape應(yīng)該一目了然''' def build_model_2 ( pretrained_path
, config
, MAX_LEN
, p2id
) : ids
= tf
. keras
. layers
. Input
( ( MAX_LEN
, ) , dtype
= tf
. int32
) att
= tf
. keras
. layers
. Input
( ( MAX_LEN
, ) , dtype
= tf
. int32
) s_po_index
= tf
. keras
. layers
. Input
( ( 2 , ) , dtype
= tf
. int32
) config
. output_hidden_states
= True bert_model
= TFBertModel
. from_pretrained
( pretrained_path
, config
= config
, from_pt
= True ) x
, _
, hidden_states
= bert_model
( ids
, attention_mask
= att
) layer_1
= hidden_states
[ - 1 ] start_logits
= tf
. keras
. layers
. Dense
( 1 , activation
= 'sigmoid' ) ( layer_1
) start_logits
= tf
. keras
. layers
. Lambda
( lambda x
: x
** 2 ) ( start_logits
) end_logits
= tf
. keras
. layers
. Dense
( 1 , activation
= 'sigmoid' ) ( layer_1
) end_logits
= tf
. keras
. layers
. Lambda
( lambda x
: x
** 2 ) ( end_logits
) subject_1
= extract_subject
( [ layer_1
, s_po_index
] ) Normalization_1
= LayerNormalization
( conditional
= True ) ( [ layer_1
, subject_1
] ) op_out_put_start
= tf
. keras
. layers
. Dense
( len ( p2id
) , activation
= 'sigmoid' ) ( Normalization_1
) op_out_put_start
= tf
. keras
. layers
. Lambda
( lambda x
: x
** 4 ) ( op_out_put_start
) op_out_put_end
= tf
. keras
. layers
. Dense
( len ( p2id
) , activation
= 'sigmoid' ) ( Normalization_1
) op_out_put_end
= tf
. keras
. layers
. Lambda
( lambda x
: x
** 4 ) ( op_out_put_end
) model
= tf
. keras
. models
. Model
( inputs
= [ ids
, att
, s_po_index
] , outputs
= [ start_logits
, end_logits
, op_out_put_start
, op_out_put_end
] ) model_2
= tf
. keras
. models
. Model
( inputs
= [ ids
, att
] , outputs
= [ start_logits
, end_logits
] ) model_3
= tf
. keras
. models
. Model
( inputs
= [ ids
, att
, s_po_index
] , outputs
= [ op_out_put_start
, op_out_put_end
] ) return model
, model_2
, model_3
模型參數(shù)圖
Conditional_LayerNormalization
這一部分直接拿的原碼,把keras換成tf,其他沒有作太大的改動,基本的思路是通過兩個Dense層將extract_subject出來的向量進(jìn)行矩陣變換,得到LayerNormalization的beta和gamma,而這個矩陣變化的參數(shù)是模型學(xué)出來的。 對于Conditional_LayerNormalization的介紹見上文鏈接。
class LayerNormalization ( tf
. keras
. layers
. Layer
) : """(Conditional) Layer Normalizationhidden_*系列參數(shù)僅為有條件輸入時(conditional=True)使用""" def __init__ ( self
, center
= True , scale
= True , epsilon
= None , conditional
= False , hidden_units
= None , hidden_activation
= 'linear' , hidden_initializer
= 'glorot_uniform' , ** kwargs
) : super ( LayerNormalization
, self
) . __init__
( ** kwargs
) self
. center
= centerself
. scale
= scaleself
. conditional
= conditionalself
. hidden_units
= hidden_unitsself
. hidden_activation
= activations
. get
( hidden_activation
) self
. hidden_initializer
= initializers
. get
( hidden_initializer
) self
. epsilon
= epsilon
or 1e - 12 def compute_mask ( self
, inputs
, mask
= None ) : if self
. conditional
: masks
= mask
if mask
is not None else [ ] masks
= [ m
[ None ] for m
in masks
if m
is not None ] if len ( masks
) == 0 : return None else : return K
. all ( K
. concatenate
( masks
, axis
= 0 ) , axis
= 0 ) else : return mask
def build ( self
, input_shape
) : super ( LayerNormalization
, self
) . build
( input_shape
) if self
. conditional
: shape
= ( input_shape
[ 0 ] [ - 1 ] , ) else : shape
= ( input_shape
[ - 1 ] , ) if self
. center
: self
. beta
= self
. add_weight
( shape
= shape
, initializer
= 'zeros' , name
= 'beta' ) if self
. scale
: self
. gamma
= self
. add_weight
( shape
= shape
, initializer
= 'ones' , name
= 'gamma' ) if self
. conditional
: if self
. hidden_units
is not None : self
. hidden_dense
= tf
. keras
. layers
. Dense
( units
= self
. hidden_units
, activation
= self
. hidden_activation
, use_bias
= False , kernel_initializer
= self
. hidden_initializer
) if self
. center
: self
. beta_dense
= tf
. keras
. layers
. Dense
( units
= shape
[ 0 ] , use_bias
= False , kernel_initializer
= 'zeros' ) if self
. scale
: self
. gamma_dense
= tf
. keras
. layers
. Dense
( units
= shape
[ 0 ] , use_bias
= False , kernel_initializer
= 'zeros' ) def call ( self
, inputs
) : """如果是條件Layer Norm,則默認(rèn)以list為輸入,第二個是condition""" if self
. conditional
: inputs
, cond
= inputs
if self
. hidden_units
is not None : cond
= self
. hidden_dense
( cond
) for _
in range ( K
. ndim
( inputs
) - K
. ndim
( cond
) ) : cond
= K
. expand_dims
( cond
, 1 ) if self
. center
: beta
= self
. beta_dense
( cond
) + self
. beta
if self
. scale
: gamma
= self
. gamma_dense
( cond
) + self
. gamma
else : if self
. center
: beta
= self
. beta
if self
. scale
: gamma
= self
. gammaoutputs
= inputs
if self
. center
: mean
= K
. mean
( outputs
, axis
= - 1 , keepdims
= True ) outputs
= outputs
- mean
if self
. scale
: variance
= K
. mean
( K
. square
( outputs
) , axis
= - 1 , keepdims
= True ) std
= K
. sqrt
( variance
+ self
. epsilon
) outputs
= outputs
/ stdoutputs
= outputs
* gamma
if self
. center
: outputs
= outputs
+ beta
return outputs
SPO的抽取,評估并保存模型
這一部分的方法是自己寫的,也最有可能是因為抽取方法和評估方法與原文有不同之處,導(dǎo)致一直拿不到最好的結(jié)果。看個思路就好。
對于S和O的分?jǐn)?shù)閾值的選擇,這里S和O分別為0.5和0.4,這兩個值還需要通過多次實驗驗證來調(diào)整。需要注意的是:因為模型輸出時對sigmoid后的概率值作了2次和4次方,因此輸出的值會偏小,設(shè)置較高的閾值會提高精確度,但難免會犧牲一定的召回率。
class Metrics ( tf
. keras
. callbacks
. Callback
) : def __init__ ( self
, model_2
, model_3
, id2tag
, va_spo_list
, va_input_ids
, va_attention_mask
, tokenizer
) : super ( Metrics
, self
) . __init__
( ) self
. model_2
= model_2self
. model_3
= model_3self
. id2tag
= id2tagself
. va_input_ids
= va_input_idsself
. va_attention_mask
= va_attention_maskself
. va_spo_list
= va_spo_listself
. tokenizer
= tokenizer
def on_train_begin ( self
, logs
= None ) : self
. val_f1s
= [ ] self
. best_val_f1
= 0 def get_same_element_index ( self
, ob_list
) : return [ i
for ( i
, v
) in enumerate ( ob_list
) if v
== 1 ] def evaluate_data ( self
) : question
= [ ] answer
= [ ] Y1
= self
. model_2
. predict
( [ self
. va_input_ids
, self
. va_attention_mask
] ) for i
in range ( len ( Y1
[ 0 ] ) ) : for z
in self
. va_spo_list
[ i
] : question
. append
( ( z
[ 'subject' ] [ 0 ] , z
[ 'subject' ] [ - 1 ] , z
[ 'predicate' ] , z
[ 'object' ] [ 0 ] , z
[ 'object' ] [ - 1 ] ) ) x_
= [ self
. tokenizer
. decode
( [ t
] ) for t
in self
. va_input_ids
[ i
] ] x1
= np
. array
( Y1
[ 0 ] [ i
] > 0.5 , dtype
= 'int32' ) x2
= np
. array
( Y1
[ 1 ] [ i
] > 0.5 , dtype
= 'int32' ) union
= x1
+ x2index_list
= self
. get_same_element_index
( list ( union
) ) start
= 0 S_list
= [ ] while start
+ 1 < len ( index_list
) : S_list
. append
( ( index_list
[ start
] , index_list
[ start
+ 1 ] + 1 ) ) start
+= 2 for os_s
, os_e
in S_list
: S
= '' . join
( x_
[ os_s
: os_e
] ) Y2
= self
. model_3
. predict
( [ [ self
. va_input_ids
[ i
] ] , [ self
. va_attention_mask
[ i
] ] , np
. array
( [ [ os_s
, os_e
] ] ) ] ) for m
in range ( len ( self
. id2tag
) ) : x3
= np
. array
( Y2
[ 0 ] [ 0 ] [ : , m
] > 0.4 , dtype
= 'int32' ) x4
= np
. array
( Y2
[ 1 ] [ 0 ] [ : , m
] > 0.4 , dtype
= 'int32' ) if sum ( x3
) > 0 and sum ( x4
) > 0 : predict
= self
. id2tag
[ m
] union
= x3
+ x4index_list
= self
. get_same_element_index
( list ( union
) ) start
= 0 P_list
= [ ] while start
+ 1 < len ( index_list
) : P_list
. append
( ( index_list
[ start
] , index_list
[ start
+ 1 ] + 1 ) ) start
+= 2 for os_s
, os_e
in P_list
: if os_e
>= os_s
: P
= '' . join
( x_
[ os_s
: os_e
] ) answer
. append
( ( S
[ 0 ] , S
[ - 1 ] , predict
, P
[ 0 ] , P
[ - 1 ] ) ) Q
= set ( question
) S
= set ( answer
) f1
= 2 * len ( Q
& S
) / ( len ( Q
) + len ( S
) ) return f1
def on_epoch_end ( self
, epoch
, logs
= None ) : logs
= logs
or { } _val_f1
= self
. evaluate_data
( ) self
. val_f1s
. append
( _val_f1
) logs
[ 'val_f1' ] = _val_f1
if _val_f1
> self
. best_val_f1
: self
. model
. save_weights
( './model_/02_f1={}_model.hdf5' . format ( _val_f1
) ) self
. best_val_f1
= _val_f1
print ( "best f1: {}" . format ( self
. best_val_f1
) ) else : print ( "val f1: {}, but not the best f1" . format ( _val_f1
) ) return
模型訓(xùn)練
K
. clear_session
( )
model
, model_2
, model_3
= build_model_2
( pretrained_path
, config
, MAX_LEN
, p2id
)
optimizer
= tf
. keras
. optimizers
. Adam
( learning_rate
= 1e - 5 )
model
. compile ( loss
= { 'lambda' : new_loss
, 'lambda_1' : new_loss
, 'lambda_2' : new_loss
, 'lambda_3' : new_loss
} , optimizer
= optimizer
)
model
. fit
( [ input_ids
, attention_mask
, send_s_po
] , \
[ start_tokens
, end_tokens
, object_start_tokens
, object_end_tokens
] , \epochs
= 20 , batch_size
= 32 , callbacks
= [ Metrics
( model_2
, model_3
, id2tag
, va_spo_list
, va_input_ids
, va_attention_mask
, tokenizer
) ] )
最終F1只到了0.772,相比原文中的0.822還有一定距離,主要原因可能在于
對于驗證集上SPO的抽取寫的有問題,抽取不完全且可能出錯 在計算F1時,得到的SP結(jié)果直接通過decode得到,而不是原文中通過在原文中的index切片得到,會導(dǎo)致部分字符不匹配。
一些排坑
對于單個S,一定要把他所有的OP信息都標(biāo)注在C*N的矩陣?yán)?#xff0c;事實證明這必只抽取一組SPO關(guān)系效果有明顯的提升(F1:0.65-0.77) 對于該模型的loss,特別是OP矩陣部分的loss一定要重新定義!如果直接使用loss=‘binary_crossentropy’ 會使得整體Loss極小,導(dǎo)致梯度過小,更新緩慢且訓(xùn)練不充分。
def new_loss ( true
, pred
) : true
= tf
. cast
( true
, tf
. float32
) loss
= K
. sum ( K
. binary_crossentropy
( true
, pred
) ) return loss
Conditional_LayerNorm層使用的Bert隱藏層需要和推理S的層一致,可能原因是模型在擬合S時,離S最近的最外層最能表達(dá)S在句中的語義,因此通過在該層上進(jìn)行Conditional_LayerNorm再進(jìn)一步推理PO,信息更完全。本人嘗試了用最外的隱藏層和次外的隱藏層分別推理S和PO,發(fā)現(xiàn)效果不佳。 Conditional_LayerNorm后不要接卷積!嘗試了和閱讀理解中的方法,通過對隱藏層進(jìn)行卷積來推理實體,但這似乎與Conditional_LayerNorm無法匹配。
可能的優(yōu)化方向
在Conditional_LayerNorm之后接position_embedding和self_attention如下圖所示:
不隨機(jī)選擇S(subject),?是遍歷所有不同主語的標(biāo)注樣本構(gòu)建訓(xùn)練集。 多頭標(biāo)注代替指針標(biāo)注,具體見論文
總結(jié)
先做個承諾上述三個優(yōu)化方向和上文提到的F1計算和SPO抽取方法都會重新做一遍,因此總結(jié)等我優(yōu)化完再來一次性說啦~
完整代碼地址: https://github.com/zhengyanzhao1997/TF-NLP-model/blob/main/model/train/Three_relation_extract.py
參考文章:
蘇劍林. (2020, Jan 03). 《用bert4keras做三元組抽取 》[Blog post]. Retrieved from https://kexue.fm/archives/7161 一人之力,刷爆三路榜單!信息抽取競賽奪冠經(jīng)驗分享
總結(jié)
以上是生活随笔 為你收集整理的信息抽取(二)花了一个星期走了无数条弯路终于用TF复现了苏神的《Bert三元关系抽取模型》,我到底悟到了什么? 的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔 網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔 推薦給好友。