sentence_transformers 微调模型
生活随笔
收集整理的這篇文章主要介紹了
sentence_transformers 微调模型
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
模型使用的是:uer/sbert-base-chinese-nli · Hugging Face
使用sentencetransformer微調(diào)模型
sentence_transformers 官網(wǎng):SentenceTransformers Documentation — Sentence-Transformers documentation
使用自己的數(shù)據(jù)微調(diào)模型
sentence_transformer 使用自己訓(xùn)練數(shù)據(jù)微調(diào)模型的代碼如下所示。(模型為計算句子相似度)
from sentence_transformers import SentenceTransformer,SentenceDataset,InputExample,evaluation,losses,util class sbert():def build_train_data(self,o1,o2,n1,n2,train_size):train_data = []for i in range(train_size):train_data.append(InputExample(tests=[o1[i],o2[i]],label=1.0))train_data.append(InputExample(tests=[n1[i],n2[i]],label=1.0))return train_datadef build_evaluation_data(o1,o2,n1,n2,train_size,eval_size):s1 = o1[train_size:]s2 = o2[train_size:]s1.extend(list(n1[train_size:]))s2.extend(list(n2[train_size:]))score = [1.0]*eval_size + [0.0]*eval_sizeevaluator = evaluation.EmbeddingSimilarityEvaluator(s1,s2,score)return evaluatordef callback(self,score,epoch,steps)print('score:{},epoch:{},steps:{}'.format(score,epoch,steps))def train(self):#1.獲取正、負(fù)樣本,o1是標(biāo)準(zhǔn)問,O2是相似問o1,o2 = self.get_act_data()n1,n2 = self.get_neg_data()#2.定義訓(xùn)練集、測試集大小 + 構(gòu)造訓(xùn)練數(shù)據(jù)train_size = int(len(o1)*0.8)eval_size = len(o1) - train_sizetrain_data = self.build_train_data(o1,o2,n1,n2,train_size)#3.定義測試數(shù)據(jù)evaluator = self.build_evaluation_data(o1,o2,n1,n2,train_size,eval_size)#4.需要訓(xùn)練的模型mode = SentenceTransformer('模型地址')#5train_dataset = SentenceDataset(train_data,model)train_dataloader = DataLoader(train_dataset,shuffle =true, batch_size = 8)train_loss = losses.CosineSimilarityLoss(model)#6.調(diào)試模型model.fit(train_objectives = [(train_dataloader,train_loss)],epochs = 1,warmup_steps = 100,evaluator = evaluator,evaluation_steps = 100,output_path = '存調(diào)試后模型的地址',save_best_model = True,callback = self.callback)sentence_transformer使用自己微調(diào)后的模型的代碼如下所示:
#1. 定義模型 model = SentenceTransformer('模型地址') #2.編碼向量 o1_emb = model.encode(['數(shù)據(jù)list','求求一定要好運(yùn)啊']) o2_emb = model.encode(['一定要是列表','我絕對可以好運(yùn)'] #計算相似度 cosine_score0 = util.cos_sim(o1_emb,o2_emb) cosine_score = [] for i in range(len(cosine_score0)):cosine_score.append(cosine_score0[i][i].numpy().tolist()增加模型層數(shù)
from sentence_transformers import SentenceTransformer, models from torch import nnword_embedding_model = models.Transformer('bert-base-uncased', max_seq_length=256) pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension()) dense_model = models.Dense(in_features=pooling_model.get_sentence_embedding_dimension(), out_features=256, activation_function=nn.Tanh())model = SentenceTransformer(modules=[word_embedding_model, pooling_model, dense_model])總結(jié)
以上是生活随笔為你收集整理的sentence_transformers 微调模型的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: UDP 协议格式
- 下一篇: linux中PATH环境变量的作用和使用