BERT: 理解上下文的语言模型
BERT 全名為 Bidrectional Encoder Representations from Transformers, 是 Google 以無監督的方式利用大量無標注文本生成的語言代表模型,其架構為 Transforer 中的 Encoder.
BERT 是 Transformer 中的 Encoder, 只是有很多層 (圖片來源]
以前在處理不同的 NLP 任務通常需要不同的 Language Model (LM),而設計這些模型并測試其性能需要不少的人力,時間以及計算資源。 BERT 模型就在這種背景下應運而生,我們可以將該模型套用到多個 NLP 任務,再以此為基礎 fine tune 多個下游任務。fine tune BERT 來解決下游任務有5個簡單的步驟:
那 BERT 模型該怎么用呢,thanks to 開源精神,BERT 的作者們已經開源訓練好的模型,我們只需要使用 TensorFlow or PyTorch 將模型導入即可。
import torch from transformers import BertTokenizer from IPython.display import clear_outputPRETRAINED_MODEL_NAME = "bert-base-chinese" # 指定簡繁中文 BERT-BASE 預訓練模型# 取得此預訓練模型所使用的 tokenizer tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)clear_output()上述代碼選用了有 12 層 layers 的 BERT-base, 當然你還可以在 Hugging Face 的 repository 找到更多關于 BERT 的預訓練模型:
- bert-base-chinese
- bert-base-uncased
- bert-base-cased
- bert-base-german-cased
- bert-base-multilingual-uncased
- bert-base-multilingual-cased
- bert-large-cased
- bert-large-uncased
- bert-large-uncased-whole-word-masking
- bert-large-cased-whole-word-masking
這些模型的區別主要在于:
- 預訓練步驟使用的文本語言
- 有無分大小寫
- 模型層數
- 預訓練時遮住 wordpieces 或是整個 word
接下來我就簡單的介紹一個情感分類任務來幫大家聯系 BERT 的 fine tune
1.準備原始文本數據
首先加載我們需要用到的庫:
import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.model_selection import GridSearchCV from sklearn.model_selection import cross_val_score import torch import transformers as tfs import warningswarnings.filterwarnings('ignore')然后加載數據集,本文采用的數據集是斯坦福大學發布的一個情感分析數據集SST,其組成成分來自于電影的評論。
train_df = pd.read_csv('https://github.com/clairett/pytorch-sentiment-classification/raw/master/data/SST2/train.tsv', delimiter='\t', header=None)train_set = train_df[:3000] #取其中的3000條數據作為我們的數據集 print("Train set shape:", train_set.shape) train_set[1].value_counts() #查看數據集中標簽的分布得到輸出如下:
Train set shape: (3000, 2) 1 1565 0 1435 Name: 1, dtype: int64可以看到積極與消極的標簽對半分。
這是數據集的部分內容:
0 1 0 a stirring , funny and finally transporting re... 1 1 apparently reassembled from the cutting room f... 0 2 they presume their audience wo n't sit still f... 0 3 this is a visually stunning rumination on love... 1 4 jonathan parker 's bartleby should have been t... 1 ... ... ... 6915 painful , horrifying and oppressively tragic ,... 1 6916 take care is nicely performed by a quintet of ... 0 6917 the script covers huge , heavy topics in a bla... 0 6918 a seriously bad film with seriously warped log... 0 6919 a deliciously nonsensical comedy about a city ... 12. 將原始文本轉化成 BERT 相容的輸入格式
我們對原來的數據集進行一些改造,分成 batch_size 為 64 大小的數據集,以便模型進行批量梯度下降
sentences = train_set[0].values targets = train_set[1].values train_inputs, test_inputs, train_targets, test_targets = train_test_split(sentences, targets)batch_size = 64 batch_count = int(len(train_inputs) / batch_size) batch_train_inputs, batch_train_targets = [], [] for i in range(batch_count):batch_train_inputs.append(train_inputs[i*batch_size : (i+1)*batch_size])batch_train_targets.append(train_targets[i*batch_size : (i+1)*batch_size])3. 利用 BERT 基于微調的方式建立下游任務模型
在這里我們采取 fine-tuned 使得 Bert 與線性層一起參與訓練,反向傳播會更新二者的參數,使得 Bert 模型更適合這個分類任務。
class BertClassificationModel(nn.Module):def __init__(self):super(BertClassificationModel, self).__init__() model_class, tokenizer_class, pretrained_weights = (tfs.BertModel, tfs.BertTokenizer, 'bert-base-uncased') self.tokenizer = tokenizer_class.from_pretrained(pretrained_weights)self.bert = model_class.from_pretrained(pretrained_weights)self.dense = nn.Linear(768, 2) #bert默認的隱藏單元數是768, 輸出單元是2,表示二分類def forward(self, batch_sentences):batch_tokenized = self.tokenizer.batch_encode_plus(batch_sentences, add_special_tokens=True,max_len=66, pad_to_max_length=True) #tokenize、add special token、padinput_ids = torch.tensor(batch_tokenized['input_ids'])attention_mask = torch.tensor(batch_tokenized['attention_mask'])bert_output = self.bert(input_ids, attention_mask=attention_mask)bert_cls_hidden_state = bert_output[0][:,0,:] #提取[CLS]對應的隱藏狀態linear_output = self.dense(bert_cls_hidden_state)return linear_output4.訓練下游任務模型
#train the model epochs = 3 lr = 0.01 print_every_batch = 5 bert_classifier_model = BertClassificationModel() optimizer = optim.SGD(bert_classifier_model.parameters(), lr=lr, momentum=0.9) criterion = nn.CrossEntropyLoss()for epoch in range(epochs):print_avg_loss = 0for i in range(batch_count):inputs = batch_train_inputs[i]labels = torch.tensor(batch_train_targets[i])optimizer.zero_grad()outputs = bert_classifier_model(inputs)loss = criterion(outputs, labels)loss.backward()optimizer.step()print_avg_loss += loss.item()if i % print_every_batch == (print_every_batch-1):print("Batch: %d, Loss: %.4f" % ((i+1), print_avg_loss/print_every_batch))print_avg_loss = 05.對新樣本做推論
# eval the trained model total = len(test_inputs) hit = 0 with torch.no_grad():for i in range(total):outputs = bert_classifier_model([test_inputs[i]])_, predicted = torch.max(outputs, 1)if predicted == test_targets[i]:hit += 1print("Accuracy: %.2f%%" % (hit / total * 100))預測結果如下:
Accuracy: 82.27%由此可見,經過微調后的模型效果還不錯。
好了,這篇 blog 就講到這里吧。
我是 Anthony, 我們下次再見.
參考文章:
總結
以上是生活随笔為你收集整理的BERT: 理解上下文的语言模型的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: xss labs 挑战之旅
- 下一篇: Ubuntu20.04 添加右键新建文件