PT之BERT:基于torch框架(特征编码+BERT作为文本编码器+分类器)针对UCI新闻数据集利用Transformer-BERT算法(模型实时保存)实现新闻文本多分类案例
PT之BERT:基于torch框架(特征編碼+BERT作為文本編碼器+分類器)針對UCI新聞數據集利用Transformer-BERT算法(模型實時保存)實現新聞文本多分類案例
目錄
基于torch框架(特征編碼+BERT作為文本編碼器+分類器)針對UCI新聞數據集利用Transformer-BERT算法(模型實時保存)實現新聞文本多分類
# 1、定義數據集
# 2、數據預處理
2.1、篩選特征:數據集包含標題(title)和類別(category)兩列
# 2.2、去掉空值
# 2.3、【類別型】特征編碼化:將類別信息轉換為數字標簽
# 2.4、數據集規范化:模型可接受的torch向量形式,以便用于訓練或推理
# 3、模型訓練與推理
# 3.1、切分數據集
# 3.2、數據集Torch化并進入數據加載器:需要設置批量大小和最大序列長度
# 3.3、定義模型、損失、優化器
# 使用BERT模型作為編碼器,并添加一個全連接層進行分類
# 3.4、模型訓練和實時保存
# 4、模型推理
相關文章
PT之BERT:基于torch框架(特征編碼+BERT作為文本編碼器+分類器)針對UCI新聞數據集利用Transformer-BERT算法(模型實時保存)實現新聞文本多分類案例
PT之BERT:基于torch框架(特征編碼+BERT作為文本編碼器+分類器)針對UCI新聞數據集利用Transformer-BERT算法(模型實時保存)實現新聞文本多分類案例實現代碼
基于torch框架(特征編碼+BERT作為文本編碼器+分類器)針對UCI新聞數據集利用Transformer-BERT算法(模型實時保存)實現新聞文本多分類
# 1、定義數據集
| ID | TITLE | URL | PUBLISHER | CATEGORY | STORY | HOSTNAME | TIMESTAMP |
| 1 | Fed official says weak data caused by weather, should not slow taper | http://www.latimes.com/business/money/la-fi-mo-federal-reserve-plosser-stimulus-economy-20140310,0,1312750.story\?track=rss | Los Angeles Times | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.latimes.com | 1.39447E+12 |
| 2 | Fed's Charles Plosser sees high bar for change in pace of tapering | http://www.livemint.com/Politics/H2EvwJSK2VE6OF7iK1g3PP/Feds-Charles-Plosser-sees-high-bar-for-change-in-pace-of-ta.html | Livemint | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.livemint.com | 1.39447E+12 |
| 3 | US open: Stocks fall after Fed official hints at accelerated tapering | http://www.ifamagazine.com/news/us-open-stocks-fall-after-fed-official-hints-at-accelerated-tapering-294436 | IFA Magazine | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.ifamagazine.com | 1.39447E+12 |
| 4 | Fed risks falling 'behind the curve', Charles Plosser says | http://www.ifamagazine.com/news/fed-risks-falling-behind-the-curve-charles-plosser-says-294430 | IFA Magazine | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.ifamagazine.com | 1.39447E+12 |
| 5 | Fed's Plosser: Nasty Weather Has Curbed Job Growth | http://www.moneynews.com/Economy/federal-reserve-charles-plosser-weather-job-growth/2014/03/10/id/557011 | Moneynews | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.moneynews.com | 1.39447E+12 |
| 6 | Plosser: Fed May Have to Accelerate Tapering Pace | http://www.nasdaq.com/article/plosser-fed-may-have-to-accelerate-tapering-pace-20140310-00371 | NASDAQ | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.nasdaq.com | 1.39447E+12 |
| 7 | Fed's Plosser: Taper pace may be too slow | http://www.marketwatch.com/story/feds-plosser-taper-pace-may-be-too-slow-2014-03-10\?reflink=MW_news_stmp | MarketWatch | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.marketwatch.com | 1.39447E+12 |
| 8 | Fed's Plosser expects US unemployment to fall to 6.2% by the end of 2014 | http://www.fxstreet.com/news/forex-news/article.aspx\?storyid=23285020-b1b5-47ed-a8c4-96124bb91a39 | FXstreet.com | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | www.fxstreet.com | 1.39447E+12 |
| 9 | US jobs growth last month hit by weather:Fed President Charles Plosser | http://economictimes.indiatimes.com/news/international/business/us-jobs-growth-last-month-hit-by-weatherfed-president-charles-plosser/articleshow/31788000.cms | Economic Times | b | ddUyU0VZz0BRneMioxUPQVP6sIxvM | economictimes.indiatimes.com | 1.39447E+12 |
| 10 | ECB unlikely to end sterilisation of SMP purchases - traders | http://www.iii.co.uk/news-opinion/reuters/news/152615 | Interactive Investor | b | dPhGU51DcrolUIMxbRm0InaHGA2XM | www.iii.co.uk | 1.39447E+12 |
# 2、數據預處理
2.1、篩選特征:數據集包含標題(title)和類別(category)兩列
TITLE CATEGORY 0 Fed official says weak data caused by weather,... b 1 Fed's Charles Plosser sees high bar for change... b 2 US open: Stocks fall after Fed official hints ... b 3 Fed risks falling 'behind the curve', Charles ... b 4 Fed's Plosser: Nasty Weather Has Curbed Job Gr... b ... ... ... 422414 Surgeons to remove 4-year-old's rib to rebuild... m 422415 Boy to have surgery on esophagus after battery... m 422416 Child who swallowed battery to have reconstruc... m 422417 Phoenix boy undergoes surgery to repair throat... m 422418 Phoenix boy undergoes surgery to repair throat... m[422419 rows x 2 columns]# 2.2、去掉空值
TITLE CATEGORY 0 Fed official says weak data caused by weather,... b 1 Fed's Charles Plosser sees high bar for change... b 2 US open: Stocks fall after Fed official hints ... b 3 Fed risks falling 'behind the curve', Charles ... b 4 Fed's Plosser: Nasty Weather Has Curbed Job Gr... b ... ... ... 422414 Surgeons to remove 4-year-old's rib to rebuild... m 422415 Boy to have surgery on esophagus after battery... m 422416 Child who swallowed battery to have reconstruc... m 422417 Phoenix boy undergoes surgery to repair throat... m 422418 Phoenix boy undergoes surgery to repair throat... m[422419 rows x 2 columns]# 2.3、【類別型】特征編碼化:將類別信息轉換為數字標簽
TITLE CATEGORY 0 Fed official says weak data caused by weather,... 0 1 Fed's Charles Plosser sees high bar for change... 0 2 US open: Stocks fall after Fed official hints ... 0 3 Fed risks falling 'behind the curve', Charles ... 0 4 Fed's Plosser: Nasty Weather Has Curbed Job Gr... 0 ... ... ... 422414 Surgeons to remove 4-year-old's rib to rebuild... 2 422415 Boy to have surgery on esophagus after battery... 2 422416 Child who swallowed battery to have reconstruc... 2 422417 Phoenix boy undergoes surgery to repair throat... 2 422418 Phoenix boy undergoes surgery to repair throat... 2[422419 rows x 2 columns]# 2.4、數據集規范化:模型可接受的torch向量形式,以便用于訓練或推理
input_ids tensor([[ 101, 7349, 2880, ..., 0, 0, 0],[ 101, 7349, 1005, ..., 0, 0, 0],[ 101, 2149, 2330, ..., 0, 0, 0],...,[ 101, 2878, 1011, ..., 0, 0, 0],[ 101, 2878, 1011, ..., 0, 0, 0],[ 101, 20077, 1996, ..., 0, 0, 0]]) attention_masks tensor([[1, 1, 1, ..., 0, 0, 0],[1, 1, 1, ..., 0, 0, 0],[1, 1, 1, ..., 0, 0, 0],...,[1, 1, 1, ..., 0, 0, 0],[1, 1, 1, ..., 0, 0, 0],[1, 1, 1, ..., 0, 0, 0]]) labels tensor([0, 0, 0, ..., 2, 2, 2])# 3、模型訓練與推理
# 3.1、切分數據集
# 3.2、數據集Torch化并進入數據加載器:需要設置批量大小和最大序列長度
train_dataset <__main__.NewsDataset object at 0x0000021A1EE9ECA0> (tensor([ 101, 20228, 15094, 2121, 1024, 7349, 2089, 2031, 2000, 23306,6823, 4892, 6393, 102, 0, 0, 0, 0, 0]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]), tensor(0)) (tensor([ 101, 2149, 2330, 1024, 15768, 2991, 2044, 7349, 2880, 20385,2012, 14613, 6823, 4892, 102, 0, 0, 0, 0]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), tensor(0)) (tensor([ 101, 7349, 1005, 1055, 20228, 15094, 2121, 1024, 11808, 4633,2038, 13730, 2098, 3105, 3930, 102, 0, 0, 0]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0]), tensor(0)) (tensor([ 101, 7349, 10831, 4634, 1005, 2369, 1996, 7774, 1005, 1010,2798, 20228, 15094, 2121, 2758, 102, 0, 0, 0]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]), tensor(0)) test_dataloader <torch.utils.data.dataloader.DataLoader object at 0x0000021A1EF422E0> [tensor([[ 101, 7349, 2880, 2758, 5410, 2951, 3303, 2011, 4633, 1010,2323, 2025, 4030, 6823, 2099, 102, 0, 0, 0],[ 101, 7349, 1005, 1055, 2798, 20228, 15094, 2121, 5927, 2152,3347, 2005, 2689, 1999, 6393, 1997, 6823, 4892, 102]]), tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0],[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]]), tensor([0, 0])]# 3.3、定義模型、損失、優化器
# 使用BERT模型作為編碼器,并添加一個全連接層進行分類
# 3.4、模型訓練和實時保存
Epoch: 01 Train Loss: 0.7342, Train Acc: 0.7198 Eval Loss: 0.2669, Eval Acc: 46.0000 Epoch: 02 Train Loss: 0.1879, Train Acc: 0.9464 Eval Loss: 0.1194, Eval Acc: 48.2812 Epoch: 03 Train Loss: 0.0991, Train Acc: 0.9731 Eval Loss: 0.1043, Eval Acc: 48.2500 Epoch: 04 Train Loss: 0.0630, Train Acc: 0.9811 Eval Loss: 0.1025, Eval Acc: 48.5312 Epoch: 05 Train Loss: 0.0439, Train Acc: 0.9866 Eval Loss: 0.1078, Eval Acc: 48.5938# 4、模型推理
This is a breaking news about politics Predicted class: 0總結
以上是生活随笔為你收集整理的PT之BERT:基于torch框架(特征编码+BERT作为文本编码器+分类器)针对UCI新闻数据集利用Transformer-BERT算法(模型实时保存)实现新闻文本多分类案例的全部內容,希望文章能夠幫你解決所遇到的問題。