當(dāng)前位置：首頁(yè) > 编程语言 > asp.net >内容正文

asp.net

使用ML.NET实现情感分析[新手篇]

發(fā)布時(shí)間：2023/12/4 asp.net 31 豆豆

生活随笔收集整理的這篇文章主要介紹了使用ML.NET实现情感分析[新手篇] 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

在發(fā)出《.NET Core玩轉(zhuǎn)機(jī)器學(xué)習(xí)》和《使用ML.NET預(yù)測(cè)紐約出租車費(fèi)》兩文后，相信讀者朋友們即使在不明就里的情況下，也能按照內(nèi)容順利跑完代碼運(yùn)行出結(jié)果，對(duì)使用.NET Core和ML.NET，以及機(jī)器學(xué)習(xí)的效果有了初步感知。得到這些體驗(yàn)后，那么就需要回頭小結(jié)一下了，本文仍然基于一個(gè)情感分析的案例，以剛接觸機(jī)器學(xué)習(xí)的.NET開發(fā)者的視角，側(cè)重展開一下起手ML.NET的基本理解和步驟。

當(dāng)我們意識(shí)到某個(gè)現(xiàn)實(shí)問題超出了傳統(tǒng)的模式匹配能力范圍，需要借助模擬的方式先盡可能還原已經(jīng)產(chǎn)生的事實(shí)（通常也稱為擬合），然后復(fù)用這種穩(wěn)定的模擬過程（通常也稱為模型），對(duì)即將發(fā)生的條件進(jìn)行估計(jì)，求得發(fā)生或不發(fā)生相同結(jié)果的概率，此時(shí)就是利用機(jī)器學(xué)習(xí)最好的機(jī)會(huì)，同時(shí)也要看到，這也是機(jī)器學(xué)習(xí)通常離不開大量數(shù)據(jù)的原因，歷史數(shù)據(jù)太少，模擬還原這個(gè)過程效果就會(huì)差很多，自然地，評(píng)估的結(jié)果誤差就大了。所以在重視數(shù)據(jù)的準(zhǔn)確性、完整性的同時(shí)，要學(xué)會(huì)經(jīng)營(yíng)數(shù)據(jù)的體量出來。

若要使用機(jī)器學(xué)習(xí)解決問題，一般會(huì)經(jīng)歷以下這些步驟：

1. 描述問題產(chǎn)生的場(chǎng)景

2. 針對(duì)特定場(chǎng)景收集數(shù)據(jù)

3. 對(duì)數(shù)據(jù)預(yù)處理

4. 確定模型（算法）進(jìn)行訓(xùn)練

5. 對(duì)訓(xùn)練好的模型進(jìn)行驗(yàn)證和調(diào)優(yōu)

6. 使用模型進(jìn)行預(yù)測(cè)分析

?接下來我將用案例逐一介紹。?

描述問題產(chǎn)生的場(chǎng)景

說到情感分析，我假定一個(gè)最簡(jiǎn)單的句子表達(dá)的場(chǎng)景，就是當(dāng)看到一句話，通過特定的詞語，我們能判斷這是一個(gè)正向積極的態(tài)度，或是負(fù)面消極的。比如“我的程序順利通過測(cè)試?yán)病边@就是一個(gè)正向的，而“這個(gè)函數(shù)的性能實(shí)在堪憂”就是一個(gè)負(fù)面的表達(dá)。所以，對(duì)詞語的鑒別就能間接知道說這句話的人的情感反應(yīng)。（本案例為降低理解的復(fù)雜程度，暫不考慮斷句、重音、標(biāo)點(diǎn)之類的這些因素。）

針對(duì)特定場(chǎng)景收集數(shù)據(jù)

為了證實(shí)上面的思路，我們需要先收集一些有用的數(shù)據(jù)。其實(shí)這也是讓眾多開發(fā)者卡住的環(huán)節(jié)，除了使用爬蟲和自己系統(tǒng)中的歷史數(shù)據(jù)，往往想不到短時(shí)間還能在哪獲取到。互聯(lián)網(wǎng)上有不少學(xué)院和機(jī)構(gòu)，甚至政府都是有開放數(shù)據(jù)集提供的，推薦兩處獲取比較高質(zhì)量數(shù)據(jù)集的來源：

UC Irvine Machine Learning Repository來自加州大學(xué)

kaggle.com一個(gè)著名的計(jì)算科學(xué)與機(jī)器學(xué)習(xí)競(jìng)賽網(wǎng)站

這次我從UCI找到一個(gè)剛好只是每行有一個(gè)句子加一個(gè)標(biāo)簽，并且標(biāo)簽已標(biāo)注好每個(gè)句子是正向還是負(fù)向的數(shù)據(jù)集了。在Sentiment Labelled Sentences Data Set下載。格式類似如下：

A very, very, very slow-moving, aimless movie about a distressed, drifting young man.? 0

Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out.? 0

Attempting artiness with black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent.? 0

Very little music or anything to speak of.? 0

The best scene in the movie was when Gerardo is trying to find a song that keeps running through his head.? 1

The rest of the movie lacks art, charm, meaning... If it's about emptiness, it works I guess because it's empty.? 0

Wasted two hours.? 0

...

觀察每一行，一共是Tab分隔的兩個(gè)字段，第一個(gè)字段是句子，一般我們稱之為特征（Feature），第二個(gè)字段是個(gè)數(shù)值，0表示負(fù)向，1表示正向，一般我們稱之為目標(biāo)或標(biāo)簽（Label），目標(biāo)值往往是人工標(biāo)注的，如果沒有這個(gè)，是無法使用對(duì)歷史數(shù)據(jù)進(jìn)行擬合這種機(jī)器學(xué)習(xí)方式的。所以，一份高質(zhì)量的數(shù)據(jù)集對(duì)人工標(biāo)注的要求很高，要盡可能準(zhǔn)確。

對(duì)數(shù)據(jù)預(yù)處理

對(duì)于創(chuàng)建項(xiàng)目一系列步驟，參看我開頭提到的兩篇文章即可，不再贅述。我們直接進(jìn)入正題，ML.NET對(duì)數(shù)據(jù)的處理以及后面的訓(xùn)練流程是通用的，這也是為了以后擴(kuò)展到其他第三方機(jī)器學(xué)習(xí)包設(shè)計(jì)的。首先觀察數(shù)據(jù)集的格式，創(chuàng)建與數(shù)據(jù)集一致的結(jié)構(gòu)，方便導(dǎo)入過程。LearningPipeline類專門用來定義機(jī)器學(xué)習(xí)過程的對(duì)象，所以緊接著我們需要?jiǎng)?chuàng)建它。代碼如下：

const string _dataPath = @".\data\sentiment labelled sentences\imdb_labelled.txt";

const string _testDataPath = @".\data\sentiment labelled sentences\yelp_labelled.txt";

public class SentimentData

{

? ? [Column(ordinal: "0")]

? ? public string SentimentText;

? ? [Column(ordinal: "1", name: "Label")]

? ? public float Sentiment;

}

var pipeline = new LearningPipeline();

pipeline.Add(new TextLoader<SentimentData>(_dataPath, useHeader: false, separator: "tab"));

pipeline.Add(new TextFeaturizer("Features", "SentimentText"));

SentimentData就是我需要的導(dǎo)入用的數(shù)據(jù)結(jié)構(gòu)，可以看到，Column屬性除了指示對(duì)應(yīng)數(shù)據(jù)集的行位置，額外的對(duì)應(yīng)最后一列，表示正向還是負(fù)向的字段，還要指定它是目標(biāo)值，并取了個(gè)標(biāo)識(shí)名。TextLoader就是專門用來導(dǎo)入文本數(shù)據(jù)的類，TextFeaturizer就是指定特征的類，因?yàn)槊恳恍袛?shù)據(jù)不是每一個(gè)字段都可以成為特征的，如果有較多字段時(shí)，可以在此處特別地指定出來，這樣不會(huì)被無關(guān)的字段影響。

確定模型（算法）進(jìn)行訓(xùn)練

本案例目標(biāo)是一個(gè)0/1的值類型，換句話說恰好是一個(gè)二分類問題，因此模型上我選擇了FastTreeBinaryClassifier這個(gè)類，如果略有了解機(jī)器學(xué)習(xí)的朋友一定知道邏輯回歸算法，與之在目的上大致相似。若要定義模型，同時(shí)要指定一個(gè)預(yù)測(cè)用的結(jié)構(gòu)，這樣模型就會(huì)按特定的結(jié)構(gòu)輸出模型的效果，一般這個(gè)輸出用的結(jié)構(gòu)至少要包含目標(biāo)字段。代碼片段如下：

public class SentimentPrediction

{

? ? [ColumnName("PredictedLabel")]

? ? public bool Sentiment;

}

pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 5, NumTrees = 5, MinDocumentsInLeafs = 2 });

PredictionModel<SentimentData, SentimentPrediction> model = pipeline.Train<SentimentData, SentimentPrediction>();

對(duì)訓(xùn)練好的模型進(jìn)行驗(yàn)證和調(diào)優(yōu)

在得到模型后，需要用測(cè)試數(shù)據(jù)集進(jìn)行驗(yàn)證，看看擬合的效果是不是符合預(yù)期，BinaryClassificationEvaluator就是FastTreeBinaryClassifier對(duì)應(yīng)的驗(yàn)證用的類，驗(yàn)證的結(jié)果用BinaryClassificationMetrics類保存。代碼片段如下：

var testData = new TextLoader<SentimentData>(_testDataPath, useHeader: false, separator: "tab");

var evaluator = new BinaryClassificationEvaluator();

BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData);

Console.WriteLine();

Console.WriteLine("PredictionModel quality metrics evaluation");

Console.WriteLine("------------------------------------------");

Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}");

Console.WriteLine($"Auc: {metrics.Auc:P2}");

Console.WriteLine($"F1Score: {metrics.F1Score:P2}");

像Accuracy，Auc，F1Score都是一些常見的評(píng)價(jià)指標(biāo)，包含了正確率、誤差一類的得分，如果得分很低，就需要調(diào)整前一個(gè)步驟中定義模型時(shí)的參數(shù)值。詳細(xì)的解釋參考：Machine learning glossary

使用模型進(jìn)行預(yù)測(cè)分析

訓(xùn)練好一個(gè)稱心如意的模型后，就可以正式使用了。本質(zhì)上就是再取來一些沒有人工標(biāo)注結(jié)果的數(shù)據(jù)，讓模型進(jìn)行分析返回一個(gè)符合某目標(biāo)值的概率。代碼片段如下：

IEnumerable<SentimentData> sentiments = new[]

{

? ? new SentimentData

? ? {

? ? ? ? SentimentText = "Contoso's 11 is a wonderful experience",

? ? ? ? Sentiment = 0

? ? },

? ? new SentimentData

? ? {

? ? ? ? SentimentText = "The acting in this movie is very bad",

? ? ? ? Sentiment = 0

? ? },

? ? new SentimentData

? ? {

? ? ? ? SentimentText = "Joe versus the Volcano Coffee Company is a great film.",

? ? ? ? Sentiment = 0

? ? }

};

IEnumerable<SentimentPrediction> predictions = model.Predict(sentiments);

Console.WriteLine();

Console.WriteLine("Sentiment Predictions");

Console.WriteLine("---------------------");

var sentimentsAndPredictions = sentiments.Zip(predictions, (sentiment, prediction) => (sentiment, prediction));

foreach (var item in sentimentsAndPredictions)

{

? ? Console.WriteLine($"Sentiment: {item.sentiment.SentimentText} | Prediction: {(item.prediction.Sentiment ? "Positive" : "Negative")}");

}

運(yùn)行結(jié)果可以看到，其分類是符合真實(shí)判斷的。盡管驗(yàn)證階段的得分不高，這也是很正常的，再?zèng)]有任何調(diào)優(yōu)下，存在一些中性、多義的句子干擾預(yù)測(cè)導(dǎo)致的。

這樣，再有新的句子就可以放心地通過程序自動(dòng)完成分類了，是不是很簡(jiǎn)單！希望本文能帶給.NET開發(fā)的朋友們對(duì)ML.NET躍躍欲試的興趣。

順便提一下，微軟Azure還有一個(gè)機(jī)器學(xué)習(xí)的在線工作室，鏈接地址為：https://studio.azureml.net/，相關(guān)的AI項(xiàng)目庫(kù)在：https://gallery.azure.ai/browse，對(duì)于暫時(shí)無法安裝本地機(jī)器學(xué)習(xí)環(huán)境，以及找不到練手項(xiàng)目的朋友，不妨試試這個(gè)。

最后放出項(xiàng)目的文件結(jié)構(gòu)以及完整的代碼：

using System;

using Microsoft.ML.Models;

using Microsoft.ML.Runtime;

using Microsoft.ML.Runtime.Api;

using Microsoft.ML.Trainers;

using Microsoft.ML.Transforms;

using System.Collections.Generic;

using System.Linq;

using Microsoft.ML;

namespace SentimentAnalysis

{

? ? class Program

? ? {

? ? ? ? const string _dataPath = @".\data\sentiment labelled sentences\imdb_labelled.txt";

? ? ? ? const string _testDataPath = @".\data\sentiment labelled sentences\yelp_labelled.txt";

? ? ? ? public class SentimentData

? ? ? ? {

? ? ? ? ? ? [Column(ordinal: "0")]

? ? ? ? ? ? public string SentimentText;

? ? ? ? ? ? [Column(ordinal: "1", name: "Label")]

? ? ? ? ? ? public float Sentiment;

? ? ? ? }

? ? ? ? public class SentimentPrediction

? ? ? ? {

? ? ? ? ? ? [ColumnName("PredictedLabel")]

? ? ? ? ? ? public bool Sentiment;

? ? ? ? }

? ? ? ? public static PredictionModel<SentimentData, SentimentPrediction> Train()

? ? ? ? {

? ? ? ? ? ? var pipeline = new LearningPipeline();

? ? ? ? ? ? pipeline.Add(new TextLoader<SentimentData>(_dataPath, useHeader: false, separator: "tab"));

? ? ? ? ? ? pipeline.Add(new TextFeaturizer("Features", "SentimentText"));

? ? ? ? ? ? pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 5, NumTrees = 5, MinDocumentsInLeafs = 2 });

? ? ? ? ? ? PredictionModel<SentimentData, SentimentPrediction> model = pipeline.Train<SentimentData, SentimentPrediction>();

? ? ? ? ? ? return model;

? ? ? ? }

? ? ? ? public static void Evaluate(PredictionModel<SentimentData, SentimentPrediction> model)

? ? ? ? {

? ? ? ? ? ? var testData = new TextLoader<SentimentData>(_testDataPath, useHeader: false, separator: "tab");

? ? ? ? ? ? var evaluator = new BinaryClassificationEvaluator();

? ? ? ? ? ? BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData);

? ? ? ? ? ? Console.WriteLine();

? ? ? ? ? ? Console.WriteLine("PredictionModel quality metrics evaluation");

? ? ? ? ? ? Console.WriteLine("------------------------------------------");

? ? ? ? ? ? Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}");

? ? ? ? ? ? Console.WriteLine($"Auc: {metrics.Auc:P2}");

? ? ? ? ? ? Console.WriteLine($"F1Score: {metrics.F1Score:P2}");

? ? ? ? }

? ? ? ? public static void Predict(PredictionModel<SentimentData, SentimentPrediction> model)

? ? ? ? {

? ? ? ? ? ? IEnumerable<SentimentData> sentiments = new[]

? ? ? ? ? ? {

? ? ? ? ? ? ? ? new SentimentData

? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? SentimentText = "Contoso's 11 is a wonderful experience",

? ? ? ? ? ? ? ? ? ? Sentiment = 0

? ? ? ? ? ? ? ? },

? ? ? ? ? ? ? ? new SentimentData

? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? SentimentText = "The acting in this movie is very bad",

? ? ? ? ? ? ? ? ? ? Sentiment = 0

? ? ? ? ? ? ? ? },

? ? ? ? ? ? ? ? new SentimentData

? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? SentimentText = "Joe versus the Volcano Coffee Company is a great film.",

? ? ? ? ? ? ? ? ? ? Sentiment = 0

? ? ? ? ? ? ? ? }

? ? ? ? ? ? };

? ? ? ? ? ? IEnumerable<SentimentPrediction> predictions = model.Predict(sentiments);

? ? ? ? ? ? Console.WriteLine();

? ? ? ? ? ? Console.WriteLine("Sentiment Predictions");

? ? ? ? ? ? Console.WriteLine("---------------------");

? ? ? ? ? ? var sentimentsAndPredictions = sentiments.Zip(predictions, (sentiment, prediction) => (sentiment, prediction));

? ? ? ? ? ? foreach (var item in sentimentsAndPredictions)

? ? ? ? ? ? {

? ? ? ? ? ? ? ? Console.WriteLine($"Sentiment: {item.sentiment.SentimentText} | Prediction: {(item.prediction.Sentiment ? "Positive" : "Negative")}");

? ? ? ? ? ? }

? ? ? ? ? ? Console.WriteLine();

? ? ? ? }

? ? ? ? static void Main(string[] args)

? ? ? ? {

? ? ? ? ? ? var model = Train();

? ? ? ? ? ? Evaluate(model);

? ? ? ? ? ? Predict(model);

? ? ? ? }

? ? }

}

相關(guān)文章：?

.NET Core玩轉(zhuǎn)機(jī)器學(xué)習(xí)
使用ML.NET預(yù)測(cè)紐約出租車費(fèi)

原文地址：?http://www.cnblogs.com/BeanHsiang/p/9020919.html

.NET社區(qū)新聞，深度好文，歡迎訪問公眾號(hào)文章匯總 http://www.csharpkit.com

總結(jié)

以上是生活随笔為你收集整理的使用ML.NET实现情感分析[新手篇]的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：使用ML.NET预测纽约出租车费
下一篇：潘正磊：再过三五年 AI会变成开发人员的

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

asp.net

使用ML.NET实现情感分析[新手篇]

描述問題產(chǎn)生的場(chǎng)景

針對(duì)特定場(chǎng)景收集數(shù)據(jù)

對(duì)數(shù)據(jù)預(yù)處理

確定模型（算法）進(jìn)行訓(xùn)練

對(duì)訓(xùn)練好的模型進(jìn)行驗(yàn)證和調(diào)優(yōu)

使用模型進(jìn)行預(yù)測(cè)分析

總結(jié)