python 多分类情感_python 文本情感分类
對于一個(gè)簡單的文本情感分類來說,其實(shí)就是一個(gè)二分類,這篇博客主要講述的是使用scikit-learn來做文本情感分類。分類主要分為兩步:1)訓(xùn)練,主要根據(jù)訓(xùn)練集來學(xué)習(xí)分類模型的規(guī)則。2)分類,先用已知的測試集評估分類的準(zhǔn)確率等,如果效果還可以,那么該模型對無標(biāo)注的待測樣本進(jìn)行預(yù)測。
首先先介紹下我樣本集,樣本是已經(jīng)分好詞的酒店評論,第一列為標(biāo)簽,第二列為評論,前半部分為積極評論,后半部分為消極評論,格式如下:
下面實(shí)現(xiàn)了SVM,NB,邏輯回歸,決策樹,邏輯森林,KNN 等幾種分類方法,主要代碼如下:
#coding:utf-8
from matplotlib import pyplot
import scipy as sp
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import classification_report
from numpy import *
#========SVM========#
def SvmClass(x_train, y_train):
from sklearn.svm import SVC
#調(diào)分類器
clf = SVC(kernel = 'linear',probability=True)#default with 'rbf'
clf.fit(x_train, y_train)#訓(xùn)練,對于監(jiān)督模型來說是 fit(X, y),對于非監(jiān)督模型是 fit(X)
return clf
#=====NB=========#
def NbClass(x_train, y_train):
from sklearn.naive_bayes import MultinomialNB
clf=MultinomialNB(alpha=0.01).fit(x_train, y_train)
return clf
#========Logistic Regression========#
def LogisticClass(x_train, y_train):
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(penalty='l2')
clf.fit(x_train, y_train)
return clf
#========KNN========#
def KnnClass(x_train,y_train):
from sklearn.neighbors import KNeighborsClassifier
clf=KNeighborsClassifier()
clf.fit(x_train,y_train)
return clf
#========Decision Tree ========#
def DccisionClass(x_train,y_train):
from sklearn import tree
clf=tree.DecisionTreeClassifier()
clf.fit(x_train,y_train)
return clf
#========Random Forest Classifier ========#
def random_forest_class(x_train,y_train):
from sklearn.ensemble import RandomForestClassifier
clf= RandomForestClassifier(n_estimators=8)#參數(shù)n_estimators設(shè)置弱分類器的數(shù)量
clf.fit(x_train,y_train)
return clf
#========準(zhǔn)確率召回率 ========#
def Precision(clf):
doc_class_predicted = clf.predict(x_test)
print(np.mean(doc_class_predicted == y_test))#預(yù)測結(jié)果和真實(shí)標(biāo)簽
#準(zhǔn)確率與召回率
precision, recall, thresholds = precision_recall_curve(y_test, clf.predict(x_test))
answer = clf.predict_proba(x_test)[:,1]
report = answer > 0.5
print(classification_report(y_test, report, target_names = ['neg', 'pos']))
print("--------------------")
from sklearn.metrics import accuracy_score
print('準(zhǔn)確率: %.2f' % accuracy_score(y_test, doc_class_predicted))
if __name__ == '__main__':
data=[]
labels=[]
with open ("train2.txt","r")as file:
for line in file:
line=line[0:1]
labels.append(line)
with open("train2.txt","r")as file:
for line in file:
line=line[1:]
data.append(line)
x=np.array(data)
labels=np.array(labels)
labels=[int (i)for i in labels]
movie_target=labels
#轉(zhuǎn)換成空間向量
count_vec = TfidfVectorizer(binary = False)
#加載數(shù)據(jù)集,切分?jǐn)?shù)據(jù)集80%訓(xùn)練,20%測試
x_train, x_test, y_train, y_test= train_test_split(x, movie_target, test_size = 0.2)
x_train = count_vec.fit_transform(x_train)
x_test = count_vec.transform(x_test)
print('**************支持向量機(jī)************ ')
Precision(SvmClass(x_train, y_train))
print('**************樸素貝葉斯************ ')
Precision(NbClass(x_train, y_train))
print('**************最近鄰KNN************ ')
Precision(KnnClass(x_train,y_train))
print('**************邏輯回歸************ ')
Precision(LogisticClass(x_train, y_train))
print('**************決策樹************ ')
Precision(DccisionClass(x_train,y_train))
print('**************邏輯森林************ ')
Precision(random_forest_class(x_train,y_train))
結(jié)果如下:
對于整體代碼和語料的下載,可以去下載
總結(jié)
以上是生活随笔為你收集整理的python 多分类情感_python 文本情感分类的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: mysql相交_MySQL相交
- 下一篇: php生成vcf,详解PHP如何实现生成