金融贷款逾期的模型构建3——模型评估
文章目錄
- 一、評價指標
 - 1、基本概念
 - 2、準確率(accuracy)
 - 3、精確率(precision)
 - 4、召回率(recall)
 - 5、F1值
 - 6、roc曲線 和 auc值
 
- 二、模型評估
 - 1、Logistic Regression
 - 2、SVM
 - 3、決策樹
 - 4、隨機森林
 - 5、GBDT模型
 - 6、XGBoost
 - 7、lightGBM
 - 8、繪圖
 
目標:記錄7個模型(邏輯回歸、SVM、決策樹、隨機森林、GBDT、XGBoost和LightGBM)關于accuracy、precision,recall和F1-score、auc值的評分表格,并畫出ROC曲線。
一、評價指標
1、基本概念
對于一個二分類問題,預測與真實結果會出現四種情況。
| 正類 | TP(True Positive) | FN(False Negative) | 
| 負類 | FP(False Positive) | TN(True Negative) | 
我的記憶方法:首先看第一個字母是T則代表預測正確,反之F預測錯誤;然后看P表示預測的結果是正,N表示預測的結果為負。
2、準確率(accuracy)
accuracy表示所有預測正確的占總的比重。
 accuracy=TP+TNTP+TN+FP+FNaccuracy = \dfrac{TP + TN }{TP + TN+FP+FN} accuracy=TP+TN+FP+FNTP+TN?
3、精確率(precision)
precision(查準率):正確預測為正的占全部預測為正的比例,也就是真正正確的占所有預測為正的比例。
 precision=TPTP+FPprecision = \dfrac{TP}{TP+FP} precision=TP+FPTP?
4、召回率(recall)
recall(查全率):正確預測為正占全部真實為正的比例,也就是真正正確的占所有實際為正的比例。
例如:召回率在醫療方面非常重要。
 recall=TPTP+FNrecall = \dfrac{TP}{TP+FN} recall=TP+FNTP?
5、F1值
F1值:精確率和召回率的調和均值,越大越好。
 2F1=1precision+1recall\dfrac{2}{F_1} = \dfrac{1}{precision} + \dfrac{1}{recall} F1?2?=precision1?+recall1?
 ==》 F1=2PRP+R=2TP2TP+FP+FNF_1 = \dfrac{2PR}{P + R} = \dfrac{2TP}{2TP+FP+FN}F1?=P+R2PR?=2TP+FP+FN2TP?
6、roc曲線 和 auc值
roc曲線:接收者操作特征曲線(receiver operating characteristic curve),是反映敏感性和特異性連續變量的綜合指標,ROC曲線上每個點反映著對同一信號刺激的感受性。下圖是ROC曲線例子。
 
橫坐標:1-Specificity,偽正類率(False positive rate,FPR,FPR=FP/(FP+TN)),預測為正但實際為負的樣本占所有負例樣本的比例;
縱坐標:Sensitivity,真正類率(True positive rate,TPR,TPR=TP/(TP+FN)),預測為正且實際為正的樣本占所有正例樣本的比例。
真正的理想情況,TPR應接近1,FPR接近0,即圖中的(0,1)點。ROC曲線越靠攏(0,1)點,越偏離45度對角線越好。
AUC值。AUC (Area Under Curve) 被定義為ROC曲線下的面積。取值范圍 [0.5, 1],AUC值越大的分類器,正確率越高。
二、模型評估
目標:考察 accuracy、precision,recall和f1-score、auc 的取值,并畫出roc曲線圖。
1、Logistic Regression
## Logistic Regression lr = LogisticRegression() lr.fit(x_train_stand, y_train) y_pre_lr = lr.predict(x_test_stand) y_score_lr = lr.predict_proba(x_test_stand)[:,1] lr_accuracy = accuracy_score(y_test, y_pre_lr) print('The accuracy of LR', lr_accuracy) lr_precision = precision_score(y_test, y_pre_lr) print('The precision of LR', lr_precision) lr_recall = recall_score(y_test, y_pre_lr) print('The recall of LR', lr_recall) lr_f1_score = recall_score(y_test, y_pre_lr) print('The F1 score of LR', lr_f1_score) lr_roc_auc_score = roc_auc_score(y_test, y_pre_lr) print('The AUC of LR', lr_roc_auc_score) ## roc 曲線 test_fprs,test_tprs,test_thresholds = roc_curve(y_test, y_score_lr) plt.plot(test_fprs, test_tprs) plt.plot([0,1], [0,1],"--") plt.title("ROC curve") plt.xlabel("FPR") plt.ylabel("TPR") plt.legend(labels=["Test AUC:"+str(round(lr_roc_auc_score,5))], loc="lower right") plt.show()輸出結果
The accuracy of LR 0.7876664330763841 The precision of LR 0.6609195402298851 The recall of LR 0.3203342618384401 The F1 score of LR 0.3203342618384401 The AUC of LR 0.63254540807277812、SVM
## SVM svm = SVC(random_state=2018, probability=True) svm.fit(x_train_stand, y_train) y_pre_svm = svm.predict(x_test_stand) y_score_svm = svm.predict_proba(x_test_stand)[:,1] svm_accuracy = accuracy_score(y_test, y_pre_svm) print('The accuracy of SVM', svm_accuracy) svm_precision = precision_score(y_test, y_pre_svm) print('The precision of SVM', svm_precision) svm_recall = recall_score(y_test, y_pre_svm) print('The recall of SVM', svm_recall) svm_f1_score = recall_score(y_test, y_pre_svm) print('The F1 score of SVM', svm_f1_score) svm_roc_auc_score = roc_auc_score(y_test, y_pre_svm) print('The AUC of SVM', svm_roc_auc_score) ## roc 曲線 test_fprs,test_tprs,test_thresholds = roc_curve(y_test, y_score_svm) plt.plot(test_fprs, test_tprs) plt.plot([0,1], [0,1],"--") plt.title("ROC curve") plt.xlabel("FPR") plt.ylabel("TPR") plt.legend(labels=["Test AUC:"+str(round(svm_roc_auc_score,5))], loc="lower right") plt.show()輸出結果
The accuracy of SVM 0.7806587245970568 The precision of SVM 0.7017543859649122 The recall of SVM 0.22284122562674094 The F1 score of SVM 0.22284122562674094 The AUC of SVM 0.59550300981711583、決策樹
## DecisionTreeClassifier dt = DecisionTreeClassifier(random_state=2018) dt.fit(x_train_stand, y_train) y_pre_dt = svm.predict(x_test_stand) dt_accuracy = accuracy_score(y_test, y_pre_dt) print('The accuracy of DecisionTree', dt_accuracy) dt_precision = precision_score(y_test, y_pre_dt) print('The precision of DecisionTree', dt_precision) dt_recall = recall_score(y_test, y_pre_dt) print('The recall of DecisionTree', dt_recall) dt_f1_score = recall_score(y_test, y_pre_dt) print('The F1 score of DecisionTree', dt_f1_score) dt_roc_auc_score = roc_auc_score(y_test, y_pre_dt) print('The AUC of DecisionTree', dt_roc_auc_score)輸出結果
The accuracy of DecisionTree 0.7806587245970568 The precision of DecisionTree 0.7017543859649122 The recall of DecisionTree 0.22284122562674094 The F1 score of DecisionTree 0.22284122562674094 The AUC of DecisionTree 0.59550300981711584、隨機森林
## 隨機森林模型 rfc = RandomForestClassifier() rfc.fit(x_train_stand, y_train) y_pre_rf = rfc.predict(x_test_stand) rf_accuracy = accuracy_score(y_test, y_pre_rf) print('The accuracy of Random Forest', rf_accuracy) rf_precision = precision_score(y_test, y_pre_rf) print('The precision of Random Forest', rf_precision) rf_recall = recall_score(y_test, y_pre_rf) print('The recall of Random Forest', rf_recall) rf_f1_score = recall_score(y_test, y_pre_rf) print('The F1 score of Random Forest', rf_f1_score) rf_roc_auc_score = roc_auc_score(y_test, y_pre_rf) print('The AUC of Random Forest', rf_roc_auc_score)輸出結果
The accuracy of Random Forest 0.7638402242466713 The precision of Random Forest 0.5846153846153846 The recall of Random Forest 0.2116991643454039 The F1 score of Random Forest 0.2116991643454039 The AUC of Random Forest 0.58056868329629745、GBDT模型
## GBDT模型 gbdt = GradientBoostingClassifier() gbdt.fit(x_train_stand, y_train) y_pre_gbdt = gbdt.predict(x_test_stand) gbdt_accuracy = accuracy_score(y_test, y_pre_gbdt) print('The accuracy of GBDT', gbdt_accuracy) gbdt_precision = precision_score(y_test, y_pre_gbdt) print('The precision of GBDT', gbdt_precision) gbdt_recall = recall_score(y_test, y_pre_gbdt) print('The recall of GBDT', gbdt_recall) gbdt_f1_score = recall_score(y_test, y_pre_gbdt) print('The F1 score of GBDT', gbdt_f1_score) gbdt_roc_auc_score = roc_auc_score(y_test, y_pre_gbdt) print('The AUC of GBDT', gbdt_roc_auc_score)輸出結果
The accuracy of GBDT 0.7792571829011913 The precision of GBDT 0.6057692307692307 The recall of GBDT 0.35097493036211697 The F1 score of GBDT 0.35097493036211697 The AUC of GBDT 0.63709795207244426、XGBoost
## XGBoost模型 xgb = xgb.XGBClassifier() xgb.fit(x_train_stand, y_train) y_pre_xgb = xgb.predict(x_test_stand) xgb_accuracy = accuracy_score(y_test, y_pre_xgb) print('The accuracy of XGBoost', xgb_accuracy) xgb_precision = precision_score(y_test, y_pre_xgb) print('The precision of XGBoost', xgb_precision) xgb_recall = recall_score(y_test, y_pre_xgb) print('The recall of XGBoost', xgb_recall) xgb_f1_score = recall_score(y_test, y_pre_xgb) print('The F1 score of XGBoost', xgb_f1_score) xgb_roc_auc_score = roc_auc_score(y_test, y_pre_xgb) print('The AUC of XGBoost', xgb_roc_auc_score)輸出結果
The accuracy of XGBoost 0.7841625788367204 The precision of XGBoost 0.624390243902439 The recall of XGBoost 0.3565459610027855 The F1 score of XGBoost 0.3565459610027855 The AUC of XGBoost 0.6422242913628167、lightGBM
## lightGBM gbm = lgb.LGBMClassifier() gbm.fit(x_train_stand, y_train) y_pre_gbm = gbm.predict(x_test_stand) gbm_accuracy = accuracy_score(y_test, y_pre_gbm) print('The accuracy of lightGBM', gbm_accuracy) gbm_precision = precision_score(y_test, y_pre_gbm) print('The precision of lightGBM', gbm_precision) gbm_recall = recall_score(y_test, y_pre_gbm) print('The recall of lightGBM', gbm_recall) gbm_f1_score = recall_score(y_test, y_pre_gbm) print('The F1 score of lightGBM', gbm_f1_score) gbm_roc_auc_score = roc_auc_score(y_test, y_pre_gbm) print('The AUC of lightGBM', gbm_roc_auc_score)輸出結果
The accuracy of lightGBM 0.7701471618780659 The precision of lightGBM 0.5688888888888889 The recall of lightGBM 0.3565459610027855 The F1 score of lightGBM 0.3565459610027855 The AUC of lightGBM 0.63286099548266628、繪圖
y_score_lr = lr.predict_proba(x_test_stand)[:,1] y_score_svm = svm.predict_proba(x_test_stand)[:,1] y_score_rf = rfc.predict_proba(x_test_stand)[:,1] y_score_dt = dt.predict_proba(x_test_stand)[:,1] y_score_gbdt = gbdt.predict_proba(x_test_stand)[:,1] y_score_xgb = xgb.predict_proba(x_test_stand)[:,1] y_score_gbm = gbm.predict_proba(x_test_stand)[:,1] fpr_lr,tpr_lr,thresholds_lr = roc_curve(y_test,y_score_lr,pos_label=1) fpr_svm,tpr_svm,thresholds_svm = roc_curve(y_test,y_score_svm,pos_label=1) fpr_rf,tpr_rf,thresholds_rf = roc_curve(y_test,y_score_rf,pos_label=1) fpr_dt,tpr_dt,thresholds_dt = roc_curve(y_test,y_score_dt,pos_label=1) fpr_gbdt,tpr_gbdt,thresholds_gbdt = roc_curve(y_test,y_score_gbdt,pos_label=1) fpr_xgb,tpr_xgb,thresholds_xgb = roc_curve(y_test,y_score_xgb,pos_label=1) fpr_gbm,tpr_gbm,thresholds_gbm = roc_curve(y_test,y_score_gbm,pos_label=1) ## roc 曲線 plt.figure(figsize=[6,6]) plt.plot(fpr_lr,tpr_lr, color='black') plt.plot(fpr_svm,tpr_svm, color='red') plt.plot(fpr_rf,tpr_rf, color='green') plt.plot(fpr_dt,tpr_dt, color='blue') plt.plot(fpr_gbdt,tpr_gbdt, color='yellow') plt.plot(fpr_xgb,tpr_xgb, color='brown') plt.plot(fpr_gbm,tpr_gbm, color='purple') plt.title("ROC curve") plt.xlabel("FPR") plt.ylabel("TPR") label = [ "LR Test - AUC:"+ str(round(lr_roc_auc_score,5)),"SVM Test - AUC:"+ str(round(svm_roc_auc_score,5)),"RF Test - AUC:"+ str(round(rf_roc_auc_score,5)),"DT Test - AUC:"+ str(round(dt_roc_auc_score,5)),"GBDT Test - AUC:"+ str(round(gbdt_roc_auc_score,5)),"XGBoost Test - AUC:"+ str(round(xgb_roc_auc_score,5)),"GBM Test - AUC:"+ str(round(gbm_roc_auc_score,5))] plt.legend(labels=label, loc="lower right") plt.show()輸出結果
 
總結
以上是生活随笔為你收集整理的金融贷款逾期的模型构建3——模型评估的全部內容,希望文章能夠幫你解決所遇到的問題。
                            
                        - 上一篇: 【Python】编程笔记8
 - 下一篇: 金融贷款逾期的模型构建4——模型调优