python信用卡违约预测分析_Python数据分析及可视化实例之银行信用卡违约预测(24)...
1.項目背景:
銀行體系對于信用可違約進行預測,原始數據集如下:
2.分析步驟:
(1)數據清洗(Data Cleaning)
(2) 探索性可視化(Exploratory Visualization)
(3) 特征工程(Feature Engineering)
(4)基本建模&評估(Basic Modeling& Evaluation)
3.源碼:
數據集下載:易一網絡科技 - 付費文章?www.intumu.com
加載數據
import pandas as pd
df=pd.read_excel('LRGWFB.xls')
df.head()
年齡 教育 工齡 地址 收入 負債率 信用卡負債 其他負債 違約 0 41 3 17 12 176 9.3 11.359392 5.008608 1 1 27 1 10 6 31 17.3 1.362202 4.000798 0 2 40 1 15 14 55 5.5 0.856075 2.168925 0 3 41 1 15 14 120 2.9 2.658720 0.821280 0 4 24 2 2 0 28 17.3 1.787436 3.056564 1
是否有空值
df.isnull().any()
年齡 False
教育 False
工齡 False
地址 False
收入 False
負債率 False
信用卡負債 False
其他負債 False
違約 False
dtype: bool
目標集分類
df['違約'].unique()
array([1, 0], dtype=int64)
訓練集、目標集分割
X, y = df.iloc[:,1:-1],df.iloc[:,-1]
特征相關性
classes = X.columns.tolist()
classes
['教育', '工齡', '地址', '收入', '負債率', '信用卡負債', '其他負債']
from yellowbrick.features import Rank2D
visualizer = Rank2D(algorithm='pearson',size=(800, 600),title="7特征向量的皮爾森相關系數")
visualizer.fit(X, y)
visualizer.transform(X)
visualizer.poof()
E:\Anaconda3\lib\site-packages\yellowbrick\features\rankd.py:262: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
X = X.as_matrix()
特征重要性
from sklearn.ensemble import RandomForestClassifier
from yellowbrick.features.importances import FeatureImportances
model = RandomForestClassifier(n_estimators=10)
viz = FeatureImportances(model,size=(800, 600),title="隨機森林算法分類訓練特征重要性",xlabel='重要性評分')
viz.fit(X, y)
viz.poof()
分類報告
訓練集、測試集分割
from sklearn.model_selection import train_test_split as tts
X_train, X_test, y_train, y_test = tts(X, y, test_size =0.2, random_state=10)
分類結果報告
from sklearn.ensemble import RandomForestClassifier
from yellowbrick.classifier import ClassificationReport
model = RandomForestClassifier(n_estimators=10)
visualizer = ClassificationReport(model, support=True,size=(800, 600),title="機森林算法分類報告")
visualizer.fit(X_train.values, y_train)
print('得分:',visualizer.score(X_test.values, y_test))
visualizer.poof()
得分: 0.7714285714285715
持久化保存
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=10)
model.fit(X_train.values, y_train)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
from sklearn.externals import joblib
joblib.dump(model,'model.pickle') #保存
['model.pickle']
載入訓練模型
model = joblib.load('model.pickle') #載入
model.predict(X_test) # 輸出每組數據的預測結果的標簽值
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 1, 1, 0, 0, 0, 0], dtype=int64)
model.predict_proba(X_test) # 輸出的是二維矩陣 ,第i行j列表示測試數據第i行測試數據在每個label上的概率
array([[1. , 0. ],
[0.9, 0.1],
[0.8, 0.2],
[1. , 0. ],
[0.9, 0.1],
[1. , 0. ],
[0.5, 0.5],
[0.8, 0.2],
[0.9, 0.1],
[1. , 0. ],
[0.4, 0.6],
[1. , 0. ],
[0.6, 0.4],
[0.3, 0.7],
[1. , 0. ],
[0.6, 0.4],
[0.9, 0.1],
[0.7, 0.3],
[1. , 0. ],
[0.9, 0.1],
[0.4, 0.6],
[0.4, 0.6],
[0.5, 0.5],
[1. , 0. ],
[0.8, 0.2],
[1. , 0. ],
[0.9, 0.1],
[0.5, 0.5],
[0.1, 0.9],
[0.9, 0.1],
[0.8, 0.2],
[0.6, 0.4],
[0.8, 0.2],
[0.9, 0.1],
[0.7, 0.3],
[1. , 0. ],
[0.2, 0.8],
[0.9, 0.1],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[0.9, 0.1],
[0.4, 0.6],
[0.7, 0.3],
[0.4, 0.6],
[0.9, 0.1],
[0.5, 0.5],
[0.1, 0.9],
[1. , 0. ],
[1. , 0. ],
[0.8, 0.2],
[0.7, 0.3],
[1. , 0. ],
[0.5, 0.5],
[0.8, 0.2],
[0.7, 0.3],
[0.9, 0.1],
[0.8, 0.2],
[0.3, 0.7],
[0.9, 0.1],
[1. , 0. ],
[0.9, 0.1],
[0.9, 0.1],
[0.9, 0.1],
[0.8, 0.2],
[0.9, 0.1],
[1. , 0. ],
[0.9, 0.1],
[0.4, 0.6],
[0.5, 0.5],
[0.9, 0.1],
[0.8, 0.2],
[0.6, 0.4],
[0.8, 0.2],
[1. , 0. ],
[1. , 0. ],
[0.8, 0.2],
[1. , 0. ],
[0.9, 0.1],
[0.6, 0.4],
[1. , 0. ],
[1. , 0. ],
[0.7, 0.3],
[1. , 0. ],
[0.8, 0.2],
[1. , 0. ],
[0.3, 0.7],
[0.9, 0.1],
[0.7, 0.3],
[0.5, 0.5],
[0.4, 0.6],
[1. , 0. ],
[0.9, 0.1],
[0.8, 0.2],
[0.8, 0.2],
[0.9, 0.1],
[0.8, 0.2],
[0.2, 0.8],
[0.7, 0.3],
[0.7, 0.3],
[0.4, 0.6],
[0.6, 0.4],
[0.7, 0.3],
[0.8, 0.2],
[1. , 0. ],
[0.5, 0.5],
[0.8, 0.2],
[1. , 0. ],
[0.9, 0.1],
[0.5, 0.5],
[0.8, 0.2],
[0.6, 0.4],
[0.8, 0.2],
[0.9, 0.1],
[0.9, 0.1],
[0.6, 0.4],
[0.8, 0.2],
[0.9, 0.1],
[0.1, 0.9],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[0.9, 0.1],
[0.6, 0.4],
[1. , 0. ],
[0.8, 0.2],
[0.8, 0.2],
[0.7, 0.3],
[0.9, 0.1],
[0.9, 0.1],
[0.5, 0.5],
[1. , 0. ],
[0.2, 0.8],
[0.9, 0.1],
[0.4, 0.6],
[0.2, 0.8],
[0.8, 0.2],
[1. , 0. ],
[0.8, 0.2],
[0.8, 0.2]])
新手可查閱歷史目錄:yeayee:Python數據分析及可視化實例目錄?zhuanlan.zhihu.com
最后,別只收藏不關注哈
總結
以上是生活随笔為你收集整理的python信用卡违约预测分析_Python数据分析及可视化实例之银行信用卡违约预测(24)...的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: mysql分页查询所有数据库_MySQL
- 下一篇: matlab cell转数组_MATLA