生活随笔
收集整理的這篇文章主要介紹了
机器学习实战-逻辑回归-19
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
機器學習實戰-邏輯回歸-用戶流失預測
import numpy
as np
train_data
= np
.genfromtxt
('Churn-Modelling.csv',delimiter
=',',dtype
=np
.str)
test_data
= np
.genfromtxt
('Churn-Modelling-Test-Data.csv',delimiter
=',',dtype
=np
.str)
x_train
= train_data
[1:,:-1]
y_train
= train_data
[1:,-1].astype
(int)
x_test
= test_data
[1:,:-1]
y_test
= test_data
[1:,-1].astype
(int)
x_train
= np
.delete
(x_train
,[0,1,2],axis
=1)
x_test
= np
.delete
(x_test
,[0,1,2],axis
=1)
x_train
[:5]
y_train
[:5]
from sklearn
.preprocessing
import LabelEncoder
labelencoder1
= LabelEncoder
()
x_train
[:,1] = labelencoder1
.fit_transform
(x_train
[:,1])
x_test
[:,1] = labelencoder1
.transform
(x_test
[:,1])
labelencoder2
= LabelEncoder
()
x_train
[:,2] = labelencoder2
.fit_transform
(x_train
[:,2])
x_test
[:,2] = labelencoder2
.transform
(x_test
[:,2])
x_train
= x_train
.astype
(np
.float32
)
x_test
= x_test
.astype
(np
.float32
)
y_train
= y_train
.astype
(np
.float32
)
y_test
= y_test
.astype
(np
.float32
)
from sklearn
.preprocessing
import StandardScaler
sc
= StandardScaler
()
x_train
= sc
.fit_transform
(x_train
)
x_test
= sc
.transform
(x_test
)
from sklearn
.linear_model
import LinearRegression
from sklearn
.metrics
import classificationLR
= LinearRegression
()
LR
.fit
(x_train
,y_train
)predictions
= LR
.predict
(x_test
)
print(classification_report
(y_test
, predictions
))
機器學習實戰-邏輯回歸-糖尿病預測模型
import numpy
as np
import pandas
as pd
import matplotlib
.pyplot
as plt
import seaborn
as sns
diabetes_data
= pd
.read_csv
('diabetes.csv')
diabetes_data
.head
()
diabetes_data
.info
(verbose
=True)
diabetes_data
.describe
()
diabetes_data
.shape
print(diabetes_data
.Outcome
.value_counts
())
p
=diabetes_data
.Outcome
.value_counts
().plot
(kind
="bar")
plt
.show
()
p
=sns
.pairplot
(diabetes_data
, hue
= 'Outcome')
plt
.show
()
這里畫的圖主要是兩種類型,直方圖和散點圖。單一特征對比的時候用的是直方圖,不同特征對比的時候用的是散點圖,顯示兩個特征的之間的關系。觀察數據分布我們可以發現一些異常值,比如Glucose葡萄糖,BloodPressure血壓,SkinThickness皮膚厚度,Insulin胰島素,BMI身體質量指數這些特征應該是不可能出現0值的。
colume
= ['Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI']
diabetes_data
[colume
] = diabetes_data
[colume
].replace
(0,np
.nan
)
import missingno
as msno
p
=msno
.bar
(diabetes_data
)
plt
.show
()
thresh_count
= diabetes_data
.shape
[0]*0.8
diabetes_data
= diabetes_data
.dropna
(thresh
=thresh_count
, axis
=1)
p
=msno
.bar
(diabetes_data
)
plt
.show
()
from sklearn
.preprocessing
import Imputer
imr
= Imputer
(missing_values
='NaN', strategy
='mean', axis
=0)
colume
= ['Glucose', 'BloodPressure', 'BMI']
diabetes_data
[colume
] = imr
.fit_transform
(diabetes_data
[colume
])
p
=msno
.bar
(diabetes_data
)
plt
.show
()
plt
.figure
(figsize
=(12,10))
p
=sns
.heatmap
(diabetes_data
.corr
(), annot
=True)
plt
.show
()
x
= diabetes_data
.drop
("Outcome",axis
= 1)
y
= diabetes_data
.Outcome
from sklearn
.model_selection
import train_test_split
x_train
,x_test
,y_train
,y_test
= train_test_split
(x
,y
,test_size
=0.3, stratify
=y
)
from sklearn
.linear_model
import LogisticRegression
from sklearn
.metrics
import classification_reportLR
= LogisticRegression
()
LR
.fit
(x_train
,y_train
)predictions
= LR
.predict
(x_test
)
print(classification_report
(y_test
, predictions
))
總結
以上是生活随笔為你收集整理的机器学习实战-逻辑回归-19的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。