【Python-ML】SKlearn库L1正则化特征选择
生活随笔
收集整理的這篇文章主要介紹了
【Python-ML】SKlearn库L1正则化特征选择
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
# -*- coding: utf-8 -*-
'''
Created on 2018年1月17日
@author: Jason.F
@summary: Scikit-Learn庫邏輯斯蒂L1正則化-特征選擇
'''
import pandas as pd
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import matplotlib.pyplot as plt
#導(dǎo)入數(shù)據(jù)
df_wine = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data',header=None)
df_wine.columns=['Class label','Alcohol','Malic acid','Ash','Alcalinity of ash','Magnesium','Total phenols','Flavanoids','Nonflavanoid phenols','Proanthocyanins','Color intensity','Hue','OD280/OD315 of diluted wines','Proline']
print ('class labels:',np.unique(df_wine['Class label']))
#print (df_wine.head(5))
#分割訓(xùn)練集合測試集
X,y=df_wine.iloc[:,1:].values,df_wine.iloc[:,0].values
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=0)
#特征值縮放
#歸一化
mms=MinMaxScaler()
X_train_norm=mms.fit_transform(X_train)
X_test_norm=mms.fit_transform(X_test)
#標(biāo)準(zhǔn)化
stdsc=StandardScaler()
X_train_std=stdsc.fit_transform(X_train)
X_test_std=stdsc.fit_transform(X_test)#L1正則化的邏輯斯蒂模型
lr=LogisticRegression(penalty='l1',C=0.1)#penalty='l2'
lr.fit(X_train_std,y_train)
print ('Training accuracy:',lr.score(X_train_std, y_train))
print ('Test accuracy:',lr.score(X_test_std, y_test))#比較訓(xùn)練集和測試集,觀察是否出現(xiàn)過擬合
print (lr.intercept_)#查看截距,三個(gè)類別
print (lr.coef_)#查看權(quán)重系數(shù),L1有稀疏化效果做特征選擇#正則化效果,減少約束參數(shù)值C,增加懲罰力度,各特征權(quán)重系數(shù)趨近于0
fig=plt.figure()
ax=plt.subplot(111)
colors=['blue','green','red','cyan','magenta','yellow','black','pink','lightgreen','lightblue','gray','indigo','orange']
weights,params=[],[]
for c in np.arange(-4,6,dtype=float):lr=LogisticRegression(penalty='l1',C=10**c,random_state=0)lr.fit(X_train_std,y_train)weights.append(lr.coef_[0])#三個(gè)類別,選擇第一個(gè)類別來觀察params.append(10**c)
weights=np.array(weights)
for column,color in zip(range(weights.shape[1]),colors):plt.plot(params,weights[:,column],label=df_wine.columns[column+1],color=color)
plt.axhline(0,color='black',linestyle='--',linewidth=3)
plt.xlim([10**(-5),10**5])
plt.ylabel('weight coefficient')
plt.xlabel('C')
plt.xscale('log')
plt.legend(loc='upper left')
ax.legend(loc='upper center',bbox_to_anchor=(1.38,1.03),ncol=1,fancybox=True)
plt.show()
《新程序員》:云原生和全面數(shù)字化實(shí)踐50位技術(shù)專家共同創(chuàng)作,文字、視頻、音頻交互閱讀
結(jié)果:
('class labels:', array([1, 2, 3], dtype=int64)) ('Training accuracy:', 0.9838709677419355) ('Test accuracy:', 0.98148148148148151) [-0.38378625 -0.15815556 -0.70033857] [[ 0.28028457 0. 0. -0.02806147 0. 0.0.71013567 0. 0. 0. 0. 0.1.23592372][-0.64368703 -0.06896342 -0.05715611 0. 0. 0. 0.0. 0. -0.92722893 0.05967934 0. -0.37098083][ 0. 0.06129709 0. 0. 0. 0.-0.63710764 0. 0. 0.49858959 -0.35822494 -0.570042510. ]]《新程序員》:云原生和全面數(shù)字化實(shí)踐50位技術(shù)專家共同創(chuàng)作,文字、視頻、音頻交互閱讀
總結(jié)
以上是生活随笔為你收集整理的【Python-ML】SKlearn库L1正则化特征选择的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 【Python-ML】SKlearn库K
- 下一篇: 【Python-ML】SKlearn库特