生活随笔
收集整理的這篇文章主要介紹了
回归及相关模型
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
線性回歸模型
一元線性回歸模型使用單一特征來預測響應值,擬合的最佳曲線通過最小化預測值和真實值之間的誤差得到。
多元回歸模型利用多個自變量估計因變量,從而解釋和預測因變量的值
優點:模型簡單。部署方便,回歸權重可以用于結果分析,訓練快
缺點:精度低,特征存在一定的共線性問題
使用技巧:需要進行歸一化處理如特征選擇,避免高度相關的特征同時存在
from sklearn
.metrics
import mean_squared_error
from sklearn
.linear_model
import LinearRegression
clf
= LinearRegression
()
clf
.fit
(train_data
, train_target
)
test_pred
= clf
.predict
(test_data
)
score
= mean_squared_error
(test_pred
, test_target
)
score
K近鄰回歸模型
K近鄰回歸模型通過找出某個樣本的k個最近鄰居,將這些鄰居的某個屬性的平均值賦給該樣本,就可以得到該樣本對應屬性的值。
優點:模型簡單,易于理解,對于數據量小的情況方便快捷,可視化方便
缺點:計算量大,不適合數據量大的情況,需要調參數
使用技巧:特征需要歸一化,重要特征可以適當加一定比例的權重
from sklearn
.neighbors
import KNeighborsRegressor
clf
= KNeighborsRegressor
(n_neighbors
=3)
clf
.fit
(train_data
, train_target
)
test_pred
= clf
.predict
(test_data
)
score
= mean_squared_error
(test_pred
, test_target
)
score
決策樹回歸模型
決策樹回歸可以理解為根據一定規則,將一個空間劃分為若干個子空間,然后利用子空間內所有點的信息表示這個子空間,對于測試數據,只要按照特征將其歸到某個子空間,便可得到對應子空間的輸出值。
from sklearn
.tree
import DecisionTreeRegressor
clf
= DecisionTreeRegressor
()
clf
.fit
(train_data
, train_target
)
test_pred
= clf
.predict
(test_data
)
score
= mean_squared_error
(test_pred
, test_target
)
score
隨機森林回歸模型
隨機森林是通過集成學習的思想將多棵樹集成的一種算法,基本單元是決策樹,在回歸問題中,隨機森林輸出所有決策樹輸出的平均值,隨機森林回歸模型的主要優點是在所有算法中,具有極好的準確率能夠運行在大數據集上,能夠處理高維特征的輸入樣本,而且不需要降維,能夠評估各個特征在分類問題上的重要性,在生成過程中,能夠獲取到內部生成誤差的一種無偏估計,對于缺省值也能獲得很好的結果。
優點:使用方便,特征無需做過多變換,精度較高,模型并行效率塊
缺點:結果不容易解釋
使用技巧:參數調節,提高精度
from sklearn
.ensemble
import RandomForestRegressor
clf
= RandomForestRegressor
(n_estimators
=100)
clf
.fit
(train_data
, train_target
)
test_pred
= clf
.predict
(test_data
)
score
= mean_squared_error
(test_target
, test_pred
)
score
LightGBM回歸模型
LightGBM支持高效率的并行訓練,具有更快的訓練速度、更低的內存消耗、更好的準確率、分布式支持、可以快速處理海量數據
優點:精度高
缺點:訓練時間長,模型復雜
使用技巧:有效的驗證集防止過擬合,參數搜索
import lightgbm
as lgb
clf
= lgb
.LGBMRegressor
(learning_rate
= 0.01,max_depth
= -1,n_estimators
= 5000,boosting_type
='gbdt',random_state
=2019,objective
='regression'
)test_pred
= clf
.fit
(train_data
, train_target
, eval_metric
='MSE', verbose
=50)
score
= mean_squared_error
(test_target
, test_pred
)
score
基于核的嶺回歸
folds
= KFold
(n_splits
=5, shuffle
=True, random_state
=13)
oof_kr_383
= np
.zeros
(train_shape
)
predictions_kr_383
= np
.zeros
(len(X_test_383
))for fold_
, (trn_idx
, val_idx
) in enumerate(folds
.split
(X_train_383
, y_train
)):print("fold n°{}".format(fold_
+1))tr_x
= X_train_383
[trn_idx
]tr_y
= y_train
[trn_idx
]kr_383
= kr
()kr_383
.fit
(tr_x
,tr_y
)oof_kr_383
[val_idx
] = kr_383
.predict
(X_train_383
[val_idx
])predictions_kr_383
+= kr_383
.predict
(X_test_383
) / folds
.n_splits
print("CV score: {:<8.8f}".format(mean_squared_error
(oof_kr_383
, target
)))
普通嶺回歸
folds
= KFold
(n_splits
=5, shuffle
=True, random_state
=13)
oof_ridge_383
= np
.zeros
(train_shape
)
predictions_ridge_383
= np
.zeros
(len(X_test_383
))for fold_
, (trn_idx
, val_idx
) in enumerate(folds
.split
(X_train_383
, y_train
)):print("fold n°{}".format(fold_
+1))tr_x
= X_train_383
[trn_idx
]tr_y
= y_train
[trn_idx
]ridge_383
= Ridge
(alpha
=1200)ridge_383
.fit
(tr_x
,tr_y
)oof_ridge_383
[val_idx
] = ridge_383
.predict
(X_train_383
[val_idx
])predictions_ridge_383
+= ridge_383
.predict
(X_test_383
) / folds
.n_splits
print("CV score: {:<8.8f}".format(mean_squared_error
(oof_ridge_383
, target
)))
使用ElasticNet 彈性網絡
folds
= KFold
(n_splits
=5, shuffle
=True, random_state
=13)
oof_en_383
= np
.zeros
(train_shape
)
predictions_en_383
= np
.zeros
(len(X_test_383
))for fold_
, (trn_idx
, val_idx
) in enumerate(folds
.split
(X_train_383
, y_train
)):print("fold n°{}".format(fold_
+1))tr_x
= X_train_383
[trn_idx
]tr_y
= y_train
[trn_idx
]en_383
= en
(alpha
=1.0,l1_ratio
=0.06)en_383
.fit
(tr_x
,tr_y
)oof_en_383
[val_idx
] = en_383
.predict
(X_train_383
[val_idx
])predictions_en_383
+= en_383
.predict
(X_test_383
) / folds
.n_splits
print("CV score: {:<8.8f}".format(mean_squared_error
(oof_en_383
, target
)))
使用BayesianRidge 貝葉斯嶺回歸
folds
= KFold
(n_splits
=5, shuffle
=True, random_state
=13)
oof_br_383
= np
.zeros
(train_shape
)
predictions_br_383
= np
.zeros
(len(X_test_383
))for fold_
, (trn_idx
, val_idx
) in enumerate(folds
.split
(X_train_383
, y_train
)):print("fold n°{}".format(fold_
+1))tr_x
= X_train_383
[trn_idx
]tr_y
= y_train
[trn_idx
]br_383
= br
()br_383
.fit
(tr_x
,tr_y
)oof_br_383
[val_idx
] = br_383
.predict
(X_train_383
[val_idx
])predictions_br_383
+= br_383
.predict
(X_test_383
) / folds
.n_splits
print("CV score: {:<8.8f}".format(mean_squared_error
(oof_br_383
, target
)))
LASSO回歸
梯度提升樹回歸
總結
以上是生活随笔為你收集整理的回归及相关模型的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。