當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

xgboost使用调参

發(fā)布時(shí)間：2024/4/17 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了 xgboost使用调参小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

歡迎關(guān)注博主主頁，學(xué)習(xí)python視頻資源

https://blog.csdn.net/q383700092/article/details/53763328

調(diào)參后結(jié)果非常理想

from sklearn.model_selection import GridSearchCV from sklearn.datasets import load_breast_cancer from xgboost import XGBClassifier from sklearn.model_selection import train_test_split from sklearn import metricscancer=load_breast_cancer() X, y = cancer.data,cancer.targettrain_x, test_x, train_y, test_y=train_test_split(X,y,test_size=0.3,random_state=0) parameters= [{'learning_rate':[0.01,0.1,0.3],'n_estimators':[1000,1200,1500,2000,2500],'max_depth':range(1,10,1),'gamma':[0.01,0.1,0.3,0.5],'eta': [0.025,0.1,0.2,0.3]}]clf = GridSearchCV(XGBClassifier(min_child_weight=1,subsample=0.6,colsample_bytree=0.6,objective= 'binary:logistic', #邏輯回歸損失函數(shù)scale_pos_weight=1,reg_alpha=0,reg_lambda=1,seed=27), param_grid=parameters,scoring='roc_auc') clf.fit(train_x,train_y)y_pred=clf.predict(test_x)print("accuracy on the training subset:{:.3f}".format(clf.score(train_x,train_y))) print("accuracy on the test subset:{:.3f}".format(clf.score(test_x,test_y)))print(clf.best_params_) y_pre= clf.predict(test_x) y_pro= clf.predict_proba(test_x)[:,1] print ("AUC Score : %f" % metrics.roc_auc_score(test_y, y_pro)) print("Accuracy : %.4g" % metrics.accuracy_score(test_y, y_pre)) ''' accuracy on the training subset:1.000 accuracy on the test subset:0.998 {'gamma': 0.5, 'learning_rate': 0.01, 'max_depth': 2, 'n_estimators': 1000} AUC Score : 0.998089 Accuracy : 0.9708 '''best_xgb=XGBClassifier(min_child_weight=1,subsample=0.6,colsample_bytree=0.6,objective= 'binary:logistic', #邏輯回歸損失函數(shù)scale_pos_weight=1,reg_alpha=0,reg_lambda=1,seed=27,gamma=0.5,learning_rate=0.01,max_depth=2,n_estimators=1000)best_xgb.fit(train_x,train_y)print("accuracy on the training subset:{:.3f}".format(best_xgb.score(train_x,train_y))) print("accuracy on the test subset:{:.3f}".format(best_xgb.score(test_x,test_y)))''' accuracy on the training subset:0.995 accuracy on the test subset:0.979 '''

github：https://github.com/dmlc/xgboost?
論文參考：http://www.kaggle.com/blobs/download/forum-message-attachment-files/4087/xgboost-paper.pdf

基本思路及優(yōu)點(diǎn)

http://blog.csdn.net/q383700092/article/details/60954996?
參考http://dataunion.org/15787.html?
http://blog.csdn.net/china1000/article/details/51106856?
在有監(jiān)督學(xué)習(xí)中，我們通常會構(gòu)造一個(gè)目標(biāo)函數(shù)和一個(gè)預(yù)測函數(shù)，使用訓(xùn)練樣本對目標(biāo)函數(shù)最小化學(xué)習(xí)到相關(guān)的參數(shù)，然后用預(yù)測函數(shù)和訓(xùn)練樣本得到的參數(shù)來對未知的樣本進(jìn)行分類的標(biāo)注或者數(shù)值的預(yù)測。?
1. Boosting Tree構(gòu)造樹來擬合殘差，而Xgboost引入了二階導(dǎo)來進(jìn)行求解，并且引入了節(jié)點(diǎn)的數(shù)目、參數(shù)的L2正則來評估模型的復(fù)雜度,構(gòu)造Xgboost的預(yù)測函數(shù)與目標(biāo)函數(shù)。?
2. 在分裂點(diǎn)選擇的時(shí)候也以目標(biāo)函數(shù)最小化為目標(biāo)。?
優(yōu)點(diǎn)：?
1. 顯示的把樹模型復(fù)雜度作為正則項(xiàng)加到優(yōu)化目標(biāo)中。?
2. 公式推導(dǎo)中用到了二階導(dǎo)數(shù)，用了二階泰勒展開。（GBDT用牛頓法貌似也是二階信息）?
3. 實(shí)現(xiàn)了分裂點(diǎn)尋找近似算法。?
4. 利用了特征的稀疏性。?
5. 數(shù)據(jù)事先排序并且以block形式存儲，有利于并行計(jì)算。?
6. 基于分布式通信框架rabit，可以運(yùn)行在MPI和yarn上。（最新已經(jīng)不基于rabit了）?
7. 實(shí)現(xiàn)做了面向體系結(jié)構(gòu)的優(yōu)化，針對cache和內(nèi)存做了性能優(yōu)化。

原理推導(dǎo)及與GBDT區(qū)別

http://blog.csdn.net/q383700092/article/details/60954996?
參考http://dataunion.org/15787.html?
https://www.zhihu.com/question/41354392

參數(shù)說明

參考http://blog.csdn.net/han_xiaoyang/article/details/52665396?
參數(shù)?
booster：默認(rèn) gbtree效果好 (linear booster很少用到)?
gbtree：基于樹的模型?
gbliner：線性模型?
silent[默認(rèn)0]?
nthread[默認(rèn)值為最大可能的線程數(shù)]?
eta[默認(rèn)0.3] 學(xué)習(xí)率典型值為0.01-0.2?
min_child_weight[默認(rèn) 1 ] 決定最小葉子節(jié)點(diǎn)樣本權(quán)重和值較大，避免過擬合值過高，會導(dǎo)致欠擬合?
max_depth[默認(rèn)6]?
gamma[默認(rèn)0] 指定了節(jié)點(diǎn)分裂所需的最小損失函數(shù)下降值。這個(gè)參數(shù)的值越大，算法越保守?
subsample[默認(rèn)1] 對于每棵樹，隨機(jī)采樣的比例減小，算法保守，避免過擬合。值設(shè)置得過小，它會導(dǎo)致欠擬合典型值：0.5-1?
colsample_bytree[默認(rèn)1] 每棵隨機(jī)采樣的列數(shù)的占比?
colsample_bylevel[默認(rèn)1] 樹的每一級的每一次分裂，對列數(shù)的采樣的占比?
lambda[默認(rèn)1] 權(quán)重的L2正則化項(xiàng)?
alpha[默認(rèn)1] 權(quán)重的L1正則化項(xiàng)?
scale_pos_weight[默認(rèn)1] 在各類別樣本十分不平衡時(shí)，參數(shù)設(shè)定為一個(gè)正值，可以使算法更快收斂?
objective[默認(rèn)reg:linear] 最小化的損失函數(shù)?
binary:logistic 二分類的邏輯回歸，返回預(yù)測的概率(不是類別)。 multi:softmax 使用softmax的多分類器，返回預(yù)測的類別(不是概率)。?
在這種情況下，你還需要多設(shè)一個(gè)參數(shù)：num_class(類別數(shù)目)。 multi:softprob 和multi:softmax參數(shù)一樣，但是返回的是每個(gè)數(shù)據(jù)屬于各個(gè)類別的概率。?
eval_metric[默認(rèn)值取決于objective參數(shù)的取值]?
對于回歸問題，默認(rèn)值是rmse，對于分類問題，默認(rèn)值是error。典型值有：?
rmse 均方根誤差 mae 平均絕對誤差 logloss 負(fù)對數(shù)似然函數(shù)值?
error 二分類錯(cuò)誤率 merror 多分類錯(cuò)誤率 mlogloss 多分類logloss損失函數(shù) auc 曲線下面積?
seed(默認(rèn)0) 隨機(jī)數(shù)的種子設(shè)置它可以復(fù)現(xiàn)隨機(jī)數(shù)據(jù)的結(jié)果

sklearn包，XGBClassifier會改變的函數(shù)名?
eta ->learning_rate?
lambda->reg_lambda?
alpha->reg_alpha

常用調(diào)整參數(shù)：

參考?
https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/

第一步：確定學(xué)習(xí)速率和tree_based 參數(shù)調(diào)優(yōu)的估計(jì)器數(shù)目

樹的最大深度一般3-10?
max_depth = 5?
節(jié)點(diǎn)分裂所需的最小損失函數(shù)下降值0.1到0.2?
gamma = 0?
采樣?
subsample= 0.8,?
colsample_bytree = 0.8?
比較小的值，適用極不平衡的分類問題?
min_child_weight = 1?
類別十分不平衡?
scale_pos_weight = 1

from xgboost import XGBClassifier xgb1 = XGBClassifier(learning_rate =0.1,n_estimators=1000,max_depth=5, min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8, objective= 'binary:logistic', nthread=4, scale_pos_weight=1, seed=27)

第二步： max_depth 和 min_weight 參數(shù)調(diào)優(yōu)

grid_search參考?
http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html?
http://blog.csdn.net/abcjennifer/article/details/23884761?
網(wǎng)格搜索scoring=’roc_auc’只支持二分類，多分類需要修改scoring(默認(rèn)支持多分類)

param_test1 = {'max_depth':range(3,10,2),'min_child_weight':range(1,6,2) } #param_test2 = {'max_depth':[4,5,6],'min_child_weight':[4,5,6] } from sklearn import svm, grid_search, datasets from sklearn import grid_search gsearch1 = grid_search.GridSearchCV( estimator = XGBClassifier( learning_rate =0.1, n_estimators=140, max_depth=5, min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8, objective= 'binary:logistic', nthread=4, scale_pos_weight=1, seed=27), param_grid = param_test1, scoring='roc_auc', n_jobs=4, iid=False, cv=5) gsearch1.fit(train[predictors],train[target]) gsearch1.grid_scores_, gsearch1.best_params_,gsearch1.best_score_ #網(wǎng)格搜索scoring='roc_auc'只支持二分類，多分類需要修改scoring(默認(rèn)支持多分類)

第三步：gamma參數(shù)調(diào)優(yōu)

param_test3 = {'gamma':[i/10.0 for i in range(0,5)] } gsearch3 = GridSearchCV( estimator = XGBClassifier( learning_rate =0.1, n_estimators=140, max_depth=4, min_child_weight=6, gamma=0, subsample=0.8, colsample_bytree=0.8, objective= 'binary:logistic', nthread=4, scale_pos_weight=1, seed=27), param_grid = param_test3, scoring='roc_auc', n_jobs=4, iid=False, cv=5) gsearch3.fit(train[predictors],train[target]) gsearch3.grid_scores_, gsearch3.best_params_, gsearch3.best_score_

第四步：調(diào)整subsample 和 colsample_bytree 參數(shù)

#取0.6,0.7,0.8,0.9作為起始值 param_test4 = {'subsample':[i/10.0 for i in range(6,10)],'colsample_bytree':[i/10.0 for i in range(6,10)] }gsearch4 = GridSearchCV( estimator = XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=3, min_child_weight=4, gamma=0.1, subsample=0.8, colsample_bytree=0.8, objective= 'binary:logistic', nthread=4, scale_pos_weight=1, seed=27), param_grid = param_test4, scoring='roc_auc', n_jobs=4, iid=False, cv=5) gsearch4.fit(train[predictors],train[target]) gsearch4.grid_scores_, gsearch4.best_params_, gsearch4.best_score_

第五步：正則化參數(shù)調(diào)優(yōu)

param_test6 = {'reg_alpha':[1e-5, 1e-2, 0.1, 1, 100] } gsearch6 = GridSearchCV( estimator = XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=4, min_child_weight=6, gamma=0.1, subsample=0.8, colsample_bytree=0.8, objective= 'binary:logistic', nthread=4, scale_pos_weight=1, seed=27), param_grid = param_test6, scoring='roc_auc', n_jobs=4, iid=False, cv=5) gsearch6.fit(train[predictors],train[target]) gsearch6.grid_scores_, gsearch6.best_params_, gsearch6.best_score_

第六步：降低學(xué)習(xí)速率

xgb4 = XGBClassifier(learning_rate =0.01,n_estimators=5000,max_depth=4,min_child_weight=6,gamma=0, subsample=0.8, colsample_bytree=0.8, reg_alpha=0.005, objective= 'binary:logistic', nthread=4, scale_pos_weight=1, seed=27) modelfit(xgb4, train, predictors)

python示例

import xgboost as xgb import pandas as pd #獲取數(shù)據(jù) from sklearn import cross_validation from sklearn.datasets import load_iris iris = load_iris() #切分?jǐn)?shù)據(jù)集 X_train, X_test, y_train, y_test = cross_validation.train_test_split(iris.data, iris.target, test_size=0.33, random_state=42) #設(shè)置參數(shù) m_class = xgb.XGBClassifier( learning_rate =0.1, n_estimators=1000, max_depth=5, gamma=0, subsample=0.8, colsample_bytree=0.8, objective= 'binary:logistic', nthread=4, seed=27) #訓(xùn)練 m_class.fit(X_train, y_train) test_21 = m_class.predict(X_test) print "Accuracy : %.2f" % metrics.accuracy_score(y_test, test_21) #預(yù)測概率 #test_2 = m_class.predict_proba(X_test) #查看AUC評價(jià)標(biāo)準(zhǔn) from sklearn import metrics print "Accuracy : %.2f" % metrics.accuracy_score(y_test, test_21) ##必須二分類才能計(jì)算 ##print "AUC Score (Train): %f" % metrics.roc_auc_score(y_test, test_2) #查看重要程度 feat_imp = pd.Series(m_class.booster().get_fscore()).sort_values(ascending=False) feat_imp.plot(kind='bar', title='Feature Importances') import matplotlib.pyplot as plt plt.show() #回歸 #m_regress = xgb.XGBRegressor(n_estimators=1000,seed=0) #m_regress.fit(X_train, y_train) #test_1 = m_regress.predict(X_test)

整理

xgb原始

from sklearn.model_selection import train_test_split from sklearn import metrics from sklearn.datasets import make_hastie_10_2 import xgboost as xgb #記錄程序運(yùn)行時(shí)間 import time start_time = time.time() X, y = make_hastie_10_2(random_state=0) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)##test_size測試集合所占比例 #xgb矩陣賦值 xgb_train = xgb.DMatrix(X_train, label=y_train) xgb_test = xgb.DMatrix(X_test,label=y_test) ##參數(shù) params={ 'booster':'gbtree', 'silent':1 ,#設(shè)置成1則沒有運(yùn)行信息輸出，最好是設(shè)置為0. #'nthread':7,# cpu 線程數(shù) 默認(rèn)最大 'eta': 0.007, # 如同學(xué)習(xí)率 'min_child_weight':3, # 這個(gè)參數(shù)默認(rèn)是 1，是每個(gè)葉子里面 h 的和至少是多少，對正負(fù)樣本不均衡時(shí)的 0-1 分類而言 #，假設(shè) h 在 0.01 附近，min_child_weight 為 1 意味著葉子節(jié)點(diǎn)中最少需要包含 100 個(gè)樣本。 #這個(gè)參數(shù)非常影響結(jié)果，控制葉子節(jié)點(diǎn)中二階導(dǎo)的和的最小值，該參數(shù)值越小，越容易 overfitting。 'max_depth':6, # 構(gòu)建樹的深度，越大越容易過擬合 'gamma':0.1, # 樹的葉子節(jié)點(diǎn)上作進(jìn)一步分區(qū)所需的最小損失減少,越大越保守，一般0.1、0.2這樣子。 'subsample':0.7, # 隨機(jī)采樣訓(xùn)練樣本 'colsample_bytree':0.7, # 生成樹時(shí)進(jìn)行的列采樣 'lambda':2, # 控制模型復(fù)雜度的權(quán)重值的L2正則化項(xiàng)參數(shù)，參數(shù)越大，模型越不容易過擬合。 #'alpha':0, # L1 正則項(xiàng)參數(shù) #'scale_pos_weight':1, #如果取值大于0的話，在類別樣本不平衡的情況下有助于快速收斂。 #'objective': 'multi:softmax', #多分類的問題 #'num_class':10, # 類別數(shù)，多分類與 multisoftmax 并用 'seed':1000, #隨機(jī)種子 #'eval_metric': 'auc' } plst = list(params.items()) num_rounds = 100 # 迭代次數(shù) watchlist = [(xgb_train, 'train'),(xgb_test, 'val')] #訓(xùn)練模型并保存 # early_stopping_rounds 當(dāng)設(shè)置的迭代次數(shù)較大時(shí)，early_stopping_rounds 可在一定的迭代次數(shù)內(nèi)準(zhǔn)確率沒有提升就停止訓(xùn)練 model = xgb.train(plst, xgb_train, num_rounds, watchlist,early_stopping_rounds=100,pred_margin=1) #model.save_model('./model/xgb.model') # 用于存儲訓(xùn)練出的模型 print "best best_ntree_limit",model.best_ntree_limit y_pred = model.predict(xgb_test,ntree_limit=model.best_ntree_limit) print ('error=%f' % ( sum(1 for i in range(len(y_pred)) if int(y_pred[i]>0.5)!=y_test[i]) /float(len(y_pred)))) #輸出運(yùn)行時(shí)長 cost_time = time.time()-start_time print "xgboost success!",'\n',"cost time:",cost_time,"(s)......"

xgb使用sklearn接口(推薦)

官方?
會改變的函數(shù)名是：?
eta -> learning_rate?
lambda -> reg_lambda?
alpha -> reg_alpha

from sklearn.model_selection import train_test_split from sklearn import metrics from sklearn.datasets import make_hastie_10_2 from xgboost.sklearn import XGBClassifier X, y = make_hastie_10_2(random_state=0) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)##test_size測試集合所占比例 clf = XGBClassifier( silent=0 ,#設(shè)置成1則沒有運(yùn)行信息輸出，最好是設(shè)置為0.是否在運(yùn)行升級時(shí)打印消息。 #nthread=4,# cpu 線程數(shù) 默認(rèn)最大 learning_rate= 0.3, # 如同學(xué)習(xí)率 min_child_weight=1, # 這個(gè)參數(shù)默認(rèn)是 1，是每個(gè)葉子里面 h 的和至少是多少，對正負(fù)樣本不均衡時(shí)的 0-1 分類而言 #，假設(shè) h 在 0.01 附近，min_child_weight 為 1 意味著葉子節(jié)點(diǎn)中最少需要包含 100 個(gè)樣本。 #這個(gè)參數(shù)非常影響結(jié)果，控制葉子節(jié)點(diǎn)中二階導(dǎo)的和的最小值，該參數(shù)值越小，越容易 overfitting。 max_depth=6, # 構(gòu)建樹的深度，越大越容易過擬合 gamma=0, # 樹的葉子節(jié)點(diǎn)上作進(jìn)一步分區(qū)所需的最小損失減少,越大越保守，一般0.1、0.2這樣子。 subsample=1, # 隨機(jī)采樣訓(xùn)練樣本訓(xùn)練實(shí)例的子采樣比 max_delta_step=0,#最大增量步長，我們允許每個(gè)樹的權(quán)重估計(jì)。 colsample_bytree=1, # 生成樹時(shí)進(jìn)行的列采樣 reg_lambda=1, # 控制模型復(fù)雜度的權(quán)重值的L2正則化項(xiàng)參數(shù)，參數(shù)越大，模型越不容易過擬合。 #reg_alpha=0, # L1 正則項(xiàng)參數(shù) #scale_pos_weight=1, #如果取值大于0的話，在類別樣本不平衡的情況下有助于快速收斂。平衡正負(fù)權(quán)重 #objective= 'multi:softmax', #多分類的問題指定學(xué)習(xí)任務(wù)和相應(yīng)的學(xué)習(xí)目標(biāo) #num_class=10, # 類別數(shù)，多分類與 multisoftmax 并用 n_estimators=100, #樹的個(gè)數(shù) seed=1000 #隨機(jī)種子 #eval_metric= 'auc' ) clf.fit(X_train,y_train,eval_metric='auc') #設(shè)置驗(yàn)證集合 verbose=False不打印過程 clf.fit(X_train, y_train,eval_set=[(X_train, y_train), (X_val, y_val)],eval_metric='auc',verbose=False) #獲取驗(yàn)證集合結(jié)果 evals_result = clf.evals_result() y_true, y_pred = y_test, clf.predict(X_test) print"Accuracy : %.4g" % metrics.accuracy_score(y_true, y_pred) #回歸 #m_regress = xgb.XGBRegressor(n_estimators=1000,seed=0)

網(wǎng)格搜索

可以先固定一個(gè)參數(shù) 最優(yōu)化后繼續(xù)調(diào)整?
第一步：確定學(xué)習(xí)速率和tree_based 給個(gè)常見初始值根據(jù)是否類別不平衡調(diào)節(jié)?
max_depth,min_child_weight,gamma,subsample,scale_pos_weight?
max_depth=3 起始值在4-6之間都是不錯(cuò)的選擇。?
min_child_weight比較小的值解決極不平衡的分類問題eg:1?
subsample, colsample_bytree = 0.8: 這個(gè)是最常見的初始值了?
scale_pos_weight = 1: 這個(gè)值是因?yàn)轭悇e十分不平衡。?
第二步： max_depth 和 min_weight 對最終結(jié)果有很大的影響?
‘max_depth’:range(3,10,2),?
‘min_child_weight’:range(1,6,2)?
先大范圍地粗調(diào)參數(shù)，然后再小范圍地微調(diào)。?
第三步：gamma參數(shù)調(diào)優(yōu)?
‘gamma’:[i/10.0 for i in range(0,5)]?
第四步：調(diào)整subsample 和 colsample_bytree 參數(shù)?
‘subsample’:[i/100.0 for i in range(75,90,5)],?
‘colsample_bytree’:[i/100.0 for i in range(75,90,5)]?
第五步：正則化參數(shù)調(diào)優(yōu)?
‘reg_alpha’:[1e-5, 1e-2, 0.1, 1, 100]?
‘reg_lambda’?
第六步：降低學(xué)習(xí)速率?
learning_rate =0.01,

from sklearn.model_selection import GridSearchCV tuned_parameters= [{'n_estimators':[100,200,500], 'max_depth':[3,5,7], ##range(3,10,2) 'learning_rate':[0.5, 1.0], 'subsample':[0.75,0.8,0.85,0.9] }] tuned_parameters= [{'n_estimators':[100,200,500,1000] }] clf = GridSearchCV(XGBClassifier(silent=0,nthread=4,learning_rate= 0.5,min_child_weight=1, max_depth=3,gamma=0,subsample=1,colsample_bytree=1,reg_lambda=1,seed=1000), param_grid=tuned_parameters,scoring='roc_auc',n_jobs=4,iid=False,cv=5) clf.fit(X_train, y_train) ##clf.grid_scores_, clf.best_params_, clf.best_score_ print(clf.best_params_) y_true, y_pred = y_test, clf.predict(X_test) print"Accuracy : %.4g" % metrics.accuracy_score(y_true, y_pred) y_proba=clf.predict_proba(X_test)[:,1] print "AUC Score (Train): %f" % metrics.roc_auc_score(y_true, y_proba)

from sklearn.model_selection import GridSearchCV parameters= [{'learning_rate':[0.01,0.1,0.3],'n_estimators':[1000,1200,1500,2000,2500]}] clf = GridSearchCV(XGBClassifier( max_depth=3, min_child_weight=1, gamma=0.5, subsample=0.6, colsample_bytree=0.6, objective= 'binary:logistic', #邏輯回歸損失函數(shù) scale_pos_weight=1, reg_alpha=0, reg_lambda=1, seed=27 ), param_grid=parameters,scoring='roc_auc') clf.fit(X_train, y_train) print(clf.best_params_) y_pre= clf.predict(X_test) y_pro= clf.predict_proba(X_test)[:,1] print "AUC Score : %f" % metrics.roc_auc_score(y_test, y_pro) print"Accuracy : %.4g" % metrics.accuracy_score(y_test, y_pre)

輸出特征重要性

import pandas as pd import matplotlib.pylab as plt feat_imp = pd.Series(clf.booster().get_fscore()).sort_values(ascending=False) #新版需要轉(zhuǎn)換成dict or list #feat_imp = pd.Series(dict(clf.get_booster().get_fscore())).sort_values(ascending=False) #plt.bar(feat_imp.index, feat_imp) feat_imp.plot(kind='bar', title='Feature Importances') plt.ylabel('Feature Importance Score') plt.show()

python風(fēng)控評分卡建模和風(fēng)控常識

https://study.163.com/course/introduction.htm?courseId=1005214003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

總結(jié)

以上是生活随笔為你收集整理的xgboost使用调参的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

xgboost

上一篇： Android M App休眠 (adb
下一篇： java 集合框架快速预览