當(dāng)前位置：首頁(yè) > 编程语言 > python >内容正文

python

ML之XGBoost：XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(四)

發(fā)布時(shí)間：2025/3/21 python 24 豆豆

生活随笔收集整理的這篇文章主要介紹了 ML之XGBoost：XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(四) 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

ML之XGBoost：XGBoost參數(shù)調(diào)優(yōu)的優(yōu)秀外文翻譯—《XGBoost中的參數(shù)調(diào)優(yōu)完整指南(帶python中的代碼)》(四)

Step 3: Tune gamma步驟3：伽馬微調(diào)

Step 4: Tune subsample and colsample_bytree第4步：調(diào)整subsample和colsample_bytree

Step 5: Tuning Regularization Parameters步驟5：調(diào)整正則化參數(shù)

Step 6: Reducing Learning Rate第6步：降低學(xué)習(xí)率

???????尾注/End Notes

???????

原文題目：《Complete Guide to Parameter Tuning in XGBoost with codes in Python》
原文地址：https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/
所有權(quán)為原文所有，本文只負(fù)責(zé)翻譯。

相關(guān)文章
ML之XGBoost：XGBoost算法模型(相關(guān)配圖)的簡(jiǎn)介(XGBoost并行處理)、關(guān)鍵思路、代碼實(shí)現(xiàn)(目標(biāo)函數(shù)/評(píng)價(jià)函數(shù))、安裝、使用方法、案例應(yīng)用之詳細(xì)攻略
ML之XGBoost：Kaggle神器XGBoost算法模型的簡(jiǎn)介(資源)、安裝、使用方法、案例應(yīng)用之詳細(xì)攻略
ML之XGBoost：XGBoost參數(shù)調(diào)優(yōu)的優(yōu)秀外文翻譯—《XGBoost中的參數(shù)調(diào)優(yōu)完整指南(帶python中的代碼)》(一)
ML之XGBoost：XGBoost參數(shù)調(diào)優(yōu)的優(yōu)秀外文翻譯—《XGBoost中的參數(shù)調(diào)優(yōu)完整指南(帶python中的代碼)》(二)
ML之XGBoost：XGBoost參數(shù)調(diào)優(yōu)的優(yōu)秀外文翻譯—《XGBoost中的參數(shù)調(diào)優(yōu)完整指南(帶python中的代碼)》(三)
ML之XGBoost：XGBoost參數(shù)調(diào)優(yōu)的優(yōu)秀外文翻譯—《XGBoost中的參數(shù)調(diào)優(yōu)完整指南(帶python中的代碼)》(四)

Step 3: Tune gamma
步驟3：伽馬微調(diào)

Now lets tune gamma value using the parameters already tuned above. Gamma?can take various values but I’ll check for 5 values here. You can go into more precise values as.
現(xiàn)在讓我們使用上面已經(jīng)調(diào)整過的參數(shù)來調(diào)整gamma值。gamma可以取不同的值，但我在這里檢查5個(gè)值。您可以使用更精確的值。

param_test3 = {'gamma':[i/10.0 for i in range(0,5)] } gsearch3 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=140, max_depth=4,min_child_weight=6, gamma=0, subsample=0.8, colsample_bytree=0.8,objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27), param_grid = param_test3, scoring='roc_auc',n_jobs=4,iid=False, cv=5) gsearch3.fit(train[predictors],train[target]) gsearch3.grid_scores_, gsearch3.best_params_, gsearch3.best_score_

This shows that our original value of gamma, i.e.?0 is the optimum one. Before proceeding, a good idea would be to re-calibrate the number of boosting rounds for the updated parameters.
這表明我們的伽瑪原值，即0是最佳值。在繼續(xù)之前，一個(gè)好主意是為更新的參數(shù)重新校準(zhǔn)助boosting的數(shù)量。

xgb2 = XGBClassifier(learning_rate =0.1,n_estimators=1000,max_depth=4,min_child_weight=6,gamma=0,subsample=0.8,colsample_bytree=0.8,objective= 'binary:logistic',nthread=4,scale_pos_weight=1,seed=27) modelfit(xgb2, train, predictors)

Here, we can see the improvement in score. So the final parameters are:
在這里，我們可以看到分?jǐn)?shù)的提高。所以最終參數(shù)是

max_depth:?4
min_child_weight: 6
gamma:?0

Step 4: Tune subsample and colsample_bytree
第4步：調(diào)整subsample和colsample_bytree

The next step would be try different subsample and colsample_bytree values. Lets do this in 2 stages as well and take values 0.6,0.7,0.8,0.9 for both to start with.
下一步將嘗試不同的子樣本和列樣本樹值。讓我們分兩個(gè)階段來完成這項(xiàng)工作，從0.6、0.7、0.8、0.9開始。

param_test4 = {'subsample':[i/10.0 for i in range(6,10)],'colsample_bytree':[i/10.0 for i in range(6,10)] } gsearch4 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=4,min_child_weight=6, gamma=0, subsample=0.8, colsample_bytree=0.8,objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27), param_grid = param_test4, scoring='roc_auc',n_jobs=4,iid=False, cv=5) gsearch4.fit(train[predictors],train[target]) gsearch4.grid_scores_, gsearch4.best_params_, gsearch4.best_score_

Here, we found?0.8 as the optimum value for both?subsample and colsample_bytree.?Now we should try values in 0.05 interval around these.
在這里，我們發(fā)現(xiàn)0.8是子樣本和colsample_bytree的最佳值。現(xiàn)在我們應(yīng)該在0.05間隔內(nèi)嘗試這些值。

param_test5 = {'subsample':[i/100.0 for i in range(75,90,5)],'colsample_bytree':[i/100.0 for i in range(75,90,5)] } gsearch5 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=4,min_child_weight=6, gamma=0, subsample=0.8, colsample_bytree=0.8,objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27), param_grid = param_test5, scoring='roc_auc',n_jobs=4,iid=False, cv=5) gsearch5.fit(train[predictors],train[target])

Again we got the same values as before. Thus the optimum values are:
我們又得到了和以前一樣的值。因此，最佳值為：

subsample: 0.8
colsample_bytree: 0.8

Step 5: Tuning Regularization Parameters
???????步驟5：調(diào)整正則化參數(shù)

Next step is to apply regularization to?reduce overfitting. Though many people don’t use this parameters much as gamma provides a substantial way of controlling complexity. But we should always try it. I’ll tune ‘reg_alpha’ value here and leave it upto you to try different values of ‘reg_lambda’.
下一步是應(yīng)用正則化來減少過擬合。雖然許多人不使用這個(gè)參數(shù)，因?yàn)間amma提供了一種控制復(fù)雜性的實(shí)質(zhì)性方法。但我們應(yīng)該經(jīng)常嘗試。我將在這里調(diào)整“reg_alpha”值，并讓您嘗試不同的“reg_lambda”值。

param_test6 = {'reg_alpha':[1e-5, 1e-2, 0.1, 1, 100] } gsearch6 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=4,min_child_weight=6, gamma=0.1, subsample=0.8, colsample_bytree=0.8,objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27), param_grid = param_test6, scoring='roc_auc',n_jobs=4,iid=False, cv=5) gsearch6.fit(train[predictors],train[target]) gsearch6.grid_scores_, gsearch6.best_params_, gsearch6.best_score_

We can see that?the CV score is less than the previous case. But the?values tried are?very widespread, we?should try values closer to the optimum here (0.01) to see if we get something better.
我們可以看到CV的分?jǐn)?shù)低于前一個(gè)案例。但是嘗試的值非常廣泛，我們應(yīng)該嘗試接近最佳值的值（0.01），看看我們是否能得到更好的結(jié)果。

param_test7 = {'reg_alpha':[0, 0.001, 0.005, 0.01, 0.05] } gsearch7 = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=4,min_child_weight=6, gamma=0.1, subsample=0.8, colsample_bytree=0.8,objective= 'binary:logistic', nthread=4, scale_pos_weight=1,seed=27), param_grid = param_test7, scoring='roc_auc',n_jobs=4,iid=False, cv=5) gsearch7.fit(train[predictors],train[target]) gsearch7.grid_scores_, gsearch7.best_params_, gsearch7.best_score_

You can see that we got a better CV. Now we can apply this regularization in the model and look at the impact:
你可以看到我們有更好的CV。現(xiàn)在，我們可以在模型中應(yīng)用此正則化，并查看影響：

xgb3 = XGBClassifier(learning_rate =0.1,n_estimators=1000,max_depth=4,min_child_weight=6,gamma=0,subsample=0.8,colsample_bytree=0.8,reg_alpha=0.005,objective= 'binary:logistic',nthread=4,scale_pos_weight=1,seed=27) modelfit(xgb3, train, predictors)

Again we can see slight improvement in the score.
我們可以再次看到分?jǐn)?shù)略有提高。
?

Step 6: Reducing Learning Rate
第6步：降低學(xué)習(xí)率

Lastly, we should lower the learning rate and add more trees. Lets use the?cv function of XGBoost to do the job again.
最后，我們應(yīng)該降低學(xué)習(xí)率，增加更多的樹。讓我們?cè)俅问褂脁gboost的cv功能來完成這項(xiàng)工作。

xgb4 = XGBClassifier(learning_rate =0.01,n_estimators=5000,max_depth=4,min_child_weight=6,gamma=0,subsample=0.8,colsample_bytree=0.8,reg_alpha=0.005,objective= 'binary:logistic',nthread=4,scale_pos_weight=1,seed=27) modelfit(xgb4, train, predictors)

Now we can see a significant boost in performance and the effect of parameter tuning is clearer.
???????現(xiàn)在我們可以看到性能的顯著提高，參數(shù)調(diào)整的效果也更加明顯。

As we come to the end, I would like to share?2 key thoughts:
最后，我想分享2個(gè)關(guān)鍵思想：

It is?difficult to get a very big leap?in performance by just using?parameter tuning?or?slightly better models. The max score for GBM was 0.8487 while XGBoost gave 0.8494. This is a decent improvement but not something very substantial.
僅僅使用參數(shù)調(diào)整或稍好的型號(hào)，很難在性能上獲得很大的飛躍。GBM最高得分為0.8487，XGBoost最高得分為0.8494。這是一個(gè)不錯(cuò)的改進(jìn)，但不是很實(shí)質(zhì)的改進(jìn)。

A significant jump can be obtained by other methods?like?feature engineering, creating?ensemble?of models,?stacking, etc
通過其他方法，如特征工程、創(chuàng)建模型集成、疊加等，可以獲得顯著的提升。

You can also download the iPython notebook with all these model codes from my?GitHub account. For codes in R, you can refer to?this article.
您也可以從我的Github帳戶下載包含所有這些型號(hào)代碼的ipython筆記本。有關(guān)R中的代碼，請(qǐng)參閱本文。

???????尾注/End Notes

This article was based on developing a XGBoost?model?end-to-end. We started with discussing?why XGBoost has superior performance over GBM?which was followed by detailed discussion on the?various parameters?involved. We also defined a generic function which you can re-use for making models.
本文基于開發(fā)一個(gè)xgboost模型端到端。我們首先討論了xgboost為什么比gbm有更好的性能，然后詳細(xì)討論了所涉及的各種參數(shù)。我們還定義了一個(gè)通用函數(shù)，您可以使用它來創(chuàng)建模型。

Finally, we discussed the?general approach?towards tackling a problem with XGBoost?and also worked out?the?AV Data Hackathon 3.x problem?through that approach.
最后，我們討論了解決xgboost問題的一般方法，并通過該方法解決了av data hackathon 3.x問題。

I hope you found this useful and now you feel more confident to?apply XGBoost?in solving a?data science problem. You can try this out in out upcoming hackathons.
我希望您發(fā)現(xiàn)這一點(diǎn)很有用，現(xiàn)在您對(duì)應(yīng)用XGBoost解決數(shù)據(jù)科學(xué)問題更有信心。你可以在即將到來的黑客攻擊中嘗試一下。

Did you like this article? Would you like to share some other?hacks which you implement while making XGBoost?models? Please feel free to drop a note in the comments below and I’ll be glad to discuss.
???????你喜歡這篇文章嗎？您是否愿意分享一些其他的黑客，在制作XGBoost模型時(shí)您實(shí)現(xiàn)這些黑客？請(qǐng)?jiān)谙旅娴脑u(píng)論中留言，我很樂意與您討論。

You want to apply your analytical skills and test your potential? Then?participate in our Hackathons?and compete with Top?Data Scientists from all over the world.
你想運(yùn)用你的分析能力來測(cè)試你的潛力嗎？然后參與我們的黑客活動(dòng)并與來自世界各地的頂尖數(shù)據(jù)科學(xué)家競(jìng)爭(zhēng)。

總結(jié)

以上是生活随笔為你收集整理的ML之XGBoost：XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(四)的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： ML之XGBoost：XGBoost参数
下一篇： ML之FE：基于LiR/Ridge/La

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

python

ML之XGBoost：XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(四)

Step 3: Tune gamma 步驟3：伽馬微調(diào)

Step 4: Tune subsample and colsample_bytree 第4步：調(diào)整subsample和colsample_bytree

Step 5: Tuning Regularization Parameters ???????步驟5：調(diào)整正則化參數(shù)

Step 6: Reducing Learning Rate 第6步：降低學(xué)習(xí)率

???????尾注/End Notes

總結(jié)

Step 3: Tune gamma
步驟3：伽馬微調(diào)

Step 4: Tune subsample and colsample_bytree
第4步：調(diào)整subsample和colsample_bytree

Step 5: Tuning Regularization Parameters
???????步驟5：調(diào)整正則化參數(shù)

Step 6: Reducing Learning Rate
第6步：降低學(xué)習(xí)率