當前位置：首頁 > 编程语言 > python >内容正文

python

ML之XGBoost：XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(二)

發布時間：2025/3/21 python 17 豆豆

生活随笔收集整理的這篇文章主要介紹了 ML之XGBoost：XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(二) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

ML之XGBoost：XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(二)

2. xgboost參數/XGBoost?Parameters

一般參數/General Parameters

Booster參數/Booster Parameters

學習任務參數/Learning Task Parameters

???????

原文題目：《Complete Guide to Parameter Tuning in XGBoost with codes in Python》
原文地址：https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/
所有權為原文所有，本文只負責翻譯。

相關文章
ML之XGBoost：XGBoost算法模型(相關配圖)的簡介(XGBoost并行處理)、關鍵思路、代碼實現(目標函數/評價函數)、安裝、使用方法、案例應用之詳細攻略
ML之XGBoost：Kaggle神器XGBoost算法模型的簡介(資源)、安裝、使用方法、案例應用之詳細攻略
ML之XGBoost：XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(一)
ML之XGBoost：XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(二)
ML之XGBoost：XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(三)
ML之XGBoost：XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(四)

2. xgboost參數/XGBoost?Parameters

The overall parameters have been?divided into 3 categories by XGBoost authors:
XGBoost作者將總體參數分為3類：

General?Parameters:?Guide the overall functioning
一般參數：指導整體功能

Booster Parameters:?Guide the individual booster (tree/regression) at each step
Booster參數：在每個步驟中引導單個助推器（樹/回歸）

Learning Task?Parameters:?Guide the optimization performed
學習任務參數：指導優化執行

I will give analogies to GBM here and highly recommend to read?this article?to learn from the very basics.
我將在此對GBM進行類比，并強烈建議閱讀本文以從非常基礎的內容中學習。
?

一般參數/General Parameters

These define the overall functionality of XGBoost.
這些定義了XGBoost的整體功能。

booster [default=gbtree] ? ??助推器[默認值=gbtree]

Select the type of model to run at each iteration. It has 2 options:
選擇要在每次迭代中運行的模型類型。它有兩種選擇：
- gbtree: tree-based models ? ? ?gbtree:基于樹的模型
- gblinear: linear models ? ? ? ? ? ?GBLinear:線性模型

silent [default=0]: ? ??silent [默認值=0]：

Silent mode is activated is set to 1, i.e. no running messages will be printed.
?silent模式激活設置為1，即不會打印正在運行的消息。
It’s generally good to keep it 0 as the messages?might help in understanding the model.
一般來說，最好保持0，因為消息可能有助于理解模型。

nthread [default to maximum number of threads available if not set]
nthread[默認為最大可用線程數（如果未設置）]

This is used for parallel processing and number of cores in the system should be entered
這用于并行處理，應輸入系統中的內核數。
If you wish to run on all cores, value?should not be entered and algorithm will detect automatically
如果您希望在所有核心上運行，則不應輸入值，算法將自動檢測。

There are 2 more parameters which are set automatically by XGBoost and you need not worry about them. Lets move on to Booster parameters.
還有兩個參數是由xgboost自動設置的，您不必擔心它們。讓我們繼續討論助推器參數。

Booster參數/Booster Parameters

Though?there are 2 types of boosters, I’ll consider only?tree booster?here because it always outperforms the linear booster and thus the later is rarely used.
雖然有兩種助推器，但這里我只考慮樹助推器，因為它總是優于線性助推器，因此很少使用后者。

eta [default=0.3] ? ? ? ? ? ? ? ? ? ? ? ? ? ??eta[默認值=0.3]

Analogous to learning rate in GBM ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?類似于GBM中的學習率
Makes the model more robust by shrinking the weights on each step ? ?通過收縮每一步的權重，使模型更加健壯
Typical final values to be used: 0.01-0.2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?使用的典型最終值：0.01-0.2

min_child_weight [default=1] ? ? ??最小子權重[默認值=1]

Defines the minimum?sum of weights of all observations required in a child.
定義子級中所需的所有觀察值的最小權重之和。
This is similar to?min_child_leaf?in GBM but not exactly. This refers to min “sum of weights” of observations while GBM has min “number of observations”.
這與GBM中的Min_Child_Leaf類似，但不完全相同。這是指觀測值的最小“權重和”，而GBM的最小“觀測數”。
Used to control over-fitting. Higher values prevent a model from learning relations which might be highly?specific to the?particular sample selected for a tree.
用于控制裝配。較高的值會阻止模型學習關系，這可能與為樹選擇的特定樣本高度相關。
Too high values can lead to under-fitting hence, it should be tuned using CV.
過高的數值會導致擬合不足，因此應使用cv對其進行調整。

max_depth [default=6] ? ??最大深度[默認值=6]

The maximum depth of a tree, same as GBM. ? ? ? ??樹的最大深度，與gbm相同。
Used to control over-fitting as higher depth will allow model to learn relations very specific to a particular sample.
用于控制擬合，因為更高的深度將允許模型學習特定于特定樣本的關系。
Should be tuned using CV.
應該使用cv進行調整。
Typical values: 3-10
典型值：3-10

max_leaf_nodes ? ? ?最大葉節點數

The maximum number of terminal nodes or leaves in a tree.
樹中終端節點或葉的最大數目。
Can be defined in place of?max_depth. Since binary trees are created, a depth of ‘n’ would produce a maximum of 2^n leaves.
可在最大深度處定義。由于創建了二叉樹，“n”的深度最多可生成2^n個葉。
If this is defined, GBM will ignore max_depth.
如果定義了這一點，GBM將忽略最大深度。

gamma [default=0] ? ? ?gamma[默認值=0]

A node is split only when the resulting split gives a positive reduction in the loss function. Gamma specifies the minimum loss reduction required to make a split.
只有當所產生的拆分使損失函數正減少時，才會拆分節點。gamma指定進行分割所需的最小損失減少。
Makes the algorithm conservative. The values can vary depending on the loss function and should be tuned.
使算法保守。這些值可能因損失函數而變化，應進行調整。

max_delta_step [default=0] ? ? ? ? ? ? ? ? ?最大增量步進[默認值=0]

In maximum delta step we allow each tree’s weight estimation to be. If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative.
在最大增量步驟中，我們允許每棵樹的權重估計為。如果該值設置為0，則表示沒有約束。如果將其設置為正值，將有助于使更新步驟更加保守。
Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced.
通常不需要這個參數，但當類極不平衡時，它可能有助于邏輯回歸。
This is generally not used but you can explore further if you wish.
這通常不使用，但如果您愿意，您可以進一步探索。

subsample [default=1] ? ? ? ? ?子樣本[默認值=1]

Same as the subsample of GBM. Denotes the fraction of observations to be randomly samples for each tree.
與GBM的子樣本相同。表示每棵樹隨機采樣的觀測分數。
Lower values make the algorithm more conservative and prevents overfitting but too small values might lead to under-fitting.
值越小，算法越保守，防止過擬合，但過小的值可能導致過擬合。
Typical values: 0.5-1
典型值：0.5-1

colsample_bytree [default=1] ? ??colsample_bytree [默認值=1]

Similar to max_features in GBM. Denotes the fraction of columns?to be randomly samples for each tree.
類似于GBM中的max_功能。表示每棵樹隨機采樣的列的分數。
Typical values: 0.5-1
典型值：0.5-1

colsample_bylevel [default=1] ? ? ????????colsample_bylevel [默認值=1]

Denotes the subsample ratio of columns for each split, in each level.
表示每個級別中每個拆分的列的子樣本比率。
I don’t use this often because subsample and colsample_bytree will do the job for you.?but you can explore further if you feel so.
我不經常使用這個，因為Subsample和Colsample Bytree將為您完成這項工作。但是如果您覺得這樣，您可以進一步探索。

lambda [default=1] ? ? ? ? ? ? ? ? ? ?????????lambda[默認值=1]

L2 regularization term on weights (analogous to Ridge regression)
權的L2正則化項（類似于嶺回歸）
This used to handle the regularization part of XGBoost. Though many data scientists don’t use it often, it should be explored to reduce overfitting.
用于處理xgboost的正則化部分。雖然許多數據科學家不經常使用它，但應該探索它來減少過度擬合。

alpha [default=0] ? ? ? ? ? ? ? ? ? ? ????????alpha[默認值=0]

L1 regularization term on weight?(analogous to Lasso?regression)
L1重量上的正則化項（類似于lasso回歸）
Can be used in case of very high dimensionality so that the algorithm runs faster when implemented
可以在高維情況下使用，以便算法在實現時運行更快

scale_pos_weight [default=1] ? ? ? ? ? ? ?scale_pos_weight ?[默認值=1]

A value greater than 0 should be?used in case of high class imbalance as it helps in faster convergence.
大于0的值應用于高級不平衡的情況，因為它有助于更快的收斂。

???????學習任務參數/Learning Task Parameters

These parameters are used to define the optimization objective the metric to be calculated at each step.
這些參數用于定義優化目標，即在每個步驟中要計算的度量。

objective [default=reg:linear] ? ? ? ? ?目標[默認值=reg：線性]

This defines the?loss function to be minimized. Mostly used values are:
這定義了要最小化的損失函數。最常用的值是：???????
- binary:logistic?–logistic regression for binary classification, returns?predicted probability (not class)
  二進制：logistic–logistic回歸用于二進制分類，返回預測概率（非類別）
- multi:softmax?–multiclass classification using the softmax objective, returns predicted class (not probabilities)
  multi:softmax–使用softmax目標的多類分類，返回預測類（不是概率）
  - you also need to set an additional?num_class?(number of classes) parameter defining the number of unique classes
    您還需要設置一個額外的num_class（類數）參數來定義唯一類的數目。
- multi:softprob?–same as softmax, but returns?predicted probability of each data point belonging to each class.
  multi:softprob–與softmax相同，但返回屬于每個類的每個數據點的預測概率。

eval_metric [ default according to objective ] ? ? ? ? ? ? ??評估指標[根據目標默認]

The metric to be used for?validation data.
用于驗證數據的度量。
The default values are rmse for regression and error for classification.
默認值為回歸的RMSE和分類的錯誤。
Typical?values are: ? ? ?????????典型值為：
- rmse?– root mean square error ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??rmse–均方根誤差
- mae?–?mean absolute error? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?平均絕對誤差
- logloss?–?negative?log-likelihood? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 對數損失–負對數可能性
- error?–?Binary classification error rate (0.5 threshold)? ? ? 錯誤–二進制分類錯誤率（0.5閾值）
- merror?–?Multiclass classification error rate? ? ? ? ? ? ? ? ? ? ??多類分類錯誤率 ? ? ? ? ? ? ? ? ??
- mlogloss?–?Multiclass logloss? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?多類日志損失
- auc:?Area under the curve? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?AUC：曲線下面積

seed [default=0] ? ? ? ? ? ??種子[默認值=0]

The random number seed.
隨機數種子。
Can be used for generating reproducible results and also for parameter tuning.
可用于生成可重復的結果，也可用于參數調整。

If you’ve been using Scikit-Learn till now, these parameter names might not look familiar. A good news is that xgboost module in python has an sklearn wrapper called XGBClassifier. It uses sklearn style naming convention. The parameters names which will change are:
如果您到目前為止一直在使用scikit-learn，這些參數名稱可能看起來不熟悉。一個好消息是，python中的xgboost模塊有一個名為xgbclassifier的sklearn包裝器。它使用sklearn樣式命名約定。將更改的參數名稱為：

eta –> learning_rate ? ? ? ? ? ? ??

lambda –> reg_lambda ? ? ? ? ? ? ?

alpha –> reg_alpha

You must be wondering that we have defined everything except something similar to the “n_estimators” parameter in GBM. Well this exists as a parameter in XGBClassifier. However, it has to be passed as “num_boosting_rounds” while calling the fit function in the standard xgboost implementation.
您一定想知道，除了類似于GBM中的“n_Estimators”參數之外，我們已經定義了所有內容。這在XGBClassifier中作為一個參數存在。但是，在標準xgboost實現中調用fit函數時，必須將其作為“num-booting-rounds”傳遞。

I recommend you to go through the following parts of xgboost guide to better understand the parameters and codes:
我建議您仔細閱讀xgboost指南的以下部分，以便更好地了解參數和代碼：

XGBoost Parameters (official guide)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?xgboost參數（官方指南）

XGBoost Demo Codes (xgboost GitHub repository)? ? ? ? ?xgboost演示代碼（xgboost github存儲庫）

Python API Reference (official guide)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??python api參考（官方指南）

總結

以上是生活随笔為你收集整理的ML之XGBoost：XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(二)的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： ML之XGBoost：XGBoost参数
下一篇： ML之XGBoost：XGBoost参数

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

python

ML之XGBoost：XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(二)

2. xgboost參數/XGBoost?Parameters

一般參數/General Parameters

Booster參數/Booster Parameters

???????學習任務參數/Learning Task Parameters

總結