當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

CatBoost参数解释

發布時間：2023/12/8 编程问答 52 豆豆

生活随笔收集整理的這篇文章主要介紹了 CatBoost参数解释小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

官方鏈接
https://tech.yandex.com/catboost/doc/dg/concepts/python-reference_parameters-list-docpage/

Common parameters

nan_mode (string): 處理輸入數據中缺失值的方法，包括Forbidden(禁止存在缺失)，Min(用最小值補)，Max(用最大值補)。默認Min。
calc_feature_importance (bool): 是否計算特征重要性。默認True。
fold_permutation_block_size (int): 在數據隨機排列前分塊，值越小越慢。默認1。
ignored_features (list): 忽略數據集中的某些特征。默認None。
use_best_model (bool): 設置此參數時，需要提供測試數據，樹的個數通過訓練參數和優化loss function獲得。默認False。
loss_function (string/ object): 支持的有RMSE, Logloss, MAE, CrossEntropy, Quantile, LogLinQuantile, Multiclass, MultiClassOneVsAll, MAPE, Poisson。默認Logloss。
custom_loss (object): 訓練過程中損失函數的值。默認None。
eval_metric (string): 用于過擬合檢驗（設置True）和最佳模型選擇（設置True）的loss function，用于優化。
iterations (int): 最大樹數。默認500。
border (float): 用于二分類／使用Logloss function中，大于border認為是正樣本。默認0.5。
gradient_iterations (int): 梯度下降的步數。默認1。
depth (int): 樹深，最大16，建議在1到10之間。默認6。
learning_rate (float): 學習率。默認0.03。
rsm (float [0; 1]): 隨機子空間（Random subspace method）。默認1。
partition_random_seed (int): 隨機種子。默認None，每次訓練隨機選擇。
leaf_estimation_method (string): 計算葉子值的方法，Newton/ Gradient。默認Gradient。
l2_leaf_reg (int): l2正則參數。默認3
has_time (bool): 在將categorical features轉化為numerical features和選擇樹結構時，順序選擇輸入數據。默認False（隨機）。
priors (string): 訓練過程中指定先驗。默認None。
feature_priors (list): 在將categorical features轉化為numerical features時，指定先驗。
name (string): 在可視化工具中的實驗名稱。默認experiment。
fold_len_multiplier (float): folds長度系數。設置大于1的參數，在參數較小時獲得最佳結果。默認2。
approx_on_full_history (bool): 計算近似值，False：使用1／fold_len_multiplier計算；True：使用fold中前面所有行計算。默認False。
class_weights (list): 類別的權重。默認None。
classes_count (int): 類別label的上限。默認：類別label最大值＋1。
one_hot_max_size (bool): 如果feature包含的不同值的數目超過了指定值，將feature轉化為float。默認False
random_strength (float): 分數標準差乘數。默認1。
bagging_temperature (float): 貝葉斯套袋控制強度，區間[0, 1]。默認1。

Overfitting detection settings
- od_type (string): 過擬合檢測類型：IncToDec/ Iter。默認IncToDec。
- od_pval (float): 使用IncToDec時的閾值，值越大越早檢測出過擬合。默認0（不使用過擬合檢測）。
- od_wait (int): 在最小化損失函數后的迭代次數。使用InctoDec時，表示當達到閾值后，忽略過擬合檢測，繼續訓練。使用Iter時，表示達到指定次數后，停止訓練。默認20。

CTR settings
- ctr_description (string): categorical features的二值化設置。默認None。包括CTR類型（Borders, Buckets, BinarizedTargetMeanValue，Counter），邊界數（只對回歸，范圍1－255，默認1），二值化類型（只對回歸，Median, Uniform, UniformAndQuantiles, MaxSumLog, MinEntropy, GreedyLogSum，默認MinEntropy）。默認None。
- counter_calc_method (string): 計算點擊率類型的方法，PrefixTest考慮測試集中當前對象，FullTest考慮測試集中所有對象，SkipTest不考慮測試集中的對象，Full考慮訓練和測試集中的全部對象。默認None（PrefixTest）。
- ctr_border_count (int): categorical features的分割數，范圍1－255。默認16。
- max_ctr_complexity (int): 組合categorical features的最大數目。默認4。
- ctr_leaf_count_limit (int): categorical features最大葉子數，如果超過設置值則部分葉子被丟棄。葉子按值的頻率排序，選擇前n個（n為設置值），之后的葉子全丟棄。默認None。
- store_all_simple_ctr (bool): 忽略不使用的categorical features。與ctr_leaf_count_limit一起使用。默認False。

Binarization settings
- border_count (int): numerical features的分割數，范圍1－255。默認128。
- feature_border_type (string): numerical features的二值化模式，Median, Uniform, UniformAndQuantiles, MaxSumLog, MinEntropy, GreedyLogSum。默認MiniEntropy。

Performance settings
- thread_count (int): 訓練模型時使用的thread，不影響結果。默認None。

Output settings
- verbose (bool): 顯示詳細信息。默認False。
- train_dir (string): 儲存訓練期間的文件目錄。默認當前目錄。
- allow_writing_files (bool): 允許在訓練期間寫analytical和snapshot文件。如果設置為False，snapshot和可視化工具不能使用。默認True。
- save_snapshot (bool): 啟用snapshot在中斷后存儲訓練進度。默認None。
- snapshot_file (string): 存儲的文件名稱。默認experiment.cbsnapshot。
- plot (bool): 訓練期間輸出以下信息：損失函數值，自定損失值，已訓練時間，距訓練結束時間。在jupyter notebook中可以使用。默認False。

總結

以上是生活随笔為你收集整理的CatBoost参数解释的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： html5 指南针,html5指南针实现
下一篇：基于STM32和hs1527、ev152

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

CatBoost参数解释

總結