當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

5行代码可实现5倍Scikit-Learn参数调整的更快速度

發布時間：2023/12/15 编程问答 23 豆豆

生活随笔收集整理的這篇文章主要介紹了 5行代码可实现5倍Scikit-Learn参数调整的更快速度小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

By Michael Chau, Anthony Yu, Richard Liaw

由邁克爾洲，安東尼宇，理查德·廖

Everyone knows about Scikit-Learn — it’s a staple for data scientists, offering dozens of easy-to-use machine learning algorithms. It also provides two out-of-the-box techniques to address hyperparameter tuning: Grid Search (GridSearchCV) and Random Search (RandomizedSearchCV).

每個人都知道Scikit-Learn，它是數據科學家的必備品，它提供了數十種易于使用的機器學習算法。它還提供了兩種開箱即用的技術來解決超參數調整：網格搜索(GridSearchCV)和隨機搜索(RandomizedSearchCV)。

Though effective, both techniques are brute-force approaches to finding the right hyperparameter configurations, which is an expensive and time-consuming process!

盡管有效，但這兩種技術都是尋找正確的超參數配置的蠻力方法，這是一個昂貴且耗時的過程！

Image by author圖片作者

如果您想加快此過程怎么辦？ (What if you wanted to speed up this process?)

In this blog post, we introduce tune-sklearn, which makes it easier to leverage these new algorithms while staying in the Scikit-Learn API. Tune-sklearn is a drop-in replacement for Scikit-Learn’s model selection module with cutting edge hyperparameter tuning techniques (bayesian optimization, early stopping, distributed execution) — these techniques provide significant speedups over grid search and random search!

在此博客文章中，我們介紹tune-sklearn ，這使得在保留Scikit-Learn API的同時更容易利用這些新算法。 Tune-sklearn使用尖端的超參數調整技術(貝葉斯優化，提前停止，分布式執行) 替代了Scikit-Learn的模型選擇模塊，這些技術大大提高了網格搜索和隨機搜索的速度！

Here’s what tune-sklearn has to offer:

以下是tune-sklearn提供的功能：

Consistency with Scikit-Learn API: tune-sklearn is a drop-in replacement for GridSearchCV and RandomizedSearchCV, so you only need to change less than 5 lines in a standard Scikit-Learn script to use the API.
與Scikit-Learn API的一致性： tune-sklearn是GridSearchCV和RandomizedSearchCV的直接替代，因此您只需在標準Scikit-Learn腳本中更改少于5行即可使用該API。
Modern hyperparameter tuning techniques: tune-sklearn allows you to easily leverage Bayesian Optimization, HyperBand, and other optimization techniques by simply toggling a few parameters.
現代超參數調整技術： tune-sklearn使您可以通過簡單地切換幾個參數來輕松利用貝葉斯優化，HyperBand和其他優化技術。
Framework support: tune-sklearn is used primarily for tuning Scikit-Learn models, but it also supports and provides examples for many other frameworks with Scikit-Learn wrappers such as Skorch (Pytorch), KerasClassifiers (Keras), and XGBoostClassifiers (XGBoost).
框架支持： tune-sklearn主要用于調整Scikit-Learn模型，但它也支持許多Scikit-Learn包裝器，例如Skorch(Pytorch)，KerasClassifiers(Keras)和XGBoostClassifiers(XGBoost)。
Scale up: Tune-sklearn leverages Ray Tune, a library for distributed hyperparameter tuning, to efficiently and transparently parallelize cross validation on multiple cores and even multiple machines.
擴大規模： Tune-sklearn利用Ray Tune (一個用于分布式超參數調整的庫)來高效透明地并行化多核甚至多臺機器上的交叉驗證。

A sample of the frameworks supported by tune-sklearn. tune-sklearn支持的框架示例。

Tune-sklearn is also fast. To see this, we benchmark tune-sklearn (with early stopping enabled) against native Scikit-Learn on a standard hyperparameter sweep. In our benchmarks we can see significant performance differences on both an average laptop and a large workstation of 48 CPU cores.

Tune-sklearn也很快。為此，我們在標準超參數掃描中將tune-sklearn (啟用了提前停止)與本機Scikit-Learn進行了基準測試。在我們的基準測試中，我們可以看到普通筆記本電腦和具有48個CPU內核的大型工作站在性能上都存在顯著差異。

For the larger benchmark 48-core computer, Scikit-Learn took 20 minutes for a 40,000-size dataset searching over 75 hyperparameter sets. Tune-sklearn took a mere 3 and a half minutes — sacrificing minimal accuracy.*

對于更大的基準48核計算機，Scikit-Learn用了20分鐘的時間來搜索40,000個數據集，并搜索了75個超參數集。 Tune-sklearn僅花費了3分半鐘-犧牲了最低的準確性。*

On left: On a personal dual core i5 8GB RAM laptop using a parameter grid of 6 configurations. On right: On a large 48 core 250 GB RAM computer using a parameter grid of 75 configurations.左側：在個人雙核i5 8GB RAM筆記本電腦上，使用6種配置的參數網格。右圖：在使用75個配置的參數網格的大型48核250 GB RAM計算機上。

* Note: For smaller datasets (10,000 or fewer data points), there may be a sacrifice in accuracy when attempting to fit with early stopping. We don’t anticipate this to make a difference for users as the library is intended to speed up large training tasks with large datasets.

*注意：對于較小的數據集(10,000個或更少的數據點)，在嘗試適應早期停止時可能會犧牲準確性。我們預計這不會對用戶產生任何影響，因為該庫旨在加快使用大型數據集的大型培訓任務的速度。

簡單的60秒演練 (Simple 60 second Walkthrough)

Let’s take a look at how it all works.

讓我們看一下它們的工作原理。

Run pip install tune-sklearn ray[tune] or pip install tune-sklearn "ray[tune]" to get started with our example code below.

運行pip install tune-sklearn ray[tune]或pip install tune-sklearn "ray[tune]"以開始下面的示例代碼。

Hyperparam set 2 is a set of unpromising hyperparameters that would be detected by tune’s early stopping mechanisms, and stopped early to avoid wasting training time and resources.超參數集2是一組沒有希望的超參數，可以通過曲調的早期停止機制檢測到，并且盡早停止以避免浪費訓練時間和資源。

TuneGridSearchCV示例 (TuneGridSearchCV Example)

To start out, it’s as easy as changing our import statement to get Tune’s grid search cross validation interface:

首先，就像更改導入語句以獲取Tune的網格搜索交叉驗證界面一樣簡單：

And from there, we would proceed just like how we would in Scikit-Learn’s interface! Let’s use a “dummy” custom classification dataset and an SGDClassifier to classify the data.

從那里開始，我們將像在Scikit-Learn界面中一樣進行操作！讓我們使用“虛擬”自定義分類數據集和SGDClassifier對數據進行分類。

We choose the SGDClassifier because it has a partial_fit API, which enables it to stop fitting to the data for a certain hyperparameter configuration. If the estimator does not support early stopping, we would fall back to a parallel grid search.

我們選擇SGDClassifier是因為它具有partial_fit API，這使它可以停止擬合特定超參數配置的數據。如果估算器不支持提早停止，我們將退回到并行網格搜索。

As you can see, the setup here is exactly how you would do it for Scikit-Learn! Now, let’s try fitting a model.

如您所見，此處的設置正是您為Scikit-Learn所做的設置！現在，讓我們嘗試擬合模型。

Note the slight differences we introduced above:

請注意我們上面介紹的細微差別：

a new early_stopping variable, and

一個新的early_stopping變量，以及

a specification of max_iters parameter

max_iters參數的規范

The early_stopping determines when to stop early — MedianStoppingRule is a great default but see Tune’s documentation on schedulers here for a full list to choose from. max_iters is the maximum number of iterations a given hyperparameter set could run for; it may run for fewer iterations if it is early stopped.

early_stopping決定了何時提前停止-MedianStoppingRule是一個很好的默認設置，但是請參閱此處有關調度程序的Tune文檔，以獲取完整列表。 max_iters是給定超參數集可以運行的最大迭代次數；如果它提前停止，它可能會運行較少的迭代。

Try running this compared to the GridSearchCV equivalent.

嘗試將其與GridSearchCV等效運行。

TuneSearchCV貝葉斯優化示例 (TuneSearchCV Bayesian Optimization Example)

Other than the grid search interface, tune-sklearn also provides an interface, TuneSearchCV, for sampling from distributions of hyperparameters.

除了網格搜索界面之外， tune-sklearn還提供了一個接口TuneSearchCV，用于從超參數分布中進行采樣。

In addition, you can easily enable Bayesian optimization over the distributions in TuneSearchCV in only a few lines of code changes.

此外，您只需更改幾行代碼即可輕松地對TuneSearchCV中的發行版啟用貝葉斯優化。

Run pip install scikit-optimize to try out this example:

運行pip install scikit-optimize嘗試以下示例：

Lines 17, 18, and 26 are the only lines of code changed to enable Bayesian optimization第17、18和26行是更改的僅有幾行代碼，以啟用貝葉斯優化

As you can see, it’s very simple to integrate tune-sklearn into existing code. Check out more detailed examples and get started with tune-sklearn here and let us know what you think! Also take a look at Ray’s replacement for joblib, which allows users to parallelize training over multiple nodes, not just one node, further speeding up training.

如您所見，將tune-sklearn集成到現有代碼中非常簡單。在此處查看更詳細的示例并開始使用tune-sklearn ，讓我們知道您的想法！還可以看看Ray 替代 joblib的方法，它可以使用戶在多個節點(而不僅僅是一個節點)上并行進行訓練，從而進一步加快了訓練速度。

文檔和示例 (Documentation and Examples)

Documentation*
文檔 *
Example: Skorch with tune-sklearn
示例：帶有tune-sklearn的Skorch
Example: Scikit-Learn Pipelines with tune-sklearn
示例：使用tune-sklearn的Scikit-Learn管道
Example: XGBoost with tune-sklearn
示例：帶有tune-sklearn的XGBoost
Example: KerasClassifier with tune-sklearn
示例：帶有tune-sklearn的KerasClassifier
Example: LightGBM with tune-sklearn
示例： LightGBM和tune-sklearn

*Note: importing from ray.tune as shown in the linked documentation is available only on the nightly Ray wheels and will be available on pip soon

*注意：如鏈接文檔中所示，從 ray.tune 導入僅在每晚的Ray輪上可用，并且很快將在pip上可用

翻譯自: https://medium.com/@michaelchau_99485/5x-faster-scikit-learn-parameter-tuning-in-5-lines-of-code-be6bdd21833c

總結

以上是生活随笔為你收集整理的5行代码可实现5倍Scikit-Learn参数调整的更快速度的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：快手艾特通知怎么关闭
下一篇： tensorflow 多人_使用Tens