模型越复杂越容易惰性_ML模型的惰性预测
模型越復雜越容易惰性
Hey, hope you are having a wonderful day!
嘿,希望您今天過得愉快!
Whenever I work on a new ML project. These lines always pop up in my mind every time
每當我從事新的ML項目時。 這些線每次都會在我的腦海中彈出
“I need to fit the data for every model then apply metrics to check which model has better accuracy for the available dataset ,then choose best model and also this process is time-consuming and even it might not be that much effective too“
“我需要為每個模型擬合數據,然后應用度量標準來檢查哪個模型對可用數據集具有更好的準確性,然后選擇最佳模型,而且此過程非常耗時,甚至可能效果也不那么好”
For this problem, I got a simple solution when surfing through python org, which is a small python library by name “lazypredict” and it does wonders
對于這個問題,我在通過python org進行瀏覽時得到了一個簡單的解決方案,這是一個名為“ lazypredict”的小型python庫,它的確令人驚訝
Let me tell you how it works:-
讓我告訴你它是如何工作的:
安裝庫 (Install the library)
pip install lazypredict注意 (Note)
lazypredict comes only for supervised learning (Classification and Regression)
lazypredict僅用于監督學習(分類和回歸)
I will be using jupyter notebook in this article
我將在本文中使用Jupyter Notebook
碼 (Code)
# import necessary modulesimport warnings
warnings.filterwarnings('ignore')
import time
from sklearn.datasets import load_iris,fetch_california_housing
from sklearn.model_selection import train_test_split
from lazypredict.Supervised import LazyClassifier,LazyRegressor
- warnings: Package to handle warnings and ‘ignore’ is used when we need to filter out all the warnings 警告:處理警告和“忽略”的包在我們需要過濾掉所有警告時使用
- time: Package to handle time manipulation time:處理時間的軟件包
- sklearn.datasets: Package to load datasets, today we gonna use the classic datasets which everyone works on it that are load_iris() for classification problem and fetch_california_housing() for a regression problem sklearn.datasets:打包以加載數據集,今天我們將使用每個人都可以處理的經典數據集,其中load_iris()用于分類問題,而fetch_california_housing()用于回歸問題。
- sklearn.model_selection.train_test_split:Used to split the dataset into train and split sklearn.model_selection.train_test_split:用于將數據集拆分為訓練并拆分
- lazypredict:this is the package we gonna learn today in lazypredict.Supervised there are two main functions LazyClassifier for Classification and LazyRegressor for Regression lazypredict:這是我們今天將要在lazypredict中學習的軟件包。在監督下,有兩個主要功能用于分類的LazyClassifier和用于回歸的LazyRegressor
惰性分類器 (LazyClassifier)
# load the iris datasetdata=load_iris()
X=data.data
Y=data.target
- The data is a variable with dictionary data type where there are two keys the data which contains independent features/column values and target which contains dependent feature value 數據是具有字典數據類型的變量,其中有兩個鍵,數據包含獨立的要素/列值,目標包含相關的要素值
- X has all the independent features values X具有所有獨立特征值
- Y has all the dependent features values Y具有所有從屬特征值
X_train, X_test, Y_train, Y_test =train_test_split(X,Y,test_size=.3,random_state =23)
classi=LazyClassifier(verbose=0,predictions=True)
- We will split the data into train and test using train_test_split() 我們將數據分為train和使用train_test_split()進行測試
- The test size will be 0.3(30%) of the dataset 測試大小將為數據集的0.3(30%)
- random_state will decide the splitting of data into train and test indices just choose any number you like! random_state將決定將數據拆分為訓練索引和測試索引,只需選擇您喜歡的任何數字即可!
Tip 1:If you want to see source code behind any function or object in the jupyter notebook then just add ? or ?? after the object or the function you want to check out and excute it
提示1:如果要查看jupyter筆記本中任何函數或對象背后的源代碼,則只需添加? 要么 ?? 在要檢出并執行的對象或功能之后
- Next, we will call LazyClassifier() and initialize to classic with two parameters verbose and prediction 接下來,我們將調用LazyClassifier()并使用詳細信息和預測兩個參數將其初始化為經典
- verbose: int data type, if non zero, progress messages are printed. Above 50, the output is sent to stdout. The frequency of the messages increases with the verbosity level. If it more than 10, all iterations are reported. I would suggest you try different values based on your depth of analysis 詳細:int數據類型,如果非零,則顯示進度消息。 高于50時,輸出將發送到stdout。 消息的頻率隨著詳細程度而增加。 如果大于10,則報告所有迭代。 我建議您根據您的分析深度嘗試其他值
- predictions: boolean data type if it is set to True then it will return all the predicted values from the models 預測:布爾數據類型,如果將其設置為True,則它將返回模型中的所有預測值
start_time_1=time.time()
models_c,predictions_c=classi.fit(X_train, X_test, Y_train, Y_test)
end_time_1=time.time()
- we gonna fit train and test data to the classi object 我們將訓練和測試數據擬合到classi對象
- classic will return two values: 經典版將返回兩個值:
- models_c: will have all the models and with some metrics models_c:將具有所有模型并具有一些指標
predictions_c: will have all the predicted values that is ?
projections_c :將具有所有預測值?
models_cmodel_c outputmodel_c輸出
- To be honest I didn't know some of these models even exist for classification until I saw this 老實說,我不知道其中一些模型可以分類,直到我看到了
- I know that your mind would be thinking why is ROC AUC is None is this function not giving proper output nope that's not the case here, ROC AUC is None because we have taken multi-classification dataset 我知道您的大腦會在思考為什么ROC AUC為None是該函數沒有提供適當的輸出空間嗎,這里不是這種情況,ROC AUC為None是因為我們采用了多分類數據集
Tip 2: For the above dataset or multi-classification we can use roc_auc_score rather than ROC AUC
提示2:對于上述數據集或多分類,我們可以使用roc_auc_score而不是ROC AUC
# to check the predications for the modelspredictions_cpredictions_c outputprojections_c輸出
- This is just a few sample predictions from the models 這只是來自模型的一些樣本預測
惰性回歸器 (LazyRegressor)
- So we checked out LazyClassifier it will be sad if we didn't pay some attention to LazyRegressor 因此,我們檢查了LazyClassifier,如果我們不注意LazyRegressor將會很可惜
- The following code is similar to LazyClassifier so let's pick up the phase and skip some explanations 以下代碼與LazyClassifier相似,因此讓我們開始階段并跳過一些解釋。
data1=fetch_california_housing()
X1=data1.data
Y1=data1.target
- data1 is dict data type with data and target as keys data1是dict數據類型,數據和目標為鍵
X_train1, X_test1, Y_train1, Y_test1 =train_test_split(X1,Y1,test_size=.3,random_state =23)
regr=LazyRegressor(verbose=0,predictions=True)
- after fitting the model next we will train 擬合模型之后,我們將進行訓練
start_time_2=time.time()
models_r,predictions_r=regr.fit(X_train1, X_test1, Y_train1, Y_test1)
end_time_2=time.time()
注意 (Note)
1. Before running the above cell make sure you clear all the unnecessary background process because it takes a lot of computation power
1.在運行上面的單元格之前,請確保清除所有不必要的后臺進程,因為這需要大量的計算能力
2. I would suggest if you have low computation power(RAM, GPU) then use Google Colab, This is the simplest solution you can get
2.我建議如果您的計算能力(RAM,GPU)低,那么請使用Google Colab,這是您可以獲得的最簡單的解決方案
# to check which model did better on the fetch_california_housing datasetmodels_rmodels_r outputmodels_r輸出
- And again I didn't know there were so many models for regression 再一次,我不知道有這么多回歸模型
predictions_rpredictions_r outputprojections_r輸出
時間復雜度 (Time Complexity)
- We should talk about time complexity because that's the main goal for all us to reduce it as much as possible 我們應該談論時間復雜性,因為這是我們所有人盡可能降低時間的主要目標
print("The time taken by LazyClassifier for {0} samples is {1} ms".format(len(data.data),round(end_time_1-start_time_1,0)))
print("The time taken by LazyRegressor for {0} samples is {1} ms".format(len(data1.data),round(end_time_2-start_time_2,0)))time complexity output時間復雜度輸出
Tip 3: Add %%time to check the execution time of the current jupyter cell
提示3:添加%% time以檢查當前jupyter單元的執行時間
注意 (Note)
- Use this library in the first iteration of your ML project before hypertunning models 在對模型進行超調整之前,請在ML項目的第一個迭代中使用此庫
- lazypredict only works for Python versions ≥3.6 lazypredict僅適用于Python版本≥3.6
- If you don’t have the computational power just use Google colab 如果您沒有計算能力,請使用Google colab
The Github link is here for the code.
Github鏈接在此處提供代碼。
If you want to read the official docs
如果您想閱讀官方文檔
That's all the things you need to know about lazypredict library for now
這就是您現在需要了解的關于lazypredict庫的所有信息
Hope you learned new things from this article today and will help you to make your ML projects a bit easier
希望您今天從本文中學到了新東西,并可以幫助您簡化ML項目
Thank you for dedicating a few mins of your day
感謝您奉獻您的幾分鐘時間
If you have any doubts just comment down below I will be happy to help you out!
如果您有任何疑問,請在下方留言,我們將竭誠為您服務!
Thank you!
謝謝!
-Mani
-馬尼
翻譯自: https://medium.com/swlh/lazy-predict-for-ml-models-c513a5daf792
模型越復雜越容易惰性
總結
以上是生活随笔為你收集整理的模型越复杂越容易惰性_ML模型的惰性预测的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 特斯拉公布2022财报:2023预计交付
- 下一篇: 磷酸铁锂电池反超三元锂,市场份额达 55