當前位置：首頁 > 编程语言 > python >内容正文

python

python中的auto_ml自动机器学习框架学习实践

發布時間：2024/8/1 python 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 python中的auto_ml自动机器学习框架学习实践小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

? ? ?之前就有接觸過auto_ml這個自動機器學習框架，但是一直沒有時間做一個簡單的記錄總結，以便于后續有時間繼續學習，我相信隨著機器學習的普及推廣和發展，自動機器學習一定會占據越來越大的作用，因為機器學習、深度學習里面很大的一部分時間都要花在特征工程、模型選擇、組合和參數調優上面，auto_ml框架提供了一種很好的解決思路，當前的自動學習框架也有很多，想要完整地進行學習還是需要花費一定的時間的，這里就簡單對之前使用的auto_ml做個簡單的記錄。

? ? ?由于數據集的緣故我不能隨意公開使用，這里索性直接使用官方提供的Demo來簡單學習實踐一下，之后使用自己的數據集的時候只需要做一點數據集規范格式的統一處理就好了。

? ? ? 以波士頓房價數據為例，簡單的一個小例子如下：

def bostonSimpleFunc():'''波士頓房價數據的簡單應用實例'''train_data,test_data = get_boston_dataset()column_descriptions = {'MEDV': 'output','CHAS': 'categorical'}ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)ml_predictor.train(train_data)ml_predictor.score(test_data, test_data.MEDV)

? ? ? ?運行結果如下：

? ? ? ?作者說了，這個auto_ml是為了產品研發的，提供了很完整的應用，這里從訓練測試數據集劃分、模型訓練、模型持久化、模型加載、模型預測幾個部分來拿波士頓房價數據做一個完成的實踐，具體如下：

def bostonWholeFunc():'''波士頓房價數據的一個比較完整的實例包括：訓練測試數據集劃分、模型訓練、模型持久化、模型加載、模型預測'''train_data,test_data = get_boston_dataset()column_descriptions = {'MEDV': 'output', 'CHAS': 'categorical'}ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)ml_predictor.train(train_data)test_score = ml_predictor.score(test_data, test_data.MEDV)file_name = ml_predictor.save()trained_model = load_ml_model(file_name)predictions = trained_model.predict(test_data)print('=====================predictions===========================')print(predictions)predictions = trained_model.predict_proba(test_data)print('=====================predictions===========================')print(predictions)

? ? ? ?結果如下：

Welcome to auto_ml! We're about to go through and make sense of your data using machine learning, and give you a production-ready pipeline to get predictions with.If you have any issues, or new feature ideas, let us know at http://auto.ml You are running on version 2.9.10 Now using the model training_params that you passed in: {} After overwriting our defaults with your values, here are the final params that will be used to initialize the model: {'presort': False, 'warm_start': True, 'learning_rate': 0.1} Running basic data cleaning Fitting DataFrameVectorizer Now using the model training_params that you passed in: {} After overwriting our defaults with your values, here are the final params that will be used to initialize the model: {'presort': False, 'warm_start': True, 'learning_rate': 0.1}******************************************************************************************** About to fit the pipeline for the model GradientBoostingRegressor to predict MEDV Started at: 2019-06-12 09:21:21 [1] random_holdout_set_from_training_data's score is: -9.93 [2] random_holdout_set_from_training_data's score is: -9.281 [3] random_holdout_set_from_training_data's score is: -8.683 [4] random_holdout_set_from_training_data's score is: -8.03 [5] random_holdout_set_from_training_data's score is: -7.494 [6] random_holdout_set_from_training_data's score is: -7.074 [7] random_holdout_set_from_training_data's score is: -6.649 [8] random_holdout_set_from_training_data's score is: -6.374 [9] random_holdout_set_from_training_data's score is: -6.115 [10] random_holdout_set_from_training_data's score is: -5.877 [11] random_holdout_set_from_training_data's score is: -5.566 [12] random_holdout_set_from_training_data's score is: -5.391 [13] random_holdout_set_from_training_data's score is: -5.088 [14] random_holdout_set_from_training_data's score is: -4.911 [15] random_holdout_set_from_training_data's score is: -4.692 [16] random_holdout_set_from_training_data's score is: -4.566 [17] random_holdout_set_from_training_data's score is: -4.379 [18] random_holdout_set_from_training_data's score is: -4.296 [19] random_holdout_set_from_training_data's score is: -4.14 [20] random_holdout_set_from_training_data's score is: -4.009 [21] random_holdout_set_from_training_data's score is: -3.92 [22] random_holdout_set_from_training_data's score is: -3.856 [23] random_holdout_set_from_training_data's score is: -3.81 [24] random_holdout_set_from_training_data's score is: -3.72 [25] random_holdout_set_from_training_data's score is: -3.632 [26] random_holdout_set_from_training_data's score is: -3.601 [27] random_holdout_set_from_training_data's score is: -3.538 [28] random_holdout_set_from_training_data's score is: -3.487 [29] random_holdout_set_from_training_data's score is: -3.459 [30] random_holdout_set_from_training_data's score is: -3.458 [31] random_holdout_set_from_training_data's score is: -3.422 [32] random_holdout_set_from_training_data's score is: -3.408 [33] random_holdout_set_from_training_data's score is: -3.356 [34] random_holdout_set_from_training_data's score is: -3.335 [35] random_holdout_set_from_training_data's score is: -3.323 [36] random_holdout_set_from_training_data's score is: -3.313 [37] random_holdout_set_from_training_data's score is: -3.262 [38] random_holdout_set_from_training_data's score is: -3.236 [39] random_holdout_set_from_training_data's score is: -3.207 [40] random_holdout_set_from_training_data's score is: -3.214 [41] random_holdout_set_from_training_data's score is: -3.198 [42] random_holdout_set_from_training_data's score is: -3.188 [43] random_holdout_set_from_training_data's score is: -3.174 [44] random_holdout_set_from_training_data's score is: -3.164 [45] random_holdout_set_from_training_data's score is: -3.122 [46] random_holdout_set_from_training_data's score is: -3.122 [47] random_holdout_set_from_training_data's score is: -3.109 [48] random_holdout_set_from_training_data's score is: -3.11 [49] random_holdout_set_from_training_data's score is: -3.119 [50] random_holdout_set_from_training_data's score is: -3.113 [52] random_holdout_set_from_training_data's score is: -3.113 [54] random_holdout_set_from_training_data's score is: -3.099 [56] random_holdout_set_from_training_data's score is: -3.102 [58] random_holdout_set_from_training_data's score is: -3.097 [60] random_holdout_set_from_training_data's score is: -3.069 [62] random_holdout_set_from_training_data's score is: -3.061 [64] random_holdout_set_from_training_data's score is: -3.024 [66] random_holdout_set_from_training_data's score is: -2.999 [68] random_holdout_set_from_training_data's score is: -2.999 [70] random_holdout_set_from_training_data's score is: -2.984 [72] random_holdout_set_from_training_data's score is: -2.978 [74] random_holdout_set_from_training_data's score is: -2.96 [76] random_holdout_set_from_training_data's score is: -2.943 [78] random_holdout_set_from_training_data's score is: -2.947 [80] random_holdout_set_from_training_data's score is: -2.938 [82] random_holdout_set_from_training_data's score is: -2.921 [84] random_holdout_set_from_training_data's score is: -2.914 [86] random_holdout_set_from_training_data's score is: -2.91 [88] random_holdout_set_from_training_data's score is: -2.901 [90] random_holdout_set_from_training_data's score is: -2.906 [92] random_holdout_set_from_training_data's score is: -2.892 [94] random_holdout_set_from_training_data's score is: -2.885 [96] random_holdout_set_from_training_data's score is: -2.884 [98] random_holdout_set_from_training_data's score is: -2.894 [100] random_holdout_set_from_training_data's score is: -2.88 [103] random_holdout_set_from_training_data's score is: -2.893 [106] random_holdout_set_from_training_data's score is: -2.889 [109] random_holdout_set_from_training_data's score is: -2.886 [112] random_holdout_set_from_training_data's score is: -2.869 [115] random_holdout_set_from_training_data's score is: -2.875 [118] random_holdout_set_from_training_data's score is: -2.852 [121] random_holdout_set_from_training_data's score is: -2.855 [124] random_holdout_set_from_training_data's score is: -2.848 [127] random_holdout_set_from_training_data's score is: -2.854 [130] random_holdout_set_from_training_data's score is: -2.86 [133] random_holdout_set_from_training_data's score is: -2.857 [136] random_holdout_set_from_training_data's score is: -2.854 [139] random_holdout_set_from_training_data's score is: -2.856 [142] random_holdout_set_from_training_data's score is: -2.854 [145] random_holdout_set_from_training_data's score is: -2.845 [148] random_holdout_set_from_training_data's score is: -2.84 [151] random_holdout_set_from_training_data's score is: -2.838 [154] random_holdout_set_from_training_data's score is: -2.838 [157] random_holdout_set_from_training_data's score is: -2.839 [160] random_holdout_set_from_training_data's score is: -2.837 [163] random_holdout_set_from_training_data's score is: -2.838 [166] random_holdout_set_from_training_data's score is: -2.838 [169] random_holdout_set_from_training_data's score is: -2.84 [172] random_holdout_set_from_training_data's score is: -2.828 [175] random_holdout_set_from_training_data's score is: -2.836 [178] random_holdout_set_from_training_data's score is: -2.834 [181] random_holdout_set_from_training_data's score is: -2.836 [184] random_holdout_set_from_training_data's score is: -2.837 [187] random_holdout_set_from_training_data's score is: -2.86 [190] random_holdout_set_from_training_data's score is: -2.862 [193] random_holdout_set_from_training_data's score is: -2.856 [196] random_holdout_set_from_training_data's score is: -2.855 [199] random_holdout_set_from_training_data's score is: -2.857 [202] random_holdout_set_from_training_data's score is: -2.856 [205] random_holdout_set_from_training_data's score is: -2.86 [208] random_holdout_set_from_training_data's score is: -2.859 [211] random_holdout_set_from_training_data's score is: -2.857 [214] random_holdout_set_from_training_data's score is: -2.855 [217] random_holdout_set_from_training_data's score is: -2.852 [220] random_holdout_set_from_training_data's score is: -2.849 [223] random_holdout_set_from_training_data's score is: -2.853 [226] random_holdout_set_from_training_data's score is: -2.845 [229] random_holdout_set_from_training_data's score is: -2.846 [232] random_holdout_set_from_training_data's score is: -2.849 The number of estimators that were the best for this training dataset: 172 The best score on the holdout set: -2.827876248876794 Finished training the pipeline! Total training time: 0:00:01Here are the results from our GradientBoostingRegressor predicting MEDV Calculating feature responses, for advanced analytics. The printed list will only contain at most the top 100 features. +----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------+ | | Feature Name | Importance | Delta | FR_Decrementing | FR_Incrementing | FRD_abs | FRI_abs | FRD_MAD | FRI_MAD | |----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------| | 12 | CHAS=0.0 | 0.0000 | nan | nan | nan | nan | nan | nan | nan | | 1 | ZN | 0.0004 | 11.5619 | -0.0194 | 0.0204 | 0.0205 | 0.0230 | 0.0000 | 0.0000 | | 13 | CHAS=1.0 | 0.0005 | nan | nan | nan | nan | nan | nan | nan | | 2 | INDUS | 0.0031 | 3.4430 | 0.1103 | 0.0494 | 0.1565 | 0.1543 | 0.0597 | 0.0000 | | 7 | RAD | 0.0059 | 4.2895 | -0.3558 | 0.0537 | 0.3620 | 0.1431 | 0.3727 | 0.0000 | | 5 | AGE | 0.0105 | 13.9801 | 0.2805 | -0.3050 | 0.5735 | 0.4734 | 0.3615 | 0.2435 | | 10 | B | 0.0118 | 45.7266 | -0.1885 | 0.1507 | 0.3139 | 0.2903 | 0.1688 | 0.0582 | | 8 | TAX | 0.0167 | 82.9834 | 1.1477 | -0.4399 | 1.2920 | 0.4563 | 0.2671 | 0.2617 | | 9 | PTRATIO | 0.0247 | 1.1130 | 0.5095 | -0.2323 | 0.5599 | 0.4590 | 0.2984 | 0.3357 | | 0 | CRIM | 0.0284 | 4.4320 | -0.4701 | -0.2061 | 0.7788 | 0.4938 | 0.5027 | 0.2806 | | 3 | NOX | 0.0298 | 0.0588 | 0.3083 | -0.1691 | 0.4285 | 0.3968 | 0.0745 | 0.0745 | | 6 | DIS | 0.0608 | 1.0643 | 3.4966 | -0.3628 | 3.5823 | 0.8045 | 0.9935 | 0.3655 | | 4 | RM | 0.3571 | 0.3543 | -1.2174 | 1.4995 | 1.3628 | 1.7090 | 0.7740 | 1.0375 | | 11 | LSTAT | 0.4504 | 3.5508 | 1.9849 | -1.8635 | 2.0343 | 1.9289 | 1.8354 | 1.5375 | +----+----------------+--------------+----------+-------------------+-------------------+-----------+-----------+-----------+-----------+******* Legend: Importance = Feature ImportanceExplanation: A weighted measure of how much of the variance the model is able to explain is due to this column FR_delta = Feature Response Delta AmountExplanation: Amount this column was incremented or decremented by to calculate the feature reponses FR_Decrementing = Feature Response From Decrementing Values In This Column By One FR_deltaExplanation: Represents how much the predicted output values respond to subtracting one FR_delta amount from every value in this column FR_Incrementing = Feature Response From Incrementing Values In This Column By One FR_deltaExplanation: Represents how much the predicted output values respond to adding one FR_delta amount to every value in this column FRD_MAD = Feature Response From Decrementing- Median Absolute DeltaExplanation: Takes the absolute value of all changes in predictions, then takes the median of those. Useful for seeing if decrementing this feature provokes strong changes that are both positive and negative FRI_MAD = Feature Response From Incrementing- Median Absolute DeltaExplanation: Takes the absolute value of all changes in predictions, then takes the median of those. Useful for seeing if incrementing this feature provokes strong changes that are both positive and negative FRD_abs = Feature Response From Decrementing Avg Absolute ChangeExplanation: What is the average absolute change in predicted output values to subtracting one FR_delta amount to every value in this column. Useful for seeing if output is sensitive to a feature, but not in a uniformly positive or negative way FRI_abs = Feature Response From Incrementing Avg Absolute ChangeExplanation: What is the average absolute change in predicted output values to adding one FR_delta amount to every value in this column. Useful for seeing if output is sensitive to a feature, but not in a uniformly positive or negative way *******None*********************************************** Advanced scoring metrics for the trained regression model on this particular dataset:Here is the overall RMSE for these predictions: 2.4474947386663786Here is the average of the predictions: 21.2925792927Here is the average actual value on this validation set: 21.4882352941Here is the median prediction: 20.457423442279662Here is the median actual value: 20.15Here is the mean absolute error: 1.844793596155306Here is the median absolute error (robust to outliers): 1.3340192567295777Here is the explained variance: 0.9188375538746201Here is the R-squared value: 0.9183155397464807 Count of positive differences (prediction > actual): 51 Count of negative differences: 51 Average positive difference: 1.64913759477 Average negative difference: -2.04044959754***********************************************We have saved the trained pipeline to a filed called "auto_ml_saved_pipeline.dill" It is saved in the directory: C:\Users\18706\Desktop\myBlogs\auto_ml_use To use it to get predictions, please follow the following flow (adjusting for your own uses as necessary:`from auto_ml.utils_models import load_ml_model `trained_ml_pipeline = load_ml_model("auto_ml_saved_pipeline.dill") `trained_ml_pipeline.predict(data)`Note that this pickle/dill file can only be loaded in an environment with the same modules installed, and running the same Python version. This version of Python is: sys.version_info(major=2, minor=7, micro=13, releaselevel='final', serial=0)When passing in new data to get predictions on, columns that were not present (or were not found to be useful) in the training data will be silently ignored. It is worthwhile to make sure that you feed in all the most useful data points though, to make sure you can get the highest quality predictions. =====================predictions=========================== [23.503099796820333, 32.63486484873551, 17.607843570794248, 22.96364141712182, 18.037259790025, 22.154154350077157, 18.157171399351753, 14.490724400217747, 20.91569106207268, 21.371745165599958, 19.978460029298827, 17.617959317911595, 6.657480263073871, 21.259425283809687, 19.30470390603625, 23.54754498054679, 20.616057833202493, 8.569816325663448, 45.01902942229479, 15.319975928505148, 23.84765254861352, 24.49050663723932, 12.344561585629016, 23.24874551694055, 15.137348894013865, 15.067038653704085, 21.674735923166942, 12.88017013620315, 19.43339890697579, 20.933210490656045, 20.235546222120107, 22.99264652948031, 20.45638944287541, 20.50831821637611, 14.026411558432988, 17.14000803427353, 34.322736768893236, 19.82116882409099, 20.757084718131125, 23.523990773770624, 17.92101235838185, 30.745980540024213, 45.09505946725109, 18.76719301853909, 23.69250732281568, 14.627546717865679, 15.404318347865019, 23.856332667077602, 18.597560915078148, 28.295069087679007, 20.335783749261154, 35.49551328178157, 17.049478769941757, 27.36240739278428, 49.168123673644864, 21.919364008618228, 16.431621230418827, 32.50614954154076, 22.60486571683311, 17.190717714534216, 24.86659240393153, 34.726632201151446, 32.56154963374883, 17.991423510542266, 23.19139847589728, 16.3827778391806, 13.763406903575234, 23.041746542718485, 28.897952087920405, 15.16115409656009, 20.54704218671605, 27.630784534960636, 9.265217126500687, 20.218468086624206, 22.678130640115423, 3.978712919679104, 20.458457441683915, 44.47945990229906, 12.603336785642627, 11.482102006681343, 21.066151218556975, 13.559181962607349, 21.19973222974325, 10.447704116792627, 20.110776756244167, 28.928923567731772, 15.527462244687818, 23.24725371877329, 25.743821297087276, 18.04671684265537, 22.950747524482065, 9.088864852661203, 19.075035374223955, 18.42257896844079, 23.564483816162195, 19.647455910849818, 44.12778583727594, 11.427374611849514, 12.040264853009598, 16.998049081305517, 20.25692214075818, 22.80453061159547] =====================predictions=========================== [[1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0], [1, 0]] [Finished in 3.3s]

? ? ? ?在官方提供的原始實例上，我對輸出多加了一層類別的輸出。

? ? ? ?完整的程序如下：

#!usr/bin/env python #encoding:utf-8 from __future__ import division''' __Author__:沂水寒城功能： auto_ml 學習實踐使用GitHub地址：https://github.com/yishuihanhan/auto_ml 官方文檔：https://auto-ml.readthedocs.io/en/latest/formatting_data.html '''from auto_ml import Predictor from auto_ml.utils import get_boston_dataset from auto_ml.utils_models import load_ml_modeldef bostonSimpleFunc():'''波士頓房價數據的簡單應用實例'''train_data,test_data = get_boston_dataset()column_descriptions = {'MEDV': 'output','CHAS': 'categorical'}ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)ml_predictor.train(train_data)ml_predictor.score(test_data, test_data.MEDV)def bostonWholeFunc():'''波士頓房價數據的一個比較完整的實例包括：訓練測試數據集劃分、模型訓練、模型持久化、模型加載、模型預測'''train_data,test_data = get_boston_dataset()column_descriptions = {'MEDV': 'output', 'CHAS': 'categorical'}ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)ml_predictor.train(train_data)test_score = ml_predictor.score(test_data, test_data.MEDV)file_name = ml_predictor.save()trained_model = load_ml_model(file_name)predictions = trained_model.predict(test_data)print('=====================predictions===========================')print(predictions)predictions = trained_model.predict_proba(test_data)print('=====================predictions===========================')print(predictions)if __name__=='__main__':bostonSimpleFunc()bostonWholeFunc()

? ? ? ? 相應地GitHub地址和官方文檔地址在代碼的開頭部分我都給出來了，感興趣的話可以去看看，記錄學習了！

總結

以上是生活随笔為你收集整理的python中的auto_ml自动机器学习框架学习实践的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：实时音频编解码之二编码学数学知识
下一篇：声卡驱动正常但就是没有声音,驱动人生解决

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

python

python中的auto_ml自动机器学习框架学习实践

總結