當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

回归：预测燃油效率

發布時間：2024/10/8 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了回归：预测燃油效率小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

回歸：預測燃油效率

在一個回歸問題中，我們的目標是預測一個連續值的輸出，比如價格或概率。這與一個分類問題形成對比，我們的目標是從一系列類中選擇一個類（例如，一張圖片包含一個蘋果或一個橘子，識別圖片中的水果）。

本筆記本使用經典的[auto-mpg]（https://archive.ics.uci.edu/ml/datasets/auto+mpg）數據集，建立了預測70年代末和80年代初汽車燃油效率的模型。為了做到這一點，我們將為該模型提供從那個時期開始的許多汽車的描述。此描述包括以下屬性：氣缸、排量、馬力和重量。

此示例使用“tf.keras”API，有關詳細信息，請參閱[本指南]（https://www.tensorflow.org/guide/keras）。

import pathlib import matplotlib.pyplot as plt import pandas as pd import seaborn as sns import keras from keras import layers %matplotlib inline

The Auto MPG dataset

The dataset is available from the UCI Machine Learning Repository.

Get the data

First download the dataset.

dataset_path = keras.utils.get_file("auto-mpg.data", "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data") dataset_path 'C:\\Users\\YIUYE\\.keras\\datasets\\auto-mpg.data'

Import it using pandas

column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight','Acceleration', 'Model Year', 'Origin'] raw_dataset = pd.read_csv(dataset_path, names=column_names,na_values = "?", comment='\t',sep=" ", skipinitialspace=True)dataset = raw_dataset.copy() dataset.tail() MPGCylindersDisplacementHorsepowerWeightAccelerationModel YearOrigin393394395396397

27.0	4	140.0	86.0	2790.0	15.6	82	1
44.0	4	97.0	52.0	2130.0	24.6	82	2
32.0	4	135.0	84.0	2295.0	11.6	82	1
28.0	4	120.0	79.0	2625.0	18.6	82	1
31.0	4	119.0	82.0	2720.0	19.4	82	1

Clean the data

The dataset contains a few unknown values.

dataset.isnull().sum() MPG 0 Cylinders 0 Displacement 0 Horsepower 6 Weight 0 Acceleration 0 Model Year 0 Origin 0 dtype: int64

To keep this initial tutorial simple drop those rows.

dataset = dataset.dropna()

The "Origin" column is really categorical, not numeric. So convert that to a one-hot:

origin = dataset.pop('Origin') dataset['USA'] = (origin == 1)*1.0 dataset['Europe'] = (origin == 2)*1.0 dataset['Japan'] = (origin == 3)*1.0 dataset.tail() MPGCylindersDisplacementHorsepowerWeightAccelerationModel YearUSAEuropeJapan393394395396397

27.0	4	140.0	86.0	2790.0	15.6	82	1.0	0.0
44.0	4	97.0	52.0	2130.0	24.6	82	0.0	1.0
32.0	4	135.0	84.0	2295.0	11.6	82	1.0	0.0
28.0	4	120.0	79.0	2625.0	18.6	82	1.0	0.0
31.0	4	119.0	82.0	2720.0	19.4	82	1.0	0.0

現在將數據集拆分為一個訓練集和一個測試集。

我們將在模型的最終評估中使用測試集。

train_dataset = dataset.sample(frac=0.8,random_state=0) test_dataset = dataset.drop(train_dataset.index) sns.pairplot(train_dataset[[ "Cylinders", "Displacement", "Weight"]], diag_kind="kde") sns.set()

Also look at the overall statistics:

train_stats = train_dataset.describe() train_stats.pop("MPG") train_stats = train_stats.transpose() train_stats countmeanstdmin25%50%75%maxCylindersDisplacementHorsepowerWeightAccelerationModel YearUSAEuropeJapan

314.0	5.477707	1.699788	3.0	4.00	4.0	8.00	8.0
314.0	195.318471	104.331589	68.0	105.50	151.0	265.75	455.0
314.0	104.869427	38.096214	46.0	76.25	94.5	128.00	225.0
314.0	2990.251592	843.898596	1649.0	2256.50	2822.5	3608.00	5140.0
314.0	15.559236	2.789230	8.0	13.80	15.5	17.20	24.8
314.0	75.898089	3.675642	70.0	73.00	76.0	79.00	82.0
314.0	0.624204	0.485101	0.0	0.00	1.0	1.00	1.0
314.0	0.178344	0.383413	0.0	0.00	0.0	0.00	1.0
314.0	0.197452	0.398712	0.0	0.00	0.0	0.00	1.0

Split features from labels

Separate the target value, or “label”, from the features. This label is the value that you will train the model to predict.

train_labels = train_dataset.pop('MPG') test_labels = test_dataset.pop('MPG')

Normalize the data

Look again at the train_stats block above and note how different the ranges of each feature are.

規范化使用不同尺度和范圍的特征是一個很好的實踐。雖然模型可能在沒有特征規范化的情況下收斂，但它使訓練變得更加困難，并且使生成的模型依賴于輸入中使用的單元的選擇。

注意：盡管我們有意只從訓練數據集生成這些統計信息，但這些統計信息也將用于規范化測試數據集。我們需要這樣做，以將測試數據集投影到模型所訓練的相同分發中。

def norm(x):return (x - train_stats['mean']) / train_stats['std'] normed_train_data = norm(train_dataset) normed_test_data = norm(test_dataset) def build_model():model = keras.Sequential([layers.Dense(64, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]),layers.Dense(64, activation=tf.nn.relu),layers.Dense(1)])optimizer = keras.optimizers.RMSprop(0.001)model.compile(loss='mean_squared_error',optimizer=optimizer,metrics=['mean_absolute_error', 'mean_squared_error'])return model model = build_model() model.summary() _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_10 (Dense) (None, 64) 640 _________________________________________________________________ dense_11 (Dense) (None, 64) 4160 _________________________________________________________________ dense_12 (Dense) (None, 1) 65 ================================================================= Total params: 4,865 Trainable params: 4,865 Non-trainable params: 0 _________________________________________________________________

Now try out the model. Take a batch of 10 examples from the training data and call model.predict on it.

example_batch = normed_train_data[:10] example_result = model.predict(example_batch) example_result array([[-0.03468257],[-0.01342154],[-0.15384783],[-0.18010283],[ 0.03922582],[-0.12172151],[ 0.10603201],[ 0.2442987 ],[ 0.00099315],[ 0.18530795]], dtype=float32)

It seems to be working, and it produces a result of the expected shape and type.

Train the model

Train the model for 1000 epochs, and record the training and validation accuracy in the history object.

# Display training progress by printing a single dot for each completed epoch class PrintDot(keras.callbacks.Callback):def on_epoch_end(self, epoch, logs):if epoch % 100 == 0: print('')print('.', end='')EPOCHS = 1000history = model.fit(normed_train_data, train_labels,epochs=EPOCHS, validation_split = 0.2, verbose=0,callbacks=[PrintDot()]) .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... hist = pd.DataFrame(history.history) hist['epoch'] = history.epoch hist.tail() lossmean_absolute_errormean_squared_errorval_lossval_mean_absolute_errorval_mean_squared_errorepoch995996997998999

2.075518	0.940943	2.075518	8.913726	2.351839	8.913726	995
2.130111	0.953561	2.130111	9.769884	2.438282	9.769884	996
2.221040	0.951258	2.221040	9.664708	2.382888	9.664708	997
2.301870	0.980407	2.301870	9.934311	2.425505	9.934311	998
2.002580	0.887644	2.002580	9.484982	2.414742	9.484982	999

def plot_history(history):hist = pd.DataFrame(history.history)hist['epoch'] = history.epochplt.figure()plt.xlabel('Epoch')plt.ylabel('Mean Abs Error [MPG]')plt.plot(hist['epoch'], hist['mean_absolute_error'],label='Train Error')plt.plot(hist['epoch'], hist['val_mean_absolute_error'],label = 'Val Error')plt.ylim([0,5])plt.legend()plt.figure()plt.xlabel('Epoch')plt.ylabel('Mean Square Error [$MPG^2$]')plt.plot(hist['epoch'], hist['mean_squared_error'],label='Train Error')plt.plot(hist['epoch'], hist['val_mean_squared_error'],label = 'Val Error')plt.ylim([0,20])plt.legend()plt.show()plot_history(history)

此圖顯示在大約100個周期后，驗證錯誤幾乎沒有改善，甚至惡化。讓我們更新“model.fit”調用，以便在驗證分數沒有提高時自動停止培訓。我們將使用一個早期的回調來測試每個時代的訓練條件。如果一個設定的時間段沒有顯示出改善，那么自動停止訓練。

model = build_model()# The patience parameter is the amount of epochs to check for improvement early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)history = model.fit(normed_train_data, train_labels, epochs=EPOCHS,validation_split = 0.2, verbose=0, callbacks=[early_stop, PrintDot()])plot_history(history) .................................................

loss, mae, mse = model.evaluate(normed_test_data, test_labels, verbose=0)print("Testing set Mean Abs Error: {:5.2f} MPG".format(mae)) Testing set Mean Abs Error: 1.79 MPG

Make predictions

Finally, predict MPG values using data in the testing set:

test_predictions = model.predict(normed_test_data).flatten()plt.scatter(test_labels, test_predictions) plt.xlabel('True Values [MPG]') plt.ylabel('Predictions [MPG]') plt.axis('equal') plt.axis('square') plt.xlim([0,plt.xlim()[1]]) plt.ylim([0,plt.ylim()[1]]) _ = plt.plot([-100, 100], [-100, 100])

error = test_predictions - test_labels plt.hist(error, bins = 25) plt.xlabel("Prediction Error [MPG]") _ = plt.ylabel("Count")

它不是很高斯的，但是我們可以預期，因為樣本的數量非常小。

總結

以上是生活随笔為你收集整理的回归：预测燃油效率的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：有谁可以说下阿克蒂思卫浴AQ390智能马
下一篇：特斯拉发布第三季度财报总营收87.