當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

贝叶斯优化神经网络参数_贝叶斯超参数优化：神经网络，TensorFlow，相预测示例

發(fā)布時(shí)間：2023/12/15 编程问答 23 豆豆

生活随笔收集整理的這篇文章主要介紹了贝叶斯优化神经网络参数_贝叶斯超参数优化：神经网络，TensorFlow，相预测示例小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

貝葉斯優(yōu)化神經(jīng)網(wǎng)絡(luò)參數(shù)

The purpose of this work is to optimize the neural network model hyper-parameters to estimate facies classes from well logs. I will include some codes in this paper but for a full jupyter notebook file, you can visit my Github.

這項(xiàng)工作的目的是優(yōu)化神經(jīng)網(wǎng)絡(luò)模型的超參數(shù)，以從測井中估計(jì)相類。我將在本文中包括一些代碼，但要獲取完整的jupyter筆記本文件，您可以訪問我的Github 。

note: if you are new in TensorFlow, its installation elaborated by Jeff Heaton.

注意：如果您是TensorFlow的新手，那么其安裝將由Jeff Heaton進(jìn)行詳細(xì)說明。

In machine learning, model parameters can be divided into two main categories:1- Trainable parameters: such as weights in neural networks learned by training algorithms and the user does not interfere in the process,2- Hyper-parameters: users can set them before training operation such as learning rate or the number of dense layers in the model.Selecting the best hyper-parameters can be a tedious task if you try it by hand and it is almost impossible to find the best ones if you are dealing with more than two parameters.One way is to divide each parameter into a valid evenly range and then simply ask the computer to loop for the combination of parameters and calculate the results. The method is called Grid Search. Although it is done by machine, it will be a time-consuming process. Suppose you have 3 hyper-parameters with 10 possible values in each. In this approach, you will run 103 neural network models (even with reasonable training datasets size, this task is huge).Another way is a random search approach. In fact, instead of using organized parameter searching, it will go through a random combination of parameters and look for the optimized ones. You may estimate that chance of success decreases to zero for larger hyper-parameter tunings.

在機(jī)器學(xué)習(xí)中，模型參數(shù)可以分為兩大類： 1-可 訓(xùn)練參數(shù) ：例如通過訓(xùn)練算法學(xué)習(xí)的神經(jīng)網(wǎng)絡(luò)權(quán)重，并且用戶不會干擾過程； 2- 超參數(shù)：用戶可以在設(shè)置參數(shù)之前訓(xùn)練操作，例如學(xué)習(xí)率或模型中的密集層數(shù)。如果您手動嘗試選擇最佳超參數(shù)可能是一項(xiàng)繁瑣的任務(wù)，并且如果您要處理的參數(shù)過多，則幾乎找不到最佳參數(shù)兩個(gè)參數(shù)。一種方法是將每個(gè)參數(shù)平均劃分為有效范圍，然后簡單地讓計(jì)算機(jī)循環(huán)以獲取參數(shù)組合并計(jì)算結(jié)果。該方法稱為“ 網(wǎng)格搜索” 。盡管它是由機(jī)器完成的，但這將是一個(gè)耗時(shí)的過程。假設(shè)您有3個(gè)超參數(shù)，每個(gè)參數(shù)都有10個(gè)可能的值。在這種方法中，您將運(yùn)行103神經(jīng)網(wǎng)絡(luò)模型(即使具有合理的訓(xùn)練數(shù)據(jù)集大小，此任務(wù)也非常艱巨)。另一種方法是隨機(jī)搜索方法。實(shí)際上，它會使用參數(shù)的隨機(jī)組合并尋找經(jīng)過優(yōu)化的參數(shù)，而不是使用有組織的參數(shù)搜索。您可能會估計(jì)，對于較大的超參數(shù)調(diào)整，成功的機(jī)會將減少為零。

Scikit-Optimize, skopt, which we will use here to the facies estimation task, is a simple and efficient library to minimize expensive noisy black-box functions. Bayesian optimization constructs another model of search-space for parameters. Gaussian Process is one kind of these models. This generates an estimate of how model performance varies with hyper-parameter changes.

Scikit-Optimize skopt是一個(gè)簡單而有效的庫，可最大程度地減少昂貴的嘈雜黑盒功能，我們將在此處將其用于相估計(jì)任務(wù)。貝葉斯優(yōu)化為參數(shù)構(gòu)造了另一種搜索空間模型。高斯過程就是這些模型中的一種。這樣就可以估算模型性能如何隨超參數(shù)變化而變化。

As we see in the picture, the true objective function(red dash line) is surrounded by noise (red shade). The red line shows how scikit optimize sampled the search space for hyper-parameters(one dimension). Scikit-optimize fills the area between sample points with the Gaussian process (green line) and estimates true real fitness value. In the areas with low samples or lack(like the left side of the picture between two red samples), there is great uncertainty (big difference between red and green lines causing big uncertainty green shade area such as two standard deviations uncertainty).In this process, then we ask a new set of hyper-parameter to explore more search space. In the initial steps, it goes with sparse accuracy but in later iterations, it focuses on where sampling points are more with the good agreement of fitness function with true objective function(trough area in the graph).For more study, you may refer to Scikit Optimize documentation.

如圖所示，真正的目標(biāo)函數(shù)(紅色虛線)被噪聲(紅色陰影)包圍。紅線顯示scikit如何優(yōu)化對超參數(shù)(一維)的搜索空間進(jìn)行采樣。 Scikit優(yōu)化使用高斯過程(綠線)填充采樣點(diǎn)之間的區(qū)域，并估算真實(shí)的實(shí)際適應(yīng)度值。在樣本較少或不足的區(qū)域(例如兩個(gè)紅色樣本之間的圖片左側(cè))，存在很大的不確定性(紅色和綠色線條之間的差異很大，導(dǎo)致綠色陰影區(qū)域的不確定性較大，例如兩個(gè)標(biāo)準(zhǔn)偏差不確定性)。過程，然后我們要求使用一組新的超參數(shù)來探索更多的搜索空間。在最初的步驟中，它具有稀疏的準(zhǔn)確性，但是在以后的迭代中，它著重于采樣點(diǎn)更多，適應(yīng)度函數(shù)與真實(shí)目標(biāo)函數(shù)(圖中的谷值區(qū)域)具有良好一致性的地方。更多的研究，您可以參考Scikit優(yōu)化文檔。

Data ReviewThe Council Grove gas reservoir is located in Kansas. From this carbonate reservoir, nine wells are available. Facies are studied from core samples in every half foot and matched with logging data in well location. Feature variables include five from wireline log measurements and two geologic constraining variables that are derived from geologic knowledge. For more detail refer here. For the dataset, you may download it from here. The seven variables are:

數(shù)據(jù)審查 Council Grove儲氣庫位于堪薩斯州。從該碳酸鹽巖儲層中可獲得九口井。從每半英尺的巖心樣本中研究巖相，并與井眼位置的測井?dāng)?shù)據(jù)相匹配。特征變量包括來自測井測井的五個(gè)變量和來自地質(zhì)知識的兩個(gè)地質(zhì)約束變量。有關(guān)更多詳細(xì)信息，請參見此處。對于數(shù)據(jù)集，您可以從此處下載。七個(gè)變量是：

GR: this wireline logging tools measure gamma emission

GR ：此電纜測井工具可測量伽馬輻射

ILD_log10: this is resistivity measurement

ILD_log10 ：這是電阻率測量

PE: photoelectric effect log

PE ：光電效應(yīng)記錄

DeltaPHI: Phi is a porosity index in petrophysics.

DeltaPHI ：Phi是巖石物理學(xué)中的Kong隙度指數(shù)。

PNHIND: Average of neutron and density log.

PNHIND ：中子和密度對數(shù)的平均值。

NM_M:nonmarine-marine indicator

NM_M ：非海洋-海洋指示器

RELPOS: relative position

RELPOS ：相對位置

The nine discrete facies (classes of rocks) are:

九個(gè)離散相(巖石類別)為：

(SS) Nonmarine sandstone

(SS)淺海砂巖

(CSiS) Nonmarine coarse siltstone

(CSiS)淺海粗粉砂巖

(FSiS) Nonmarine fine siltstone

(FSiS)船用細(xì)粉砂巖

(SiSH) Marine siltstone and shale

(SiSH)海洋粉砂巖和頁巖

(MS) Mudstone (limestone)

(MS)泥巖(石灰石)

(WS) Wackestone (limestone)

(WS) Wackestone(石灰石)

(D) Dolomite

(D)白云石

(PS) Packstone-grainstone (limestone)

(PS) Packstone-grainstone(石灰石)

(BS) Phylloid-algal bafflestone (limestone)

(BS) Phylloid-alal擋板石(石灰石)

After reading the dataset into python, we can keep one well data as a blind set for future model performance examination. We also need to convert facies numbers into strings in the dataset. Refer to the full notebook.

將數(shù)據(jù)集讀入python后，我們可以保留一個(gè)井?dāng)?shù)據(jù)作為盲集，以供將來進(jìn)行模型性能檢查。我們還需要將相序數(shù)字轉(zhuǎn)換為數(shù)據(jù)集中的字符串。請參閱完整的筆記本。

df = pd.read_csv(‘training_data.csv’)
blind = df[df['Well Name'] == 'SHANKLE']
training_data = df[df['Well Name'] != 'SHANKLE']

Feature EngineeringFacies classes should be converted to dummy variable in order to use in neural network:

為了將其用于神經(jīng)網(wǎng)絡(luò)，應(yīng)將特征工程相類轉(zhuǎn)換為虛擬變量：

dummies = pd.get_dummies(training_data[‘FaciesLabels’])
Facies_cat = dummies.columns
labels = dummies.values # target matirx# select predictors
features = training_data.drop(['Facies', 'Formation', 'Well Name', 'Depth','FaciesLabels'], axis=1)

預(yù)處理(使標(biāo)準(zhǔn)) (Preprocessing (make standard))

As we are dealing with various range of data, to make network efficient, let’s normalize it.

當(dāng)我們處理各種數(shù)據(jù)時(shí)，為了使網(wǎng)絡(luò)高效，我們將其標(biāo)準(zhǔn)化。

from sklearn import preprocessing
scaler = preprocessing.StandardScaler().fit(features)
scaled_features = scaler.transform(features)#Data split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
scaled_features, labels, test_size=0.2, random_state=42)

超參數(shù) (Hyper-Parameters)

In this work, we will predict facies from well logs using deep learning in Tensorflow. There several hyper-parameters that we may adjust for deep learning. I will try to find out the optimized parameters for:

在這項(xiàng)工作中，我們將使用Tensorflow中的深度學(xué)習(xí)來預(yù)測測井相。我們可以為深度學(xué)習(xí)調(diào)整一些超參數(shù)。我將嘗試找出以下方面的優(yōu)化參數(shù)：

Learning rate

學(xué)習(xí)率

Number of dense layers

致密層數(shù)

Number of nodes for each layer

每層的節(jié)點(diǎn)數(shù)

Which activation function: ‘relu’ or sigmoid

哪個(gè)激活功能：“ relu”或Sigmoid

To elaborate in this search dimension, we will use scikit-optimize(skopt) library. From skopt, real function will define our favorite range(lower bound = 1e-6, higher bound = 1e-1) for learning rate and will use logarithmic transformation. The search dimension for the number of layers (we look between 1 to 5) and each layer’s node amounts(between 5 to 512) can be implemented with Integer function of skopt.

為了詳細(xì)說明此搜索維度，我們將使用scikit-optimize(skopt)庫。從skopt中，實(shí)函數(shù)將定義我們喜歡的范圍(下限= 1e-6，上界= 1e-1)以提高學(xué)習(xí)率，并將使用對數(shù)轉(zhuǎn)換。可以使用skopt的Integer函數(shù)實(shí)現(xiàn)搜索層數(shù)(我們在1到5之間)和每個(gè)層的節(jié)點(diǎn)數(shù)量(在5到512之間)的搜索維度。

dim_learning_rate = Real(low=1e-6, high=1e-1, prior='log-uniform',
name='learning_rate')dim_num_dense_layers = Integer(low=1, high=10, name='num_dense_layers')dim_num_dense_nodes = Integer(low=5, high=512, name='num_dense_nodes')

For activation algorithms, we should use categorical function for optimization.

對于激活算法，我們應(yīng)該使用分類函數(shù)進(jìn)行優(yōu)化。

dim_activation = Categorical(categories=['relu', 'sigmoid'],
name='activation')

Bring all search-dimensions into a single list:

將所有搜索維度合并到一個(gè)列表中：

dimensions = [dim_learning_rate,
dim_num_dense_layers,
dim_num_dense_nodes,
dim_activation]

If you already worked with deep learning for a specific project and found your hyper-parameters by hand for that project, you know how hard it is to optimize. You may also use your own guess (like mine as default) to compare the results with the Bayesian tuning approach.

如果您已經(jīng)為特定項(xiàng)目進(jìn)行了深度學(xué)習(xí)，并且手動找到了該項(xiàng)目的超參數(shù)，那么您就會知道優(yōu)化的難度。您也可以使用自己的猜測(例如默認(rèn)為我的)將結(jié)果與貝葉斯調(diào)整方法進(jìn)行比較。

default_parameters = [1e-5, 1, 16, ‘relu’]

超參數(shù)優(yōu)化 (Hyper-Parameter Optimization)

建立模型 (Create Model)

Like some examples developed by Tneseflow, we also need to define a model function first. After defining the type of model(Sequential here), we need to introduce the data dimension (data shape) in the first line. The number of layers and activation types are those two hyper-parameters that we are looking for to optimize. Softmax activation should be used for classification problems. Then another hyper-parameter is the learning rate which should be defined in the Adam function. The model should be compiled considering that loss function should be ‘categorical_crossentropy’ as we are dealing with the classification problems (facies prediction).

像Tneseflow開發(fā)的一些示例一樣，我們還需要首先定義一個(gè)模型函數(shù)。定義模型的類型(此處為順序)后，我們需要在第一行中介紹數(shù)據(jù)維度(數(shù)據(jù)形狀)。層數(shù)和激活類型是我們要優(yōu)化的那兩個(gè)超參數(shù)。 Softmax激活應(yīng)用于分類問題。然后另一個(gè)超參數(shù)是學(xué)習(xí)率，該學(xué)習(xí)率應(yīng)在亞當(dāng)函數(shù)中定義。在處理分類問題(相預(yù)測)時(shí)，應(yīng)考慮損失函數(shù)應(yīng)為“ categorical_crossentropy”，對模型進(jìn)行編譯。

def create_model(learning_rate, num_dense_layers,
num_dense_nodes, activation):

model = Sequential()
model.add(InputLayer(input_shape=(scaled_features.shape[1])))

for i in range(num_dense_layers):
name = 'layer_dense_{0}'.format(i+1)
# add dense layer
model.add(Dense(num_dense_nodes,
activation=activation,
name=name))
# use softmax-activation for classification.
model.add(Dense(labels.shape[1], activation='softmax'))

# Use the Adam method for training the network.
optimizer = Adam(lr=learning_rate)

#compile the model so it can be trained.
model.compile(optimizer=optimizer,
loss='categorical_crossentropy',
metrics=['accuracy'])

return model

訓(xùn)練和評估模型 (Train and Evaluate the Model)

This function aims to create and train a network with given hyper-parameters and then evaluate model performance with the validation dataset. It returns fitness value, negative classification accuracy on the dataset. It is negative because skopt performs minimization rather than maximization.

此功能旨在創(chuàng)建和訓(xùn)練具有給定超參數(shù)的網(wǎng)絡(luò)，然后使用驗(yàn)證數(shù)據(jù)集評估模型性能。它在數(shù)據(jù)集上返回適應(yīng)度值，負(fù)分類精度。這是負(fù)面的，因?yàn)閟kopt會執(zhí)行最小化而不是最大化。

@use_named_args(dimensions=dimensions)def fitness(learning_rate, num_dense_layers,
num_dense_nodes, activation):
""" Hyper-parameters: learning_rate: Learning-rate for the optimizer. num_dense_layers: Number of dense layers. num_dense_nodes: Number of nodes in each dense layer. activation: Activation function for all layers. """
# Print the hyper-parameters.
print('learning rate: {0:.1e}'.format(learning_rate))
print('num_dense_layers:', num_dense_layers)
print('num_dense_nodes:', num_dense_nodes)
print('activation:', activation)
print()

# Create the neural network with these hyper-parameters.
model = create_model(learning_rate=learning_rate,
num_dense_layers=num_dense_layers,
num_dense_nodes=num_dense_nodes,
activation=activation)
# Dir-name for the TensorBoard log-files.
log_dir = log_dir_name(learning_rate, num_dense_layers,
num_dense_nodes, activation)

# Create a callback-function for Keras which will be
# run after each epoch has ended during training.
# This saves the log-files for TensorBoard.
# Note that there are complications when histogram_freq=1.
# It might give strange errors and it also does not properly
# support Keras data-generators for the validation-set.
callback_log = TensorBoard(
log_dir=log_dir,
histogram_freq=0,
write_graph=True,
write_grads=False,
write_images=False)

# Use Keras to train the model.
history = model.fit(x= X_train,
y= y_train,
epochs=3,
batch_size=128,
validation_data=validation_data,
callbacks=[callback_log])
# Get the classification accuracy on the validation-set
# after the last training-epoch.
accuracy = history.history['val_accuracy'][-1]
# Print the classification accuracy.
print()
print("Accuracy: {0:.2%}".format(accuracy))
print()
# Save the model if it improves on the best-found performance.
# We use the global keyword so we update the variable outside
# of this function.
global best_accuracy
# If the classification accuracy of the saved model is improved ...
if accuracy > best_accuracy:
# Save the new model to harddisk.
model.save(path_best_model)

# Update the classification accuracy.
best_accuracy = accuracy
# Delete the Keras model with these hyper-parameters from memory.
del model

# Clear the Keras session, otherwise it will keep adding new
# models to the same TensorFlow graph each time we create
# a model with a different set of hyper-parameters.
K.clear_session()

# NOTE: Scikit-optimize does minimization so it tries to
# find a set of hyper-parameters with the LOWEST fitness-value.
# Because we are interested in the HIGHEST classification
# accuracy, we need to negate this number so it can be minimized.
return -accuracy# This function exactly comes from :Hvass-Labs, TensorFlow-Tutorials

run this:

運(yùn)行這個(gè)：

fitness(x= default_parameters)

運(yùn)行超參數(shù)優(yōu)化 (Run Hyper-Parameter Optimization)

We already checked the default hyper-parameter performance. Now we can examine Bayesian optimization from scikit-optimize library. Here we use 40 runs for fitness function, though it is an expensive operation and needs to used carefully with datasets.

我們已經(jīng)檢查了默認(rèn)的超參數(shù)性能。現(xiàn)在我們可以從scikit-optimize庫檢查貝葉斯優(yōu)化。在這里，我們使用40個(gè)運(yùn)行來進(jìn)行適應(yīng)度函數(shù)計(jì)算，盡管這是一項(xiàng)昂貴的操作，并且需要謹(jǐn)慎使用數(shù)據(jù)集。

search_result = gp_minimize(func=fitness,
dimensions=dimensions,
acq_func='EI', # Expected Improvement.
n_calls=40,
x0=default_parameters)

just some last runs shows below:

下面是一些最后的運(yùn)行：

進(jìn)度可視化 (Progress visualization)

Using plot_convergence function of skopt, we may see the optimization progress and the best fitness value found on y-axis.

使用skopt的plot_convergence函數(shù)，我們可以在y軸上看到優(yōu)化進(jìn)度和最佳適應(yīng)度值。

plot_convergence(search_result) # plt.savefig("Converge.png", dpi=400)

最佳超參數(shù) (Optimal Hyper-Parameters)

Using the serach_result function, we can see the best hyper-parameter that Bayesian-optimizer generated.

使用serach_result函數(shù)，我們可以看到貝葉斯優(yōu)化器生成的最佳超參數(shù)。

search_result.x

Optimized hyper-parameters are in order: Learning rate, number of dense layers, number of nodes in each layer, and the best activation function.

優(yōu)化的超參數(shù)按順序排列：學(xué)習(xí)率，密集層數(shù)，每層中的節(jié)點(diǎn)數(shù)以及最佳激活功能。

We can see all results for 40 calls with corresponding hyper-parameters and fitness values.

我們可以看到40個(gè)帶有相應(yīng)超參數(shù)和適用性值的呼叫的所有結(jié)果。

sorted(zip(search_result.func_vals, search_result.x_iters))

An interesting point is that the ‘relu’ activation function is almost dominant.

有趣的一點(diǎn)是，“ relu”激活功能幾乎占主導(dǎo)地位。

情節(jié) (Plots)

First, let’s look at 2D plot of two optimized parameters. Here we made landscape-plot of estimated fitness values for learning rate and number of nodes in each layer.The Bayesian optimizer builds a surrogate model of search space and searches inside this dimension rather than real search-space, that is why it is faster. In the plot, the yellow regions are better and blue regions are worse. Balck dots are the optimizer’s sampling location and the red star is the best parameter found.

首先，讓我們看一下兩個(gè)優(yōu)化參數(shù)的二維圖。在這里，我們對學(xué)習(xí)率和每層節(jié)點(diǎn)數(shù)進(jìn)行了適合度估計(jì)值的景觀圖。貝葉斯優(yōu)化器建立了搜索空間的替代模型，并在此維度內(nèi)進(jìn)行搜索，而不是在實(shí)際搜索空間內(nèi)進(jìn)行搜索，這就是為什么它更快的原因。在該圖中，黃色區(qū)域較好，藍(lán)色區(qū)域較差。 Balck點(diǎn)是優(yōu)化程序的采樣位置，紅色星號是找到的最佳參數(shù)。

from skopt.plots import plot_objective_2D
fig = plot_objective_2D(result=search_result,
dimension_identifier1='learning_rate',
dimension_identifier2='num_dense_nodes',
levels=50)# plt.savefig("Lr_numnods.png", dpi=400)

Some points:

一些要點(diǎn)：

The surrogate model can be inaccurate because it is built from only 40 samples of calls to the fitness function

替代模型可能不準(zhǔn)確，因?yàn)樗鼉H由對適應(yīng)性函數(shù)的40個(gè)調(diào)用樣本構(gòu)建而成

The plot may change in each time of optimization re-run because of random noise and training process in NN

由于NN中的隨機(jī)噪聲和訓(xùn)練過程，該圖可能在每次優(yōu)化重新運(yùn)行時(shí)都發(fā)生變化

This is 2D plot, while we optimized 4 parameters and could be imagined 4 dimensions.

這是2D圖，我們優(yōu)化了4個(gè)參數(shù)，可以想象得到4個(gè)維度。

# create a list for plotting
dim_names = ['learning_rate', 'num_dense_layers', 'num_dense_nodes', 'activation' ]fig, ax = plot_objective(result=search_result, dimensions=dim_names)
plt.savefig("all_dimen.png", dpi=400)

In these plots, we can see how the optimization happened. The Bayesian approach tries to fit model parameters with prior info at the points with a higher density of sampling. Gathering all four parameters into a scikit-optimization approach will introduce the best results in this run if the learning rate is about 0.003, the number of dense layers 6, the number of nodes in each layer about 327, and activation function is ‘relu’.

在這些圖中，我們可以看到優(yōu)化是如何發(fā)生的。貝葉斯方法試圖在具有較高采樣密度的點(diǎn)上使模型參數(shù)具有先驗(yàn)信息。如果學(xué)習(xí)率約為0.003，密集層數(shù)為6，每層中的節(jié)點(diǎn)數(shù)為327，激活函數(shù)為“ relu”，則將所有四個(gè)參數(shù)收集到scikit優(yōu)化方法中將在此運(yùn)行中引入最佳結(jié)果。。

使用帶有盲數(shù)據(jù)的優(yōu)化超參數(shù)評估模型 (Evaluate the model with optimized hyper-parameters with blind data)

The same steps of data preparation are required here as well. We skip repeating here. Now we can make a model with optimized parameters to see the prediction.

這里也需要相同的數(shù)據(jù)準(zhǔn)備步驟。我們在這里跳過重復(fù)。現(xiàn)在我們可以建立一個(gè)具有優(yōu)化參數(shù)的模型以查看預(yù)測。

opt_par = search_result.x
# use hyper-parameters from optimization
learning_rate = opt_par[0]
num_layers = opt_par[1]
num_nodes = opt_par[2]
activation = opt_par[3]

create model:

創(chuàng)建模型：

import numpy as npimport tensorflow.kerasfrom tensorflow.keras.models import Sequentialfrom tensorflow.keras.layers import Dense, Activationfrom tensorflow.keras.callbacks import EarlyStopping
model = Sequential()
model.add(InputLayer(input_shape=(scaled_features.shape[1])))
model.add(Dense(num_nodes, activation=activation, kernel_initializer='random_normal'))
model.add(Dense(labels.shape[1], activation='softmax', kernel_initializer='random_normal'))
optimizer = Adam(lr=learning_rate)

model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=20,
verbose=1, mode='auto', restore_best_weights=True)
histories = model.fit(X_train,y_train, validation_data=(X_test,y_test),
callbacks=[monitor],verbose=2,epochs=100)

let’s see the model accuracy development:

讓我們看看模型準(zhǔn)確性的發(fā)展：

plt.plot(histories.history['accuracy'], 'bo')
plt.plot(histories.history['val_accuracy'],'b' )
plt.title('Training and validation accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.savefig("accu.png", dpi=400)
plt.show()

Training and validation accuracy plot shows that almost after 80% accuracy (iteration 10), the model starts to overfit because we can not see improvement in test data prediction accuracy.

訓(xùn)練和驗(yàn)證準(zhǔn)確性圖顯示，幾乎在80％的準(zhǔn)確性(迭代10)之后，該模型就開始過擬合，因?yàn)槲覀兛床坏綔y試數(shù)據(jù)預(yù)測準(zhǔn)確性的提高。

Let’s evaluate model performance with a dataset that has not seen yet (blind well). We always predict that Machine Learning models will predict with blind data by less accuracy than training process if dataset is small or features are not big enough to cover all complexity of data dimensions.

讓我們用一個(gè)尚未見過的數(shù)據(jù)集評估模型性能(很好)。我們總是預(yù)測，如果數(shù)據(jù)集很小或特征不足以涵蓋所有數(shù)據(jù)維度的復(fù)雜性，則機(jī)器學(xué)習(xí)模型將使用盲數(shù)據(jù)進(jìn)行預(yù)測，其準(zhǔn)確性低于訓(xùn)練過程。

result = model.evaluate(scaled_features_blind, labels_blind)
print("{0}: {1:.2%}".format(model.metrics_names[1], result[1]))

預(yù)測盲井?dāng)?shù)據(jù)和圖表 (Predict Blind Well Data and Plot)

y_pred = model.predict(scaled_features_blind) # result is probability arrayy_pred_idx = np.argmax(y_pred, axis=1) + 1# +1 becuase facies starts from 1 not zero like indexblind['Pred_Facies']= y_pred_idx

function to plot:

繪圖功能：

def compare_facies_plot(logs, compadre, facies_colors):
#make sure logs are sorted by depth
logs = logs.sort_values(by='Depth')
cmap_facies = colors.ListedColormap(
facies_colors[0:len(facies_colors)], 'indexed')

ztop=logs.Depth.min(); zbot=logs.Depth.max()

cluster1 = np.repeat(np.expand_dims(logs['Facies'].values,1), 100, 1)
cluster2 = np.repeat(np.expand_dims(logs[compadre].values,1), 100, 1)

f, ax = plt.subplots(nrows=1, ncols=7, figsize=(12, 6))
ax[0].plot(logs.GR, logs.Depth, '-g', alpha=0.8, lw = 0.9)
ax[1].plot(logs.ILD_log10, logs.Depth, '-b', alpha=0.8, lw = 0.9)
ax[2].plot(logs.DeltaPHI, logs.Depth, '-k', alpha=0.8, lw = 0.9)
ax[3].plot(logs.PHIND, logs.Depth, '-r', alpha=0.8, lw = 0.9)
ax[4].plot(logs.PE, logs.Depth, '-c', alpha=0.8, lw = 0.9)
im1 = ax[5].imshow(cluster1, interpolation='none', aspect='auto',
cmap=cmap_facies,vmin=1,vmax=9)
im2 = ax[6].imshow(cluster2, interpolation='none', aspect='auto',
cmap=cmap_facies,vmin=1,vmax=9)

divider = make_axes_locatable(ax[6])
cax = divider.append_axes("right", size="20%", pad=0.05)
cbar=plt.colorbar(im2, cax=cax)
cbar.set_label((5*' ').join([' SS ', 'CSiS', 'FSiS',
'SiSh', ' MS ', ' WS ', ' D ',
' PS ', ' BS ']))
cbar.set_ticks(range(0,1)); cbar.set_ticklabels('')

for i in range(len(ax)-2):
ax[i].set_ylim(ztop,zbot)
ax[i].invert_yaxis()
ax[i].grid()
ax[i].locator_params(axis='x', nbins=3)

ax[0].set_xlabel("GR")
ax[0].set_xlim(logs.GR.min(),logs.GR.max())
ax[1].set_xlabel("ILD_log10")
ax[1].set_xlim(logs.ILD_log10.min(),logs.ILD_log10.max())
ax[2].set_xlabel("DeltaPHI")
ax[2].set_xlim(logs.DeltaPHI.min(),logs.DeltaPHI.max())
ax[3].set_xlabel("PHIND")
ax[3].set_xlim(logs.PHIND.min(),logs.PHIND.max())
ax[4].set_xlabel("PE")
ax[4].set_xlim(logs.PE.min(),logs.PE.max())
ax[5].set_xlabel('Facies')
ax[6].set_xlabel(compadre)

ax[1].set_yticklabels([]); ax[2].set_yticklabels([]); ax[3].set_yticklabels([])
ax[4].set_yticklabels([]); ax[5].set_yticklabels([]); ax[6].set_yticklabels([])
ax[5].set_xticklabels([])
ax[6].set_xticklabels([])
f.suptitle('Well: %s'%logs.iloc[0]['Well Name'], fontsize=14,y=0.94)

Run:

跑：

compare_facies_plot(blind, 'Pred_Facies', facies_colors)
plt.savefig("Compo.png", dpi=400)

結(jié)論 (Conclusion)

In this work, we optimized hyper-parameters using a Bayesian approach with a scikit-learn library called skopt. This approach is superior to a random search and grid search, especially in complex datasets. Using this method, we can get rid of the hand-tuning of hyper-parameters for the neural networks, although in each run, you will face new parameters.

在這項(xiàng)工作中，我們使用貝葉斯方法和一個(gè)名為skopt的scikit-learn庫優(yōu)化了超參數(shù)。這種方法優(yōu)于隨機(jī)搜索和網(wǎng)格搜索，尤其是在復(fù)雜數(shù)據(jù)集中。使用這種方法，我們可以擺脫神經(jīng)網(wǎng)絡(luò)超參數(shù)的手動調(diào)整，盡管在每次運(yùn)行中，您都將面臨新的參數(shù)。

翻譯自: https://towardsdatascience.com/bayesian-hyper-parameter-optimization-neural-networks-tensorflow-facies-prediction-example-f9c48d21f795