PyMC3实现贝叶斯神经网络
轉(zhuǎn)自https://blog.csdn.net/jackxu8/article/details/71308390#commentBox
源地址https://docs.pymc.io/notebooks/bayesian_neural_network_advi.html
PyMC3中的貝葉斯深網(wǎng)絡(luò)
生成數(shù)據(jù)
產(chǎn)生一個(gè)簡(jiǎn)單的線性不可分的二分類問題的模擬數(shù)據(jù)。
%matplotlib inline import pymc3 as pm import theano.tensor as T import theano import sklearn import numpy as np import matplotlib.pyplot as plt import seaborn as sns sns.set_style('white') from sklearn import datasets from sklearn.preprocessing import scale from sklearn.cross_validation import train_test_split from sklearn.datasets import make_moons /Users/xujie/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20."This module will be removed in 0.20.", DeprecationWarning) X, Y = make_moons(noise=0.2, random_state=0, n_samples=1000) X = scale(X) X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.5) fig, ax = plt.subplots() ax.scatter(X[Y==0, 0], X[Y==0, 1], color='b',edgecolors='k',alpha=0.6, label='Class 0') ax.scatter(X[Y==1, 0], X[Y==1, 1], color='r',edgecolors='k',alpha=0.6, label='Class 1') sns.despine(); ax.legend() ax.set(xlabel='X', ylabel='Y', title='Toy binary classification data set');模型定義
神經(jīng)網(wǎng)絡(luò)的基礎(chǔ)單元是感知器,和邏輯回歸中的類似。將感知器并行排列然后堆疊起來獲得隱層。這里使用2個(gè)隱層,每層有5個(gè)神經(jīng)元。使用正態(tài)分布來正則化權(quán)重。
ann_input = theano.shared(X_train) ann_output = theano.shared(Y_train)n_hidden = 5# Initialize random weights between each layer init_1 = np.random.randn(X.shape[1], n_hidden) init_2 = np.random.randn(n_hidden, n_hidden) init_out = np.random.randn(n_hidden)with pm.Model() as neural_network:# Weights from input to hidden layerweights_in_1 = pm.Normal('w_in_1', 0, sd=1, shape=(X.shape[1], n_hidden), testval=init_1)# Weights from 1st to 2nd layerweights_1_2 = pm.Normal('w_1_2', 0, sd=1, shape=(n_hidden, n_hidden), testval=init_2)# Weights from hidden layer to outputweights_2_out = pm.Normal('w_2_out', 0, sd=1, shape=(n_hidden,), testval=init_out)# Build neural-network using tanh activation functionact_1 = T.tanh(T.dot(ann_input, weights_in_1))act_2 = T.tanh(T.dot(act_1, weights_1_2))act_out = T.nnet.sigmoid(T.dot(act_2, weights_2_out))# Binary classification -> Bernoulli likelihoodout = pm.Bernoulli('out', act_out,observed=ann_output) /Users/xujie/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/theano/tensor/basic.py:2146: UserWarning: theano.tensor.round() changed its default from `half_away_from_zero` to `half_to_even` to have the same default as NumPy. Use the Theano flag `warn.round=False` to disable this warning."theano.tensor.round() changed its default from"變分推理:縮放模型的復(fù)雜性
現(xiàn)在可以使用MCMC采樣器(如NUTS)進(jìn)行采樣將獲得不錯(cuò)的效果,但是這會(huì)非常緩慢。這里使用全新的ADVI變分推理算法,更加快速,對(duì)模型復(fù)雜度的適應(yīng)性也更好。但是,這是均值近似,因此忽略后驗(yàn)中的相關(guān)性。
%%timewith neural_network:# Run ADVI which returns posterior means, standard deviations, and the evidence lower bound (ELBO)v_params = pm.variational.advi(n=50000) Average ELBO = -151.63: 100%|██████████| 50000/50000 [00:13<00:00, 3790.27it/s] Finished [100%]: Average ELBO = -136.89CPU times: user 15.5 s, sys: 695 ms, total: 16.1 s Wall time: 18 s畫出目標(biāo)函數(shù)(ELBO)可以看出隨著優(yōu)化的進(jìn)行,效果主見提升。
plt.plot(v_params.elbo_vals); plt.ylabel('ELBO'); plt.xlabel('iteration');為了觀察方便,這里使用sample_vp()對(duì)后驗(yàn)進(jìn)行采樣(這個(gè)函數(shù)只是對(duì)正態(tài)分布進(jìn)行采樣,因此和MCMC不一樣)。
with neural_network:trace = pm.variational.sample_vp(v_params, draws=5000) 100%|██████████| 5000/5000 [00:00<00:00, 10687.44it/s]這里完成了模型訓(xùn)練,然后使用posterior predictive check (PPC)對(duì)排除在外的數(shù)據(jù)點(diǎn)進(jìn)行預(yù)測(cè)。使用sample_ppc()從后驗(yàn)中生成新數(shù)據(jù)(從變分估計(jì)中采樣)。
# Replace shared variables with testing set ann_input.set_value(X_test) ann_output.set_value(Y_test)# Creater posterior predictive samples ppc = pm.sample_ppc(trace, model=neural_network, samples=500)# Use probability of > 0.5 to assume prediction of class 1 pred = ppc['out'].mean(axis=0) > 0.5 100%|██████████| 500/500 [00:03<00:00, 139.67it/s] fig, ax = plt.subplots() ax.scatter(X_test[pred==0, 0], X_test[pred==0, 1], color='b',edgecolors='k',alpha=0.6) ax.scatter(X_test[pred==1, 0], X_test[pred==1, 1], color='r',edgecolors='k',alpha=0.6) sns.despine() ax.set(title='Predicted labels in testing set', xlabel='X', ylabel='Y'); print('Accuracy = {}%'.format((Y_test == pred).mean() * 100)) Accuracy = 95.6%分類器分析
為了分析這個(gè)分類器的結(jié)果,多整個(gè)輸入空間上的網(wǎng)絡(luò)輸出做出評(píng)估。
grid = np.mgrid[-3:3:100j,-3:3:100j] grid_2d = grid.reshape(2, -1).T dummy_out = np.ones(grid.shape[1], dtype=np.int8) ann_input.set_value(grid_2d) ann_output.set_value(dummy_out)# 對(duì)后驗(yàn)估計(jì)進(jìn)行采樣 ppc = pm.sample_ppc(trace, model=neural_network, samples=500) 100%|██████████| 500/500 [00:04<00:00, 101.43it/s]繪制預(yù)測(cè)平面如下
cmap = sns.diverging_palette(250, 12, s=85, l=25, as_cmap=True) fig, ax = plt.subplots(figsize=(10, 6)) contour = ax.contourf(*grid, ppc['out'].mean(axis=0).reshape(100, 100), cmap=cmap) ax.scatter(X_test[pred==0, 0], X_test[pred==0, 1], color='b', edgecolors='k',alpha=0.6) ax.scatter(X_test[pred==1, 0], X_test[pred==1, 1], color='r', edgecolors='k',alpha=0.6) cbar = plt.colorbar(contour, ax=ax) _ = ax.set(xlim=(-3, 3), ylim=(-3, 3), xlabel='X', ylabel='Y'); cbar.ax.set_ylabel('Posterior predictive mean probability of class label = 0');預(yù)測(cè)中的不確定性
目前未知,這些工作都可以在非貝葉斯神經(jīng)網(wǎng)絡(luò)上完成。每個(gè)類別的后驗(yàn)均值可以用極大似然估計(jì)給出。然而,貝葉斯神經(jīng)網(wǎng)絡(luò)還可以給出后驗(yàn)估計(jì)的標(biāo)準(zhǔn)差,來評(píng)價(jià)后驗(yàn)中的不確定性。
cmap = sns.cubehelix_palette(light=1, as_cmap=True) fig, ax = plt.subplots(figsize=(10, 6)) contour = ax.contourf(*grid, ppc['out'].std(axis=0).reshape(100, 100), cmap=cmap) ax.scatter(X_test[pred==0, 0], X_test[pred==0, 1], color='b', edgecolors='k',alpha=0.6) ax.scatter(X_test[pred==1, 0], X_test[pred==1, 1], color='r', edgecolors='k',alpha=0.6) cbar = plt.colorbar(contour, ax=ax) _ = ax.set(xlim=(-3, 3), ylim=(-3, 3), xlabel='X', ylabel='Y'); cbar.ax.set_ylabel('Uncertainty (posterior predictive standard deviation)');可以看出,在越接近決策邊界的地方,預(yù)測(cè)的不確定性越高。而預(yù)測(cè)結(jié)果與不確定性相關(guān)聯(lián)分析是許多應(yīng)用領(lǐng)域的重要因素,比如醫(yī)學(xué)檢測(cè)。為了更高的準(zhǔn)確性,可以讓模型在不確定性高的地方進(jìn)行更多的采樣。
微批AVDI(Mini-batch AVDI):可擴(kuò)展數(shù)據(jù)大小
此前,都是在所有數(shù)據(jù)上一次性訓(xùn)練模型。這樣的訓(xùn)練方式對(duì)于數(shù)據(jù)集的大小不可擴(kuò)展。然而可以在一小批數(shù)據(jù)上(mini-batch of data)進(jìn)行訓(xùn)練(隨機(jī)梯度下降),可以避免局部最小并獲得更快的收斂。
from six.moves import zip# Set back to original data to retrain ann_input.set_value(X_train) ann_output.set_value(Y_train)# Tensors and RV that will be using mini-batches minibatch_tensors = [ann_input, ann_output] minibatch_RVs = [out]# Generator that returns mini-batches in each iteration def create_minibatch(data):while True:# Return random data samples of set size 100 each iterationixs = np.random.randint(len(data), size=50)yield data[ixs]minibatches = zip(create_minibatch(X_train),create_minibatch(Y_train), )total_size = len(Y_train) %%timewith neural_network:# Run advi_minibatchv_params = pm.variational.advi_minibatch(n=50000, minibatch_tensors=minibatch_tensors, minibatch_RVs=minibatch_RVs, minibatches=minibatches, total_size=total_size, learning_rate=1e-2, epsilon=1.0) Average ELBO = -123.12: 100%|██████████| 50000/50000 [00:16<00:00, 3043.69it/s] Finished minibatch ADVI: ELBO = -121.76CPU times: user 20.5 s, sys: 371 ms, total: 20.9 s Wall time: 23.9 s with neural_network: trace = pm.variational.sample_vp(v_params, draws=5000) 100%|██████████| 5000/5000 [00:00<00:00, 9914.64it/s]mini-batch方式的運(yùn)行時(shí)間更長(zhǎng),但是收斂更快。
pm.traceplot(trace);總結(jié)
以上是生活随笔為你收集整理的PyMC3实现贝叶斯神经网络的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: MATLAB(四)在高等数学中的应用
- 下一篇: 使用pymc3可能遇到的问题及解决方法