神经网络如何调参、超参数的最优化方法、python实现
神經(jīng)網(wǎng)絡(luò)如何調(diào)參、超參數(shù)的最優(yōu)化方法、python實(shí)現(xiàn)
- 一、what is 超參數(shù)
- 二、超參數(shù)優(yōu)化實(shí)驗(yàn)
一、what is 超參數(shù)
超參數(shù)是什么,其實(shí)就是,各層神經(jīng)元數(shù)量、batch大小、學(xué)習(xí)率等人為設(shè)定的一些數(shù)。
數(shù)據(jù)集分為訓(xùn)練數(shù)據(jù)、測(cè)試數(shù)據(jù)、驗(yàn)證數(shù)據(jù)。
用測(cè)試數(shù)據(jù)評(píng)估超參數(shù)值的好壞,就可能導(dǎo)致超參數(shù)的值被調(diào)整為只擬合測(cè)試數(shù)據(jù),所以加了個(gè)驗(yàn)證數(shù)據(jù)。
訓(xùn)練數(shù)據(jù)用于參數(shù)的學(xué)習(xí),驗(yàn)證數(shù)據(jù)用于超參數(shù)的性能評(píng)估。
進(jìn)行超參數(shù)最優(yōu)化,重要的是,逐漸縮小超參數(shù)好值存在范圍。
一開(kāi)始大致設(shè)定一個(gè)范圍,從范圍中隨機(jī)采樣出超參數(shù),用這個(gè)采樣值進(jìn)行識(shí)別精度評(píng)估,根據(jù)這個(gè)結(jié)果縮小超參數(shù)好值范圍,然后重復(fù)上述操作。研究發(fā)現(xiàn),隨機(jī)采樣效果好點(diǎn)。
二、超參數(shù)優(yōu)化實(shí)驗(yàn)
接下來(lái)用MNISIT數(shù)據(jù)集進(jìn)行超參數(shù)最優(yōu)化,參考斯坦福大學(xué)的實(shí)驗(yàn)。
實(shí)驗(yàn):最優(yōu)化學(xué)習(xí)率和控制權(quán)值衰減強(qiáng)度系數(shù)這兩個(gè)參數(shù)。
實(shí)驗(yàn)中,權(quán)值衰減系數(shù)初始范圍1e- 8到1e- 4,學(xué)習(xí)率初始范圍1e- 6到1e- 2。
隨機(jī)采樣體現(xiàn)在下面代碼:
weight_decay = 10 ** np.random.uniform(-8, -4)lr = 10 ** np.random.uniform(-6, -2)實(shí)驗(yàn)結(jié)果:
結(jié)果可以看出,學(xué)習(xí)率在0.001到0.01之間,權(quán)值衰減系數(shù)在1e-8到1e-6之間時(shí),學(xué)習(xí)可以順利進(jìn)行。
觀察可以使學(xué)習(xí)順利進(jìn)行的超參數(shù)范圍,從而縮小值的范圍。
然后可以從縮小的范圍中繼續(xù)縮小,然后選個(gè)最終值。
=========== Hyper-Parameter Optimization Result =========== Best-1(val acc:0.8) | lr:0.008986830875594513, weight decay:3.716187805144909e-07 Best-2(val acc:0.76) | lr:0.007815234765792472, weight decay:8.723036800420108e-08 Best-3(val acc:0.73) | lr:0.004924088836198354, weight decay:5.044414627324654e-07 Best-4(val acc:0.7) | lr:0.006838530258012433, weight decay:7.678322790416307e-06 Best-5(val acc:0.69) | lr:0.0037618568422154793, weight decay:6.384663995933291e-08 Best-6(val acc:0.69) | lr:0.004818463383741305, weight decay:4.875486288914377e-08 Best-7(val acc:0.65) | lr:0.004659925318439445, weight decay:1.4968108648982665e-05 Best-8(val acc:0.64) | lr:0.005664124223619111, weight decay:6.070191899324037e-06 Best-9(val acc:0.56) | lr:0.003954240835144594, weight decay:1.5725686195018805e-06 Best-10(val acc:0.5) | lr:0.002554755378245952, weight decay:4.481334628759244e-08 Best-11(val acc:0.5) | lr:0.002855983685917335, weight decay:1.9598718051356917e-05 Best-12(val acc:0.47) | lr:0.004592998586693871, weight decay:4.888121831499798e-05 Best-13(val acc:0.47) | lr:0.0025326736070483947, weight decay:3.200796060402024e-05 Best-14(val acc:0.44) | lr:0.002645798359877985, weight decay:5.0830237860839325e-06 Best-15(val acc:0.42) | lr:0.001942571686958991, weight decay:3.0673143794194257e-06 Best-16(val acc:0.37) | lr:0.001289748323175032, weight decay:2.3690338828642213e-06 Best-17(val acc:0.36) | lr:0.0017017390582746337, weight decay:9.176068035802207e-05 Best-18(val acc:0.3) | lr:0.0015961247160317246, weight decay:1.3527453417413358e-08 Best-19(val acc:0.28) | lr:0.002261959202515378, weight decay:6.004620370338303e-05 Best-20(val acc:0.26) | lr:0.0008799239275589458, weight decay:4.600825912333848e-07 # coding: utf-8 import sys, os sys.path.append(os.pardir) # 為了導(dǎo)入父目錄的文件而進(jìn)行的設(shè)定 import numpy as np import matplotlib.pyplot as plt from dataset.mnist import load_mnist from common.multi_layer_net import MultiLayerNet from common.util import shuffle_dataset from common.trainer import Trainer(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True)# 為了實(shí)現(xiàn)高速化,減少訓(xùn)練數(shù)據(jù) x_train = x_train[:500] t_train = t_train[:500]# 分割驗(yàn)證數(shù)據(jù) validation_rate = 0.20 validation_num = int(x_train.shape[0] * validation_rate) x_train, t_train = shuffle_dataset(x_train, t_train) x_val = x_train[:validation_num] t_val = t_train[:validation_num] x_train = x_train[validation_num:] t_train = t_train[validation_num:]def __train(lr, weight_decay, epocs=50):network = MultiLayerNet(input_size=784, hidden_size_list=[100, 100, 100, 100, 100, 100],output_size=10, weight_decay_lambda=weight_decay)trainer = Trainer(network, x_train, t_train, x_val, t_val,epochs=epocs, mini_batch_size=100,optimizer='sgd', optimizer_param={'lr': lr}, verbose=False)trainer.train()return trainer.test_acc_list, trainer.train_acc_list# 超參數(shù)的隨機(jī)搜索====================================== optimization_trial = 100 results_val = {} results_train = {} for _ in range(optimization_trial):# 指定搜索的超參數(shù)的范圍===============weight_decay = 10 ** np.random.uniform(-8, -4)lr = 10 ** np.random.uniform(-6, -2)# ================================================val_acc_list, train_acc_list = __train(lr, weight_decay)print("val acc:" + str(val_acc_list[-1]) + " | lr:" + str(lr) + ", weight decay:" + str(weight_decay))key = "lr:" + str(lr) + ", weight decay:" + str(weight_decay)results_val[key] = val_acc_listresults_train[key] = train_acc_list# 繪制圖形======================================================== print("=========== Hyper-Parameter Optimization Result ===========") graph_draw_num = 20 col_num = 5 row_num = int(np.ceil(graph_draw_num / col_num)) i = 0for key, val_acc_list in sorted(results_val.items(), key=lambda x:x[1][-1], reverse=True):print("Best-" + str(i+1) + "(val acc:" + str(val_acc_list[-1]) + ") | " + key)plt.subplot(row_num, col_num, i+1)plt.title("Best-" + str(i+1))plt.ylim(0.0, 1.0)if i % 5: plt.yticks([])plt.xticks([])x = np.arange(len(val_acc_list))plt.plot(x, val_acc_list)plt.plot(x, results_train[key], "--")i += 1if i >= graph_draw_num:breakplt.show()總結(jié)
以上是生活随笔為你收集整理的神经网络如何调参、超参数的最优化方法、python实现的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: php 验证ajax提交表单提交表单提交
- 下一篇: php文件上传漏洞防御,第十二课 php