HBU-NNDL 实验五 前馈神经网络(3)鸢尾花分类
目錄
深入研究鳶尾花數據集
4.5 實踐:基于前饋神經網絡完成鳶尾花分類
4.5.1 小批量梯度下降法
4.5.1.1 數據分組
4.5.2 數據處理
4.5.2.2 用DataLoader進行封裝?
4.5.3 模型構建
4.5.4 完善Runner類
4.5.5 模型訓練
4.5.6 模型評價
4.5.7 模型預測?
思考題
1. 對比Softmax分類和前饋神經網絡分類。
2. 自定義隱藏層層數和每個隱藏層中的神經元個數,嘗試找到最優超參數完成多分類。
3. 對比SVM與FNN分類效果,談談自己看法。
4. 嘗試基于MNIST手寫數字識別數據集,設計合適的前饋神經網絡進行實驗,并取得95%以上的準確率。
總結
思維導圖
參考
深入研究鳶尾花數據集
Iris 鳶尾花數據集內包含 3 類分別為山鳶尾(Iris-setosa)、變色鳶尾(Iris-versicolor)和維吉尼亞鳶尾(Iris-virginica),共 150 條記錄,每類各 50 個數據,每條記錄都有 4 項特征:花萼長度、花萼寬度、花瓣長度、花瓣寬度。
sepallength:萼片長度
sepalwidth:萼片寬度
petallength:花瓣長度
petalwidth:花瓣寬度
以上四個特征的單位都是厘米(cm)
畫出數據集中150個數據的前兩個特征的散點分布圖
4.5 實踐:基于前饋神經網絡完成鳶尾花分類
4.5.1 小批量梯度下降法
在梯度下降法中,目標函數是整個訓練集上的風險函數,這種方式稱為批量梯度下降法(Batch Gradient Descent,BGD)。 批量梯度下降法在每次迭代時需要計算每個樣本上損失函數的梯度并求和。當訓練集中的樣本數量N很大時,空間復雜度比較高,每次迭代的計算開銷也很大。
為了減少每次迭代的計算復雜度,我們可以在每次迭代時只采集一小部分樣本,計算在這組樣本上損失函數的梯度并更新參數,這種優化方式稱為
小批量梯度下降法(Mini-Batch Gradient Descent,Mini-Batch GD)。
4.5.1.1 數據分組
為了小批量梯度下降法,我們需要對數據進行隨機分組。目前,機器學習中通常做法是構建一個數據迭代器,每個迭代過程中從全部數據集中獲取一批指定數量的數據。
數據迭代器的實現原理如下圖所示:
4.5.2 數據處理
構造IrisDataset類進行數據讀取,繼承自torch.utils.data.Dataset類。torch.utils.data.Dataset是用來封裝 Dataset的方法和行為的抽象類,通過一個索引獲取指定的樣本,同時對該樣本進行數據處理。當繼承torch.utils.data.Dataset來定義數據讀取類時,實現如下方法:
- __getitem__:根據給定索引獲取數據集中指定樣本,并對樣本進行數據處理;
- __len__:返回數據集樣本個數。
代碼實現如下:
import numpy as np import torch import torch.utils.data as io from dataset import load_dataclass IrisDataset(io.Dataset):def __init__(self, mode='train', num_train=120, num_dev=15):super(IrisDataset, self).__init__()# 調用第三章中的數據讀取函數,其中不需要將標簽轉成one-hot類型X, y = load_data(shuffle=True)if mode == 'train':self.X, self.y = X[:num_train], y[:num_train]elif mode == 'dev':self.X, self.y = X[num_train:num_train + num_dev], y[num_train:num_train + num_dev]else:self.X, self.y = X[num_train + num_dev:], y[num_train + num_dev:]def __getitem__(self, idx):return self.X[idx], self.y[idx]def __len__(self):return len(self.y) torch.random.manual_seed(12) train_dataset = IrisDataset(mode='train') dev_dataset = IrisDataset(mode='dev') test_dataset = IrisDataset(mode='test') # 打印訓練集長度 print ("length of train set: ", len(train_dataset))length of train set: ?120
4.5.2.2 用DataLoader進行封裝?
# 批量大小 batch_size = 16# 加載數據 train_loader = io.DataLoader(train_dataset, batch_size=batch_size, shuffle=True) dev_loader = io.DataLoader(dev_dataset, batch_size=batch_size) test_loader = io.DataLoader(test_dataset, batch_size=batch_size)4.5.3 模型構建
構建一個簡單的前饋神經網絡進行鳶尾花分類實驗。其中輸入層神經元個數為4,輸出層神經元個數為3,隱含層神經元個數為6。代碼實現如下:
# 定義前饋神經網絡 class Model_MLP_L2_V3(nn.Module):def __init__(self, input_size, output_size, hidden_size):super(Model_MLP_L2_V3, self).__init__()# 構建第一個全連接層self.fc1 = nn.Linear(input_size,hidden_size)normal_(self.fc1.weight,mean=0.0,std=0.01)constant_(self.fc1.bias,val=1.0)# 構建第二全連接層self.fc2 = nn.Linear(hidden_size,output_size)normal_(self.fc2.weight, mean=0.0, std=0.01)constant_(self.fc2.bias, val=1.0)# 定義網絡使用的激活函數self.act = nn.Sigmoid()def forward(self, inputs):outputs = self.fc1(inputs)outputs = self.act(outputs)outputs = self.fc2(outputs)return outputsfnn_model = Model_MLP_L2_V3(input_size=4, output_size=3, hidden_size=6)4.5.4 完善Runner類
基于RunnerV2類進行完善實現了RunnerV3類。其中訓練過程使用自動梯度計算,使用DataLoader加載批量數據,使用隨機梯度下降法進行參數優化;模型保存時,使用state_dict方法獲取模型參數;模型加載時,使用set_state_dict方法加載模型參數.
由于這里使用隨機梯度下降法對參數優化,所以數據以批次的形式輸入到模型中進行訓練,那么評價指標計算也是分別在每個批次進行的,要想獲得每個epoch整體的評價結果,需要對歷史評價結果進行累積。這里定義Accuracy類實現該功能。
由于沒找到pytorch對應的Metric代碼,就直接找到了paddle的源代碼
import six import abc import numpy as np @six.add_metaclass(abc.ABCMeta) class Metric(object):r"""Base class for metric, encapsulates metric logic and APIsUsage:.. code-block:: textm = SomeMetric()for prediction, label in ...:m.update(prediction, label)m.accumulate()Advanced usage for :code:`compute`:Metric calculation can be accelerated by calculating metric statesfrom model outputs and labels by build-in operators not by Python/NumPyin :code:`compute`, metric states will be fetched as NumPy array andcall :code:`update` with states in NumPy format.Metric calculated as follows (operations in Model and Metric areindicated with curly brackets, while data nodes not):.. code-block:: textinputs & labels || ------------------| ||{model} ||| ||outputs & labels ||| || tensor data{Metric.compute} ||| ||metric states(tensor) ||| ||{fetch as numpy} || ------------------| ||metric states(numpy) || numpy data| ||{Metric.update} \/ ------------------Examples:For :code:`Accuracy` metric, which takes :code:`pred` and :code:`label`as inputs, we can calculate the correct prediction matrix between:code:`pred` and :code:`label` in :code:`compute`.For examples, prediction results contains 10 classes, while :code:`pred`shape is [N, 10], :code:`label` shape is [N, 1], N is mini-batch size,and we only need to calculate accurary of top-1 and top-5, we couldcalculate the correct prediction matrix of the top-5 scores of theprediction of each sample like follows, while the correct predictionmatrix shape is [N, 5]... code-block:: textdef compute(pred, label):# sort prediction and slice the top-5 scorespred = torch.argsort(pred, descending=True)[:, :5]# calculate whether the predictions are correctcorrect = pred == labelreturn torch.cast(correct, dtype='float32')With the :code:`compute`, we split some calculations to OPs (whichmay run on GPU devices, will be faster), and only fetch 1 tensor withshape as [N, 5] instead of 2 tensors with shapes as [N, 10] and [N, 1].:code:`update` can be define as follows:.. code-block:: textdef update(self, correct):accs = []for i, k in enumerate(self.topk):num_corrects = correct[:, :k].sum()num_samples = len(correct)accs.append(float(num_corrects) / num_samples)self.total[i] += num_correctsself.count[i] += num_samplesreturn accs"""def __init__(self):pass@abc.abstractmethoddef reset(self):"""Reset states and result"""raise NotImplementedError("function 'reset' not implemented in {}.".format(self.__class__.__name__))@abc.abstractmethoddef update(self, *args):"""Update states for metricInputs of :code:`update` is the outputs of :code:`Metric.compute`,if :code:`compute` is not defined, the inputs of :code:`update`will be flatten arguments of **output** of mode and **label** from data::code:`update(output1, output2, ..., label1, label2,...)`see :code:`Metric.compute`"""raise NotImplementedError("function 'update' not implemented in {}.".format(self.__class__.__name__))@abc.abstractmethoddef accumulate(self):"""Accumulates statistics, computes and returns the metric value"""raise NotImplementedError("function 'accumulate' not implemented in {}.".format(self.__class__.__name__))@abc.abstractmethoddef name(self):"""Returns metric name"""raise NotImplementedError("function 'name' not implemented in {}.".format(self.__class__.__name__))def compute(self, *args):"""This API is advanced usage to accelerate metric calculating, calulationsfrom outputs of model to the states which should be updated by Metric canbe defined here, where torch OPs is also supported. Outputs of this APIwill be the inputs of "Metric.update".If :code:`compute` is defined, it will be called with **outputs**of model and **labels** from data as arguments, all outputs and labelswill be concatenated and flatten and each filed as a separate argumentas follows::code:`compute(output1, output2, ..., label1, label2,...)`If :code:`compute` is not defined, default behaviour is to passinput to output, so output format will be::code:`return output1, output2, ..., label1, label2,...`see :code:`Metric.update`"""return args class Accuracy(Metric):def __init__(self, is_logist=True):"""輸入:- is_logist: outputs是logist還是激活后的值"""# 用于統計正確的樣本個數self.num_correct = 0# 用于統計樣本的總數self.num_count = 0self.is_logist = is_logistdef update(self, outputs, labels):"""輸入:- outputs: 預測值, shape=[N,class_num]- labels: 標簽值, shape=[N,1]"""# 判斷是二分類任務還是多分類任務,shape[1]=1時為二分類任務,shape[1]>1時為多分類任務if outputs.shape[1] == 1: # 二分類outputs = torch.squeeze(outputs, dim=-1)if self.is_logist:# logist判斷是否大于0preds = torch.tensor((outputs>=0), dtype=torch.float32)else:# 如果不是logist,判斷每個概率值是否大于0.5,當大于0.5時,類別為1,否則類別為0preds = torch.tensor((outputs>=0.5), dtype=torch.float32)else:# 多分類時,使用'torch.argmax'計算最大元素索引作為類別preds = torch.argmax(outputs, dim=1)# 獲取本批數據中預測正確的樣本個數labels = torch.squeeze(labels, dim=-1)batch_correct = torch.sum(torch.tensor(preds==labels, dtype=torch.float32)).numpy()batch_count = len(labels)# 更新num_correct 和 num_countself.num_correct += batch_correctself.num_count += batch_countdef accumulate(self):# 使用累計的數據,計算總的指標if self.num_count == 0:return 0return self.num_correct / self.num_countdef reset(self):# 重置正確的數目和總數self.num_correct = 0self.num_count = 0def name(self):return "Accuracy"RunnerV3類的代碼實現如下:?
import torch.nn.functional as Fclass RunnerV3(object):def __init__(self, model, optimizer, loss_fn, metric, **kwargs):self.model = modelself.optimizer = optimizerself.loss_fn = loss_fnself.metric = metric # 只用于計算評價指標# 記錄訓練過程中的評價指標變化情況self.dev_scores = []# 記錄訓練過程中的損失函數變化情況self.train_epoch_losses = [] # 一個epoch記錄一次lossself.train_step_losses = [] # 一個step記錄一次lossself.dev_losses = []# 記錄全局最優指標self.best_score = 0def train(self, train_loader, dev_loader=None, **kwargs):# 將模型切換為訓練模式self.model.train()# 傳入訓練輪數,如果沒有傳入值則默認為0num_epochs = kwargs.get("num_epochs", 0)# 傳入log打印頻率,如果沒有傳入值則默認為100log_steps = kwargs.get("log_steps", 100)# 評價頻率eval_steps = kwargs.get("eval_steps", 0)# 傳入模型保存路徑,如果沒有傳入值則默認為"best_model.pdparams"save_path = kwargs.get("save_path", "best_model.pdparams")custom_print_log = kwargs.get("custom_print_log", None)# 訓練總的步數num_training_steps = num_epochs * len(train_loader)if eval_steps:if self.metric is None:raise RuntimeError('Error: Metric can not be None!')if dev_loader is None:raise RuntimeError('Error: dev_loader can not be None!')# 運行的step數目global_step = 0# 進行num_epochs輪訓練for epoch in range(num_epochs):# 用于統計訓練集的損失total_loss = 0for step, data in enumerate(train_loader):X, y = data# 獲取模型預測logits = self.model(X)loss = self.loss_fn(logits, y) # 默認求meantotal_loss += loss# 訓練過程中,每個step的loss進行保存self.train_step_losses.append((global_step, loss.item()))if log_steps and global_step % log_steps == 0:print(f"[Train] epoch: {epoch}/{num_epochs}, step: {global_step}/{num_training_steps}, loss: {loss.item():.5f}")# 梯度反向傳播,計算每個參數的梯度值loss.backward()if custom_print_log:custom_print_log(self)# 小批量梯度下降進行參數更新self.optimizer.step()# 梯度歸零self.optimizer.zero_grad()# 判斷是否需要評價if eval_steps > 0 and global_step > 0 and \(global_step % eval_steps == 0 or global_step == (num_training_steps - 1)):dev_score, dev_loss = self.evaluate(dev_loader, global_step=global_step)print(f"[Evaluate] dev score: {dev_score:.5f}, dev loss: {dev_loss:.5f}")# 將模型切換為訓練模式self.model.train()# 如果當前指標為最優指標,保存該模型if dev_score > self.best_score:self.save_model(save_path)print(f"[Evaluate] best accuracy performence has been updated: {self.best_score:.5f} --> {dev_score:.5f}")self.best_score = dev_scoreglobal_step += 1# 當前epoch 訓練loss累計值 trn_loss = (total_loss / len(train_loader)).item()# epoch粒度的訓練loss保存self.train_epoch_losses.append(trn_loss)print("[Train] Training done!")# 模型評估階段,使用'torch.no_grad()'控制不計算和存儲梯度@torch.no_grad()def evaluate(self, dev_loader, **kwargs):assert self.metric is not None# 將模型設置為評估模式self.model.eval()global_step = kwargs.get("global_step", -1)# 用于統計訓練集的損失total_loss = 0# 重置評價self.metric.reset()# 遍歷驗證集每個批次 for batch_id, data in enumerate(dev_loader):X, y = data# 計算模型輸出logits = self.model(X)# 計算損失函數loss = self.loss_fn(logits, y).item()# 累積損失total_loss += loss# 累積評價self.metric.update(logits, y)dev_loss = (total_loss / len(dev_loader))dev_score = self.metric.accumulate()# 記錄驗證集lossif global_step != -1:self.dev_losses.append((global_step, dev_loss))self.dev_scores.append(dev_score)return dev_score, dev_loss# 模型評估階段,使用'torch.no_grad()'控制不計算和存儲梯度@torch.no_grad()def predict(self, x, **kwargs):# 將模型設置為評估模式self.model.eval()# 運行模型前向計算,得到預測值logits = self.model(x)return logitsdef save_model(self, save_path):torch.save(self.model.state_dict(), save_path)def load_model(self, model_path):model_state_dict = torch.load(model_path)self.model.load_state_dict(model_state_dict)4.5.5 模型訓練
實例化RunnerV3類,并傳入訓練配置,代碼實現如下:
import torch.optim as opt from metric import Accuracy lr = 0.2# 定義網絡 model = fnn_model# 定義優化器 optimizer = opt.SGD(model.parameters(),lr=lr)# 定義損失函數。softmax+交叉熵 loss_fn = F.cross_entropy# 定義評價指標 metric = Accuracy(is_logist=True)runner = RunnerV3(model, optimizer, loss_fn, metric)使用訓練集和驗證集進行模型訓練,共訓練150個epoch。在實驗中,保存準確率最高的模型作為最佳模型。代碼實現如下:
# 啟動訓練 log_steps = 100 eval_steps = 50 runner.train(train_loader, dev_loader,num_epochs=150, log_steps=log_steps, eval_steps = eval_steps,save_path="best_model.pdparams")[Train] epoch: 0/150, step: 0/1200, loss: 1.09898
[Evaluate] ?dev score: 0.33333, dev loss: 1.09582
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.33333
[Train] epoch: 12/150, step: 100/1200, loss: 1.13891
[Evaluate] ?dev score: 0.46667, dev loss: 1.10749
[Evaluate] best accuracy performence has been updated: 0.33333 --> 0.46667
[Evaluate] ?dev score: 0.20000, dev loss: 1.10089
[Train] epoch: 25/150, step: 200/1200, loss: 1.10158
[Evaluate] ?dev score: 0.20000, dev loss: 1.12477
[Evaluate] ?dev score: 0.46667, dev loss: 1.09090
[Train] epoch: 37/150, step: 300/1200, loss: 1.09982
[Evaluate] ?dev score: 0.46667, dev loss: 1.07537
[Evaluate] ?dev score: 0.53333, dev loss: 1.04453
[Evaluate] best accuracy performence has been updated: 0.46667 --> 0.53333
[Train] epoch: 50/150, step: 400/1200, loss: 1.01054
[Evaluate] ?dev score: 1.00000, dev loss: 1.00635
[Evaluate] best accuracy performence has been updated: 0.53333 --> 1.00000
[Evaluate] ?dev score: 0.86667, dev loss: 0.86850
[Train] epoch: 62/150, step: 500/1200, loss: 0.63702
[Evaluate] ?dev score: 0.80000, dev loss: 0.66986
[Evaluate] ?dev score: 0.86667, dev loss: 0.57089
[Train] epoch: 75/150, step: 600/1200, loss: 0.56490
[Evaluate] ?dev score: 0.93333, dev loss: 0.52392
[Evaluate] ?dev score: 0.86667, dev loss: 0.45410
[Train] epoch: 87/150, step: 700/1200, loss: 0.41929
[Evaluate] ?dev score: 0.86667, dev loss: 0.46156
[Evaluate] ?dev score: 0.93333, dev loss: 0.41593
[Train] epoch: 100/150, step: 800/1200, loss: 0.41047
[Evaluate] ?dev score: 0.93333, dev loss: 0.40600
[Evaluate] ?dev score: 0.93333, dev loss: 0.37672
[Train] epoch: 112/150, step: 900/1200, loss: 0.42777
[Evaluate] ?dev score: 0.93333, dev loss: 0.34534
[Evaluate] ?dev score: 0.93333, dev loss: 0.33552
[Train] epoch: 125/150, step: 1000/1200, loss: 0.30734
[Evaluate] ?dev score: 0.93333, dev loss: 0.31958
[Evaluate] ?dev score: 0.93333, dev loss: 0.32091
[Train] epoch: 137/150, step: 1100/1200, loss: 0.28321
[Evaluate] ?dev score: 0.93333, dev loss: 0.28383
[Evaluate] ?dev score: 0.93333, dev loss: 0.27171
[Evaluate] ?dev score: 0.93333, dev loss: 0.25447
[Train] Training done!
可視化觀察訓練集損失和訓練集loss變化情況。
import matplotlib.pyplot as plt# 繪制訓練集和驗證集的損失變化以及驗證集上的準確率變化曲線 def plot_training_loss_acc(runner, fig_name,fig_size=(16, 6),sample_step=20,loss_legend_loc="upper right",acc_legend_loc="lower right",train_color="#e4007f",dev_color='#f19ec2',fontsize='large',train_linestyle="-",dev_linestyle='--'):plt.figure(figsize=fig_size)plt.subplot(1, 2, 1)train_items = runner.train_step_losses[::sample_step]train_steps = [x[0] for x in train_items]train_losses = [x[1] for x in train_items]plt.plot(train_steps, train_losses, color=train_color, linestyle=train_linestyle, label="Train loss")if len(runner.dev_losses) > 0:dev_steps = [x[0] for x in runner.dev_losses]dev_losses = [x[1] for x in runner.dev_losses]plt.plot(dev_steps, dev_losses, color=dev_color, linestyle=dev_linestyle, label="Dev loss")# 繪制坐標軸和圖例plt.ylabel("loss", fontsize=fontsize)plt.xlabel("step", fontsize=fontsize)plt.legend(loc=loss_legend_loc, fontsize='x-large')# 繪制評價準確率變化曲線if len(runner.dev_scores) > 0:plt.subplot(1, 2, 2)plt.plot(dev_steps, runner.dev_scores,color=dev_color, linestyle=dev_linestyle, label="Dev accuracy")# 繪制坐標軸和圖例plt.ylabel("score", fontsize=fontsize)plt.xlabel("step", fontsize=fontsize)plt.legend(loc=acc_legend_loc, fontsize='x-large')plt.savefig(fig_name)plt.show()plot_training_loss_acc(runner, 'fw-loss.pdf')?從輸出結果可以看出準確率隨著迭代次數增加逐漸上升,損失函數下降。
4.5.6 模型評價
# 加載最優模型 runner.load_model('best_model.pdparams') # 模型評價 score, loss = runner.evaluate(test_loader) print("[Test] accuracy/loss: {:.4f}/{:.4f}".format(score, loss))[Test] accuracy/loss: 1.0000/1.0183
4.5.7 模型預測?
# 獲取測試集中第一條數據 X, label = next(iter(test_loader)) logits = runner.predict(X)pred_class = torch.argmax(logits[0]).numpy() label = label[0].numpy()# 輸出真實類別與預測類別 print("The true category is {} and the predicted category is {}".format(label, pred_class))The true category is 2 and the predicted category is 2
?next(test_loader),報錯’DataLoader’ object is not an?iterator.
改成next(iter(test_loader))
思考題
1. 對比Softmax分類和前饋神經網絡分類。
Softmax分類Iris代碼:
from sklearn.datasets import load_iris import pandas import numpy as np import op import copy import torchdef softmax(X):"""輸入:- X:shape=[N, C],N為向量數量,C為向量維度"""x_max = torch.max(X, dim=1, keepdim=True)[0]#N,1x_exp = torch.exp(X - x_max)partition = torch.sum(x_exp, dim=1, keepdim=True)#N,1return x_exp / partitionclass model_SR(op.Op):def __init__(self, input_dim, output_dim):super(model_SR, self).__init__()self.params = {}# 將線性層的權重參數全部初始化為0self.params['W'] = torch.zeros(size=[input_dim, output_dim])# self.params['W'] = torch.normal(mean=0, std=0.01, shape=[input_dim, output_dim])# 將線性層的偏置參數初始化為0self.params['b'] = torch.zeros(size=[output_dim])# 存放參數的梯度self.grads = {}self.X = Noneself.outputs = Noneself.output_dim = output_dimdef __call__(self, inputs):return self.forward(inputs)def forward(self, inputs):self.X = inputs# 線性計算score = torch.matmul(self.X, self.params['W']) + self.params['b']# Softmax 函數self.outputs = softmax(score)return self.outputsdef backward(self, labels):"""輸入:- labels:真實標簽,shape=[N, 1],其中N為樣本數量"""# 計算偏導數N =labels.shape[0]labels = torch.nn.functional.one_hot(labels, self.output_dim)self.grads['W'] = -1 / N * torch.matmul(self.X.t(), (labels-self.outputs))self.grads['b'] = -1 / N * torch.matmul(torch.ones(size=[N]), (labels-self.outputs)) # 加載數據集 class RunnerV2(object):def __init__(self, model, optimizer, metric, loss_fn):self.model = modelself.optimizer = optimizerself.loss_fn = loss_fnself.metric = metric# 記錄訓練過程中的評價指標變化情況self.train_scores = []self.dev_scores = []# 記錄訓練過程中的損失函數變化情況self.train_loss = []self.dev_loss = []def train(self, train_set, dev_set, **kwargs):# 傳入訓練輪數,如果沒有傳入值則默認為0num_epochs = kwargs.get("num_epochs", 0)# 傳入log打印頻率,如果沒有傳入值則默認為100log_epochs = kwargs.get("log_epochs", 100)# 傳入模型保存路徑,如果沒有傳入值則默認為"best_model.pdparams"save_path = kwargs.get("save_path", "best_model.pdparams")# 梯度打印函數,如果沒有傳入則默認為"None"print_grads = kwargs.get("print_grads", None)# 記錄全局最優指標best_score = 0# 進行num_epochs輪訓練for epoch in range(num_epochs):X, y = train_set# 獲取模型預測logits = self.model(X)# 計算交叉熵損失trn_loss = self.loss_fn(logits, y).item()self.train_loss.append(trn_loss)# 計算評價指標trn_score = self.metric(logits, y).item()self.train_scores.append(trn_score)# 計算參數梯度self.model.backward(y)if print_grads is not None:# 打印每一層的梯度print_grads(self.model)# 更新模型參數self.optimizer.step()dev_score, dev_loss = self.evaluate(dev_set)# 如果當前指標為最優指標,保存該模型if dev_score > best_score:self.save_model(save_path)print(f"best accuracy performence has been updated: {best_score:.5f} --> {dev_score:.5f}")best_score = dev_scoreif epoch % log_epochs == 0:print(f"[Train] epoch: {epoch}, loss: {trn_loss}, score: {trn_score}")print(f"[Dev] epoch: {epoch}, loss: {dev_loss}, score: {dev_score}")def evaluate(self, data_set):X, y = data_set# 計算模型輸出logits = self.model(X)# 計算損失函數loss = self.loss_fn(logits, y).item()self.dev_loss.append(loss)# 計算評價指標score = self.metric(logits, y).item()self.dev_scores.append(score)return score, lossdef predict(self, X):return self.model(X)def save_model(self, save_path):torch.save(self.model.params, save_path)def load_model(self, model_path):self.model.params = torch.load(model_path)class MultiCrossEntropyLoss(op.Op):def __init__(self):self.predicts = Noneself.labels = Noneself.num = Nonedef __call__(self, predicts, labels):return self.forward(predicts, labels)def forward(self, predicts, labels):"""輸入:- predicts:預測值,shape=[N, 1],N為樣本數量- labels:真實標簽,shape=[N, 1]輸出:- 損失值:shape=[1]"""self.predicts = predictsself.labels = labelsself.num = self.predicts.shape[0]loss = 0for i in range(0, self.num):index = self.labels[i]loss -= torch.log(self.predicts[i][index])return loss / self.numfrom abc import abstractmethod class Optimizer(object):def __init__(self, init_lr, model):"""優化器類初始化"""# 初始化學習率,用于參數更新的計算self.init_lr = init_lr# 指定優化器需要優化的模型self.model = model@abstractmethoddef step(self):"""定義每次迭代如何更新參數"""passclass SimpleBatchGD(Optimizer):def __init__(self, init_lr, model):super(SimpleBatchGD, self).__init__(init_lr=init_lr, model=model)def step(self):# 參數更新# 遍歷所有參數,按照公式(3.8)和(3.9)更新參數if isinstance(self.model.params, dict):for key in self.model.params.keys():self.model.params[key] = self.model.params[key] - self.init_lr * self.model.grads[key] def accuracy(preds, labels):"""輸入:- preds:預測值,二分類時,shape=[N, 1],N為樣本數量,多分類時,shape=[N, C],C為類別數量- labels:真實標簽,shape=[N, 1]輸出:- 準確率:shape=[1]"""# 判斷是二分類任務還是多分類任務,preds.shape[1]=1時為二分類任務,preds.shape[1]>1時為多分類任務if preds.shape[1] == 1:data_float = torch.randn(preds.shape[0], preds.shape[1])# 二分類時,判斷每個概率值是否大于0.5,當大于0.5時,類別為1,否則類別為0# 使用'torch.cast'將preds的數據類型轉換為float32類型preds = (preds>=0.5).type(torch.float32)else:# 多分類時,使用'torch.argmax'計算最大元素索引作為類別data_float = torch.randn(preds.shape[0], preds.shape[1])preds = torch.argmax(preds,dim=1)return torch.mean(torch.eq(preds, labels).type(torch.float32)) import matplotlib.pyplot as plt def plot(runner,fig_name):plt.figure(figsize=(10,5))plt.subplot(1,2,1)epochs = [i for i in range(len(runner.train_scores))]# 繪制訓練損失變化曲線plt.plot(epochs, runner.train_loss, color='#e4007f', label="Train loss")# 繪制評價損失變化曲線plt.plot(epochs, runner.dev_loss, color='#f19ec2', linestyle='--', label="Dev loss")# 繪制坐標軸和圖例plt.ylabel("loss", fontsize='large')plt.xlabel("epoch", fontsize='large')plt.legend(loc='upper right', fontsize='x-large')plt.subplot(1,2,2)# 繪制訓練準確率變化曲線plt.plot(epochs, runner.train_scores, color='#e4007f', label="Train accuracy")# 繪制評價準確率變化曲線plt.plot(epochs, runner.dev_scores, color='#f19ec2', linestyle='--', label="Dev accuracy")# 繪制坐標軸和圖例plt.ylabel("score", fontsize='large')plt.xlabel("epoch", fontsize='large')plt.legend(loc='lower right', fontsize='x-large')plt.tight_layout()plt.savefig(fig_name)plt.show()def load_data(shuffle=True):"""加載鳶尾花數據輸入:- shuffle:是否打亂數據,數據類型為bool輸出:- X:特征數據,shape=[150,4]- y:標簽數據, shape=[150]"""# 加載原始數據X = np.array(load_iris().data, dtype=np.float32)y = np.array(load_iris().target, dtype=np.int64)X = torch.tensor(X)y = torch.tensor(y)# 數據歸一化X_min = torch.min(X, dim=0)[0]X_max = torch.max(X, dim=0)[0]X = (X-X_min) / (X_max-X_min)# 如果shuffle為True,隨機打亂數據if shuffle:idx = torch.randperm(X.shape[0])X = X[idx]y = y[idx]return X, y# 固定隨機種子 torch.manual_seed(12)num_train = 120 num_dev = 15 num_test = 15X, y = load_data(shuffle=True) X_train, y_train = X[:num_train], y[:num_train] X_dev, y_dev = X[num_train:num_train + num_dev], y[num_train:num_train + num_dev] X_test, y_test = X[num_train + num_dev:], y[num_train + num_dev:]# 輸入維度 input_dim = 4 # 類別數 output_dim = 3 # 實例化模型 model = model_SR(input_dim=input_dim, output_dim=output_dim)# 學習率 lr = 0.2# 梯度下降法 optimizer = SimpleBatchGD(init_lr=lr, model=model) # 交叉熵損失 loss_fn = MultiCrossEntropyLoss() # 準確率 metric = accuracy# 實例化RunnerV2 runner = RunnerV2(model, optimizer, metric, loss_fn)# 啟動訓練 runner.train([X_train, y_train], [X_dev, y_dev], num_epochs=150, log_epochs=10, save_path="best_model.pdparams")# 加載最優模型 runner.load_model('best_model.pdparams') # 模型評價 score, loss = runner.evaluate([X_test, y_test]) print("[Test] score/loss: {:.4f}/{:.4f}".format(score, loss))# 預測測試集數據 logits = runner.predict(X_test) # 觀察其中一條樣本的預測結果 pred = torch.argmax(logits[0]).numpy() # 獲取該樣本概率最大的類別 label = y_test[0].numpy() # 輸出真實類別與預測類別 print("The true category is {} and the predicted category is {}".format(label, pred))[Train] epoch: 130, loss: 0.572494626045227, score: 0.8416666388511658
[Dev] epoch: 130, loss: 0.5110094547271729, score: 0.800000011920929
[Train] epoch: 140, loss: 0.5590549111366272, score: 0.8500000238418579
[Dev] epoch: 140, loss: 0.49699389934539795, score: 0.800000011920929
[Test] score/loss: 0.8000/0.6527
The true category is 1 and the predicted category is 1
Softmax分類Iris決策邊界代碼如下:
# -*- coding:utf-8 -*-import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt import pandas as pd import matplotlib.patches as mpatches from sklearn.preprocessing import PolynomialFeaturesdef soft_max(X, Y, K, alpha, lamda):n = len(X[0])w = np.zeros((K,n))wnew = np.zeros((K,n))for times in range(1000):for i in range(len(X)):x = X[i]for k in range(K):y = 0if Y[i] == k:y = 1p = predict(w,x,k)g = (y-p)*xwnew[k] = w[k] + (alpha*g + lamda*w[k])w = wnew.copy()return wdef predict(w,x,k):numerator = np.exp(np.dot(w[k],x))denominator = sum(np.exp(np.dot(w,x)))return numerator/denominatordef model(w,x,K):cat = []p = [0,0,0]for i in range(len(x[:,0])):for k in range(K):p[k] = predict(w,x[i,:],k)cat.append(p.index(max(p)))return catdef extend(a, b):return 1.05*a-0.05*b, 1.05*b-0.05*adata = pd.read_csv('iris.data', header=None) columns = np.array([u'花萼長度', u'花萼寬度', u'花瓣長度', u'花瓣寬度', u'類型']) data.rename(columns=dict(zip(np.arange(5), columns)), inplace=True) data.drop(columns[:2],axis=1,inplace=True) data[u'類型'] = pd.Categorical(data[u'類型']).codesx = data[columns[2:-1]].values y = data[columns[-1]].values poly = PolynomialFeatures(2) x_p = poly.fit_transform(x)N, M = 200, 200 # 橫縱各采樣多少個值 x1_min, x1_max = extend(x[:, 0].min(), x[:, 0].max()) # 第0列的范圍 x2_min, x2_max = extend(x[:, 1].min(), x[:, 1].max()) # 第1列的范圍 t1 = np.linspace(x1_min, x1_max, N) t2 = np.linspace(x2_min, x2_max, M) x1, x2 = np.meshgrid(t1, t2)x_show = np.stack((x1.flat, x2.flat), axis=1) # 測試點 x_show_p = poly.fit_transform(x_show)K = 3 w = soft_max(x_p,y,K,0.0005,0.0000005)y_hat = np.array(model(w,x_show_p,K))y_hat = y_hat.reshape(x1.shape) # 使之與輸入的形狀相同cm_light = mpl.colors.ListedColormap(['#77E0A0', '#FF8080', '#A0A0FF']) cm_dark = mpl.colors.ListedColormap(['g', 'r', 'b']) mpl.rcParams['font.sans-serif'] = u'SimHei' mpl.rcParams['axes.unicode_minus'] = False plt.figure(facecolor='w') plt.pcolormesh(x1, x2, y_hat, cmap=cm_light) # 預測值的顯示 plt.scatter(x[:, 0], x[:, 1], s=30, c=y, edgecolors='k', cmap=cm_dark) # 樣本的顯示 x1_label, x2_label = columns[2],columns[3] plt.xlabel(x1_label, fontsize=14) plt.ylabel(x2_label, fontsize=14) plt.xlim(x1_min, x1_max) plt.ylim(x2_min, x2_max) plt.grid(b=True, ls=':') # 畫各種圖 # a = mpl.patches.Wedge(((x1_min+x1_max)/2, (x2_min+x2_max)/2), 1.5, 0, 360, width=0.5, alpha=0.5, color='r') # plt.gca().add_patch(a) patchs = [mpatches.Patch(color='#77E0A0', label='Iris-setosa'),mpatches.Patch(color='#FF8080', label='Iris-versicolor'),mpatches.Patch(color='#A0A0FF', label='Iris-virginica')] plt.legend(handles=patchs, fancybox=True, framealpha=0.8, loc='lower right') plt.title(u'鳶尾花softmax回歸分類', fontsize=17) plt.show()?神經網絡劃分決策邊界如下:
?通過決策邊界可以看出,神經網絡的劃分要更優于Softmax回歸。
2. 自定義隱藏層層數和每個隱藏層中的神經元個數,嘗試找到最優超參數完成多分類。
一個隱藏層時:
2個神經元:
[Train] epoch: 137/150, step: 1100/1200, loss: 0.45784
[Evaluate] ?dev score: 0.93333, dev loss: 0.38736
[Evaluate] ?dev score: 0.93333, dev loss: 0.38031
[Evaluate] ?dev score: 0.93333, dev loss: 0.35182
[Train] Training done!
[Test] accuracy/loss: 1.0000/0.9992
6個神經元:
[Train] epoch: 137/150, step: 1100/1200, loss: 0.28321
[Evaluate] ?dev score: 0.93333, dev loss: 0.28383
[Evaluate] ?dev score: 0.93333, dev loss: 0.27171
[Evaluate] ?dev score: 0.93333, dev loss: 0.25447
[Train] Training done!
[Test] accuracy/loss: 1.0000/1.0183?
9個神經元:
[Train] epoch: 137/150, step: 1100/1200, loss: 0.30954
[Evaluate] ?dev score: 0.86667, dev loss: 0.26464
[Evaluate] ?dev score: 0.86667, dev loss: 0.25517
[Evaluate] ?dev score: 0.86667, dev loss: 0.27032
[Train] Training done!
[Test] accuracy/loss: 1.0000/0.6159?
可以看出在訓練集和開發集上,6個神經元的效果最好,可是在測試集上效果最差
兩個隱層:
2個神經元:
[Train] epoch: 137/150, step: 1100/1200, loss: 1.07650
[Evaluate] ?dev score: 0.20000, dev loss: 1.11814
[Evaluate] ?dev score: 0.20000, dev loss: 1.10823
[Evaluate] ?dev score: 0.20000, dev loss: 1.10268
[Train] Training done!
[Test] accuracy/loss: 0.3333/1.0948
6個神經元:
[Train] epoch: 137/150, step: 1100/1200, loss: 1.10329
[Evaluate] ?dev score: 0.46667, dev loss: 1.08948
[Evaluate] ?dev score: 0.46667, dev loss: 1.10116
[Evaluate] ?dev score: 0.46667, dev loss: 1.09433
[Train] Training done!
[Test] accuracy/loss: 0.3333/1.0922
The true category is 2 and the predicted category is 2?
9個神經元:
[Train] epoch: 137/150, step: 1100/1200, loss: 1.10266
[Evaluate] ?dev score: 0.33333, dev loss: 1.10681
[Evaluate] ?dev score: 0.46667, dev loss: 1.09803
[Evaluate] ?dev score: 0.20000, dev loss: 1.11512
[Train] Training done!
[Test] accuracy/loss: 0.3333/1.0928
可以看出兩個隱層的訓練效果太差了,出現了過擬合的狀況。
關于隱藏層層數
·沒有隱藏層,進能夠表示線性可分的函數或者決策
·隱藏層數為1,可以擬合從一個有限空間到另外一個有限空間的連續映射
·隱藏層數為2,搭配適當的激活函數可以表示任意精度的任意決策邊界,可以擬合任何精度的任何平滑映射
·隱藏層數大于2:多出來的隱藏層可以學習復雜的描述(某種自動特征工程)
3. 對比SVM與FNN分類效果,談談自己看法。
SVM代碼如下:
import math # 數學 import random # 隨機 import numpy as np import matplotlib.pyplot as plt import matplotlib as mpl mpl.rcParams['font.sans-serif'] = u'SimHei'def zhichi_w(zhichi, xy, a): # 計算更新 ww = [0, 0]if len(zhichi) == 0: # 初始化的0return wfor i in zhichi:w[0] += a[i] * xy[0][i] * xy[2][i] # 更新ww[1] += a[i] * xy[1][i] * xy[2][i]return wdef zhichi_b(zhichi, xy, a): # 計算更新 bb = 0if len(zhichi) == 0: # 初始化的0return 0for s in zhichi: # 對任意的支持向量有 ysf(xs)=1 所有支持向量求解平均值sum = 0for i in zhichi:sum += a[i] * xy[2][i] * (xy[0][i] * xy[0][s] + xy[1][i] * xy[1][s])b += 1 / xy[2][s] - sumreturn b / len(zhichi)def SMO(xy, m):a = [0.0] * len(xy[0]) # 拉格朗日乘子zhichi = set() # 支持向量下標loop = 1 # 循環標記(符合KKT)w = [0, 0] # 初始化 wb = 0 # 初始化 bwhile loop:loop += 1if loop == 150:print("達到早停標準")print("循環了:", loop, "次")loop = 0break# 初始化=========================================fx = [] # 儲存所有的fxyfx = [] # 儲存所有yfx-1的值Ek = [] # Ek,記錄fx-y用于啟發式搜索E_ = -1 # 貯存最大偏差,減少計算a1 = 0 # SMO a1a2 = 0 # SMO a2# 初始化結束======================================# 尋找a1,a2======================================for i in range(len(xy[0])): # 計算所有的 fx yfx-1 Ekfx.append(w[0] * xy[0][i] + w[1] * xy[1][i] + b) # 計算 fx=wx+byfx.append(xy[2][i] * fx[i] - 1) # 計算 yfx-1Ek.append(fx[i] - xy[2][i]) # 計算 fx-yif i in zhichi: # 之前看過的不看了,防止重復找某個acontinueif yfx[i] <= yfx[a1]:a1 = i # 得到偏離最大位置的下標(數值最小的)if yfx[a1] >= 0: # 最小的也滿足KKTprint("循環了:", loop, "次")loop = 0 # 循環標記(符合KKT)置零(沒有用到)breakfor i in range(len(xy[0])): # 遍歷找間隔最大的a2if i == a1: # 如果是a1,跳過continueEi = abs(Ek[i] - Ek[a1]) # |Eki-Eka1|if Ei < E_: # 找偏差E_ = Ei # 儲存偏差的值a2 = i # 儲存偏差的下標# 尋找a1,a2結束===================================zhichi.add(a1) # a1錄入支持向量zhichi.add(a2) # a2錄入支持向量# 分析約束條件=====================================# c=a1*y1+a2*y2c = a[a1] * xy[2][a1] + a[a2] * xy[2][a2] # 求出c# n=K11+k22-2*k12if m == "xianxinghe": # 線性核n = xy[0][a1] ** 2 + xy[1][a1] ** 2 + xy[0][a2] ** 2 + xy[1][a2] ** 2 - 2 * (xy[0][a1] * xy[0][a2] + xy[1][a1] * xy[1][a2])elif m == "duoxiangshihe": # 多項式核(這里是二次)n = (xy[0][a1] ** 2 + xy[1][a1] ** 2) ** 2 + (xy[0][a2] ** 2 + xy[1][a2] ** 2) ** 2 - 2 * (xy[0][a1] * xy[0][a2] + xy[1][a1] * xy[1][a2]) ** 2else: # 高斯核 取 2σ^2 = 1n = 2 * math.exp(-1) - 2 * math.exp(-((xy[0][a1] - xy[0][a2]) ** 2 + (xy[1][a1] - xy[1][a2]) ** 2))# 確定a1的可行域=====================================if xy[2][a1] == xy[2][a2]:L = max(0.0, a[a1] + a[a2] - 0.5) # 下界H = min(0.5, a[a1] + a[a2]) # 上界else:L = max(0.0, a[a1] - a[a2]) # 下界H = min(0.5, 0.5 + a[a1] - a[a2]) # 上界if n > 0:a1_New = a[a1] - xy[2][a1] * (Ek[a1] - Ek[a2]) / n # a1_New = a1_old-y1(e1-e2)/n# print("x1=",xy[0][a1],"y1=",xy[1][a1],"z1=",xy[2][a1],"x2=",xy[0][a2],"y2=",xy[1][a2],"z2=",xy[2][a2],"a1_New=",a1_New)# 越界裁剪============================================================if a1_New >= H:a1_New = Helif a1_New <= L:a1_New = Lelse:a1_New = min(H, L)# 參數更新=======================================a[a2] = a[a2] + xy[2][a1] * xy[2][a2] * (a[a1] - a1_New) # a2更新a[a1] = a1_New # a1更新w = zhichi_w(zhichi, xy, a) # 更新wb = zhichi_b(zhichi, xy, a) # 更新b# print("W=", w, "b=", b, "zhichi=", zhichi, "a1=", a[a1], "a2=", a[a2])# 標記支持向量======================================for i in zhichi:if a[i] == 0: # 選了,但值仍為0loop = loop + 1e = 'silver'else:if xy[2][i] == 1:e = 'b'else:e = 'r'plt.scatter(x1[0][i], x1[1][i], c='none', s=100, linewidths=1, edgecolor=e)print("支持向量數為:", len(zhichi), "\na為零支持向量:", loop)print("有用向量數:", len(zhichi) - loop)# 返回數據 w b =======================================return [w, b]def panduan(xyz, w_b1, w_b2):c = 0for i in range(len(xyz[0])):if (xyz[0][i] * w_b1[0][0] + xyz[1][i] * w_b1[0][1] + w_b1[1]) * xyz[2][i][0] < 0:c = c + 1continueif (xyz[0][i] * w_b2[0][0] + xyz[1][i] * w_b2[0][1] + w_b2[1]) * xyz[2][i][1] < 0:c = c + 1continuereturn (1 - c / len(xyz[0])) * 100def huitu(x1, x2, wb1, wb2, name):x = [x1[0][:], x1[1][:], x1[2][:]]for i in range(len(x[2])): # 對訓練集‘上色’if x[2][i] == [1, 1]:x[2][i] = 'r' # 訓練集 1 1 紅色elif x[2][i] == [-1, 1]:x[2][i] = 'g' # 訓練集 -1 1 綠色else:x[2][i] = 'b' # 訓練集 -1 -1 藍色plt.scatter(x[0], x[1], c=x[2], alpha=0.8) # 繪點訓練集x = [x2[0][:], x2[1][:], x2[2][:]]for i in range(len(x[2])): # 對測試集‘上色’if x[2][i] == [1, 1]:x[2][i] = 'orange' # 訓練集 1 1 橙色elif x[2][i] == [-1, 1]:x[2][i] = 'y' # 訓練集 -1 1 黃色else:x[2][i] = 'm' # 訓練集 -1 -1 紫色plt.scatter(x[0], x[1], c=x[2], alpha=0.8) # 繪點測試集plt.xlabel('x') # x軸標簽plt.ylabel('y') # y軸標簽plt.title(name) # 標題xl = np.arange(min(x[0]), max(x[0]), 0.1) # 繪制分類線一yl = (-wb1[0][0] * xl - wb1[1]) / wb1[0][1]plt.plot(xl, yl, 'r')xl = np.arange(min(x[0]), max(x[0]), 0.1) # 繪制分類線二yl = (-wb2[0][0] * xl - wb2[1]) / wb2[0][1]plt.plot(xl, yl, 'b')# 主函數======================================================= f = open('Iris.txt', 'r') # 讀文件 x = [[], [], [], [], []] # 花朵屬性,(0,1,2,3),花朵種類 while 1:yihang = f.readline() # 讀一行if len(yihang) <= 1: # 讀到末尾結束breakfenkai = yihang.split('\t') # 按\t分開for i in range(4): # 分開的四個值x[i].append(eval(fenkai[i])) # 化為數字加到x中if (eval(fenkai[4]) == 1): # 將標簽化為向量形式x[4].append([1, 1])else:if (eval(fenkai[4]) == 2):x[4].append([-1, 1])else:x[4].append([-1, -1])print('數據集=======================================================') print(len(x[0])) # 數據大小 # 選擇數據=================================================== shuxing1 = eval(input("選取第一個屬性:")) if shuxing1 < 0 or shuxing1 > 4:print("無效選項,默認選擇第1項")shuxing1 = 1 shuxing2 = eval(input("選取第一個屬性:")) if shuxing2 < 0 or shuxing2 > 4 or shuxing1 == shuxing2:print("無效選項,默認選擇第2項")shuxing2 = 2 # 生成數據集================================================== lt = list(range(150)) # 得到一個順序序列 random.shuffle(lt) # 打亂序列 x1 = [[], [], []] # 初始化x1 x2 = [[], [], []] # 初始化x2 for i in lt[0:100]: # 截取部分做訓練集x1[0].append(x[shuxing1][i]) # 加上數據集x屬性x1[1].append(x[shuxing2][i]) # 加上數據集y屬性x1[2].append(x[4][i]) # 加上數據集c標簽 for i in lt[100:150]: # 截取部分做測試集x2[0].append(x[shuxing1][i]) # 加上數據集x屬性x2[1].append(x[shuxing2][i]) # 加上數據集y屬性x2[2].append(x[4][i]) # 加上數據集c標簽 print('\n\n開始訓練==============================================') print('\n線性核==============================================') # 計算 w b============================================ plt.figure(1) # 第一張畫布 x = [x1[0][:], x1[1][:], []] # 第一次分類 for i in x1[2]:x[2].append(i[0]) # 加上數據集標簽 wb1 = SMO(x, "SVM") x = [x1[0][:], x1[1][:], []] # 第二次分類 for i in x1[2]:x[2].append(i[1]) # 加上數據集標簽 wb2 = SMO(x, "SVM") print("w1為:", wb1[0], " b1為:", wb1[1]) print("w2為:", wb2[0], " b2為:", wb2[1]) # 計算正確率=========================================== print("訓練集上的正確率為:", panduan(x1, wb1, wb2), "%") print("測試集上的正確率為:", panduan(x2, wb1, wb2), "%") # 繪圖 =============================================== # 圈著的是曾經選中的值,灰色的是選中但更新為0 huitu(x1, x2, wb1, wb2, "SVM") print('\n多項式核============================================') # 計算 w b============================================ plt.figure(2) # 第二張畫布 x = [x1[0][:], x1[1][:], []] # 第一次分類 for i in x1[2]:x[2].append(i[0]) # 加上數據集標簽 wb1 = SMO(x, "SVM") x = [x1[0][:], x1[1][:], []] # 第二次分類 for i in x1[2]:x[2].append(i[1]) # 加上數據集標簽 wb2 = SMO(x, "SVM") print("w1為:", wb1[0], " b1為:", wb1[1]) print("w2為:", wb2[0], " b2為:", wb2[1]) # 計算正確率=========================================== print("訓練集上的正確率為:", panduan(x1, wb1, wb2), "%") print("測試集上的正確率為:", panduan(x2, wb1, wb2), "%") # 繪圖 =============================================== # 圈著的是曾經選中的值,灰色的是選中但更新為0 huitu(x1, x2, wb1, wb2, "SVM") print('\n高斯核==============================================') # 計算 w b============================================ plt.figure(3) # 第三張畫布 x = [x1[0][:], x1[1][:], []] # 第一次分類 for i in x1[2]:x[2].append(i[0]) # 加上數據集標簽 wb1 = SMO(x, "SVM") x = [x1[0][:], x1[1][:], []] # 第二次分類 for i in x1[2]:x[2].append(i[1]) # 加上數據集標簽 wb2 = SMO(x, "SVM") print("w1為:", wb1[0], " b1為:", wb1[1]) print("w2為:", wb2[0], " b2為:", wb2[1]) # 計算正確率=========================================== print("訓練集上的正確率為:", panduan(x1, wb1, wb2), "%") print("測試集上的正確率為:", panduan(x2, wb1, wb2), "%") # 繪圖 =============================================== # 圈著的是曾經選中的值,灰色的是選中但更新為0 huitu(x1, x2, wb1, wb2, "SVM") # 顯示所有圖 plt.show() # 顯示數據集=======================================================
150
選取第一個屬性:1
選取第一個屬性:2
開始訓練==============================================
循環了: 3 次
支持向量數為: 2?
a為零支持向量: 0
有用向量數: 2
循環了: 55 次
支持向量數為: 54?
a為零支持向量: 44
有用向量數: 10
w1為: [0.6000000000000001, -1.2] ?b1為: 1.38
w2為: [-0.34999999999999987, -1.0] ?b2為: 5.924444444444445
訓練集上的正確率為: 92.0 %
測試集上的正確率為: 94.0 %
?二者在形式上有幾分相似,但實際上有很大不同。
簡而言之,神經網絡是個“黑匣子”,優化目標是基于經驗風險最小化,易陷入局部最優,訓練結果不太穩定,一般需要大樣本;
而支持向量機有嚴格的理論和數學基礎,基于結構風險最小化原則, 泛化能力優于前者,算法具有全局最優性, 是針對小樣本統計的理論。
目前來看,雖然二者均為機器學習領域非常流行的方法,但后者在很多方面的應用一般都優于前者。
4. 嘗試基于MNIST手寫數字識別數據集,設計合適的前饋神經網絡進行實驗,并取得95%以上的準確率。
import torch import torch.nn as nn from torch.utils.data import DataLoader, TensorDataset import torch.optim as optim from scipy.io import loadmat from torch.autograd import Variable import torch.nn.functional as F from sklearn.preprocessing import OneHotEncoder from sklearn.metrics import accuracy_score, roc_auc_score, precision_score, recall_score from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt import numpy as np import time import warnings warnings.filterwarnings("ignore")def LoadMatFile(dataset='mnist'):if dataset == 'usps':X = loadmat('usps_train.mat')X = X['usps_train']y = loadmat('usps_train_labels.mat')y = y['usps_train_labels']else:X = loadmat('mnist_train.mat')X = X['mnist_train']y = loadmat('mnist_train_labels.mat')y = y['mnist_train_labels']return X, yloss = nn.CrossEntropyLoss()class Recognition(nn.Module):def __init__(self, dim_input, dim_output, depth=3):super(Recognition, self).__init__()self.depth = depthself.linear = nn.Linear(dim_input, dim_input)self.final = nn.Linear(dim_input, dim_output)self.relu = nn.ReLU()def net(self):nets = nn.ModuleList()for i in range(self.depth-1):nets.append(self.linear)nets.append(self.relu)nets.append(self.final)return netsdef forward(self, X):nets = self.net()for n in nets:X = n(X)y_pred = torch.sigmoid(X)return y_preddef props_to_onehot(props):if isinstance(props, list):props = np.array()a = np.argmax(props, axis=1)b = np.zeros((len(a), props.shape[1]))b[np.arange(len(a)), a] = 1return bdef process_labels(y):labels = dict()for i in y:if i[0] in labels:continueelse:labels[i[0]] = len(labels)Y = []for i in y:Y.append([labels[i[0]]])return np.array(Y), len(labels)def eval(y_hat, y):y_hat = y_hat.detach().numpy()encoder = OneHotEncoder(categories='auto')y = encoder.fit_transform(y)y = y.toarray()roc = roc_auc_score(y, y_hat, average='micro')y_hat = props_to_onehot(y_hat)acc = accuracy_score(y, y_hat)precision = precision_score(y, y_hat, average='macro')recall = recall_score(y, y_hat, average='macro')return acc, precision, roc, recallif __name__ == "__main__":data_name = 'usps' # usps, mnistdepth = 2epoch = 1000lr = 0.01batch_size = 32test_size = 0.2 # train: test = 8:2X, y = LoadMatFile(data_name)y, num_output = process_labels(y)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size)X_train = torch.FloatTensor(X_train)X_test = torch.FloatTensor(X_test)y_train = torch.LongTensor(y_train)y_test = torch.LongTensor(y_test)dataset = TensorDataset(X_train, y_train)dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)net = Recognition(X_train.shape[1], num_output, depth=depth)optimzer = optim.SGD(net.parameters(), lr=lr)loss_history = []epoch_his = []acc_history = []roc_history = []recall_history = []precision_history = []start = time.time()for i in range(epoch):epoch_his.append(i)print('epoch ', i)net.train()for X, y in dataloader:X = Variable(X)y_pred = net(X)l = loss(y_pred, y.squeeze()).sum()l.backward()optimzer.step()optimzer.zero_grad()loss_history.append(l)net.eval()y_hat = net(X_test)acc, p, roc, recall = eval(y_hat, y_test)acc_history.append(acc)recall_history.append(recall)roc_history.append(roc)precision_history.append(p)print('loss:{}, acc:{}, precision:{}, roc:{}, recall:{}'.format(l, acc, p, roc, recall))elapsed = (time.time() - start)print('total time: {}'.format(elapsed))plt.plot(np.array(epoch_his), np.array(loss_history), label='loss')plt.legend()plt.savefig('loss_{}_depth{}_lr{}_epoch{}_batch{}.png'.format(data_name, depth, lr, epoch, batch_size))plt.show()plt.plot(np.array(epoch_his), np.array(acc_history), label='acc')plt.plot(np.array(epoch_his), np.array(precision_history), label='precision')plt.plot(np.array(epoch_his), np.array(roc_history), label='roc_auc')plt.plot(np.array(epoch_his), np.array(recall_history), label='recall')plt.legend()plt.savefig('metrics_{}_depth{}_lr{}_epoch{}_batch{}.png'.format(data_name, depth, lr, epoch, batch_size))plt.show()?epoch ?997
loss:1.4835679531097412, acc:0.9483870967741935, precision:0.9453482437669699, roc:0.9966766870929201, recall:0.9418365760280014
epoch ?998
loss:1.4650973081588745, acc:0.9494623655913978, precision:0.9463293024467889, roc:0.9966754024228877, recall:0.9432651474565729
epoch ?999
loss:1.4738668203353882, acc:0.9483870967741935, precision:0.9453482437669699, roc:0.9966756593568943, recall:0.9418365760280014
?
?在訓練1000次后,識別準確率達到了95%
總結
更深入了解了神經網絡,同時復習了上學期的SVM。
神經網絡的優勢要在數據量很大,計算力很強的時候才能體現,數據量小的話,很多任務上的表現都不是很好。
SVM屬于非參數方法,擁有很強的理論基礎和統計保障。損失函數擁有全局最優解,而且當數據量不大的時候,收斂速度很快,超參數即便需要調整,但也有具體的含義,比如高斯kernel的大小可以理解為數據點之間的中位數距離。在神經網絡普及之前,引領了機器學習的主流,那時候理論和實驗都同樣重要。
思維導圖
參考
如何確定神經網絡的層數和隱藏層神經元數量 - 知乎
機器學習之神經網絡與支持向量機_WihauShe的博客-CSDN博客_支持向量機和神經網絡
總結
以上是生活随笔為你收集整理的HBU-NNDL 实验五 前馈神经网络(3)鸢尾花分类的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 反思 大班 快乐的机器人_幼儿园大班教案
- 下一篇: 程序员创作的歌曲《三月五色风》正在欢唱中