當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

李宏毅的可解释模型——三个任务

發(fā)布時(shí)間：2023/12/20 编程问答 42 豆豆

生活随笔收集整理的這篇文章主要介紹了李宏毅的可解释模型——三个任务小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

1、問題

觀看了李宏毅老師的機(jī)器學(xué)習(xí)進(jìn)化課程之可解釋的機(jī)器學(xué)習(xí)，課程中對(duì)主要是針對(duì)黑盒模型進(jìn)行白盒模型轉(zhuǎn)化的技巧和方法進(jìn)行了簡(jiǎn)單介紹，詳細(xì)細(xì)節(jié)可以參考《Interpretable Machine Learning》。像一些線性模型、樹形模型是可解釋的ML model，但是，深度學(xué)習(xí)一直被稱為“黑盒子”，是end-to-end模型，即忽略內(nèi)部計(jì)算，只關(guān)心輸入端和輸出端。然后就有不少人想要知道深度學(xué)習(xí)模型訓(xùn)練之后學(xué)到了什么，是否真的是人類設(shè)想的，學(xué)到了某些物體的輪廓和紋理，又是如何判別的，這些疑問可以從中間結(jié)果、或?qū)⒛Ｐ娃D(zhuǎn)換為可解釋的ML model等多種途徑解決。

可解釋性（非數(shù)學(xué)定義）：可解釋性是一個(gè)人能夠理解決策原因的程度

可解釋的機(jī)器學(xué)習(xí)是它捕獲“從機(jī)器學(xué)習(xí)模型中提取與數(shù)據(jù)中包含的活模型學(xué)習(xí)的關(guān)系相關(guān)的知識(shí)”。

1.1、可解釋模型的分類

機(jī)器學(xué)習(xí)的可解釋性的方法可以根據(jù)各種標(biāo)準(zhǔn)進(jìn)行分類

（1）Intrinsic or post hoc？

該標(biāo)準(zhǔn)區(qū)分了可解釋性是通過限制機(jī)器學(xué)習(xí)模型的復(fù)雜性（Intrinsic）還是通過應(yīng)用訓(xùn)練后分析模型的方法（Post hoc）來實(shí)現(xiàn)。Intrinsic是指由于結(jié)構(gòu)簡(jiǎn)單而被認(rèn)為是可解釋的機(jī)器學(xué)習(xí)模型，如線性模型、樹形模型，Post hoc是訓(xùn)練后解釋方法的應(yīng)用，如排列特征重要性。

（2）Model-specific or model-agnostic??

模型是特定的還是不可知的，特定于模型（Model-specific）的解釋工具僅限于特定的模型類。線性模型中回歸權(quán)重的解釋是特定于模型的解釋，因?yàn)楦鶕?jù)定義，內(nèi)在可解釋模型的解釋總是特定于模型的。僅用于解釋例如神經(jīng)網(wǎng)絡(luò)的工具是特定于模型的。與模型無關(guān)的工具可以用在任何機(jī)器學(xué)習(xí)模型上，并且在模型已經(jīng)被訓(xùn)練(事后)之后被應(yīng)用。這些不可知的方法（model-agnostic）通常通過分析特征輸入和輸出對(duì)來工作。根據(jù)定義，這些方法不能訪問模型內(nèi)部，如重量或結(jié)構(gòu)信息。

（3）Local or global？

模型是局部還是整體的，即是解釋單個(gè)預(yù)測(cè)還是整體模型。

2、任務(wù)

3、解析

3.1 任務(wù)一：計(jì)算梯度

我們都知道神經(jīng)網(wǎng)絡(luò)模型是分為forward和backward兩個(gè)部分，前向傳播（forward）是從輸入端到輸出端,通過各個(gè)網(wǎng)絡(luò)層的隱節(jié)點(diǎn)產(chǎn)生輸出。后向傳播是定義一個(gè)損失函數(shù)，將損失函數(shù)的信息向后傳播用以計(jì)算梯度，從而達(dá)到優(yōu)化網(wǎng)絡(luò)參數(shù)的目的。

3.1.1 前提

（1）stop_gradient

查看一個(gè)Tensor是否計(jì)算并傳播梯度，如果stop_gradinet為true,則該Tensor不會(huì)計(jì)算梯度，并會(huì)阻絕Autograd的梯度傳播，反之，則進(jìn)行梯度計(jì)算和傳播

（2）grad

查看一個(gè)Tensor的梯度，數(shù)據(jù)類型是numpy.ndarray。

（3）backward

調(diào)用backward，自動(dòng)計(jì)算梯度，并將結(jié)果存在grad屬性中。backward()會(huì)累積梯度，還提供了clear_grad()函數(shù)來清除當(dāng)前Tensor的梯度。

（4）自動(dòng)微分運(yùn)行機(jī)制

飛漿的神經(jīng)網(wǎng)絡(luò)核心是自動(dòng)微分（Tensorflow和Pytorch也有自動(dòng)微分機(jī)制），飛槳的自動(dòng)微分是通過trace的方式，記錄前向OP的執(zhí)行，并自動(dòng)創(chuàng)建反向var和添加相應(yīng)的反向OP，然后來實(shí)現(xiàn)反向梯度計(jì)算的。以a=wx+b為例，細(xì)節(jié)參考。

3.1.2 實(shí)現(xiàn)

（1）處理數(shù)據(jù)

import paddle import os import pandas import cv2 import numpy as np from paddle.io import Dataset,DataLoader from paddlenlp.datasets import MapDatasettrain_path = 'data/food-11/training' val_path = 'data/food-11/validation' test_path = 'data/food-11/testing'#圖片大小不一致 def data_loader(path):filelist = os.listdir(path)data = [] for i in filelist[:1000]:img = cv2.imread(path+'/'+i)img = cv2.resize(img,(512,512))# 讀入的圖像數(shù)據(jù)格式是[H, W, C]# 使用轉(zhuǎn)置操作將其變成[C, H, W]img = np.transpose(img, (2,0,1))img = np.array(img,dtype='float32')label = int(i.split('_')[0])data.append((img,label))return datatrain_data = data_loader(train_path)train_loader = paddle.io.DataLoader(MapDataset(train_data), return_list=True, shuffle=True, batch_size=5, drop_last=True)

（2）定義模型

import paddle import numpy as np from paddle.nn import Conv2D, MaxPool2D, Linear## 組網(wǎng) import paddle.nn.functional as F# 定義 LeNet 網(wǎng)絡(luò)結(jié)構(gòu) class LeNet(paddle.nn.Layer):def __init__(self, num_classes=1):super(LeNet, self).__init__()# 創(chuàng)建卷積和池化層# 創(chuàng)建第1個(gè)卷積層self.conv1 = Conv2D(in_channels=3, out_channels=32, kernel_size=128)self.max_pool1 = MaxPool2D(kernel_size=4, stride=2)# 尺寸的邏輯：池化層未改變通道數(shù)；當(dāng)前通道數(shù)為6# 創(chuàng)建第2個(gè)卷積層self.conv2 = Conv2D(in_channels=32, out_channels=64, kernel_size=128)self.max_pool2 = MaxPool2D(kernel_size=4, stride=2)# 創(chuàng)建第3個(gè)卷積層self.conv3 = Conv2D(in_channels=64, out_channels=128, kernel_size=64)self.max_pool3 = MaxPool2D(kernel_size=4, stride=2)# 創(chuàng)建第4個(gè)卷積層self.conv4 = Conv2D(in_channels=128, out_channels=256, kernel_size=32)self.max_pool4 = MaxPool2D(kernel_size=4, stride=2)# 創(chuàng)建第5個(gè)卷積層self.conv5 = Conv2D(in_channels=256, out_channels=512, kernel_size=16)self.max_pool5 = MaxPool2D(kernel_size=4, stride=2)# 創(chuàng)建第6個(gè)卷積層self.conv6 = Conv2D(in_channels=512, out_channels=1024, kernel_size=8)self.max_pool6 = MaxPool2D(kernel_size=4, stride=2)# 尺寸的邏輯：輸入層將數(shù)據(jù)拉平[B,C,H,W] -> [B,C*H*W]# 輸入size是[28,28]，經(jīng)過三次卷積和兩次池化之后，C*H*W等于120self.fc1 = Linear(in_features=1024, out_features=64)# 創(chuàng)建全連接層，第一個(gè)全連接層的輸出神經(jīng)元個(gè)數(shù)為64，第二個(gè)全連接層輸出神經(jīng)元個(gè)數(shù)為分類標(biāo)簽的類別數(shù)self.fc2 = Linear(in_features=64, out_features=num_classes)# 網(wǎng)絡(luò)的前向計(jì)算過程def forward(self, x):x = self.conv1(x)# 每個(gè)卷積層使用Sigmoid激活函數(shù)，后面跟著一個(gè)2x2的池化x = F.sigmoid(x)x = self.max_pool1(x)x = F.sigmoid(x)x = self.conv2(x)x = self.max_pool2(x)x = self.conv3(x)x = self.max_pool3(x)x = self.conv4(x)x = self.max_pool4(x)x = self.conv5(x)x = self.max_pool5(x)x = self.conv6(x)x = self.max_pool6(x)# 尺寸的邏輯：輸入層將數(shù)據(jù)拉平[B,C,H,W] -> [B,C*H*W]x = paddle.reshape(x, [x.shape[0], -1])x = self.fc1(x)x = F.sigmoid(x)x = self.fc2(x)return xmodel_pre = paddle.Model(model)

（3）訓(xùn)練，并保存梯度數(shù)據(jù)

def normalize(image):return (image - image.min()) / (image.max() - image.min())def compute_saliency_maps(x, y, model):# 計(jì)算梯度x.stop_gradient = False#模型訓(xùn)練y_pred = model(x)loss_func = paddle.nn.loss.CrossEntropyLoss()loss = loss_func(y_pred, y)#反向傳播loss.backward()#獲取反向傳播的梯度saliencies = np.abs(x.grad)# saliencies: (batches, channels, height, weight)# 因?yàn)榻酉聛砦覀円獙?duì)每張圖片畫 saliency map，每張圖片的 gradient scale 很可能有巨大落差# 可能第一張圖片的 gradient 在 100 ~ 1000，但第二張圖片的 gradient 在 0.001 ~ 0.0001# 如果我們用同樣的色階去畫每一張 saliency 的話，第一張可能就全部都很亮，第二張就全部都很暗，# 如此就看不到有意義的結(jié)果，我們想看的是「單一張 saliency 內(nèi)部的大小關(guān)係」，# 所以這邊我們要對(duì)每張 saliency 各自做 normalize。手法有很多種，這邊只採(cǎi)用最簡(jiǎn)單的saliencies = np.stack([normalize(item) for item in saliencies])return saliencies#這里遍歷了所有數(shù)據(jù)，但是只存最后一組， for images,labels in train_loader:# print(images,labels)saliencies = compute_saliency_maps(images, labels, model)

（4）繪圖

import cv2 import matplotlib.pyplot as plt import paddle # 使用 matplotlib 畫出來，batch_size=5，默認(rèn)畫5張 fig, axs = plt.subplots(2, 5, figsize=(15, 8)) index = 0 saliencies = paddle.to_tensor(saliencies) for row, target in enumerate([images, saliencies]):for column, img in enumerate(target):img = img.numpy()axs[row][column].imshow(img[0])plt.show() plt.close()

3.2 任務(wù)二：中間結(jié)果展示

CNN模型在卷積的過程中，我們認(rèn)為不同的濾波器能學(xué)習(xí)到不同的圖像紋理或邊緣特征，但是模型訓(xùn)練是一體化，并不展示中間結(jié)果，因此，hook機(jī)制應(yīng)運(yùn)而生。

3.2.1 前提

（1）model的執(zhí)行模式

模型的執(zhí)行模式有兩種，如果需要訓(xùn)練的話調(diào)用?train()?，如果只進(jìn)行前向執(zhí)行則調(diào)用?eval()

（2）Hook的應(yīng)用

hook是一個(gè)作用于變量的自定義函數(shù)，在模型執(zhí)行時(shí)調(diào)用。對(duì)于注冊(cè)在層上的hook函數(shù)，可以分為pre_hook和post_hook兩種。pre_hook可以對(duì)層的輸入變量進(jìn)行處理，用函數(shù)的返回值作為新的變量參與層的計(jì)算。post_hook則可以對(duì)層的輸出變量進(jìn)行處理，將層的輸出進(jìn)行進(jìn)一步處理后，用函數(shù)的返回值作為層計(jì)算的輸出。細(xì)節(jié)參考

? ? ? ? hook的實(shí)現(xiàn)步驟：

? ? ? ? 1. 定義一個(gè)對(duì)feature進(jìn)行處理的函數(shù)，比如叫hook_fun

? ? ? ? 2. 注冊(cè)hook：告訴模型，我將在哪些層使用hook_fun來處理feature

register_forward_pre_hook(pre_hook)

import paddle import numpy as np# the forward_post_hook change the input of the layer: input = input * 2 def forward_pre_hook(layer, input):# user can use layer and input for information statistis tasks# change the inputinput_return = (input[0] * 2)return input_returnlinear = paddle.nn.Linear(13, 5) # register the hook forward_pre_hook_handle = linear.register_forward_pre_hook(forward_pre_hook) value0 = np.arange(26).reshape(2, 13).astype("float32") in0 = paddle.to_tensor(value0) out0 = linear(in0)# remove the hook forward_pre_hook_handle.remove() value1 = value0 * 2 in1 = paddle.to_tensor(value1) out1 = linear(in1)# hook change the linear's input to input * 2, so out0 is equal to out1. assert (out0.numpy() == out1.numpy()).any()

register_forward_post_hook(post_hook)

import paddle import numpy as np# the forward_post_hook change the output of the layer: output = output * 2 def forward_post_hook(layer, input, output):# user can use layer, input and output for information statistis tasks# change the outputreturn output * 2linear = paddle.nn.Linear(13, 5) # register the hook forward_post_hook_handle = linear.register_forward_post_hook(forward_post_hook) value1 = np.arange(26).reshape(2, 13).astype("float32") in1 = paddle.to_tensor(value1) out0 = linear(in1)# remove the hook forward_post_hook_handle.remove() out1 = linear(in1)# hook change the linear's output to output * 2, so out0 is equal to out1 * 2. assert (out0.numpy() == (out1.numpy()) * 2).any()

3.2.2 實(shí)現(xiàn)

模型是基于任務(wù)一，可以通過summary()打印網(wǎng)絡(luò)的基礎(chǔ)結(jié)構(gòu)和參數(shù)信息，即

model_pre.summary()

模型結(jié)構(gòu)是6層卷積池化層，2層全連接層，我們通過hook機(jī)制來輸出中間結(jié)果，可以定一個(gè)全局變量,來記錄中間結(jié)果值。

#定義hook函數(shù) def hook(model,input,output):global llll = output#對(duì)模型第三層卷積層進(jìn)行輸出 hook_ll = model.conv3.register_forward_post_hook(hook) #遍歷數(shù)據(jù)，并代入到模型中 for i in train_loader:model(i[0])#待執(zhí)行完，ll變量里邊存儲(chǔ)中間結(jié)果數(shù)據(jù)，是一個(gè)四維數(shù)組， #第一維是train_loader里邊的batch_size個(gè)數(shù)，即圖片個(gè)數(shù) #第二維是濾波器的個(gè)數(shù) #第三維和第四維是當(dāng)前卷積層卷積之后的output的寬高 print(ll[:,1,:,:])

畫圖：

import cv2 import matplotlib.pyplot as plt import paddle fig, axs = plt.subplots(1, 5, figsize=(15, 6))for i,img in enumerate(ll[:,1,:,:]) :# print(i,img)img = img.numpy()axs[i].imshow(img)plt.show() plt.close()

3.3 任務(wù)三：LIME & SHAP 的應(yīng)用

3.3.1 LIME & SHAP的工具使用

（1）LIME

LIME（Local?Interpretable?Model-AgnosticExplanations）算法是為了解釋某個(gè)樣本在模型中的信息，幫助理解模型。原理參考

以iris數(shù)據(jù)集為數(shù)據(jù)源的LIME實(shí)例：

import lime import sklearn import numpy as np import sklearn.ensemble import sklearn.metrics import matplotlib.pyplot as plt from sklearn.datasets import load_iris from sklearn.ensemble import GradientBoostingClassifier from lime.lime_tabular import LimeTabularExplainer #讀取數(shù)據(jù) # categories = ['alt.atheism', 'soc.religion.christian'] # newsgroups_train = fetch_20newsgroups(subset='train', categories=categories) # newsgroups_test = fetch_20newsgroups(subset='test', categories=categories) # class_names = ['atheism', 'christian']# data = pd.DataFrame(newsgroups_train.data) data = load_iris()feature_names = data.feature_names class_names = data.target_names #利用GBDT分類模型區(qū)分是否違約x =data.data y = data.targetgbdt = GradientBoostingClassifier() gbdt = gbdt.fit(x[:140],y[:140]) #直接將訓(xùn)練數(shù)據(jù)作為預(yù)測(cè)數(shù)據(jù) pred = gbdt.score(x[140:],y[140:])#中文字體顯示 plt.rc('font', family='SimHei', size=13)#建立解釋器 explainer = LimeTabularExplainer(x, feature_names=feature_names, class_names=class_names) #解釋第81個(gè)樣本的規(guī)則 exp = explainer.explain_instance(x[81], gbdt.predict_proba) #畫圖 fig = exp.as_pyplot_figure() #畫分析圖 exp.show_in_notebook(show_table=True, show_all=False)

（2）SHAP

SHAP（SHapley?Additive exPlanation）是另一種可解釋方法的模型。具體實(shí)現(xiàn)細(xì)節(jié)就不深究了。但是，提供了多種模型的解釋器：

細(xì)節(jié)參考

（1）特征重要性

使用XGboost模型去訓(xùn)練iris數(shù)據(jù)集，并通過特征重要性排序，來解釋各項(xiàng)特征對(duì)模型的影響力。

import xgboost as xgb import pandas as pd import numpy as np import matplotlib.pyplot as plt; plt.style.use('seaborn')import sklearn import numpy as np import sklearn.ensemble import sklearn.metrics from sklearn.datasets import load_iris from sklearn.ensemble import GradientBoostingClassifier #讀取數(shù)據(jù) # categories = ['alt.atheism', 'soc.religion.christian'] # newsgroups_train = fetch_20newsgroups(subset='train', categories=categories) # newsgroups_test = fetch_20newsgroups(subset='test', categories=categories) # class_names = ['atheism', 'christian']# data = pd.DataFrame(newsgroups_train.data) data = load_iris()feature_names = data.feature_names class_names = data.target_names #利用GBDT分類模型區(qū)分是否違約x =data.data y = data.target# 訓(xùn)練xgboost回歸模型 model = xgb.XGBRegressor(max_depth=4, learning_rate=0.05, n_estimators=150) model.fit(x, y)# 獲取feature importance plt.figure(figsize=(15, 5)) plt.bar(range(len(feature_names)), model.feature_importances_) plt.xticks(range(len(feature_names)), feature_names, rotation=-45, fontsize=14) plt.title('Feature importance', fontsize=14)

（2）通過shap進(jìn)行分析，計(jì)算shap_values值

import shap # model是在第1節(jié)中訓(xùn)練的模型 explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(x) print(shap_values.shape)

獲取單個(gè)樣本的shape值

j = 30 f_explainer = pd.DataFrame() f_explainer['feature'] = feature_names f_explainer['feature_value'] = x[j] f_explainer['shap_value'] = shap_values[j] print(f_explainer)

確定模型的基線：

#基線值就是訓(xùn)練集的目標(biāo)變量的擬合值的均值 #一個(gè)樣本中各特征SHAP值的和加上基線值應(yīng)該等于該樣本的預(yù)測(cè)值 y_base = explainer.expected_value print(y_base)

繪圖

shap.initjs() shap.force_plot(explainer.expected_value, shap_values[j], x[j])

#對(duì)特征總體進(jìn)行分析 shap.summary_plot(shap_values, x)

#對(duì)特征總體進(jìn)行分析，繪制柱形圖 shap.summary_plot(shap_values,x , plot_type="bar")

#某個(gè)特征對(duì)shap_value的影響 shap.dependence_plot('Feature 2', shap_values, x, interaction_index=None, show=False)

?多變量之間的分析

#多個(gè)變量的交互分析 shap_interaction_values = shap.TreeExplainer(model).shap_interaction_values(x) shap.summary_plot(shap_interaction_values, x, max_display=4)#兩個(gè)變量交互下變量對(duì)目標(biāo)值的影響 shap.dependence_plot('Feature 2', shap_values,x , interaction_index='Feature 1', show=False)

3.3.2 實(shí)現(xiàn)

（1）LIME

from lime.lime_image import LimeImageExplainer#處理數(shù)據(jù) images=[] labels = [] for image,label in train_loader:for i in image:i = i.numpy()images.append(i)for j in label:j = j.numpy()[0]# images[i]=imagelabels.append(j)def predict(images):images = np.transpose(np.array(images), (0,3,1,2))images = paddle.to_tensor(images)output = model(images)return output.detach().numpy()#此處要注意順序，model的data格式是[batch_size,channels,height,weight] # lime的data順序是[batch_size,height,weight,channels] lime_ex = LimeImageExplainer().explain_instance(image= np.transpose(np.array(images), (0,2,3,1))[0],classifier_fn=predict,labels=labels[0])lime_img, mask = lime_ex.get_image_and_mask(label=int(labels[0]))import matplotlib.pyplot as plt plt.imshow(lime_img) plt.show()

（2）SHAP

?SHAP的DeepExplainer解釋器，僅支持pytorch和tensorflow框架，需要用該框架定義model。鑒于下載數(shù)據(jù)集的麻煩，忽略。

參考：

李宏毅課程-機(jī)器學(xué)習(xí)進(jìn)階-作業(yè)1-卷積神經(jīng)網(wǎng)絡(luò)的可解釋性 - 飛槳AI Studio - 人工智能學(xué)習(xí)與實(shí)訓(xùn)社區(qū)

總結(jié)

以上是生活随笔為你收集整理的李宏毅的可解释模型——三个任务的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。