python lime_本地可解释模型不可知的解释– LIME in Python
python lime
When working with classification and/or regression techniques, its always good to have the ability to ‘explain’ what your model is doing. Using Local Interpretable Model-agnostic Explanations (LIME), you now have the ability to quickly provide visual explanations of your model(s).
當(dāng)使用分類和/或回歸技術(shù)時,總是能夠“解釋”模型的作用總是很不錯的。 使用本地可解釋模型不可知的解釋(LIME),您現(xiàn)在可以快速提供模型的視覺解釋。
Its quite easy to throw numbers or content into an algorithm and get a result that looks good. We can test for accuracy and feel confident that the classifier and/or model is ‘good’…but can we describe what the model is actually doing to other users? A good data scientist spends some of their time making sure they have reasonable explanations for what the model is doing and why the results are what they are.
將數(shù)字或內(nèi)容放入算法中并獲得看起來不錯的結(jié)果非常容易。 我們可以測試準(zhǔn)確性,并對分類器和/或模型“良好”充滿信心……但是我們可以描述該模型對其他用戶的實際作用嗎? 一位優(yōu)秀的數(shù)據(jù)科學(xué)家會花費一些時間來確保他們對模型的工作方式以及結(jié)果為何才是合理的做出合理的解釋。
There’s always been a focus on ‘trust’ in any type of modeling methodology but with machine learning and deep learning, many people feel like the black-box approach taken with these methods isn’t as trustworthy as other methods.? This topic was addressed in a paper titled?Why Should I Trust You?”: Explaining the Predictions of Any Classifier, which proposes the concept of Local Interpretable Model-agnostic Explanations (LIME). According to the paper, LIME is ‘a(chǎn)n algorithm that can explain the predictions of any classifier or regressor in a faithful way, by approximating it locally with an interpretable model.’
在任何類型的建模方法中,始終都將“信任”作為重點,但是在機器學(xué)習(xí)和深度學(xué)習(xí)中,許多人感覺這些方法所采用的黑盒方法并不像其他方法那樣值得信賴。 在題為“ 為什么我應(yīng)該信任您?”的論文中解決了該主題。該論文解釋了任何分類器的預(yù)測 ,提出了局部可解釋模型不可知性解釋(LIME)的概念。 根據(jù)該論文, LIME是 “一種算法,通過使用可解釋的模型在局部逼近分類器或回歸變量,可以忠實地解釋任何分類器或回歸變量的預(yù)測。”
I’ve used the LIME approach a few times in recent projects and really like the idea. It breaks down the modeling / classification techniques and output into a form that can be easily described to non-technical people.? That said, LIME isn’t a replacement for doing your job as a data scientist, but it is another tool to add to your toolbox.
在最近的項目中,我已經(jīng)多次使用LIME方法,并且非常喜歡這個想法。 它分解了建模/分類技術(shù),并將其輸出為非技術(shù)人員可以輕松描述的形式。 就是說,LIME不能替代您作為數(shù)據(jù)科學(xué)家的工作,但是它是添加到工具箱中的另一種工具。
To implement LIME in python, I use this LIME library?written / released by one of the authors the above paper.
為了在python中實現(xiàn)LIME,我使用了由上述作者之一編寫/發(fā)布的LIME庫 。
I thought it might be good to provide a quick run-through of how to use this library. For this post, I’m going to mimic “Using lime for regression” notebook the authors provide, but I’m going to provide a little more explanation.
我認(rèn)為最好快速介紹一下如何使用此庫。 對于這篇文章,我將模仿作者提供的“ 使用石灰進行回歸 ”筆記本,但我將提供更多解釋。
The full notebook is available in my repo here.
完整的筆記本可以在我的倉庫中找到 。
不可解釋的本地不可解釋模型(LIME)入門 (Getting started with?Local Interpretable Model-agnostic Explanations (LIME))
Before you get started, you’ll need to install Lime.
在開始之前,您需要安裝Lime。
pip install limeNext, let’s import our required libraries.
接下來,讓我們導(dǎo)入所需的庫。
from sklearn.datasets import load_boston import sklearn.ensemble import numpy as np from sklearn.model_selection import train_test_split import lime import lime.lime_tabularLet’s load the sklearn dataset called ‘boston’. This data is a dataset that contains house prices that is often used for machine learning regression examples.
讓我們加載名為“波士頓”的sklearn數(shù)據(jù)集。 該數(shù)據(jù)是包含房價的數(shù)據(jù)集,通常用于機器學(xué)習(xí)回歸示例。
boston = load_boston()Before we do much else, let’s take a look at the description of the dataset to get familiar with it.? You can do this by running the following command:
在做很多其他事情之前,讓我們看一下數(shù)據(jù)集的描述以熟悉它。 您可以通過運行以下命令來執(zhí)行此操作:
print boston['DESCR']The output is:
輸出為:
Boston House Prices dataset ===========================Notes ------ Data Set Characteristics: :Number of Instances: 506 :Number of Attributes: 13 numeric/categorical predictive:Median Value (attribute 14) is usually the target:Attribute Information (in order):- CRIM per capita crime rate by town- ZN proportion of residential land zoned for lots over 25,000 sq.ft.- INDUS proportion of non-retail business acres per town- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)- NOX nitric oxides concentration (parts per 10 million)- RM average number of rooms per dwelling- AGE proportion of owner-occupied units built prior to 1940- DIS weighted distances to five Boston employment centres- RAD index of accessibility to radial highways- TAX full-value property-tax rate per $10,000- PTRATIO pupil-teacher ratio by town- B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town- LSTAT % lower status of the population- MEDV Median value of owner-occupied homes in $1000's:Missing Attribute Values: None:Creator: Harrison, D. and Rubinfeld, D.L.This is a copy of UCI ML housing dataset. http://archive.ics.uci.edu/ml/datasets/HousingThis dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, 'Regression diagnostics ...', Wiley, 1980. N.B. Various transformations are used in the table on pages 244-261 of the latter.The Boston house-price data has been used in many machine learning papers that address regression problems. **References**- Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.- Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth InternationalConference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.- many more! (see http://archive.ics.uci.edu/ml/datasets/Housing)Now that we have our data loaded, we want to build a regression model to forecast boston housing prices. We’ll use random forest for this to follow the example by the authors.
現(xiàn)在,我們已經(jīng)加載了數(shù)據(jù),我們想建立一個回歸模型來預(yù)測波士頓的房價。 我們將使用隨機森林來遵循作者的示例。
First, we’ll set up the RF Model and then create our training and test data using the?train_test_split?module from sklearn. Then, we’ll fit the data.
首先,我們將建立RF模型,然后使用sklearn的train_test_split模塊創(chuàng)建訓(xùn)練和測試數(shù)據(jù)。 然后,我們將擬合數(shù)據(jù)。
rf = sklearn.ensemble.RandomForestRegressor(n_estimators=1000) train, test, labels_train, labels_test = train_test_split(boston.data, boston.target, train_size=0.80) rf.fit(train, labels_train)Now that we have a Random Forest Regressor trained, we can check some of the accuracy measures.
現(xiàn)在我們已經(jīng)訓(xùn)練了隨機森林回歸器,我們可以檢查一些準(zhǔn)確性度量。
print('Random Forest MSError', np.mean((rf.predict(test) - labels_test) ** 2))Tbe MSError is:?10.45. Now, let’s look at the MSError when predicting the mean.
Tbe MSError是:10.45。 現(xiàn)在,讓我們看看預(yù)測均值時的MSError。
print('MSError when predicting the mean', np.mean((labels_train.mean() - labels_test) ** 2))From this, we get 80.09.
由此得出80.09。
Without really knowing the dataset, its hard to say whether they are good or bad.? Since we are really most interested in looking at the LIME approach, we’ll move along and assume these are decent errors.
在不真正了解數(shù)據(jù)集的情況下,很難說出它們是好是壞。 由于我們真的對研究LIME方法最感興趣,因此我們將繼續(xù)研究并假設(shè)這些都是不錯的錯誤。
To implement LIME, we need to get the categorical features from our data and then build an ‘explainer’. This is done with the following commands:
為了實現(xiàn)LIME,我們需要從數(shù)據(jù)中獲取分類特征,然后構(gòu)建一個“解釋器”。 這可以通過以下命令完成:
categorical_features = np.argwhere(np.array([len(set(boston.data[:,x]))for x in range(boston.data.shape[1])]) <= 10).flatten()and the explainer:
和解釋器:
explainer = lime.lime_tabular.LimeTabularExplainer(train, feature_names=boston.feature_names, class_names=['price'], categorical_features=categorical_features, verbose=True, mode='regression')Now, we can grab one of our test values and check out our prediction(s). Here, we’ll grab the 100th test value and check the prediction and see what the explainer has to say about it.
現(xiàn)在,我們可以獲取我們的測試值之一,并查看我們的預(yù)測。 在這里,我們將獲取第100個測試值并檢查預(yù)測,然后看看解釋者對此有何評論。
i = 100exp = explainer.explain_instance(test[i], rf.predict, num_features=5) exp.show_in_notebook(show_table=True) LIME Explainer for regression LIME回歸解釋器So…what does this tell us?
那么……這告訴我們什么?
It tells us that the 100th test value’s prediction is 21.16 with the “RAD=24” value providing the most positive valuation and the other features providing negative valuation in the prediction.
它告訴我們,第100個測試值的預(yù)測為21.16,其中“ RAD = 24”值提供了最大的肯定估值,而其他功能則提供了負(fù)的估值。
For regression, this isn’t quite as interesting (although it is useful). The LIME approach shows much more benefit (at least to me) when performing classification.
對于回歸,這不是那么有趣(盡管很有用)。 LIME方法在執(zhí)行分類時顯示出更多的好處(至少對我而言)。
As an example, if you are trying to classify plans as edible or poisonous, LIME’s explanation is much more useful. Here’s an example from the authors.
例如,如果您試圖將計劃分類為可食用或有毒,則LIME的解釋會更加有用。 這是作者的一個例子。
LIME explanation of edible vs poisonous LIME解釋食用與有毒Take a look at LIME when you have some time. Its a good library to add to your toolkit, especially if you are doing a lot of classification work. It makes it much easier to ‘explain’ what the model is doing.
有空的時候看看LIME。 它是一個很好的庫,可以添加到您的工具箱中,尤其是在您進行大量分類工作時。 它使“解釋”模型的工作變得更加容易。
Eric Brown埃里克·布朗Eric D. Brown , D.Sc. has a doctorate in Information Systems with a specialization in Data Sciences, Decision Support and Knowledge Management. He writes about utilizing python for data analytics at pythondata.com and the crossroads of technology and strategy at ericbrown.com
埃里克·布朗(Eric D.Brown) 擁有信息系統(tǒng)博士學(xué)位,專門研究數(shù)據(jù)科學(xué),決策支持和知識管理。 他寫了關(guān)于利用數(shù)據(jù)分析Python在pythondata.com技術(shù)和戰(zhàn)略的十字路口在ericbrown.com
http://pythondata.wpengine.com/http://pythondata.wpengine.com/翻譯自: https://www.pybloggers.com/2018/01/local-interpretable-model-agnostic-explanations-lime-in-python/
python lime
總結(jié)
以上是生活随笔為你收集整理的python lime_本地可解释模型不可知的解释– LIME in Python的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 印象团队EverTeam for Mac
- 下一篇: BEC