【人工智能课程实验】 - 利用贝叶斯分类器实现手写数字 的识别
讀入數(shù)據(jù)與預(yù)處理
因?yàn)槔蠋熃o的文件無法直接讀取,故從官網(wǎng)導(dǎo)入數(shù)據(jù):
官網(wǎng)鏈接:http://www.cs.nyu.edu/~roweis/data.html?
導(dǎo)入數(shù)據(jù)之后要對(duì)MATLAB文件進(jìn)行讀入:
data=sio.loadmat(trainfile)對(duì)文件type一下:
type(data) Out[118]: dict?
將data中的train部分,完全復(fù)制到tr中:
for i in range(10) :trstr.append('train'+str(i)) for i in range():print(trstr[i])tr = dict.fromkeys(trstr) for i in range(10):tr[trstr[i]]=data[trstr[i]]將其中一個(gè)小圖像賦值給tmp,進(jìn)行如下練習(xí)操作:
輸出一下第一張“0”的圖像:
tmp = tr[trstr[0]][0] tmp = tmp.reshape(28,28) im = Image.fromarray(tmp) plt.imshow(im) plt.show()''' plt.figure("Image") # 圖像窗口名稱 plt.imshow(tmp) plt.axis('on') # 關(guān)掉坐標(biāo)軸為 off plt.title('image') # 圖像題目 plt.show() '''圖像輸出如下:?
?運(yùn)行如下代碼:
tmp = tmp.reshape(14,2*28) im = Image.fromarray(tmp) plt.imshow(im) plt.show()則輸出圖像如下:(可以思考一下原因,為什么會(huì)出現(xiàn)了兩個(gè)零,而不是被拉寬了的一個(gè)零)?
答:模擬一下輸出像素點(diǎn)的過程不難發(fā)現(xiàn),相當(dāng)于左右兩側(cè)的圖像幾乎是一樣的像素點(diǎn),所以輸出的圖像應(yīng)該是大致相同的。
?
進(jìn)行01二值化:
tmp = tr[trstr[0]][0].copy() tmp = tmp.reshape(28*28) for i in range(tmp.size) :if tmp[i] > 10 :tmp[i] = 1else :tmp[i] = 0tmp = tmp.reshape(28,28)im = Image.fromarray(np.uint8(tmp)) plt.imshow(im)(ps:注意輸出圖像的時(shí)候,傳入的參數(shù)需要是unsigned int類型的,不然有可能輸出的圖像是一種顏色的。)

定義一個(gè)數(shù)組out來進(jìn)行降維(將28*28的圖像降維到7*7)
out=np.zeros((7,7)) tmp=tmp.reshape(28,28)for i in range(7) :for j in range(7) :out[i][j] = np.sum(tmp[i*4:i*4+4,j*4:j*4+4]) print(out.size) print(out.shape) for i in range(7):for j in range(7):if(out[i][j] > 5) :out[i][j]=1else :out[i][j]=0 im = Image.fromarray(np.uint8(out)) plt.imshow(im) plt.show()輸出的圖像如下:?
對(duì)單幅圖像的操作練習(xí)到此結(jié)束了。接下來是對(duì)原訓(xùn)練集的二值化和降維。
將訓(xùn)練集字典dict進(jìn)行二值化和降維:
先建立字典:?
tr = dict.fromkeys(trstr) jwtr = dict.fromkeys(trstr) for i in range(10):#處理測(cè)試集tr[trstr[i]]=data[trstr[i]].copy()jwtr[trstr[i]] = np.zeros((data[trstr[i]].shape[0],7,7))?進(jìn)行二值化:
for i in range(10):# 枚舉所有數(shù)字print(i)for j in range(tr[trstr[i]].shape[0]): # 枚舉所有行for k in range(28*28):if(tr[trstr[i]][j][k] > 0):tr[trstr[i]][j][k] = 1?將字典的值(不是鍵值key哈,是指值value,這里的值是個(gè)目前是二維數(shù)組)更改為三維數(shù)組,即將784分為28*28
for i in range(10):# 枚舉所有數(shù)字tr[trstr[i]] = tr[trstr[i]].reshape(tr[trstr[i]].shape[0],28,28) # 錯(cuò)誤寫法! # for j in range(tr[trstr[i]].shape[0]): # 枚舉所有行 # tr[trstr[i]][j] = tr[trstr[i]][j].reshape(28,28)注意這里不能像注釋的這樣寫!因?yàn)樗钦麄€(gè)的數(shù)組,需要保持形狀一致,所以不能只更改第一維取某一個(gè)值的時(shí)候的第二維。也就是你可以對(duì)字典的某一個(gè)鍵,修改對(duì)應(yīng)的value,但是不能對(duì)數(shù)組的某一維的某一個(gè)值,去修改其他維度,要改就整個(gè)數(shù)組都改。
↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑至此,得到了訓(xùn)練集的所有數(shù)字的所有行的28*28的矩陣。↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
下面處理得到對(duì)應(yīng)的降維矩陣:
for i in range(10):# 枚舉所有數(shù)字for j in range(tr[trstr[i]].shape[0]):# 枚舉所有行(此時(shí)每一行是28*28的二值化矩陣)for k in range(7):for kk in range(7):jwtr[trstr[i]][j][k][kk] = np.sum(tr[trstr[i]][j][k*4:k*4+4,kk*4:kk*4+4])if(jwtr[trstr[i]][j][k][kk] > 5):jwtr[trstr[i]][j][k][kk]=1else:jwtr[trstr[i]][j][k][kk]=0jwtr[trstr[i]] = jwtr[trstr[i]].reshape((tr[trstr[i]].shape[0],49))?處理得到先驗(yàn)概率:
P = np.zeros(10,dtype = float) Nsum = 0 for i in range(10):Nsum += tr[trstr[i]].shape[0]for i in range(10):P[i] = tr[trstr[i]].shape[0]/Nsum PP = np.zeros((49,10),dtype = float) for i in range(49):for j in range(10):PP[i][j] = (sum(jwtr[trstr[j]][0:jwtr[trstr[j]].shape[0],i:i+1])+1)/(jwtr[trstr[j]].shape[0]+2)↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑至此,PP[i][j]代表第i個(gè)特征,組成的數(shù)字為j的概率?↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
也就是我們訓(xùn)練出了一個(gè)二維矩陣,接下來要用這個(gè)矩陣來預(yù)測(cè)驗(yàn)證集了。
首先處理一下驗(yàn)證集:
test_str = [] # 驗(yàn)證集 for i in range(10) :test_str.append('test'+str(i)) test_image = dict.fromkeys(test_str) jw_test_image = dict.fromkeys(test_str) for i in range(10):#處理驗(yàn)證集test_image[test_str[i]] = data[test_str[i]].copy()jw_test_image[test_str[i]] = np.zeros((data[test_str[i]].shape[0],7,7))接下來的01二值化和降維的過程和訓(xùn)練集一致:
for i in range(10):# 枚舉所有數(shù)字print(i)for j in range(test_image[test_str[i]].shape[0]): # 枚舉所有行for k in range(28*28):if(test_image[test_str[i]][j][k] > 0):test_image[test_str[i]][j][k] = 1 for i in range(10):# 枚舉所有數(shù)字test_image[test_str[i]] = test_image[test_str[i]].reshape(test_image[test_str[i]].shape[0],28,28)for i in range(10):# 枚舉所有數(shù)字for j in range(test_image[test_str[i]].shape[0]):# 枚舉所有行(此時(shí)每一行是28*28的二值化矩陣)for k in range(7):for kk in range(7):jw_test_image[test_str[i]][j][k][kk] = np.sum(test_image[test_str[i]][j][k*4:k*4+4,kk*4:kk*4+4])if(jw_test_image[test_str[i]][j][k][kk] > 5):jw_test_image[test_str[i]][j][k][kk]=1else:jw_test_image[test_str[i]][j][k][kk]=0jw_test_image[test_str[i]] = jw_test_image[test_str[i]].reshape((test_image[test_str[i]].shape[0],49))?↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑至此,我們將原始驗(yàn)證集處理成降維二值化矩陣jw_test_image字典 ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
備份一份給操作數(shù)組
opr_test_image = dict.fromkeys(test_str) for i in range(10):opr_test_image[test_str[i]] = jw_test_image[test_str[i]].copy()??↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑至此,我們接下來的操作矩陣就是opr_test_image字典 ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
利用貝葉斯公式求后驗(yàn)概率,并進(jìn)行預(yù)測(cè),概率最大的值對(duì)應(yīng)的數(shù)字就是通過PP矩陣預(yù)測(cè)得到的數(shù)字。
注意兩點(diǎn):
①根據(jù)貝葉斯公式,發(fā)現(xiàn)分母均為同樣的正值,由于我們這里只需要關(guān)注值之間的大小關(guān)系,所以不需要計(jì)算分母,只需要比較分子即可。
②因?yàn)橐玫氖莕umpy包,所以每一個(gè)元素都是float類型,所以不能直接相乘,精度不夠。但是由于我們這里只需要關(guān)注值之間的大小關(guān)系,所以可以取個(gè)log變成加法運(yùn)算,保證了大小關(guān)系。
Phou = np.ones(10,dtype =float) # 后驗(yàn)概率初始數(shù)組 ans = np.zeros(10,dtype =float) for dig in range(10):for col in range(opr_test_image[test_str[dig]].shape[0]):tmp = opr_test_image[test_str[dig]][col] #得到49個(gè)參數(shù)Phou = np.zeros(10,dtype =float) # 后驗(yàn)概率初始數(shù)組 for i in range(49):for j in range(10):if(tmp[i] != 0): # 若為1Phou[j] = Phou[j]+np.log(PP[i][j])else :Phou[j] = Phou[j]+np.log((1-PP[i][j]))for j in range(10): # 枚舉每一個(gè)數(shù)字Phou[j] = Phou[j] * P[j]if(dig == np.argmax(Phou)):ans[dig] = ans[dig]+1 # print(ans[dig])ans[dig] = ans[dig] / opr_test_image[test_str[dig]].shape[0]print("數(shù)字%d: "%(dig))print(ans[dig])輸出結(jié)果:
數(shù)字0: 0.8306122448979592 數(shù)字1: 0.9092511013215859 數(shù)字2: 0.685077519379845 數(shù)字3: 0.6514851485148515 數(shù)字4: 0.6924643584521385 數(shù)字5: 0.7365470852017937 數(shù)字6: 0.7494780793319415 數(shù)字7: 0.7334630350194552 數(shù)字8: 0.6098562628336756 數(shù)字9: 0.7205153617443013np.mean(ans) Out[205]: 0.7318750196697547?
完整代碼:
trainfile = "C:\\Users\\...\\mnist_all"import numpy as np import pandas as pd import scipy.io as sio from matplotlib import pyplot as plt from PIL import Image# df = pd.DataFrame(pd.read_csv(train_data,header=1)''' data=sio.loadmat(trainfile) trstr = [] jwtr = [] test_str = [] # 驗(yàn)證集 for i in range(10) :trstr.append('train'+str(i)) for i in range(10) :test_str.append('test'+str(i))tr = dict.fromkeys(trstr) jwtr = dict.fromkeys(trstr)test_image = dict.fromkeys(test_str) jw_test_image = dict.fromkeys(test_str) for i in range(10):#處理測(cè)試集tr[trstr[i]]=data[trstr[i]].copy()jwtr[trstr[i]] = np.zeros((data[trstr[i]].shape[0],7,7))#處理驗(yàn)證集test_image[test_str[i]] = data[test_str[i]].copy()jw_test_image[test_str[i]] = np.zeros((data[test_str[i]].shape[0],7,7))'''for i in range(10):# 枚舉所有數(shù)字print(i)for j in range(tr[trstr[i]].shape[0]): # 枚舉所有行for k in range(28*28):if(tr[trstr[i]][j][k] > 0):tr[trstr[i]][j][k] = 1for i in range(10):# 枚舉所有數(shù)字tr[trstr[i]] = tr[trstr[i]].reshape(tr[trstr[i]].shape[0],28,28) # 錯(cuò)誤寫法! # for j in range(tr[trstr[i]].shape[0]): # 枚舉所有行 # tr[trstr[i]][j] = tr[trstr[i]][j].reshape(28,28)''' ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑至此,得到了訓(xùn)練集的所有數(shù)字的所有行的28*28的矩陣。↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑ '''for i in range(10):# 枚舉所有數(shù)字for j in range(tr[trstr[i]].shape[0]):# 枚舉所有行(此時(shí)每一行是28*28的二值化矩陣)for k in range(7):for kk in range(7):jwtr[trstr[i]][j][k][kk] = np.sum(tr[trstr[i]][j][k*4:k*4+4,kk*4:kk*4+4])if(jwtr[trstr[i]][j][k][kk] > 5):jwtr[trstr[i]][j][k][kk]=1else:jwtr[trstr[i]][j][k][kk]=0jwtr[trstr[i]] = jwtr[trstr[i]].reshape((tr[trstr[i]].shape[0],49))P = np.zeros(10,dtype = float) Nsum = 0 for i in range(10):Nsum += tr[trstr[i]].shape[0]for i in range(10):P[i] = tr[trstr[i]].shape[0]/Nsum PP = np.zeros((49,10),dtype = float) for i in range(49):for j in range(10):PP[i][j] = (sum(jwtr[trstr[j]][0:jwtr[trstr[j]].shape[0],i:i+1])+1)/(jwtr[trstr[j]].shape[0]+2)''' 至此,PP[i][j]代表第i個(gè)特征,組成的數(shù)字為j的概率 '''# 處理驗(yàn)證集 for i in range(10):# 枚舉所有數(shù)字print(i)for j in range(test_image[test_str[i]].shape[0]): # 枚舉所有行for k in range(28*28):if(test_image[test_str[i]][j][k] > 0):test_image[test_str[i]][j][k] = 1 for i in range(10):# 枚舉所有數(shù)字test_image[test_str[i]] = test_image[test_str[i]].reshape(test_image[test_str[i]].shape[0],28,28)for i in range(10):# 枚舉所有數(shù)字for j in range(test_image[test_str[i]].shape[0]):# 枚舉所有行(此時(shí)每一行是28*28的二值化矩陣)for k in range(7):for kk in range(7):jw_test_image[test_str[i]][j][k][kk] = np.sum(test_image[test_str[i]][j][k*4:k*4+4,kk*4:kk*4+4])if(jw_test_image[test_str[i]][j][k][kk] > 5):jw_test_image[test_str[i]][j][k][kk]=1else:jw_test_image[test_str[i]][j][k][kk]=0jw_test_image[test_str[i]] = jw_test_image[test_str[i]].reshape((test_image[test_str[i]].shape[0],49))#得到二值化降維矩陣 jw_test_image''' 接下來,將降維矩陣賦值給操作數(shù)組 '''opr_test_image = dict.fromkeys(test_str) for i in range(10):opr_test_image[test_str[i]] = jw_test_image[test_str[i]].copy()#得到操作數(shù)組 opr_test_imagePhou = np.ones(10,dtype =float) # 后驗(yàn)概率初始數(shù)組 ans = np.zeros(10,dtype =float) for dig in range(10):for col in range(opr_test_image[test_str[dig]].shape[0]):tmp = opr_test_image[test_str[dig]][col] #得到49個(gè)參數(shù)Phou = np.zeros(10,dtype =float) # 后驗(yàn)概率初始數(shù)組 for i in range(49):for j in range(10):if(tmp[i] != 0): # 若為1Phou[j] = Phou[j]+np.log(PP[i][j])else :Phou[j] = Phou[j]+np.log((1-PP[i][j]))for j in range(10): # 枚舉每一個(gè)數(shù)字Phou[j] = Phou[j] * P[j]if(dig == np.argmax(Phou)):ans[dig] = ans[dig]+1 # print(ans[dig])ans[dig] = ans[dig] / opr_test_image[test_str[dig]].shape[0]print("數(shù)字%d: "%(dig))print(ans[dig])''' 下面是對(duì)測(cè)試集的一個(gè)圖像的處理樣例:'''''' tmp = tr[trstr[0]][0].copy() tmp = tmp.reshape(28,28) im = Image.fromarray(tmp) plt.imshow(im)plt.figure("Image") # 圖像窗口名稱 plt.imshow(tmp) plt.axis('on') # 關(guān)掉坐標(biāo)軸為 off plt.title('image') # 圖像題目 plt.show() ''' ''' tmp = tr[trstr[0]][0].copy() tmp = tmp.reshape(28*28) for i in range(tmp.size) :if tmp[i] > 10 :tmp[i] = 1else :tmp[i] = 0tmp = tmp.reshape(28,28)im = Image.fromarray(np.uint8(tmp)) plt.imshow(im) '''''' out=np.zeros((7,7)) tmp=tmp.reshape(28,28)for i in range(7) :for j in range(7) :out[i][j] = np.sum(tmp[i*4:i*4+4,j*4:j*4+4]) print(out.size) print(out.shape) for i in range(7):for j in range(7):if(out[i][j] > 5) :out[i][j]=1else :out[i][j]=0 im = Image.fromarray(np.uint8(out)) plt.imshow(im) plt.show() '''update:(20191201)
發(fā)現(xiàn)對(duì)于降維到7*7的矩陣,可以做到平均73%的準(zhǔn)確率。那么思考降維到14*14的矩陣,保留的特征會(huì)更多一些,那么準(zhǔn)確率會(huì)不會(huì)更高一些呢?于是繼續(xù)寫了下面的代碼(其實(shí)是在上面這個(gè)代碼上進(jìn)行了增加,并沒有改變?cè)瓉淼臇|西,也就是說下面這個(gè)代碼和上面這個(gè)代碼有很多重復(fù)的地方,在注釋中也有標(biāo)注)
實(shí)驗(yàn)結(jié)果:(為了防止放到下面看不到,這里就放到上面一起寫了)
平均預(yù)測(cè)率為75%左右,也就是說雖然參數(shù)變多了,但是預(yù)測(cè)率的提升并不明顯。
# -*- coding: utf-8 -*- """ Spyder EditorThis is a temporary script file. """trainfile = "D:\\mystudy\\大三上學(xué)期作業(yè)\\人工智能\\數(shù)字識(shí)別相關(guān)\\mnist_all" import numpy as np import pandas as pd import scipy.io as sio from matplotlib import pyplot as plt from PIL import Image from pylab import *# df = pd.DataFrame(pd.read_csv(train_data,header=1)''' data=sio.loadmat(trainfile) trstr = [] jwtr = [] test_str = [] # 驗(yàn)證集 for i in range(10) :trstr.append('train'+str(i)) for i in range(10) :test_str.append('test'+str(i))tr = dict.fromkeys(trstr) jwtr = dict.fromkeys(trstr)test_image = dict.fromkeys(test_str) jw_test_image = dict.fromkeys(test_str) for i in range(10):#處理測(cè)試集tr[trstr[i]]=data[trstr[i]].copy()jwtr[trstr[i]] = np.zeros((data[trstr[i]].shape[0],7,7))#處理驗(yàn)證集test_image[test_str[i]] = data[test_str[i]].copy()jw_test_image[test_str[i]] = np.zeros((data[test_str[i]].shape[0],7,7))'''for i in range(10):# 枚舉所有數(shù)字print(i)for j in range(tr[trstr[i]].shape[0]): # 枚舉所有行for k in range(28*28):if(tr[trstr[i]][j][k] > 0):tr[trstr[i]][j][k] = 1for i in range(10):# 枚舉所有數(shù)字tr[trstr[i]] = tr[trstr[i]].reshape(tr[trstr[i]].shape[0],28,28) # 錯(cuò)誤寫法! # for j in range(tr[trstr[i]].shape[0]): # 枚舉所有行 # tr[trstr[i]][j] = tr[trstr[i]][j].reshape(28,28)''' ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑至此,得到了訓(xùn)練集的所有數(shù)字的所有行的28*28的矩陣。↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑ '''for i in range(10):# 枚舉所有數(shù)字for j in range(tr[trstr[i]].shape[0]):# 枚舉所有行(此時(shí)每一行是28*28的二值化矩陣)for k in range(7):for kk in range(7):jwtr[trstr[i]][j][k][kk] = np.sum(tr[trstr[i]][j][k*4:k*4+4,kk*4:kk*4+4])if(jwtr[trstr[i]][j][k][kk] > 5):jwtr[trstr[i]][j][k][kk]=1else:jwtr[trstr[i]][j][k][kk]=0jwtr[trstr[i]] = jwtr[trstr[i]].reshape((tr[trstr[i]].shape[0],49))P = np.zeros(10,dtype = float) Nsum = 0 for i in range(10):Nsum += tr[trstr[i]].shape[0]for i in range(10):P[i] = tr[trstr[i]].shape[0]/Nsum PP = np.zeros((49,10),dtype = float) for i in range(49):for j in range(10):PP[i][j] = (sum(jwtr[trstr[j]][0:jwtr[trstr[j]].shape[0],i:i+1])+1)/(jwtr[trstr[j]].shape[0]+2)''' 至此,PP[i][j]代表第i個(gè)特征,組成的數(shù)字為j的概率 '''# 處理驗(yàn)證集 for i in range(10):# 枚舉所有數(shù)字print(i)for j in range(test_image[test_str[i]].shape[0]): # 枚舉所有行for k in range(28*28):if(test_image[test_str[i]][j][k] > 0):test_image[test_str[i]][j][k] = 1 for i in range(10):# 枚舉所有數(shù)字test_image[test_str[i]] = test_image[test_str[i]].reshape(test_image[test_str[i]].shape[0],28,28)for i in range(10):# 枚舉所有數(shù)字for j in range(test_image[test_str[i]].shape[0]):# 枚舉所有行(此時(shí)每一行是28*28的二值化矩陣)for k in range(7):for kk in range(7):jw_test_image[test_str[i]][j][k][kk] = np.sum(test_image[test_str[i]][j][k*4:k*4+4,kk*4:kk*4+4])if(jw_test_image[test_str[i]][j][k][kk] > 5):jw_test_image[test_str[i]][j][k][kk]=1else:jw_test_image[test_str[i]][j][k][kk]=0jw_test_image[test_str[i]] = jw_test_image[test_str[i]].reshape((test_image[test_str[i]].shape[0],49))#得到二值化降維矩陣 jw_test_image''' 接下來,將降維矩陣賦值給操作數(shù)組 '''opr_test_image = dict.fromkeys(test_str) for i in range(10):opr_test_image[test_str[i]] = jw_test_image[test_str[i]].copy()#得到操作數(shù)組 opr_test_imagePhou = np.ones(10,dtype =float) # 后驗(yàn)概率初始數(shù)組 ans = np.zeros(10,dtype =float) for dig in range(10):for col in range(opr_test_image[test_str[dig]].shape[0]):tmp = opr_test_image[test_str[dig]][col] #得到49個(gè)參數(shù)Phou = np.zeros(10,dtype =float) # 后驗(yàn)概率初始數(shù)組 for i in range(49):for j in range(10):if(tmp[i] != 0): # 若為1Phou[j] = Phou[j]+np.log(PP[i][j])else :Phou[j] = Phou[j]+np.log((1-PP[i][j]))for j in range(10): # 枚舉每一個(gè)數(shù)字Phou[j] = Phou[j] * P[j]if(dig == np.argmax(Phou)):ans[dig] = ans[dig]+1 # print(ans[dig])ans[dig] = ans[dig] / opr_test_image[test_str[dig]].shape[0]print("數(shù)字%d: "%(dig))print(ans[dig])print(np.mean(ans))''' ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓降維到14*14的答案↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓ ''' for i in range(10): # test_image[test_str[i]] = data[test_str[i]].copy()#處理測(cè)試集jwtr[trstr[i]] = np.zeros((data[trstr[i]].shape[0],14,14))#處理驗(yàn)證集jw_test_image[test_str[i]] = np.zeros((data[test_str[i]].shape[0],14,14))for i in range(10):# 枚舉所有數(shù)字print(i)for j in range(tr[trstr[i]].shape[0]):# 枚舉所有行(此時(shí)每一行是28*28的二值化矩陣)for k in range(14):for kk in range(14):jwtr[trstr[i]][j][k][kk] = np.sum(tr[trstr[i]][j][k*2:k*2+2,kk*2:kk*2+2])if(jwtr[trstr[i]][j][k][kk] > 2):jwtr[trstr[i]][j][k][kk]=1else:jwtr[trstr[i]][j][k][kk]=0jwtr[trstr[i]] = jwtr[trstr[i]].reshape((tr[trstr[i]].shape[0],14*14))P = np.zeros(10,dtype = float) Nsum = 0 for i in range(10):Nsum += tr[trstr[i]].shape[0]for i in range(10):P[i] = tr[trstr[i]].shape[0]/Nsum PP = np.zeros((14*14,10),dtype = float) for i in range(14*14):for j in range(10):PP[i][j] = (sum(jwtr[trstr[j]][0:jwtr[trstr[j]].shape[0],i:i+1])+1)/(jwtr[trstr[j]].shape[0]+2)''' 至此,PP[i][j]代表第i個(gè)特征,組成的數(shù)字為j的概率 '''# 處理驗(yàn)證集 for i in range(10):# 枚舉所有數(shù)字print(i)for j in range(test_image[test_str[i]].shape[0]): # 枚舉所有行for k in range(28*28):if(test_image[test_str[i]][j][k] > 0):test_image[test_str[i]][j][k] = 1 for i in range(10):# 枚舉所有數(shù)字test_image[test_str[i]] = test_image[test_str[i]].reshape(test_image[test_str[i]].shape[0],28,28)for i in range(10):# 枚舉所有數(shù)字for j in range(test_image[test_str[i]].shape[0]):# 枚舉所有行(此時(shí)每一行是28*28的二值化矩陣)for k in range(14):for kk in range(14):jw_test_image[test_str[i]][j][k][kk] = np.sum(test_image[test_str[i]][j][k*2:k*2+2,kk*2:kk*2+2])if(jw_test_image[test_str[i]][j][k][kk] > 2):#注意這里也要修改!!!jw_test_image[test_str[i]][j][k][kk]=1else:jw_test_image[test_str[i]][j][k][kk]=0 for i in range(10): jw_test_image[test_str[i]] = jw_test_image[test_str[i]].reshape((test_image[test_str[i]].shape[0],14*14))#得到二值化降維矩陣 jw_test_image''' 接下來,將降維矩陣賦值給操作數(shù)組 '''opr_test_image = dict.fromkeys(test_str) for i in range(10):opr_test_image[test_str[i]] = jw_test_image[test_str[i]].copy()#得到操作數(shù)組 opr_test_imagePhou = np.zeros(10,dtype =float) # 后驗(yàn)概率初始數(shù)組 ans = np.zeros(10,dtype =float) for dig in range(10):for col in range(opr_test_image[test_str[dig]].shape[0]):tmp = opr_test_image[test_str[dig]][col] #得到14*14個(gè)參數(shù)Phou = np.zeros(10,dtype =float) # 后驗(yàn)概率初始數(shù)組 for i in range(14*14):for j in range(10):if(tmp[i] != 0): # 若為1Phou[j] = Phou[j]+np.log(PP[i][j])else :Phou[j] = Phou[j]+np.log((1-PP[i][j]))for j in range(10): # 枚舉每一個(gè)數(shù)字Phou[j] = Phou[j] * P[j]if(dig == np.argmax(Phou)):ans[dig] = ans[dig]+1 # print(ans[dig])ans[dig] = ans[dig] / opr_test_image[test_str[dig]].shape[0]print("數(shù)字%d: "%(dig))print(ans[dig])print(np.mean(ans)) for dig in range(10):im = Image.fromarray(np.uint8(opr_test_image[test_str[dig]][0].reshape(14,14)))plt.imshow(im)plt.show()''' ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑降維到14*14的答案↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑ ''' ''' 輸出準(zhǔn)確率圖像: ''' def autolabel(rects):for rect in rects:height = rect.get_height()plt.text(rect.get_x()+rect.get_width()/2.- 0.2, 1.03*height, '%.2f' % (height))name_list = ['0', '1', '2', '3', '4', '5', '6', '7','8','9',] num_list = [0.8306122448979592,0.9092511013215859,0.685077519379845,0.6514851485148515,0.6924643584521385,0.7365470852017937,0.7494780793319415,0.7334630350194552,0.6098562628336756,0.7205153617443013] autolabel(plt.bar(range(len(num_list)), num_list, color='rgb', tick_label=name_list)) plt.show()''' 下面是對(duì)測(cè)試集的一個(gè)圖像的處理樣例:'''''' tmp = tr[trstr[0]][0].copy() tmp = tmp.reshape(28,28) im = Image.fromarray(tmp) plt.imshow(im)plt.figure("Image") # 圖像窗口名稱 plt.imshow(tmp) plt.axis('on') # 關(guān)掉坐標(biāo)軸為 off plt.title('image') # 圖像題目 plt.show() ''' ''' tmp = tr[trstr[0]][0].copy() tmp = tmp.reshape(28*28) for i in range(tmp.size) :if tmp[i] > 10 :tmp[i] = 1else :tmp[i] = 0tmp = tmp.reshape(28,28)im = Image.fromarray(np.uint8(tmp)) plt.imshow(im) '''''' out=np.zeros((7,7)) tmp=tmp.reshape(28,28)for i in range(7) :for j in range(7) :out[i][j] = np.sum(tmp[i*4:i*4+4,j*4:j*4+4]) print(out.size) print(out.shape) for i in range(7):for j in range(7):if(out[i][j] > 5) :out[i][j]=1else :out[i][j]=0 im = Image.fromarray(np.uint8(out)) plt.imshow(im) plt.show() '''''' dd = pd.date_range(end = '20191115',periods = 6,) print(dd) df1=pd.DataFrame({"id":[1001,1002,1003,1004,1005,1006,1007,1008], "gender":['male','female','male','female','male','female','male','female'], "pay":['Y','N','Y','Y','N','Y','N','Y',], "m-point":[10,12,20,40,40,40,30,20]}) print(df1) '''?
總結(jié)
以上是生活随笔為你收集整理的【人工智能课程实验】 - 利用贝叶斯分类器实现手写数字 的识别的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 中国移动新一代超级SIM卡芯片来了:2M
- 下一篇: 美国GDP增速再遭下调,今年美国的GDP