knn用于水果数据集分类
生活随笔
收集整理的這篇文章主要介紹了
knn用于水果数据集分类
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
數據集地址:https://download.csdn.net/download/fanzonghao/10940440?
knn算法流程:
若k取無窮大,那么測試數據就取決于每一類的占比,歸屬于占比最大的那一類。
首先觀察數據集,利用mass,height,width,color_score四列特征進行水果分類。
g=sns.pairplot(data=fruits_df,hue='fruit_name',vars=['mass','width','height','color_score'])
然后利用sns.pairplot查看兩兩特征之間的關系,可看出對角線是每一類的直方圖,mass和width幾乎呈線性關系。
再利用width,height,color_score,建立三維圖,看出綠色可以容易區分,對于更高維的數據可以采用pca降維然后進行查看。
knn代碼:
import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D import pandas as pd import seaborn as sns from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import accuracy_score import ml_visualization #利用k近鄰法分離 #分離訓練集與測試集 from sklearn.model_selection import train_test_split fruits_df=pd.read_table('fruit_data_with_colors.txt') print(fruits_df) print('樣本個數:',len(fruits_df)) #創建目標標簽和名稱的 字典 fruits_name_dict=dict(zip(fruits_df['fruit_label'],fruits_df['fruit_name'])) #print(fruits_df['fruit_label']) print(fruits_name_dict) #劃分數據集 X=fruits_df[['mass','width','height','color_score']] # print(X) y=fruits_df['fruit_label'] X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=1/4,random_state=0) print('X_train=\n',X_train) print('y_train=\n',y_train) print('數據集樣本數:{},訓練集樣本數:{},測試集樣本數:{}'.format(len(X),len(X_train),len(X_test))) # #可視化查看特征變量,對角線就是直方圖,其余是兩兩直接的關系 g=sns.pairplot(data=fruits_df,hue='fruit_name',vars=['mass','width','height','color_score']) plt.savefig('1.jpg') #三維查看 label_color_dict={1:'red',2:'green',3:'blue',4:'yellow'} colors=list(map(lambda label: label_color_dict[label],y_train)) print('colors=\n',colors) fig=plt.figure() ax=fig.add_subplot(111,projection='3d') ax.scatter(X_train['width'],X_train['height'],X_train['color_score'],c=colors,marker='o',s=100) ax.set_xlabel('width') ax.set_ylabel('height') ax.set_zlabel('color_score') plt.show() # #建立knn模型 acc_scores=[] for k in range(1,20):knn=KNeighborsClassifier(n_neighbors=k) #訓練模型knn.fit(X_train,y_train) #預測y_pred=knn.predict(X_test) # print('y_pred=',y_pred) # print('y_test=\n',y_test)acc=accuracy_score(y_test,y_pred)acc_scores.append(acc) plt.figure() plt.xlabel('k') plt.ylabel('accuarcy') plt.plot(acc_scores,marker='o') plt.show() print('準確率:',acc) ml_visualization.plot_fruit_knn(X_train,y_train,5)可視化代碼:ml_visualization.py
# -*- coding: utf-8 -*-import numpy as np import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap from sklearn import neighbors import graphviz from sklearn.tree import export_graphviz import matplotlib.patches as mpatchesdef plot_fruit_knn(X, y, n_neighbors):"""在“水果數據集”上對 height 和 width 二維數據進行kNN訓練并繪制出結果"""X_mat = X[['height', 'width']].as_matrix()y_mat = y.as_matrix()# Create color mapscmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF', '#AFAFAF'])cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF', '#AFAFAF'])clf = neighbors.KNeighborsClassifier(n_neighbors)clf.fit(X_mat, y_mat)# Plot the decision boundary by assigning a color in the color map# to each mesh point.mesh_step_size = .01 # step size in the meshplot_symbol_size = 50x_min, x_max = X_mat[:, 0].min() - 1, X_mat[:, 0].max() + 1y_min, y_max = X_mat[:, 1].min() - 1, X_mat[:, 1].max() + 1xx, yy = np.meshgrid(np.arange(x_min, x_max, mesh_step_size),np.arange(y_min, y_max, mesh_step_size))Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])# Put the result into a color plotZ = Z.reshape(xx.shape)plt.figure()plt.pcolormesh(xx, yy, Z, cmap=cmap_light)# Plot training pointsplt.scatter(X_mat[:, 0], X_mat[:, 1], s=plot_symbol_size, c=y, cmap=cmap_bold,edgecolor='black')plt.xlim(xx.min(), xx.max())plt.ylim(yy.min(), yy.max())patch0 = mpatches.Patch(color='#FF0000', label='apple')patch1 = mpatches.Patch(color='#00FF00', label='mandarin')patch2 = mpatches.Patch(color='#0000FF', label='orange')patch3 = mpatches.Patch(color='#AFAFAF', label='lemon')plt.legend(handles=[patch0, patch1, patch2, patch3])plt.xlabel('height (cm)')plt.ylabel('width (cm)')plt.show()結果:可看出k為5時acc最高。
而對于回歸的話,對于k==3,相鄰的三個值取平均,也可以利用距離加權。
?
創作挑戰賽新人創作獎勵來咯,堅持創作打卡瓜分現金大獎總結
以上是生活随笔為你收集整理的knn用于水果数据集分类的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 进入opencv内部函数调试
- 下一篇: MFC基于单文档制作吹彩色泡泡程序