當前位置：首頁 > 编程语言 > python >内容正文

python

【机器学习算法-python实现】K-means无监督学习实现分类

發布時間：2025/4/5 python 15 豆豆

生活随笔收集整理的這篇文章主要介紹了【机器学习算法-python实现】K-means无监督学习实现分类小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1.背景

? ? ? ? 無監督學習的定義就不多說了，不懂得可以google。因為項目需要，需要進行無監督的分類學習。 ? ? ? ? K-means里面的K指的是將數據分成的份數，基本上用的就是算距離的方法。 ? ? ? ? 大致的思路就是給定一個矩陣，假設K的值是2，也就是分成兩個部分，那么我們首先確定兩個質心。一開始是找矩陣每一列的最大值max，最小值min，算出range=max-min，然后設質心就是min+range*random。之后在逐漸遞歸跟進，其實要想明白還是要跟一遍代碼，自己每一步都輸出一下看看跟自己想象的是否一樣。（順便吐槽一下，網上好多人在寫文章的事后拿了書上的代碼就粘貼上，也不管能不能用，博主改了一下午才改好。。。，各種bug）

2.代碼 ? ??

''' @author: hakuri ''' from numpy import * import matplotlib.pyplot as plt def loadDataSet(fileName): #general function to parse tab -delimited floatsdataMat = [] #assume last column is target valuefr = open(fileName)for line in fr.readlines():curLine = line.strip().split('\t')fltLine = map(float,curLine) #map all elements to float()dataMat.append(fltLine)return dataMatdef distEclud(vecA, vecB):return sqrt(sum(power(vecA - vecB, 2))) #la.norm(vecA-vecB)def randCent(dataSet, k):n = shape(dataSet)[1]centroids = mat(zeros((k,n)))#create centroid matfor j in range(n):#create random cluster centers, within bounds of each dimensionminJ = min(array(dataSet)[:,j])rangeJ = float(max(array(dataSet)[:,j]) - minJ)centroids[:,j] = mat(minJ + rangeJ * random.rand(k,1))return centroidsdef kMeans(dataSet, k, distMeas=distEclud, createCent=randCent):m = shape(dataSet)[0]clusterAssment = mat(zeros((m,2)))#create mat to assign data points #to a centroid, also holds SE of each pointcentroids = createCent(dataSet, k)clusterChanged = Truewhile clusterChanged:clusterChanged = Falsefor i in range(m):#for each data point assign it to the closest centroidminDist = inf; minIndex = -1for j in range(k):distJI = distMeas(array(centroids)[j,:],array(dataSet)[i,:])if distJI < minDist:minDist = distJI; minIndex = jif clusterAssment[i,0] != minIndex: clusterChanged = TrueclusterAssment[i,:] = minIndex,minDist**2print centroids # print nonzero(array(clusterAssment)[:,0]for cent in range(k):#recalculate centroidsptsInClust = dataSet[nonzero(array(clusterAssment)[:,0]==cent)[0][0]]#get all the point in this clustercentroids[cent,:] = mean(ptsInClust, axis=0) #assign centroid to mean id=nonzero(array(clusterAssment)[:,0]==cent)[0] return centroids, clusterAssment,iddef plotBestFit(dataSet,id,centroids): dataArr = array(dataSet)cent=array(centroids)n = shape(dataArr)[0] n1=shape(cent)[0]xcord1 = []; ycord1 = []xcord2 = []; ycord2 = []xcord3=[];ycord3=[]j=0for i in range(n):if j in id:xcord1.append(dataArr[i,0]); ycord1.append(dataArr[i,1])else:xcord2.append(dataArr[i,0]); ycord2.append(dataArr[i,1])j=j+1 for k in range(n1):xcord3.append(cent[k,0]);ycord3.append(cent[k,1]) fig = plt.figure()ax = fig.add_subplot(111)ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')ax.scatter(xcord2, ycord2, s=30, c='green')ax.scatter(xcord3, ycord3, s=50, c='black')plt.xlabel('X1'); plt.ylabel('X2');plt.show() if __name__=='__main__':dataSet=loadDataSet('/Users/hakuri/Desktop/testSet.txt') # # print randCent(dataSet,2) # print dataSet # # print kMeans(dataSet,2)a=[]b=[]a, b,id=kMeans(dataSet,2)plotBestFit(dataSet,id,a)
用的時候直接看最后的main，dataSet是數據集輸入，我會在下載地址提供給大家。 kmeans函數第一個參數是輸入矩陣、第二個是K的值，也就是分幾份。 plotBestFit是畫圖函數，需要加plot庫，而且目前只支持二維且K=2的情況。

3.效果圖

? ? ? 里面黑色的大點是兩個質心，怎么樣，效果還可以吧！測試的時候一定要多用一點數據才會明顯。

4.下載地址

? ? ?我的github地址https://github.com/jimenbian，喜歡就點個starO(∩_∩)O哈！ ? ? ?點我下載

/********************************

* 本文來自博客 ?“李博Garvin“

* 轉載請標明出處:http://blog.csdn.net/buptgshengod

******************************************/

總結

以上是生活随笔為你收集整理的【机器学习算法-python实现】K-means无监督学习实现分类的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：【机器学习算法-python实现】矩阵去
下一篇：【机器学习算法-python实现】采样算