當前位置：首頁 > 编程语言 > python >内容正文

python

python-DBSCAN密度聚类

發布時間：2024/7/19 python 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 python-DBSCAN密度聚类小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1.DBSCAN 算法是一種基于密度的聚類算法：

聚類的時候不需要預先指定簇的個數。
最終的簇的個數不定。

2.DBSCAN 算法將數據點分為三類：

核心點：在半徑Eps內含有超過MinPts數目的點
邊界點：在半徑Eps內點的數量小于MinPts，但是落在核心點在鄰域內
噪音點：既不是核心點也不是邊界的點

? ? ? ? ? ? ? ? ? ??

3.算法流程

將所有點標記為核心點，邊界點或噪聲點；
刪除噪聲點；
為距離在Eps內的所有點之間賦予一條邊；
每組連通的核心點點形成一個簇；
將每個邊界點指派到一個與之關聯的核心點簇中；

學生在線上網時間分析時間：

'''DBSCAN主要參數：eps:兩個樣本被看做鄰居節點的最大距離min_samles:簇的樣本數metric:距離計算方式 ''' import numpy as np from sklearn.cluster import DBSCAN import sklearn.cluster as skc from sklearn import metrics import matplotlib.pyplot as pltmac2id=dict() onlinetimes=[] f=open("E:\\python\online.txt") for line in f:#讀取每條數據中mac地址，開始上網時間，上網時長print(line)#每行信息mac=line.split(',')[2]print(mac)onlinetime=int(line.split(",")[6])print("在線時間：",onlinetime)starttime=int(line.split(',')[4].split(' ')[1].split(':')[0])print("上線時間:",starttime)#mac2id是一個字典，key是mac地址，value是對應mac地址的上網時長以及開始上網時間if mac not in mac2id:mac2id[mac]=len(onlinetimes)onlinetimes.append((starttime,onlinetime))else:onlinetimes[mac2id[mac]]=[(starttime,onlinetime)] real_x=np.array(onlinetimes).reshape((-1,2))#調用DBSCAN方法進行訓練，labels為每個簇的標簽 x=real_x[:,0:1] db=skc.DBSCAN(eps=0.01,min_samples=20).fit(x) labels=db.labels_#打印數據被標記的標簽，計算標簽為-1，即噪聲數據的比例 print("Labels:") print(labels) raito=len(labels[labels[:]==-1])/len(labels) print("Noise raito:",format(raito,".2%"))#計算簇的個數并打印，評價聚類效果 n_cluster_ = len(set(labels))-(1 if -1 in labels else 0) print("Estimated number of cluster: %d"%n_cluster_) print("Silhouette Coefficient : %0.3f"%metrics.silhouette_score(x,labels))#輸出各簇標號以及各簇內數據 for i in range(n_cluster_):print("cluster ",i,":")print(list(x[labels==i].flatten()))#直方圖計算顯示 plt.hist(x,24) plt.show()x=np.log(1+real_x[:,1:]) db=skc.DBSCAN(eps=0.14,min_samples=10).fit(x) labels=db.labels_print("Labels:") print(labels) raito=len(labels[labels[:]==-1])/len(labels) print("Noise raito:",format(raito,".2%"))n_cluster_=len(set(labels))-(1 if -1 in labels else 0)print("Estimated number of cluster: %d"%n_cluster_) print("Silhouette Coefficient : %0.3f"%metrics.silhouette_score(x,labels))# 統計每個簇內的樣本個數，均值，標準差 for i in range(n_cluster_):print("cluster",i,':')count=len(x[labels==i])mean=np.mean(real_x[labels==i][:,1])std=np.std(real_x[labels==i][:,1])print("\t number of sample:",count)print("\t mean of sample:",format(mean,'.1f'))print("\t mean of sample:",format(std,'.1f'))plt.hist(x,24) plt.show()

數據文件分割出的一天記錄：?

可視化結果：

下面為處理過的結果：

總結

以上是生活随笔為你收集整理的python-DBSCAN密度聚类的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

python

python-DBSCAN密度聚类

總結