當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

ucinet计算聚类系数大于1怎么办_聚类性能评估-ARI（调兰德指数）

發布時間：2024/7/19 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 ucinet计算聚类系数大于1怎么办_聚类性能评估-ARI（调兰德指数）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

注意：ARI取值范圍為[-1,1]，值越大越好，反映兩種劃分的重疊程度，使用該度量指標需要數據本身有類別標記。

用C表示實際的類別劃分，K表示聚類結果。定義a 為在C中被劃分為同一類，在K中被劃分為同一簇的實例對數量。定義b為在C中被劃分為不同類別，在K中被劃分為不同簇的實例對數量。定義Rand Index（蘭德系數）：

Rand Index無法保證隨機劃分的聚類結果的RI值接近0。于是，提出了Adjusted Rand index（調節的蘭德系數）：

為了計算ARI的值，引入contingency table（列聯表），反映實例類別劃分與聚類劃分的重疊程度，表的行表示實際劃分的類別，表的列表示聚類劃分的簇標記，nij表示重疊實例數量，如下所示：

有了列聯表，即可用它計算ARI：

這里，顯然把max(RI)替換成了mean(RI)。

還是看個例子吧，

例：設實際類別劃分為labels_true = [0, 0, 0, 1, 1, 1]，聚類劃分為labels_pred = [0, 0, 1, 1, 2, 2]，求ARI值。

畫劃分圖：

畫列聯表：

看看sklearn中如何計算吧，https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/cluster/_supervised.py 文件中的adjusted_rand_score方法。

labels_true, labels_pred = check_clusterings(labels_true, labels_pred) n_samples = labels_true.shape[0] n_classes = np.unique(labels_true).shape[0] n_clusters = np.unique(labels_pred).shape[0]# Special limit cases: no clustering since the data is not split; # or trivial clustering where each document is assigned a unique cluster. # These are perfect matches hence return 1.0. if (n_classes == n_clusters == 1 orn_classes == n_clusters == 0 orn_classes == n_clusters == n_samples): return 1.0# Compute the ARI using the contingency data contingency = contingency_matrix(labels_true, labels_pred, sparse=True) sum_comb_c = sum(_comb2(n_c) for n_c in np.ravel(contingency.sum(axis=1))) sum_comb_k = sum(_comb2(n_k) for n_k in np.ravel(contingency.sum(axis=0))) sum_comb = sum(_comb2(n_ij) for n_ij in contingency.data)prod_comb = (sum_comb_c * sum_comb_k) / _comb2(n_samples) mean_comb = (sum_comb_k + sum_comb_c) / 2. return (sum_comb - prod_comb) / (mean_comb - prod_comb)

運行一下看看結果吧：

# coding:utf-8 """ 測試ARI聚類評測指標 """from sklearn import metricslabels_true = [0, 0, 0, 1, 1, 1] labels_pred = [0, 0, 1, 1, 2, 2]print(metrics.adjusted_rand_score(labels_true, labels_pred))0.24242424242424246

總結

以上是生活随笔為你收集整理的ucinet计算聚类系数大于1怎么办_聚类性能评估-ARI（调兰德指数）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： gentoo linux 分区_小白安装
下一篇： cat命令详解_需要！Linux常用监视

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

ucinet计算聚类系数大于1怎么办_聚类性能评估-ARI（调兰德指数）

總結