當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

python 聚类_使用python+sklearn实现聚类性能评估中随机分配对聚类度量值的影响

發布時間：2023/11/27 生活经验 36 豆豆

生活随笔收集整理的這篇文章主要介紹了 python 聚类_使用python+sklearn实现聚类性能评估中随机分配对聚类度量值的影响小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

注意：單擊此處https://urlify.cn/3iAzUr下載完整的示例代碼，或通過Binder在瀏覽器中運行此示例

下圖說明了聚類數量和樣本數量對各種聚類性能評估度量指標的影響。未調整的度量指標(例如V度量)顯示了聚類的數量與樣本數之間的依賴關系：隨機標記的平均V度量隨著聚類的數量越接近用于計算的樣本總數而顯著增加。針對ARI等偶然性度量指標進行調整后，對于任意數量的樣本和聚類，一些隨機方差(variations)均以0.0的平均得分為中心。因此，只有調整后的度量指標才能安全地用作共識指數(consensus index)，才能用來評估數據集中在各種重疊子樣本上給定k值時，聚類算法的平均穩定性。

輸出：

Computing adjusted_rand_score for 10 values of n_clusters and n_samples=100done in 0.050sComputing v_measure_score for 10 values of n_clusters and n_samples=100done in 0.068sComputing ami_score for 10 values of n_clusters and n_samples=100done in 0.356sComputing mutual_info_score for 10 values of n_clusters and n_samples=100done in 0.044sComputing adjusted_rand_score for 10 values of n_clusters and n_samples=1000done in 0.051sComputing v_measure_score for 10 values of n_clusters and n_samples=1000done in 0.064sComputing ami_score for 10 values of n_clusters and n_samples=1000done in 0.208sComputing mutual_info_score for 10 values of n_clusters and n_samples=1000done in 0.048s

print(__doc__)# 作者: Olivier Grisel # 許可證: BSD 3 clauseimport numpy as npimport matplotlib.pyplot as pltfrom time import timefrom sklearn import metricsdef uniform_labelings_scores(score_func, n_samples, n_clusters_range,                             fixed_n_classes=None, n_runs=5, seed=42):    """計算2個隨機均一聚類標簽的得分。???? 兩個隨機標簽中每個在n_clusters_range中的可能值都具有相同數量的聚類。???????? 當fixed_n_classes不為None時，第一個標簽被認為是具有固定類數量的真實類(ground truth class)。    """    random_labels = np.random.RandomState(seed).randint    scores = np.zeros((len(n_clusters_range), n_runs))    if fixed_n_classes is not None:        labels_a = random_labels(low=0, high=fixed_n_classes, size=n_samples)    for i, k in enumerate(n_clusters_range):        for j in range(n_runs):            if fixed_n_classes is None:                labels_a = random_labels(low=0, high=k, size=n_samples)            labels_b = random_labels(low=0, high=k, size=n_samples)            scores[i, j] = score_func(labels_a, labels_b)    return scoresdef ami_score(U, V):    return metrics.adjusted_mutual_info_score(U, V)score_funcs = [    metrics.adjusted_rand_score,    metrics.v_measure_score,    ami_score,    metrics.mutual_info_score,]# 2個獨立的隨機聚類，具有相同的聚類數n_samples = 100n_clusters_range = np.linspace(2, n_samples, 10).astype(np.int)plt.figure(1)plots = []names = []for score_func in score_funcs:    print("Computing %s for %d values of n_clusters and n_samples=%d"          % (score_func.__name__, len(n_clusters_range), n_samples))    t0 = time()    scores = uniform_labelings_scores(score_func, n_samples, n_clusters_range)    print("done in %0.3fs" % (time() - t0))    plots.append(plt.errorbar(        n_clusters_range, np.median(scores, axis=1), scores.std(axis=1))[0])    names.append(score_func.__name__)plt.title("Clustering measures for 2 random uniform labelings\n"          "with equal number of clusters")plt.xlabel('Number of clusters (Number of samples is fixed to %d)' % n_samples)plt.ylabel('Score value')plt.legend(plots, names)plt.ylim(bottom=-0.05, top=1.05)# 根據真實類標簽使用不同的n_clusters隨機標簽# 聚類數量固定n_samples = 1000n_clusters_range = np.linspace(2, 100, 10).astype(np.int)n_classes = 10plt.figure(2)plots = []names = []for score_func in score_funcs:    print("Computing %s for %d values of n_clusters and n_samples=%d"          % (score_func.__name__, len(n_clusters_range), n_samples))    t0 = time()    scores = uniform_labelings_scores(score_func, n_samples, n_clusters_range,                                      fixed_n_classes=n_classes)    print("done in %0.3fs" % (time() - t0))    plots.append(plt.errorbar(        n_clusters_range, scores.mean(axis=1), scores.std(axis=1))[0])    names.append(score_func.__name__)plt.title("Clustering measures for random uniform labeling\n"          "against reference assignment with %d classes" % n_classes)plt.xlabel('Number of clusters (Number of samples is fixed to %d)' % n_samples)plt.ylabel('Score value')plt.ylim(bottom=-0.05, top=1.05)plt.legend(plots, names)plt.show()

腳本的總運行時間：(0分鐘1.225秒)估計的內存使用量： 8 MB下載Python源代碼: plot_adjusted_for_chance_measures.py下載Jupyter notebook源代碼: plot_adjusted_for_chance_measures.ipynb由Sphinx-Gallery生成的畫廊?

文壹由“伴編輯器”提供技術支持

☆☆☆為方便大家查閱，小編已將scikit-learn學習路線專欄文章統一整理到公眾號底部菜單欄，同步更新中，關注公眾號，點擊左下方“系列文章”，如圖：

歡迎大家和我一起沿著scikit-learn文檔這條路線，一起鞏固機器學習算法基礎。(添加微信：mthler，備注：sklearn學習，一起進【sklearn機器學習進步群】開啟打怪升級的學習之旅。)

總結

以上是生活随笔為你收集整理的python 聚类_使用python+sklearn实现聚类性能评估中随机分配对聚类度量值的影响的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

生活经验

python 聚类_使用python+sklearn实现聚类性能评估中随机分配对聚类度量值的影响

總結