當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

非线性降维方法 Isomap Embedding

發布時間：2023/12/8 编程问答 39 豆豆

生活随笔收集整理的這篇文章主要介紹了非线性降维方法 Isomap Embedding 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Isomap Embedding 等距特征映射是一種新穎,高效的非線性降維技術,它的一個突出優點是只有兩個參數需要設定,即鄰域參數和嵌入維數.

在文章中，我們討論一下問題：

Isomap 屬于哪一類機器學習技術？

Isomap 是如何工作的？我通過一個直觀的例子而不是復雜的數學來解釋。

如何使用 Isomap 減少數據的維度？

機器學習算法系列中的 Isomap

機器學習算法太多了，可能永遠不可能將它們全部收集和分類。然而，我已經嘗試為一些最常用的做這件事，你可以在下面的旭日圖中找到這些👇。

圖太大了可能的不太清楚，這幾強調一下，Isomap 是一種旨在降維的無監督機器學習技術。

它與同一類別中的其他一些技術不同，它使用非線性降維方法而不是 PCA 等算法使用的線性映射。我們將在下一節中看到線性方法與非線性方法有何不同。

等距映射 (Isomap) 如何工作？

Isomap 是一種結合了幾種不同算法的技術，使其能夠使用非線性方式來減少維度，同時保留局部結構。

在我們查看 Isomap 的示例并將其與主成分分析 (PCA) 的線性方法進行比較之前，讓我們列出 Isomap 執行步驟：

使用 KNN 方法找到每個數據點的 k 個最近鄰。此處，“k”是可以在模型超參數中指定的鄰居數量。

找到鄰居后，如果點是彼此的鄰居，則構建鄰域圖，其中點相互連接。不是鄰居的數據點保持未連接狀態。

計算每對數據點（節點）之間的最短路徑。通常，用于此任務的是 Floyd-Warshall 或 Dijkstra 算法。請注意，此步驟通常也被描述為找到點之間的測地線距離。

使用多維縮放 (MDS) 計算低維嵌入。給定每對點之間的距離，MDS 將每個對象放入 N 維空間（N 被指定為超參數），以便盡可能保留點之間的距離。

對于我們的示例，讓我們創建一個稱為瑞士卷的 3D 對象。該對象由 2,000 個單獨的數據點組成。

接下來，我們要使用 Isomap 將這個 3 維瑞士卷映射到 2 維。要跟蹤此轉換過程中發生的情況，讓我們選擇兩個點：A 和 B。

我們可以看到這兩個點在 3D 空間內彼此相對靠近。如果我們使用諸如 PCA 之類的線性降維方法，那么這兩個點之間的歐幾里得距離在較低維度上會保持一些相似。請參閱下面的 PCA 轉換圖表：

請注意，PCA 中 2D 對象的形狀看起來像是從特定角度拍攝的同一 3D 對象的圖片。這是線性變換的一個特點。

同時，諸如 Isomap 之類的非線性方法給了我們非常不同的結果。我們可以將這種轉換描述為展開瑞士卷并將其平放在 2D 表面上：

我們可以看到，二維空間中點 A 和 B 之間的距離基于通過鄰域連接計算的測地線距離。

這就是 Isomap 能夠執行非線性降維的秘訣，它專注于保留局部結構而較少關注全局結構。

如何使用 Isomap ？

現在讓我們使用 Isomap 來降低 MNIST 數據集（手寫數字集合）中圖片的高維數。這將使我們能夠看到不同的數字如何在 3D 空間中聚集在一起。

設置
我們將使用以下數據和庫：

Scikit-learn
Plotly 和 Matplotlib
Pandas

讓我們導入庫。

import pandas as pd # for data manipulation# Visualization import plotly.express as px # for data visualization import matplotlib.pyplot as plt # for showing handwritten digits# Skleran from sklearn.datasets import load_digits # for MNIST data from sklearn.manifold import Isomap # for Isomap dimensionality reduction

接下來，我們加載 MNIST 數據。

# Load digits data digits = load_digits()# Load arrays containing digit data (64 pixels per image) and their true labels X, y = load_digits(return_X_y=True)# Some stats print('Shape of digit images: ', digits.images.shape) print('Shape of X (training data): ', X.shape) print('Shape of y (true labels): ', y.shape)

讓我們顯示前 10 個手寫數字，以便更好地了解我們正在處理的內容。

# Display images of the first 10 digits fig, axs = plt.subplots(2, 5, sharey=False, tight_layout=True, figsize=(12,6), facecolor='white') n=0 plt.gray() for i in range(0,2):for j in range(0,5):axs[i,j].matshow(digits.images[n])axs[i,j].set(title=y[n])n=n+1 plt.show()

我們現在將應用 Isomap 將 X 數組中每條記錄的維數從 64 減少到 3。

### Step 1 - Configure the Isomap function, note we use default hyperparameter values in this example embed3 = Isomap(n_neighbors=5, # default=5, algorithm finds local structures based on the nearest neighborsn_components=3, # number of dimensionseigen_solver='auto', # {‘auto’, ‘arpack’, ‘dense’}, default=’auto’tol=0, # default=0, Convergence tolerance passed to arpack or lobpcg. not used if eigen_solver == ‘dense’.max_iter=None, # default=None, Maximum number of iterations for the arpack solver. not used if eigen_solver == ‘dense’.path_method='auto', # {‘auto’, ‘FW’, ‘D’}, default=’auto’, Method to use in finding shortest path.neighbors_algorithm='auto', # neighbors_algorithm{‘auto’, ‘brute’, ‘kd_tree’, ‘ball_tree’}, default=’auto’n_jobs=-1, # n_jobsint or None, default=None, The number of parallel jobs to run. -1 means using all processorsmetric='minkowski', # string, or callable, default=”minkowski”p=2, # default=2, Parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2metric_params=None # default=None, Additional keyword arguments for the metric function. )### Step 2 - Fit the data and transform it, so we have 3 dimensions instead of 64 X_trans3 = embed3.fit_transform(X)### Step 3 - Print shape to test print('The new shape of X: ',X_trans3.shape)

最后，讓我們繪制一個 3D 散點圖，看看將維度降低到 3 后數據的樣子。

# Create a 3D scatter plot fig = px.scatter_3d(None, x=X_trans3[:,0], y=X_trans3[:,1], z=X_trans3[:,2],color=y.astype(str),height=900, width=900)# Update chart looks fig.update_layout(#title_text="Scatter 3D Plot",showlegend=True,legend=dict(orientation="h", yanchor="top", y=0, xanchor="center", x=0.5),scene_camera=dict(up=dict(x=0, y=0, z=1), center=dict(x=0, y=0, z=-0.2),eye=dict(x=-1.5, y=1.5, z=0.5)),margin=dict(l=0, r=0, b=0, t=0),scene = dict(xaxis=dict(backgroundcolor='white',color='black',gridcolor='#f0f0f0',title_font=dict(size=10),tickfont=dict(size=10),),yaxis=dict(backgroundcolor='white',color='black',gridcolor='#f0f0f0',title_font=dict(size=10),tickfont=dict(size=10),),zaxis=dict(backgroundcolor='lightgrey',color='black', gridcolor='#f0f0f0',title_font=dict(size=10),tickfont=dict(size=10),)))# Update marker size fig.update_traces(marker=dict(size=2))fig.show()

，Isomap 在將維度從 64 減少到 3 方面做得非常出色，同時保留了非線性關系。這使我們能夠在 3 維空間中可視化手寫數字的簇。

對于機器學習的下一步，我們現在可以輕松使用決策樹、SVM 或 KNN 等分類模型之一來預測每個手寫數字標簽。

總結

Isomap 是降維的最佳工具之一，使我們能夠保留數據點之間的非線性關系。

我們已經看到了 Isomap 算法如何在實踐中用于手寫數字識別。同樣，您可以使用 Isomap 作為 NLP（自然語言處理）分析的一部分，以在訓練分類模型之前減少文本數據的高維。

我希望這篇文章能讓你輕松了解 Isomap 的工作原理及其在數據科學項目中的優勢。

如果您有任何問題或建議，請隨時與我們聯系。

作者：Saul Dobilas

總結

以上是生活随笔為你收集整理的非线性降维方法 Isomap Embedding的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

非线性降维方法 Isomap Embedding

等距映射 (Isomap) 如何工作？

如何使用 Isomap ？

總結

總結