umap算法_科学网-[转载]【源码】均匀流形近似与投影(UMAP)算法仿真-刘春静的博文...
UMAP算法是Leland McInnes、John Healy和James Melville的發明。
The UMAP algorithm is the invention of Leland McInnes, John Healy, and James Melville.
原始參考論文下載地址:https://arxiv.org/pdf/1802.03426.pdf
See their original paper for a long-form description (https://arxiv.org/pdf/1802.03426.pdf).
原始算法的Python實現:
https://umap-learn.readthedocs.io/en/latest/index.html
Also see the documentation for the original Python implementation (https://umap-learn.readthedocs.io/en/latest/index.html).
給定一組高維數據,run_umap.m生成數據的低維表示,用于數據可視化和探索研究。
Given a set of high-dimensional data, run_umap.m produces a lower-dimensional representation of the data for purposes of data visualization and exploration.
該MATLAB實現遵循與Python實現非常相似的結構,且許多函數描述幾乎相同。
This MATLAB implementation follows a very similar structure to the Python implementation, and many of the function descriptions are nearly identical.
以下是MATLAB實現中的一些主要差異:
1)所有最近鄰搜索都是通過內置的MATLAB函數knnsearch.m執行的。最初的Python實現使用隨機投影樹和最近鄰下降來近似數據點的最近鄰。函數knnsearch.m要么使用窮盡的方法,要么使用k-d樹,這兩種方法對于高維數據都很慢。因此,對于較大的高維數據集,這種實現速度較慢。
Here are some major differences in this MATLAB implementation:
1) All nearest-neighbour searches are performed through the built-in MATLAB function knnsearch.m. The original Python implementation uses random projection trees and nearest-neighbour descent to approximate nearest neighbours of data points. The function knnsearch.m either uses an exhaustive approach or k-d trees, both of which are slow for high-dimensional data. As such, this implementation is slower in the case of large, high-dimensional data sets.
2)MATLAB函數eigs.m不像Python包Scipy中的函數“eigsh”那么快。對于大型數據集,我們使用一種稱為概率分塊的算法對數據進行分塊,從而初始化低維轉換。如果用戶下載并安裝了Andrew Knyazev提供的lobpcg.m函數,可以用來為中型數據集找到精確的特征向量。
2) The MATLAB function eigs.m does not appear to be as fast as the function "eigsh" in the Python package Scipy. For large data sets, we initialize a low-dimensional transform by binning the data using an algorithm known as probability binning. If the user downloads and installs the function lobpcg.m, made available here (https://www.mathworks.com/matlabcentral/fileexchange/48-locally-optimal-block-preconditioned-conjugate-gradient) by Andrew Knyazev, this can be used to find exact eigenvectors for medium-sized data sets.
3)在大多數情況下,我們調用Java代碼來執行隨機梯度下降。然而,如果數據被減少到2以外的維度,那么隨機梯度下降會在MATLAB中本地執行,這會慢得多。
3) In most cases, we call Java code to perform stochastic gradient descent. However, if the data is being reduced to a dimension other than 2, then stochastic gradient descent is performed natively in MATLAB, which tends to be much slower.
總之,該MATLAB UMAP實現比原始的Python實現慢,后者使用了Numba來加速計算。
Overall, this MATLAB UMAP implementation is slower than the the original Python implementation, which uses Numba to accelerate the calculations.
然而,根據目前所做的測試,這種速度似乎是可以接受的。
However, the speed seems acceptable based on tests done so far.
盡管我們還沒有在示例中給出,但是該版本應該可以實現監督的維度縮減。
Supervised dimension reduction should be possible with this version of the implementation, though we have not yet included it in the examples.
本代碼是設計的初稿,目前還沒有得到UMAP原作者的審查。
This implementation is considered a first draft and has not yet been reviewed by the original authors of UMAP.
我們希望在將來繼續改進它,從使用監督降維的例子開始。
We hope to make improvements to it in the future, starting with examples of using supervised dimension reduction.
本代碼由斯坦福大學的赫森伯格實驗室提供。
Provided by the Herzenberg Lab at Stanford University.
完整源碼下載地址:
更多精彩文章請關注微信號:
轉載本文請聯系原作者獲取授權,同時請注明本文來自劉春靜科學網博客。
收藏
分享
分享到:
總結
以上是生活随笔為你收集整理的umap算法_科学网-[转载]【源码】均匀流形近似与投影(UMAP)算法仿真-刘春静的博文...的全部內容,希望文章能夠幫你解決所遇到的問題。

- 上一篇: 威联通建php邮件服务器_威联通TS-5
- 下一篇: 设计模式之创建型——工厂模式(3种)