轨迹聚类光谱分析_光谱聚类
軌跡聚類光譜分析
In multivariate statistics and the clustering of data, Spectral clustering techniques make use of the spectrum of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions.
在多元統計和數據聚類中, 頻譜聚類技術利用數據相似性矩陣的頻譜在聚類較少維度之前執行降維。
The similarity matrix is ??provided as an input and consists of a quantitative the relative of similarity of assessment each pair of points ,these points can be connected by edges. The edge weight value between the two points farther apart is lower, and the edge weight value between the two points closer together is higher.
相似度矩陣作為輸入提供,由定量評估相似度相對的每對點組成,這些點可以通過邊連接。 相距較遠的兩點之間的邊緣權重值較低,相距較近的兩點之間的邊緣權重值較高。
By cutting the graph composed of all data points, the sum of the edge weights between the subgraphs is as low as possible, and the weight sum of the edges within the subgraphs is as high as possible, so as to achieve the purpose of clustering.
通過切割由所有數據點組成的圖,子圖之間的邊權重之和越小,子圖中各邊的權重之和就越大,以達到聚類的目的。
無向加權圖 (Undirected Weighted Graph)
An undirected graph G(V,E) is a set of vertices V that are connected together, where all the edges E are bidirectional.
無向圖G(V,E)是一組連接在一起的頂點V ,其中所有邊E都是雙向的。
For any point in the graph v_i, its degree d_i is defined as the sum of the weights of all edges connected to it, namely
對于圖v_i中的任何點,其度數d_i定義為與其連接的所有邊的權重之和,即
Using the definition of each point degree, we can get an nxn degree matrix D,it is a diagonal matrix, only the main diagonal has a value, corresponding to the degree of the i-th point in the i-th row, defined as follows:
使用每個點度的定義,我們可以得到一個nxn度矩陣D,它是一個對角矩陣,只有主對角線有一個值,對應于第i行第i個點的度,定義為如下:
Using the weight values ??between all points, we can get the adjacency matrix of the graph W, it is also an n x n matrix, and the j-th value in the i-th row corresponds to our weight w_{i j}.
使用所有點之間的權重值,我們可以獲得圖W的鄰接矩陣,它也是一個nxn矩陣,第i行中的第j個值對應于我們的權重w_ {ij}。
Usually we only have the definition of the data points, and we do not directly give the adjacency matrix.
通常,我們僅定義數據點,而沒有直接給出鄰接矩陣。
To construct adjacency matrix W there are three types of methods:
要構造鄰接矩陣W,可以使用三種類型的方法:
- ?-Neighboring method ?-鄰接法
- K-neighboring method K鄰域法
- Full connection method. 完整的連接方法。
For ?-Neighboring method,we set a distance threshold ? and then we use Euclidean distance between any two points
對于?鄰域法,我們設置距離閾值?,然后使用任意兩點之間的歐幾里得距離
Then according to s_{i,j} and ?,we define the adjacency matrix W as follows:
然后根據s_ {i,j}和?定義鄰接矩陣W如下:
The distance measurement is very imprecise, so in practical applications, we rarely use ?-Proximity method.
距離的測量非常不精確,因此在實際應用中,我們很少使用?-Proximity方法。
The K neighbor method, use the KNN algorithm to traverse all the sample points, and takes the k nearest points of each sample as the nearest neighbors, and only the k points closest to the sample have a postive weight .
K鄰居方法,使用KNN算法遍歷所有樣本點,并將每個樣本的k個最近點作為最近鄰居,并且只有最接近樣本的k個點具有正權重。
However, this method will cause the reconstructed adjacency matrix W to be asymmetric, and our subsequent algorithm needs a symmetric adjacency matrix. In order to solve this problem, one of the following two methods is generally adopted:
但是,該方法將導致重建的鄰接矩陣W不對稱,因此我們的后續算法需要對稱的鄰接矩陣。 為了解決此問題,通常采用以下兩種方法之一:
- The first method is to keep s_{ij} as long as a point is in the neighbor of another point : 第一種方法是保持s_ {ij},只要一個點位于另一個點的鄰居中:
- The second method requires two points to be neighbors in order to keep s_{ij}: 第二種方法要求兩個點成為鄰居,以保持s_ {ij}:
In last method we can choose different kernel functions to define the edge weights. Commonly used are polynomial kernel function, Gaussian kernel function and Sigmoid kernel function. The most commonly used is the Gaussian kernel function (RBF), where the similarity matrix and the adjacency matrix are the same:
在最后一種方法中,我們可以選擇不同的核函數來定義邊緣權重。 常用的有多項式核函數,高斯核函數和Sigmoid核函數。 最常用的是高斯核函數(RBF),其中相似度矩陣和鄰接矩陣相同:
拉普拉斯矩陣 (Laplacian matrix)
The Laplacians matrix is defined asLaplacian matrix
拉普拉斯矩陣定義為拉普拉斯矩陣
Where ,D is a diagonal matrix, and the diagonal element is the degree of the corresponding node and W is the adjacency matrix
其中,D是對角矩陣,對角元素是相應節點的度數,W是鄰接矩陣
The Laplacian matrix has some good properties as follows:
拉普拉斯矩陣具有一些良好的特性,如下所示:
1. The Laplacian matrix is ??a symmetric matrix and all its eigenvalues ??are real numbers.
1.拉普拉斯矩陣是一個對稱矩陣,其所有特征值均為實數。
3. For any vector f,we have
3.對于任何向量f,我們都有
4. The Laplace matrix is ??positive semi-definite, and the corresponding n real eigenvalues ??are all greater than or equal to 0, that is
4.拉普拉斯矩陣是正半定的,對應的n個實特征值都大于或等于0,即
And the smallest eigenvalue is 0, which is easily derived from property 3.
最小特征值是0,可以很容易地從屬性3得出。
A very well-known related spectral clustering technique, using the normalized cuts algorithm, is widely used in image segmentation. When dividing, the feature based on is thought to be the second smallest feature of the symmetric regularized Laplacian matrix. This matrix is ??defined as:
使用歸一化切割算法的非常著名的相關光譜聚類技術被廣泛用于圖像分割。 劃分時,基于的特征被認為是對稱正則拉普拉斯矩陣的第二個最小特征。 該矩陣定義為:
To perform a spectral clustering we need 3 main steps:
要執行光譜聚類,我們需要3個主要步驟:
Run k-means on these features to separate objects into k classes.
在這些功能上運行k均值可將對象分為k個類。
from sklearn.cluster import KMeans
從sklearn.cluster導入KMeans
提價 (Refrences)
J. Demmel, [1], CS267: Notes for Lecture 23, April 9, 1999, Graph Partitioning, Part 2
J. Demmel, [1] ,CS267:關于第23講的說明,1999年4月9日,圖分區,第2部分
Jump up to:a b c Jianbo Shi and Jitendra Malik, “Normalized Cuts and Image Segmentation”, IEEE Transactions on PAMI, Vol. 22, №8, Aug 2000.
跳至: a b c Jianbo Shi和Jitendra Malik, “歸一化的剪切和圖像分割” ,PAMI上的IEEE Transactions 22,№8,2000年8月。
Marina Meil? & Jianbo Shi, “Learning Segmentation by Random Walks”, Neural Information Processing Systems 13 (NIPS 2000), 2001, pp. 873–879.
MarinaMeil?和Shijianbo,“ 通過隨機游走進行學習細分 ”,神經信息處理系統13(NIPS 2000),2001年,第873–879頁。
Zare, Habil; P. Shooshtari; A. Gupta; R. Brinkman (2010). “Data reduction for spectral clustering to analyze high throughput flow cytometry data”. BMC Bioinformatics. 11: 403. doi:10.1186/1471–2105–11–403. PMC 2923634. PMID 20667133.
扎爾·哈比勒; P. Shooshtari; A.古普塔; R.布林克曼(2010)。 “用于光譜聚類的數據縮減,以分析高通量流式細胞儀數據” 。 BMC生物信息學 。 11:403。 DOI : 10.1186 / 1471-2105-11-403 。 PMC 2923634 。 PMID 20667133 。
Arias-Castro, E. and Chen, G. and Lerman, G. (2011), “Spectral clustering based on local linear approximations.”, Electronic Journal of Statistics, 5: 1537–1587, arXiv:1001.1323, doi:10.1214/11-ejs651
Arias-Castro,E.和Chen,G.和Lerman,G.(2011),“基于局部線性近似的譜聚類。”,《 電子統計》 , 5 :1537–1587, arXiv : 1001.1323 , doi : 10.1214 / 11-ejs651
http://scikit-learn.org/stable/modules/clustering.html#spectral-clustering
http://scikit-learn.org/stable/modules/clustering.html#spectral-clustering
Knyazev, Andrew V. (2006). Multiscale Spectral Graph Partitioning and Image Segmentation. Workshop on Algorithms for Modern Massive Data Sets Stanford University and Yahoo! Research.
克尼亞澤夫,安德魯五世(2006)。 多尺度譜圖劃分與圖像分割 。 斯坦福大學和Yahoo!現代海量數據集算法研討會 研究。
http://spark.apache.org/docs/latest/mllib-clustering.html#power-iteration-clustering-pic
http://spark.apache.org/docs/latest/mllib-clustering.html#power-iteration-clustering-pic
翻譯自: https://medium.com/ai-in-plain-english/spectral-clustering-60f61f79002d
軌跡聚類光譜分析
總結
以上是生活随笔為你收集整理的轨迹聚类光谱分析_光谱聚类的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: bert 是单标签还是多标签 的分类_搞
- 下一篇: Juniper SRX密码复杂度、尝试登