【ML】Principle Component Analysis 主成分分析
PCA(Principal Component Analysis) 是一種常見的數據分析方式,常用于高維數據的降維,可用于提取數據的主要特征分量。
首先介紹一些關于線性降維的基本思想,用于線性PCA的實現。在文章后半部分會總結非線性PCA的實現方法,即Kernel PCA。
Linear PCA
Linear dimensionality reduction
Dimentionality reduction
Map the dataset with dimension d to the dataset with dimension k, and d > k.
x=(x1?xd)d×1→(y^1?y^k)k×1=y^,k?dx = \begin{pmatrix} x_{1}\\ \vdots \\ x_ze8trgl8bvbq \end{pmatrix}_{d \times 1} \rightarrow \begin{pmatrix} \hat{y}_{1}\\ \vdots \\ \hat{y}_{k} \end{pmatrix}_{k \times 1} = \hat{y}, \ k \leqslant d x=????x1??xd??????d×1?→????y^?1??y^?k??????k×1?=y^?,?k?d
Linear Encoding / Embedding
y^=b+WTxb=(b1?bk)k×1W=(∣...∣w1...wk∣...∣)d×k\hat{y} = b + W^{T}x \newline b = \begin{pmatrix} b_{1} \\ \vdots \\ b_{k} \end{pmatrix}_{k \times 1} \ \ \ W = \begin{pmatrix} | & ...& | \\ w_{1} & ... &w_{k} \\ | &... & | \end{pmatrix} _{d \times k} y^?=b+WTxb=????b1??bk??????k×1????W=???∣w1?∣?.........?∣wk?∣????d×k?
將 X 數據集使用線性關系映射到 Y上,使維度降低。
Linear Decoding / Reconstruction
y^=(y^1?y^k)k×1→(x^1?x^d)d×1=x^,d?k\hat{y} = \begin{pmatrix} \hat{y}_{1} \\ \vdots \\ \hat{y}_{k} \end{pmatrix}_{k \times 1} \rightarrow \begin{pmatrix} \hat{x}_{1} \\ \vdots \\ \hat{x}_ze8trgl8bvbq \end{pmatrix}_{d \times 1} = \hat{\bold{x}}, \ \ d \geqslant k y^?=????y^?1??y^?k??????k×1?→????x^1??x^d??????d×1?=x^,??d?k
x^=a+Vy^\bold{\hat{x} = a + V\hat{y}} x^=a+Vy^?
a=(a1?ad)d×1V=(∣?∣v1?vk∣?∣)d×k\bold{a} = \begin{pmatrix} a_{1} \\ \vdots\\ a_ze8trgl8bvbq \end{pmatrix} _{d \times 1} \ \ \ \ \bold{V} = \begin{pmatrix} | && \cdots && | \\ v_{1} && \cdots && v_{k}\\ | && \cdots && | \end{pmatrix} _{d \times k} a=????a1??ad??????d×1?????V=???∣v1?∣???????∣vk?∣????d×k?
將 Y 用線性關系映射回 X,使得數據集重構。
PCA Objective
-
Reconstruction MSE:
g(b,W,a,V)=1n∑j=1n∥xj?x^j∥2g(\bold{b, W, a, V}) = \frac{1}{n}\sum_{j=1}^{n}\parallel \bold{x_{j} - \hat{x}_{j}}\parallel ^{2} g(b,W,a,V)=n1?j=1∑n?∥xj??x^j?∥2=1n∑j=1n∥xj?a?Vyj∥2= \frac{1}{n}\sum_{j=1}^{n}\parallel \bold{x_{j} - a - Vy_{j}}\parallel ^{2} =n1?j=1∑n?∥xj??a?Vyj?∥2
=1n∑j=1n∥xj?a?V(b+WTxj)∥2= \frac{1}{n}\sum_{j=1}^{n}\parallel \bold{x_{j} - a - V(b + W^{T}x_j)}\parallel ^{2} =n1?j=1∑n?∥xj??a?V(b+WTxj?)∥2
-
Learning problem: given training feature vectors
x1,...xnx_{1}, ...x_{n} x1?,...xn?
Find b, W, a, V, to minimize reconstruciton MSE:
(bPCA,WPCA,aPCA,VPCA)=argming(b,W,a,V)(\bold{b_{PCA}, W_{PCA}, a_{PCA}, V_{PCA}}) = argmin\ g(\bold{b, W, a, V}) (bPCA?,WPCA?,aPCA?,VPCA?)=argmin?g(b,W,a,V)
PCA terminology
-
Principal directions:
w1,PCA,?,wk,PCA\bold{w}_{1, PCA}, \cdots, \bold{w}_{k, PCA} w1,PCA?,?,wk,PCA? -
Principal components:
-
coordinates:
y^1,PCA,?,y^k,PCA\hat{y}_{1, PCA}, \cdots, \hat{y}_{k, PCA} y^?1,PCA?,?,y^?k,PCA? -
Principal directions scaled by coordinates
(y^1,PCA?w1,PCA),?,(y^k,PCA?wk,PCA)(\hat{y}_{1, PCA} \cdot \bold{w}_{1,PCA}), \cdots, (\hat{y}_{k, PCA} \cdot \bold{w}_{k,PCA}) (y^?1,PCA??w1,PCA?),?,(y^?k,PCA??wk,PCA?) -
Principal subspace:
Span(w1,PCA,?,wk,PCA)Span(\bold{w}_{1, PCA}, \cdots, \bold{w}_{k, PCA}) Span(w1,PCA?,?,wk,PCA?)
-
PCA classical solution
WPCA=VPCA=(∣?∣u1?uk∣?∣)d×kW_{PCA} = V_{PCA} = \begin{pmatrix} | && \cdots && | \\ u_{1} && \cdots && u_{k} \\ | && \cdots && | \end{pmatrix}_{d \times k} WPCA?=VPCA?=???∣u1?∣???????∣uk?∣????d×k?
,aPCA=μ^x,bPCA=?VPCATμ^x, a_{PCA} = \hat{\mu}_{x}, b_{PCA} = -V_{PCA}^{T} \hat{\mu}_{x} ,aPCA?=μ^?x?,bPCA?=?VPCAT?μ^?x?
where,
μ^x=1n∑j=1nxj\hat{\mu}_{x} = \frac{1}{n}\sum_{j = 1}^{n}x_{j} μ^?x?=n1?j=1∑n?xj?
is the d x 1 empirical mean feature vector
S^x=1n∑j=1n(xj?μ^x)(xj?μ^x)T\hat{S}_{x} = \frac{1}{n}\sum_{j = 1}^{n}(x_{j} - \hat{\mu}_{x})(x_{j} - \hat{\mu}_{x})^{T} S^x?=n1?j=1∑n?(xj??μ^?x?)(xj??μ^?x?)T
is the d x d empirical feature covariance matrix
and
u1,?,uku_{1}, \cdots, u_{k} u1?,?,uk?
are the orthonormal eigenvectors of Sx corresponding to its top k eigenvalues
λ1,?,λk\lambda_{1}, \cdots, \lambda_{k} λ1?,?,λk?
-
Need to re-arrange the eigenvectors and corresponding eigenvalues of Sx such that eigenvalues appear in a non-increasing order.
-
If Sx has rank r, then
λr>0,andλr+1=?λd=0\lambda_{r} > 0, and\ \lambda_{r+1}= \cdots \lambda_ze8trgl8bvbq = 0 λr?>0,and?λr+1?=?λd?=0
Geometric visualization
選取方向向量,使數據集投射到該方向向量上時投射誤差盡可能小。比如選擇Top K個方向向量即是選擇k個principal vectors,使得總的投射誤差最小(最能反映數據集的特征)。
此時特征值eigenvalue => variance of data on each direction
**投射誤差:**數據點到方向向量作垂線的長度
以三維的數據集為例
數據集在三個方向向量的投影,投射誤差按u1, u2, u3遞增。
PCA Algorithm
Step 1: Compute d x 1 empirical mean feature vector:
μ^x=1n∑j=1nxj\bold{\hat{\mu}_{x}} = \frac{1}{n}\sum_{j=1}^{n}x_{j} μ^?x?=n1?j=1∑n?xj?
Step 2: Compute d x d empirical feature covariance matrix:
S^x=1n∑j=1n(xj?μ^x)(xj?μ^x)T\hat{S}_{x} = \frac{1}{n}\sum_{j = 1}^{n}(x_{j} - \hat{\mu}_{x})(x_{j} - \hat{\mu}_{x})^{T} S^x?=n1?j=1∑n?(xj??μ^?x?)(xj??μ^?x?)T
Step 3: Compute eigen-decompostion of Sx:
-
Sort the eigenvalues of Sx and re-arrange the eigenvectors corresponding to eigenvalues.
-
u1, … uk are the corresponding orthonormal eigenvectors
-
WPCA=(∣?∣u1?uk∣?∣)d×kW_{PCA} = \begin{pmatrix} | && \cdots && | \\ u_{1} && \cdots && u_{k}\\ | && \cdots && | \end{pmatrix}_{d \times k} WPCA?=???∣u1?∣???????∣uk?∣????d×k?
Step 4: Encode/ embeded any feature vector x(training or new unseen test) from d-dimension to k-dimention as follows:
y^PCA=WPCAT?(x?μ^x)\hat{y}_{PCA} = W_{PCA}^{T}\cdot(x - \hat{\mu}_{x}) y^?PCA?=WPCAT??(x?μ^?x?)
Step 5: Decode/ Reconstruct any label vector y from k-dimension to d-dimention as follows:
x^PCA=μ^x+WPCA?y\hat{x}_{PCA} = \hat{\mu}_{x} + W_{PCA}\cdot y x^PCA?=μ^?x?+WPCA??y
Dual PCA
Eigen-decomposition of d x d empirical covariance matrix Sx is computationally challenging if d >> n.
如果數據維度很大時(比如圖像,每個像素點都是一個維度, 256*256的圖片就有65536個像素),empirical covariance matrix 是d x d的矩陣,進行特征值分解會很慢,使用dual PCA就非常有必要。
與50位技術專家面對面20年技術見證,附贈技術全景圖總結
以上是生活随笔為你收集整理的【ML】Principle Component Analysis 主成分分析的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: mysql ---- innodb-4-
- 下一篇: 【计算机基础】 Virtual memo