當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【ML】Principle Component Analysis 主成分分析

發布時間：2024/9/15 编程问答 31 豆豆

生活随笔收集整理的這篇文章主要介紹了【ML】Principle Component Analysis 主成分分析小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

PCA（Principal Component Analysis）是一種常見的數據分析方式，常用于高維數據的降維，可用于提取數據的主要特征分量。

首先介紹一些關于線性降維的基本思想，用于線性PCA的實現。在文章后半部分會總結非線性PCA的實現方法，即Kernel PCA。

Linear PCA

Linear dimensionality reduction

Dimentionality reduction

Map the dataset with dimension d to the dataset with dimension k, and d > k.

$x=(x1?xd)d×1→(y^1?y^k)k×1=y^,k?dx = \begin{pmatrix} x_{1}\\ \vdots \\ x_ze8trgl8bvbq \end{pmatrix}_{d \times 1} \rightarrow \begin{pmatrix} \hat{y}_{1}\\ \vdots \\ \hat{y}_{k} \end{pmatrix}_{k \times 1} = \hat{y}, \ k \leqslant d$

Linear Encoding / Embedding

$y^=b+WTxb=(b1?bk)k×1W=(∣...∣w1...wk∣...∣)d×k\hat{y} = b + W^{T}x \newline b = \begin{pmatrix} b_{1} \\ \vdots \\ b_{k} \end{pmatrix}_{k \times 1} \ \ \ W = \begin{pmatrix} | & ...& | \\ w_{1} & ... &w_{k} \\ | &... & | \end{pmatrix} _{d \times k}$
將 X 數據集使用線性關系映射到 Y上，使維度降低。

Linear Decoding / Reconstruction

$y^=(y^1?y^k)k×1→(x^1?x^d)d×1=x^,d?k\hat{y} = \begin{pmatrix} \hat{y}_{1} \\ \vdots \\ \hat{y}_{k} \end{pmatrix}_{k \times 1} \rightarrow \begin{pmatrix} \hat{x}_{1} \\ \vdots \\ \hat{x}_ze8trgl8bvbq \end{pmatrix}_{d \times 1} = \hat{\bold{x}}, \ \ d \geqslant k$

$x^=a+Vy^\bold{\hat{x} = a + V\hat{y}}$

$a=(a1?ad)d×1V=(∣?∣v1?vk∣?∣)d×k\bold{a} = \begin{pmatrix} a_{1} \\ \vdots\\ a_ze8trgl8bvbq \end{pmatrix} _{d \times 1} \ \ \ \ \bold{V} = \begin{pmatrix} | && \cdots && | \\ v_{1} && \cdots && v_{k}\\ | && \cdots && | \end{pmatrix} _{d \times k}$

將 Y 用線性關系映射回 X，使得數據集重構。

PCA Objective

Reconstruction MSE:
$g(b,W,a,V)=1n∑j=1n∥xj?x^j∥2g(\bold{b, W, a, V}) = \frac{1}{n}\sum_{j=1}^{n}\parallel \bold{x_{j} - \hat{x}_{j}}\parallel ^{2}$

$\frac{1}{n}\sum_{j=1}^{n}\parallel \bold{x_{j} - a - Vy_{j}}\parallel ^{2}$

$\frac{1}{n}\sum_{j=1}^{n}\parallel \bold{x_{j} - a - V(b + W^{T}x_j)}\parallel ^{2}$
Learning problem: given training feature vectors
$x_{1}, ...x_{n}$
Find b, W, a, V, to minimize reconstruciton MSE:
$(bPCA,WPCA,aPCA,VPCA)=argming(b,W,a,V)(\bold{b_{PCA}, W_{PCA}, a_{PCA}, V_{PCA}}) = argmin\ g(\bold{b, W, a, V})$

PCA terminology

Principal directions:
$w1,PCA,?,wk,PCA\bold{w}_{1, PCA}, \cdots, \bold{w}_{k, PCA}$
Principal components:
- coordinates:
  $y^1,PCA,?,y^k,PCA\hat{y}_{1, PCA}, \cdots, \hat{y}_{k, PCA}$
- Principal directions scaled by coordinates
  $(y^1,PCA?w1,PCA),?,(y^k,PCA?wk,PCA)(\hat{y}_{1, PCA} \cdot \bold{w}_{1,PCA}), \cdots, (\hat{y}_{k, PCA} \cdot \bold{w}_{k,PCA})$
- Principal subspace:
  $Span(w1,PCA,?,wk,PCA)Span(\bold{w}_{1, PCA}, \cdots, \bold{w}_{k, PCA})$

PCA classical solution

$WPCA=VPCA=(∣?∣u1?uk∣?∣)d×kW_{PCA} = V_{PCA} = \begin{pmatrix} | && \cdots && | \\ u_{1} && \cdots && u_{k} \\ | && \cdots && | \end{pmatrix}_{d \times k}$

$,aPCA=μ^x,bPCA=?VPCATμ^x, a_{PCA} = \hat{\mu}_{x}, b_{PCA} = -V_{PCA}^{T} \hat{\mu}_{x}$

where,
$μ^x=1n∑j=1nxj\hat{\mu}_{x} = \frac{1}{n}\sum_{j = 1}^{n}x_{j}$
is the d x 1 empirical mean feature vector
$S^x=1n∑j=1n(xj?μ^x)(xj?μ^x)T\hat{S}_{x} = \frac{1}{n}\sum_{j = 1}^{n}(x_{j} - \hat{\mu}_{x})(x_{j} - \hat{\mu}_{x})^{T}$
is the d x d empirical feature covariance matrix

and
$u1,?,uku_{1}, \cdots, u_{k}$
are the orthonormal eigenvectors of Sx corresponding to its top k eigenvalues
$λ1,?,λk\lambda_{1}, \cdots, \lambda_{k}$

Need to re-arrange the eigenvectors and corresponding eigenvalues of Sx such that eigenvalues appear in a non-increasing order.
If Sx has rank r, then
$λr>0,andλr+1=?λd=0\lambda_{r} > 0, and\ \lambda_{r+1}= \cdots \lambda_ze8trgl8bvbq = 0$

Geometric visualization

選取方向向量，使數據集投射到該方向向量上時投射誤差盡可能小。比如選擇Top K個方向向量即是選擇k個principal vectors，使得總的投射誤差最小（最能反映數據集的特征）。

此時特征值eigenvalue => variance of data on each direction

**投射誤差：**數據點到方向向量作垂線的長度

以三維的數據集為例

數據集在三個方向向量的投影，投射誤差按u1， u2， u3遞增。

PCA Algorithm

Step 1: Compute d x 1 empirical mean feature vector:
$μ^x=1n∑j=1nxj\bold{\hat{\mu}_{x}} = \frac{1}{n}\sum_{j=1}^{n}x_{j}$
Step 2: Compute d x d empirical feature covariance matrix:
$S^x=1n∑j=1n(xj?μ^x)(xj?μ^x)T\hat{S}_{x} = \frac{1}{n}\sum_{j = 1}^{n}(x_{j} - \hat{\mu}_{x})(x_{j} - \hat{\mu}_{x})^{T}$
Step 3: Compute eigen-decompostion of Sx:

Sort the eigenvalues of Sx and re-arrange the eigenvectors corresponding to eigenvalues.
u1, … uk are the corresponding orthonormal eigenvectors
$WPCA=(∣?∣u1?uk∣?∣)d×kW_{PCA} = \begin{pmatrix} | && \cdots && | \\ u_{1} && \cdots && u_{k}\\ | && \cdots && | \end{pmatrix}_{d \times k}$

Step 4: Encode/ embeded any feature vector x(training or new unseen test) from d-dimension to k-dimention as follows:
$y^PCA=WPCAT?(x?μ^x)\hat{y}_{PCA} = W_{PCA}^{T}\cdot(x - \hat{\mu}_{x})$
Step 5: Decode/ Reconstruct any label vector y from k-dimension to d-dimention as follows:
$x^PCA=μ^x+WPCA?y\hat{x}_{PCA} = \hat{\mu}_{x} + W_{PCA}\cdot y$

Dual PCA

Eigen-decomposition of d x d empirical covariance matrix Sx is computationally challenging if d >> n.

如果數據維度很大時（比如圖像，每個像素點都是一個維度, 256*256的圖片就有65536個像素），empirical covariance matrix 是d x d的矩陣，進行特征值分解會很慢，使用dual PCA就非常有必要。

與50位技術專家面對面20年技術見證，附贈技術全景圖

總結

以上是生活随笔為你收集整理的【ML】Principle Component Analysis 主成分分析的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： mysql ---- innodb-4-
下一篇：【计算机基础】 Virtual memo