主成分分析中特征值分解与SVD(奇异值分解)的比较及其相关R语言的实现
2019獨(dú)角獸企業(yè)重金招聘Python工程師標(biāo)準(zhǔn)>>>
>pca <- read.csv("D:/pca.csv")
>pca
??? x1? x2 x3 x4
1?? 40 2.0? 5 20
2?? 10 1.5? 5 30
3? 120 3.0 13 50
4? 250 4.5 18? 0
5? 120 3.5? 9 50
6?? 10 1.5 12 50
7?? 40 1.0 19 40
8? 270 4.0 13 60
9? 280 3.5 11 60
10 170 3.0? 9 60
11 180 3.5 14 40
12 130 2.0 30 50
13 220 1.5 17 20
14 160 1.5 35 60
15 220 2.5 14 30
16 140 2.0 20 20
17 220 2.0 14 10
18? 40 1.0 10? 0
19? 20 1.0 12 60
20 120 2.0 20? 0
> P=scale(pca)#將原始數(shù)據(jù)標(biāo)準(zhǔn)化后,建立矩陣P
> P
???????????? [,1]?????? [,2]?????? [,3]?????? [,4]
?[1,] -1.10251269 -0.3081296 -1.3477550 -0.7084466
?[2,] -1.44001658 -0.7821750 -1.3477550 -0.2513843
?[3,] -0.20250233? 0.6399614 -0.2695510? 0.6627404
?[4,]? 1.26001451? 2.0620978? 0.4043265 -1.6225713
?[5,] -0.20250233? 1.1140068 -0.8086530? 0.6627404
?[6,] -1.44001658 -0.7821750 -0.4043265? 0.6627404
?[7,] -1.10251269 -1.2562205? 0.5391020? 0.2056781
?[8,]? 1.48501710? 1.5880523 -0.2695510? 1.1198028
?[9,]? 1.59751839? 1.1140068 -0.5391020? 1.1198028
[10,]? 0.36000414? 0.6399614 -0.8086530? 1.1198028
[11,]? 0.47250544? 1.1140068 -0.1347755? 0.2056781
[12,] -0.09000104 -0.3081296? 2.0216325? 0.6627404
[13,]? 0.92251062 -0.7821750? 0.2695510 -0.7084466
[14,]? 0.24750285 -0.7821750? 2.6955100? 1.1198028
[15,]? 0.92251062? 0.1659159 -0.1347755 -0.2513843
[16,]? 0.02250026 -0.3081296? 0.6738775 -0.7084466
[17,]? 0.92251062 -0.3081296 -0.1347755 -1.1655090
[18,] -1.10251269 -1.2562205 -0.6738775 -1.6225713
[19,] -1.32751528 -1.2562205 -0.4043265? 1.1198028
[20,] -0.20250233 -0.3081296? 0.6738775 -1.6225713
> eigen(cov(P)) #求矩陣P的協(xié)方差矩陣的特征值和特征向量,向量矩陣($vectors)中的第一例(0.69996363....)即為第一個(gè)特征值(1.7182516)的特征向量,以此類推。
$values
[1] 1.7182516 1.0935358 0.9813470 0.2068656
$vectors
?????????? [,1]??????? [,2]??????? [,3]?????? [,4]
[1,] 0.69996363? 0.09501037 -0.24004879? 0.6658833
[2,] 0.68979810 -0.28364662? 0.05846333 -0.6635550
[3,] 0.08793923? 0.90415870 -0.27031356 -0.3188955
[4,] 0.16277651? 0.30498307? 0.93053167? 0.1208302
特征值分解可以得到特征值與特征向量,特征值表示的是這個(gè)特征到底有多重要,而特征向量表示這個(gè)特征是什么;奇異值σ跟特征值類似,在矩陣Σ中也是從大到小排列,而且σ的減少特別的快,在很多情況下,前10%甚至1%的奇異值的和就占了全部的奇異值之和的99%以上了。
>?>?svd(cov(P))$d?#奇異值分解實(shí)現(xiàn),應(yīng)用的矩陣同樣為原始數(shù)據(jù)的標(biāo)準(zhǔn)化后的協(xié)方差矩陣(方陣) [1]?1.7182516?1.0935358?0.9813470?0.2068656 $u[,1]????????[,2]????????[,3]???????[,4] [1,]?-0.69996363??0.09501037?-0.24004879?-0.6658833 [2,]?-0.68979810?-0.28364662??0.05846333??0.6635550 [3,]?-0.08793923??0.90415870?-0.27031356??0.3188955 [4,]?-0.16277651??0.30498307??0.93053167?-0.1208302 $v[,1]????????[,2]????????[,3]???????[,4] [1,]?-0.69996363??0.09501037?-0.24004879?-0.6658833 [2,]?-0.68979810?-0.28364662??0.05846333??0.6635550 [3,]?-0.08793923??0.90415870?-0.27031356??0.3188955 [4,]?-0.16277651??0.30498307??0.93053167?-0.1208302
結(jié)果顯示和特征值分解的結(jié)果完全相同,即奇異值=特征值;左奇異向量與右奇異向量相等,這點(diǎn)和理論一致:
http://blog.csdn.net/wangzhiqing3/article/details/7446444?
2. 奇異值分解
上面討論了方陣的分解,但是在LSA中,我們是要對(duì)Term-Document矩陣進(jìn)行分解,很顯然這個(gè)矩陣不是方陣。這時(shí)需要奇異值分解對(duì)Term-Document進(jìn)行分解。奇異值分解的推理使用到了上面所講的方陣的分解。
假設(shè)C是M x N矩陣,U是M x M矩陣,其中U的列為CCT的正交特征向量,V為N x N矩陣,其中V的列為CTC的正交特征向量,再假設(shè)r為C矩陣的秩,則存在奇異值分解:
S奇異值分解是一個(gè)能適用于任意的矩陣的一種分解的方法,VD處理普通矩陣mxn,待續(xù)......
?
>?svd(P)$d??#奇異值分解實(shí)現(xiàn),應(yīng)用的矩陣為原始數(shù)據(jù)的標(biāo)準(zhǔn)化后矩陣(20X4) [1]?5.713736?4.558199?4.318054?1.982535 $u[,1]????????[,2]????????[,3]????????[,4][1,]??0.213188874?-0.31854661??0.01117934?-0.09356334[2,]??0.298743593?-0.26550125?-0.09966088?-0.02040249[3,]?-0.067184471?-0.05316902?-0.17961537?-0.19846043[4,]?-0.363306085?-0.13041863??0.41709916?-0.43090590[5,]?-0.116116981?-0.18960342?-0.21978181?-0.27040771[6,]??0.258181271?-0.01720108?-0.23759348?-0.11644174[7,]??0.272565880??0.17588846?-0.05485743?-0.02402906[8,]?-0.401395312?-0.04641079?-0.19713543??0.07886498[9,]?-0.353798967?-0.06803484?-0.20133714??0.31867233 [10,]?-0.140818406?-0.11779839?-0.28058876??0.10504376 [11,]?-0.196159553?-0.07244560?-0.04157562?-0.17994125 [12,]?-0.001770250??0.46264984?-0.01709488?-0.21189013 [13,]?-0.002549413??0.07396825??0.23141706??0.48510618 [14,]?-0.009279184??0.66343445?-0.04822489?-0.02040610 [15,]?-0.123807056?-0.03464961??0.09477346??0.26067355 [16,]??0.044254060??0.10591148??0.20027671?-0.04088441 [17,]?-0.040535151?-0.06631368??0.29818366??0.36362322 [18,]??0.343318979?-0.18704239??0.26319302??0.05965465 [19,]??0.288607918??0.04522412?-0.32341707??0.10786442 [20,]??0.097860256??0.04005869??0.38476035?-0.17217051 $v[,1]????????[,2]????????[,3]???????[,4] [1,]?-0.69996363??0.09501037??0.24004879??0.6658833 [2,]?-0.68979810?-0.28364662?-0.05846333?-0.6635550 [3,]?-0.08793923??0.90415870??0.27031356?-0.3188955 [4,]?-0.16277651??0.30498307?-0.93053167??0.1208302 奇異值與潛在語義索引LSI Book?<-?read.csv("D:/Book.csv") Book K=as.matrix(data.frame(Book)) svd(K) rownames(kk)=Book$X kk rownames(v)=paste('T',1:9,sep='') plot(rnorm,xlim=c(-0.8,0),ylim=c(-0.8,0.6),lty=0) points(v[,3],v[,2],col='red') points(kk[,3],kk[,2],col='blue') text(kk[,3],kk[,2],Book$X) text(v[,3],v[,2],paste('T',1:9,sep=''))
結(jié)果顯示右奇異矩陣為之前原始數(shù)據(jù)的標(biāo)準(zhǔn)化后的協(xié)方差矩陣的特征向量矩陣
svd即可以實(shí)現(xiàn)對(duì)列的壓縮(變量),也可以實(shí)現(xiàn)對(duì)行的壓縮(case)
轉(zhuǎn)載于:https://my.oschina.net/u/1272414/blog/190032
總結(jié)
以上是生活随笔為你收集整理的主成分分析中特征值分解与SVD(奇异值分解)的比较及其相关R语言的实现的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 将Windows文件挂在到Linux上
- 下一篇: 技术检验