r语言degseq2_第二次RNA-seq实战总结(3)-用DESeq2进行基因表达差异分析
DESeq2是一個用于分析基因表達差異的R包,具體操作要在R語言中運行
1.R語言安裝DESeq2>source("https://bioconductor.org/biocLite.R")
>biocLite("DESeq2")
2.載入基因表達量文件,添加列名>?setwd("C:\\Users\\18019\\Desktop\\counts")
>?options(stringsAsFactors=FALSE)
>?control1
>?head(control1)
gene_id?control11?ENSG00000000003.14_2?????15762??ENSG00000000005.5_2????????03?ENSG00000000419.12_2??????7564?ENSG00000000457.13_3??????3015?ENSG00000000460.16_5??????7646?ENSG00000000938.12_2????????0>?control2
>?treat1
>treat2
3.數據整合>?raw_count?
>?head(raw_count)
gene_id?control1?control2??treat1??treat21?__alignment_not_unique??7440131??2973831?7861484?86768842????????????__ambiguous???976485???412543?1014239?11790513???????????__no_feature??1860117???768637?1289737?18120564??????????__not_aligned??1198545???572588?1256232?13480685????????__too_low_aQual????????0????????0???????0???????06???ENSG00000000003.14_2?????1576??????713????1589????1969#刪除前五行>raw_count_filt?aENSEMBL?row.names(raw_count_filt)?
>?raw_count_filt?
>?colnames(raw_count_filt)[1]?
>head(raraw_count_filt?)
ensembl_gene_id??????????????gene_id?control1?control2?treat1?treat2
ENSG00000000003?ENSG00000000003?ENSG00000000003.14_2?????1576??????713???1589???1969ENSG00000000005?ENSG00000000005??ENSG00000000005.5_2????????0????????0??????0??????1ENSG00000000419?ENSG00000000419?ENSG00000000419.12_2??????756??????384????806????984ENSG00000000457?ENSG00000000457?ENSG00000000457.13_3??????301??????151????217????324ENSG00000000460?ENSG00000000460?ENSG00000000460.16_5??????764??????312????564????784ENSG00000000938?ENSG00000000938?ENSG00000000938.12_2????????0????????0??????0??????0
4.對基因進行注釋-獲取gene_symbol
用bioMart對ensembl_id轉換成gene_symbol>?library("biomaRt")
>?library("curl")
>?mart?
>?my_ensembl_gene_id?
>??options(timeout?=?4000000)
>?hg_symbols
>?head(readcount)
ensembl_gene_id??????????????gene_id?control1?control2?treat1?treat2?hgnc_symbol?chromosome_name?start_position1?ENSG00000000003?ENSG00000000003.14_2?????1576??????713???1589???1969??????TSPAN6???????????????X??????1006271092?ENSG00000000005??ENSG00000000005.5_2????????0????????0??????0??????1????????TNMD???????????????X??????1005848023?ENSG00000000419?ENSG00000000419.12_2??????756??????384????806????984????????DPM1??????????????20???????509348674?ENSG00000000457?ENSG00000000457.13_3??????301??????151????217????324???????SCYL3???????????????1??????1698496315?ENSG00000000460?ENSG00000000460.16_5??????764??????312????564????784????C1orf112???????????????1??????1696620076?ENSG00000000938?ENSG00000000938.12_2????????0????????0??????0??????0?????????FGR???????????????1???????27612064
end_position???band1????100639991??q22.12????100599885??q22.13?????50958555?q13.134????169894267??q24.25????169854080??q24.26?????27635277??p35.3#輸出count表達矩陣>?write.csv(readcount,?file='readcount_all.csv')
>?readcount
>?write.csv(readcount,?file='readcount.csv')
>?head(readcount)
control1?control2?treat1?treat2
ENSG00000000003?????1576??????713???1589???1969ENSG00000000005????????0????????0??????0??????1ENSG00000000419??????756??????384????806????984ENSG00000000457??????301??????151????217????324ENSG00000000460??????764??????312????564????784ENSG00000000938????????0????????0??????0??????0
5.DEseq2篩選差異表達基因并注釋(bioMart)#載入數據(countData和colData)>?mycounts
>?head(mycounts)
control1?control2?treat1?treat2
ENSG00000000003?????1576??????713???1589???1969
ENSG00000000005????????0????????0??????0??????1
ENSG00000000419??????756??????384????806????984
ENSG00000000457??????301??????151????217????324
ENSG00000000460??????764??????312????564????784
ENSG00000000938????????0????????0??????0??????0
>?condition?
>?condition
[1]?control?control?treat???treat
Levels:?control?treat
>?colData?
>?colData
condition
control1???control
control2???control
treat1???????treat
treat2???????treat
構建dds對象,開始DESeq流程>library("DESeq2")
>?dds?
>?dds?
estimating?size?factors
estimating?dispersions
gene-wise?dispersion?estimates
mean-dispersion?relationship
final?dispersion?estimates
fitting?model?and?testing
>?ddsclass:?DESeqDataSet?dim:?60880?4?metadata(1):?version
assays(4):?counts?mu?H?cooks
rownames(60880):?ENSG00000000003?ENSG00000000005?...?ENSG00000285993?ENSG00000285994
rowData?names(22):?baseMean?baseVar?...?deviance?maxCooks
colnames(4):?control1?control2?treat1?treat2
colData?names(2):?condition?sizeFactor#查看總體結果>?res?=?results(dds,?contrast=c("condition",?"control",?"treat"))
>?res?=?res[order(res$pvalue),]
>?head(res)
log2?fold?change?(MLE):?condition?control?vs?treat
Wald?test?p-value:?condition?control?vs?treat
DataFrame?with?6?rows?and?6?columns
baseMean???log2FoldChange?????????????lfcSE?????????????stat???????????????pvalue
ENSG00000178691?1025.66218695436?2.83012875791025?0.225513526042636?12.5497073615672?3.98981786210676e-36ENSG00000135535?2415.77359618136?1.22406336488047?0.183431131037356?6.67314952460929??2.5037106203736e-11ENSG00000164172?531.425786834548?1.30449018960413?0.207785830749451?6.27805170785243?3.42841906697453e-10ENSG00000172239?483.998634607265?1.31701332235233?0.215453141699223?6.11275988813803?9.79226522597759e-10ENSG00000237296?53.0114998109978?2.70139282483841?0.480033904207378?5.62750422660019?1.82835684560772e-08
ENSG00000196504?3592.67315807893?1.09372324353448?0.200308218929736?5.46020153031335?4.75594407815571e-08
padj
ENSG00000178691?3.90682965057494e-32ENSG00000135535?1.22581671973492e-07ENSG00000164172?1.11903598346049e-06ENSG00000172239?2.39714652731931e-06ENSG00000237296???????????????????NA
ENSG00000196504?9.31404088266014e-05>?summary(res)
out?of?33100?with?nonzero?total?read?count
adjusted?p-value??0?(up)???????:?78,?0.24%
LFC?
outliers?[1]???????:?0,?0%
low?counts?[2]?????:?23308,?70%
(mean?count?
[1]?see?'cooksCutoff'?argument?of??results
[2]?see?'independentFiltering'?argument?of??results#這里可以看到有78個基因上調,15個基因下調#將分析結果輸出>?write.csv(res,file="All_results.csv")
提取差異表達基因
這里我用的方法是倍差法
獲取padj(p值經過多重校驗校正后的值)小于0.05,表達倍數取以2為對數后大于1或者小于-1的差異表達基因>?diff_gene_deseq2??1)
>?dim(diff_gene_deseq2)
[1]?21??6
>?head(diff_gene_deseq2)
log2?fold?change?(MLE):?condition?control?vs?treat
Wald?test?p-value:?condition?control?vs?treat
DataFrame?with?6?rows?and?6?columns
baseMean???log2FoldChange?????????????lfcSE?????????????stat???????????????pvalue
ENSG00000178691?1025.66218695436?2.83012875791025?0.225513526042636?12.5497073615672?3.98981786210676e-36
ENSG00000135535?2415.77359618136?1.22406336488047?0.183431131037356?6.67314952460929??2.5037106203736e-11
ENSG00000164172?531.425786834548?1.30449018960413?0.207785830749451?6.27805170785243?3.42841906697453e-10
ENSG00000172239?483.998634607265?1.31701332235233?0.215453141699223?6.11275988813803?9.79226522597759e-10
ENSG00000196504?3592.67315807893?1.09372324353448?0.200308218929736?5.46020153031335?4.75594407815571e-08
ENSG00000163848?633.066990185649?1.15489622775117?0.219655131372136?5.25777030810433?1.45812478575117e-07
padj
ENSG00000178691?3.90682965057494e-32
ENSG00000135535?1.22581671973492e-07
ENSG00000164172?1.11903598346049e-06
ENSG00000172239?2.39714652731931e-06
ENSG00000196504?9.31404088266014e-05
ENSG00000163848?0.000230253090268928#輸出差異基因>?write.csv(diff_gene_deseq2,file=?"DEG_treat_vs_control.csv")#用bioMart對差異表達基因進行注釋>?library("biomaRt")
>?library("curl")
>?hg_symbols
>?head(hg_symbols)
ensembl_gene_id?external_gene_name
1?ENSG00000011405????????????PIK3C2A
2?ENSG00000100731??????????????PCNX1
3?ENSG00000128512??????????????DOCK4
4?ENSG00000135535??????????????CD164
5?ENSG00000140526??????????????ABHD2
6?ENSG00000144228??????????????SPOPL
description
1?phosphatidylinositol-4-phosphate?3-kinase?catalytic?subunit?type?2?alpha?[Source:HGNC?Symbol;Acc:HGNC:8971]
2???????????????????????????????????????????????????????????????pecanex?1?[Source:HGNC?Symbol;Acc:HGNC:19740]
3??????????????????????????????????????????????dedicator?of?cytokinesis?4?[Source:HGNC?Symbol;Acc:HGNC:19192]
4???????????????????????????????????????????????????????????CD164?molecule?[Source:HGNC?Symbol;Acc:HGNC:1632]
5?????????????????????????????????????????abhydrolase?domain?containing?2?[Source:HGNC?Symbol;Acc:HGNC:18717]
6???????????????????????????????????????speckle?type?BTB/POZ?protein?like?[Source:HGNC?Symbol;Acc:HGNC:27934]#合并數據:res結果hg_symbols合并成一個文件>?ensembl_gene_id
>?diff_gene_deseq2
>?colnames(diff_gene_deseq2)[1]
>?diff_name
>?head(diff_name)
DataFrame?with?6?rows?and?9?columns
ensembl_gene_id?????????baseMean???log2FoldChange?????????????lfcSE?????????????stat???????????????pvalue
1?ENSG00000011405?1600.01408863821?1.07722909393382??0.24714564887963?4.35868120202462?1.30848557424083e-05
2?ENSG00000100731?1162.93822827396??1.0006257630015?0.214393389946423?4.66724166846545?3.05270197242525e-06
3?ENSG00000128512?368.442571635954?1.19657846347522?0.262780839813213??4.5535224878867?5.27550292947225e-06
4?ENSG00000135535?2415.77359618136?1.22406336488047?0.183431131037356?6.67314952460929??2.5037106203736e-11
5?ENSG00000140526?796.447227235737?1.05296203760187??0.23492350092969?4.48214858639031?7.38952622958053e-06
6?ENSG00000144228?293.746859588111?1.10903132755747?0.283181091639851?3.91633255290906??8.9906210067011e-05
padj?external_gene_name
1??0.00533862114290257????????????PIK3C2A
2????0.002491004809499??????????????PCNX1
3??0.00319558231320097??????????????DOCK4
4?1.22581671973492e-07??????????????CD164
5??0.00364483675888146??????????????ABHD2
6???0.0214722343652725??????????????SPOPL
description
1?phosphatidylinositol-4-phosphate?3-kinase?catalytic?subunit?type?2?alpha?[Source:HGNC?Symbol;Acc:HGNC:8971]
2???????????????????????????????????????????????????????????????pecanex?1?[Source:HGNC?Symbol;Acc:HGNC:19740]
3??????????????????????????????????????????????dedicator?of?cytokinesis?4?[Source:HGNC?Symbol;Acc:HGNC:19192]
4???????????????????????????????????????????????????????????CD164?molecule?[Source:HGNC?Symbol;Acc:HGNC:1632]
5?????????????????????????????????????????abhydrolase?domain?containing?2?[Source:HGNC?Symbol;Acc:HGNC:18717]
6???????????????????????????????????????speckle?type?BTB/POZ?protein?like?[Source:HGNC?Symbol;Acc:HGNC:27934]#輸出含注釋的差異基因文件write.csv(diff_name,file=?"diff_gene.csv")
作者:孤獨巡禮_435a
鏈接:https://www.jianshu.com/p/4d0812195b65
總結
以上是生活随笔為你收集整理的r语言degseq2_第二次RNA-seq实战总结(3)-用DESeq2进行基因表达差异分析的全部內容,希望文章能夠幫你解決所遇到的問題。
                            
                        - 上一篇: 2018 Multi-Universit
 - 下一篇: 自适应Web主页