python基因差异分析_差异基因
以前是沒有想過用這個軟件的,直到有一個我的htseq無法對比對的bam文件進行基因計數(后來我才發現htseq無法計數的原因是gtf版本不同導致坐標不同,而且gtf對染色體編號沒有加上chr),我簡單搜索了一下,發現bedtools?multicov也有類似的功能,所以我選擇它來試試看!
首先注意它需要sort的bam文件及bam的index
bedtools?multicov?depends?upon?index?BAM?files?in?order?to?count?the?number?of?overlaps?in?each?BAM?file.?As?such,?each?BAM?file?should?be?position?sorted?(samtool?sort?aln.bam?aln.sort)?and?indexed?(samtools?index?aln.sort.bam)?with?either?samtools?or?bamtools.
首先安裝它:
解壓開后
Make?clean
Make?all
然后就可以看到它的bin目錄下全部是程序啦
命令很簡單的
bedtools?multicov[OPTIONS]?-bams?BAM1?BAM2?BAM3?...?BAMn?-bed??
By?default,?multicov?will?report?the?count?of?alignments?in?each?input?BAM?file?that?overlap.
例子:
bedtools?multicov?-bams?aln1.bam?aln2.bam?aln3.bam?-bed?ivls-of-interest.bed
ivls-of-interest.bed這個文件是必須的,可能需要自己制作,其實用gtf文件也可以的
chr1?0???10000???ivl1
chr1?10000???20000???ivl2
chr1?20000???30000???ivl3
chr1?30000???40000???ivl4
輸出結果前三列是坐標,第四列是基因名,跟我們的bed文件一樣,只是最后三列是三個樣本的計數,是添加上來的!
chr1?0???????10000???ivl1????100?2234????0
chr1?10000???20000???ivl2????123?3245????1000
chr1?20000???30000???ivl3????213?2332????2034
chr1?30000???40000???ivl4????335?7654????0
同樣是對gene的reads計數,bedtools的multicov工具與htseq-count的區別是
i'd?guess?it's?due?to?the?fact?that?htseq-count?only?reports?one?hit?per?aligned?read?assuming?that?read?is?aligned?uniquely?and?does?not?overlap?multiple?features?listed?in?your?GTF.?if?an?aligned?read?hits?more?than?one?feature?in?your?GTF?then?it?doesn't?report?that?hit.?bedtools?gives?you?raw?hits?which?includes?every?1?hit?for?every?intersection?of?every?alignment?with?any?features?in?the?GTF?no?matter?how?many?times?it?aligned?or?how?many?features?it?hit.?you?might?think,?"wow,?htseq-count?is?dropping?a?lot?of?information".?yes,?it?is!?i've?moved?to?using?other?tools?to?count?hits?to?genes?(RSEM/eXpress)?since?they?disambiguate?those?ambiguous?alignments?and?as?a?result?you?get?counts?for?all?of?your?aligned?reads.?in?a?genome?with?alternative?splicing?you?lose?too?much?data?using?htseq-count,?in?my?opinion.
而且專門有個文獻在討論這個問題
Differential?analysis?of?gene?regulation?at?transcript?resolution?with?RNA-seq
下面我講一個實際的例子
我的bam文件如下
bedtools?multicov?-bams?740WT1.bam?741WT2.bam?742KO1.bam?743KO2.bam?-bed?mm9.bed
得到的這個矩陣就可以去用DESeq包來進行差異分析啦!
總結
以上是生活随笔為你收集整理的python基因差异分析_差异基因的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 服务总线yali测试_满足吉利要求的车载
- 下一篇: 启动未初始化小应用程序_SpringBo