可变剪切调控因子motif基因富集分析 | motif enrichment | FIMO | MEME
類似篇:轉錄因子motif TSS區域富集分析 | motif enrichment | HOMER | FIMO | MEME
一個新的領域,現在我關注的是可變剪切調控因子,如PTBP1,它們有特定的RNA結合motif,類似TF。
相同點:
都是蛋白質的序列結合區域
有特定的序列motif
不同點:
TF的motif主要結合在promoter和enhancer,負責基因轉錄
ASF的motif主要結合在gene的intro區域,負責可變剪切
這里以PTBP1為例。
靈感來源文章:2018 - cancer cell -PTBP1-Mediated Alternative Splicing Regulates the Inflammatory Secretome and the Pro-tumorigenic Effects of Senescent Cells
RNA-Binding Motif Analysis
FIMO (Grant et al., 2011) was used to scan the human gene sequences for the PTBP1 RNA-binding motifs inferred by (Ray et al., 2013). The thereby predicted occurrences were mapped to the analyzed splicing events. To generate the RNA-maps (Figures 7B and S7D), for each comparison alternative exons were divided into those with PSIs significantly increasing upon PTBP1 knockdown (putatively repressed), those with PSIs significantly decreasing upon PTBP1 knockdown (putatively enhanced), and those with PSIs not altered upon PTBP1 knockdown (putatively not regulated). Statistical significance for local motif enrichment is associated with Fisher’s exact tests for differences in motif occurrences between groups of exons within 31 bp moving windows.
找RNA motif
查Ray et al., 2013,A compendium of RNA-binding motifs for decoding gene regulation
順藤摸瓜,找到一個數據庫:CISBP-RNA Database: Catalog of Inferred Sequence Binding Preferences of RNA binding proteins
操作,導出hg38的gene序列(包含exon和intro)
http://www.genome.ucsc.edu/cgi-bin/hgTables
用FIMO預測:https://meme-suite.org/meme/tools/fimo
得到短序列的motif的meme格式,網頁版會給出來,下載即可。
MEME version 4 ALPHABET= ACGT strands: + - Background letter frequencies (from unknown source): A 0.250 C 0.250 G 0.250 T 0.250 MOTIF 1 HYTTTYT letter-probability matrix: alength= 4 w= 7 nsites= 1 E= 0e+0 0.333333 0.333333 0.000000 0.333333 0.000000 0.500000 0.000000 0.500000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.500000 0.000000 0.500000 0.000000 0.000000 0.000000 1.000000
fimo --alpha 1 --max-strand -oc target PTBP1.motif.meme hg38_gene.fasta
一個小的DNA、RNA、protein轉換工具:http://biomodel.uah.es/en/lab/cybertory/analysis/trans.htm
注意:
motif與序列要匹配,DNA就是T,RNA就是U,不然無法匹配。
如果是RNA motif,則需要做一個反向互補的DNA motif
MEME version 4 ALPHABET= ACGT strands: + - Background letter frequencies (from unknown source): A 0.250 C 0.250 G 0.250 T 0.250 MOTIF 1 ARAAARD letter-probability matrix: alength= 4 w= 7 nsites= 1 E= 0e+0 1.000000 0.000000 0.000000 0.000000 0.500000 0.000000 0.500000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.500000 0.000000 0.500000 0.000000 0.333333 0.000000 0.333333 0.333333
fimo --alpha 1 --max-strand -oc target PTBP1.DNA.motif.meme hg38_gene.fasta --max-stored-scores 1000000 --thresh 1e-4
下次要用小數據測試,不然一晚上白跑了。
--max-strand
If matches on both strands at a given position satisfy the output threshold, only report the match for the strand with the higher score. If the scores are tied, the matching strand is chosen at random.
資源消耗統計
--max-stored-scores 1000000用到了1.48G內存,1個CPU
--max-stored-scores 10000000用到了內存,個CPU
最新命令:
fimo --max-stored-scores 10000000 --thresh 1e-4 --alpha 1 -oc target2 --text --max-strand PTBP1.DNA.motif.meme hg38_gene.fasta > output.tsv
fimo --max-stored-scores 10000000 --thresh 1e-4 --alpha 1 -oc target2 --skip-matched-sequence --max-strand PTBP1.DNA.motif.meme hg38_gene.fasta > output2.tsv
--skip-matched-sequence【超速輸出,一個半小時縮短為10分鐘】
Like the --text option, this limits output to tab-separated values (TSV) sent to standard out, but in addition, turns off output of the sequence of motif matches. This speeds up processing considerably.
--text【結果到標準輸出】
Limits output to TSV (tab-separated values) formatted results sent to standard output. The results are unsorted and no q-values are output, allowing very large files to be searched.
參考:
~/project/scPipeline/motifEnrichment/ASF_motif/
總結
以上是生活随笔為你收集整理的可变剪切调控因子motif基因富集分析 | motif enrichment | FIMO | MEME的全部內容,希望文章能夠幫你解決所遇到的問題。
 
                            
                        - 上一篇: TensorFlow 官网API
- 下一篇: 使用eclipse svn塔建(配置)时
