linux下kegg注释软件,KEGG功能注释工具 KofamKOALA 安装与使用
KEGG數(shù)據(jù)庫,即京都基因和基因組百科全書(Kyoto Encyclopedia of Genes and Genomes),是系統(tǒng)分析基因功能、基因組信息的數(shù)據(jù)庫。
KofamKOALA是一個(gè)方便的KEGG功能注釋工具,由創(chuàng)建KEGG的京都大學(xué)化研所生物信息中心學(xué)者在2019年11月發(fā)表于Bioinformatics。
以隱馬爾科夫模型(HMM)創(chuàng)建的KOfam來進(jìn)行蛋白序列同源搜索,其準(zhǔn)確性可與性能最佳的工具相媲美, 有網(wǎng)頁和Linux兩個(gè)版本,本文重點(diǎn)介紹Linux版的安裝與使用。
網(wǎng)頁版
網(wǎng)址 https://www.genome.jp/tools/kofamkoala/
avatar
網(wǎng)頁填入蛋白序列信息,設(shè)值E值和留下郵箱點(diǎn)擊Compute,只需要等待郵箱回復(fù)
Linux版
Linux版本的KofamKOALA 需要下載?KOfam(數(shù)據(jù)庫)和?KofamScan(軟件),軟件依賴Ruby,HMMER和GNU Parallel(事先沒有安裝可以看以下教程)
安裝
我們以kofamscan安裝在主目錄$HOME(或者叫~)下為例介紹:
step1
下載和解壓 KOfam 和 KofamScan
mkdir -p ~/kofamscan/dbcd ~/kofamscan/dbwget ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz wget ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz gunzip ko_list.gz tar xvzf profiles.tar.gz mkdir -p ~/kofamscan/bincd ~/kofamscan/binwget ftp://ftp.genome.jp/pub/tools/kofamscan/kofamscan-1.2.0.tar.gz # 注意kofamscan版本tar xvzf kofamscan-1.2.0.tar.gz
step2
下載 Ruby ?HMMER ?GNU Parallel
cd ~/kofamscan mkdir ruby hmmer parallel src cd src# Ruby版本應(yīng)不小于2.4,這里演示的是2.7版;HMMER應(yīng)大于3.1,這里是3.3;Parallel為最新版wget https://cache.ruby-lang.org/pub/ruby/2.7/ruby-2.7.0.tar.gzwget http://eddylab.org/software/hmmer/hmmer-3.3.tar.gzwget ftp://ftp.gnu.org/gnu/parallel/parallel-latest.tar.bz2
安裝 Ruby
cd ~/kofamscan/src tar xvzf ruby-2.7.0.tar.gz cd ruby-2.7.0./configure --prefix=$HOME/kofamscan/ruby make make install
安裝 HMMER
cd ~/kofamscan/src tar xvzf hmmer-3.3.tar.gz cd hmmer-3.3./configure --prefix=$HOME/kofamscan/hmmer make make install
安裝 GNU Parallel
cd ~/kofamscan/src tar xvjf parallel-latest.tar.bz2 cd parallel-20190322 # 這里根據(jù)版本而定./configure --prefix=$HOME/kofamscan/parallel make make install
將Ruby路徑加入環(huán)境變量(之后執(zhí)行如果報(bào)錯(cuò)可能是ruby的問題,推薦ruby按照本文方法安裝)
export PATH=$HOME/kofamscan/ruby/bin:$PATH
step3
復(fù)制config-template.yml文件并重命名為config.yml
cd ~/kofamscan/bin/ cp config-template.yml config.yml
cat config.yml文件,內(nèi)容如下:
我們需要在config.yml添加鍵值,以便ruby識(shí)別讀取.
可以vim編輯加入以下內(nèi)容到config.yml,注意絕對(duì)路徑.
profile: /path/to/home/kofamscan/db/profiles ko_list: /path/to/home/kofamscan/db/ko_list hmmsearch: /path/to/home/kofamscan/hmmer/bin/hmmsearch parallel: /path/to/home/kofamscan/parallel/bin/parallel
如:
若hummsearch和parallel可安裝在其他地方改為相關(guān)路徑
使用
現(xiàn)在可以使用了,我們準(zhǔn)備蛋白序列fasta文件(注意必須是蛋白序列,不支持核酸序列)
./exec_annotation -o result.txt query.fasta
如我在~/kofamscan/test/文件夾下有samples.fasta文件,定義輸出文件為res.txt
cd ~/kofamscan/bin/kofamscan-1.2.0 # 路徑中有exec_annotation文件./exec_annotation -o ~/kofamscan/test/res.txt~/kofamscan/test/samples.fasta
運(yùn)行完畢后的輸出文件:
若報(bào)錯(cuò)可能是ruby的路徑不在首選環(huán)境變量,可執(zhí)行:
export PATH=$HOME/kofamscan/ruby/bin:$PATH
./exec_annotation -h查看全部參數(shù):
## Options- `-o FILE` - The result are output to `FILE`. It defaults to `stdout`.- `-p`, `--profile=PROFILE` - Use `PROFILE` as a profile database. See [Profiles](#profiles)- `-k`, `--ko-list=FILE` - Use `FILE` as a KO list.- `--cpu=N` - Set the number of `hmmsearch` processes started simultaneously to `N`. It defaults to 1 unless it is set in `config.yml`.- `-c FILE` - Use `FILE` as a config file instead of `config.yml` in the same directory as `exec_annotation`.- `--tmp-dir=DIR` - Use `DIR` as a temporary directory where hmmsearch results are. It will be created if not exist. It defaults to `./tmp`.- `-E`, `--e-value=VALUE` - Require E-value to be smaller than or equal to `VALUE`. If not, an asterisk will not be added in `detail` format or the hit will not be reported in other formats.- `-T`, `--threshold-scale=VALUE` - The score thresholds are multiplied by `VALUE`. For example, with `-T2` option, the thresholds become twice as strict.- `-f`, `--format=FORMAT` - Set the format of the output to `FORMAT`. Three formats below are available. - `detail` - Default format. Gene name, assigned K number, threshold of the KO, hmmsearch score and E-value, and the definition of KO are shown. In addition, an asterisk '*' is added to the head of the line if the score is higher than the threshold. - `mapper` - Format which can be used for [KEGG Mapper](https://www.genome.jp/kegg/mapper.html) input. It includes a gene name and an assigned K number separated by a tab. Here, an assigned K number represents a hit with score above the predefined threshold. Note that for some KOs, predefined score thresholds are not available when they are represented by a very few number of sequences in KEGG GENES. - `mapper-oneline` - Similar to `mapper`, but when more than one KO are assigned to a gene, all assigned KO are shown in one line separated by tabs.- `--[no-]report-unannotated` - With `--report-unannotated` option, gene names are shown even when no KO is assigned (default when `--format=mapper(-oneline)`). With `--no-report-unannotated` such genes are not shown at all (default when `--format=detail`).- `--create-alignment` - `hmmsearch`'s normal outputs per profile are stored in the temporary directory. In addition, domain information and alignments in the outputs will be rearranged per query. - Not compatible with `--reannotation`- `-r`, `--reannotation` - Skip `hmmsearch` and assume that `hmmsearch` outputs are already in the temporary directory. This will help you to make an output in a different format or redo annotation changing thresholds. - Not compatible with `--create-alignment`- `-h`, `--help` - Show brief help message.
參考 :
https://www.genome.jp/tools/kofamkoala/
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz859/5631907
猜你喜歡
10000+:菌群分析寶寶與貓狗梅毒狂想曲 提DNA發(fā)NatureCell??c道指揮大腦
系列教程:微生物組入門 Biostar 微生物組 宏基因組
專業(yè)技能:學(xué)術(shù)圖表高分文章生信寶典 不可或缺的人
一文讀懂:宏基因組 寄生蟲益處 進(jìn)化樹
必備技能:提問 搜索 Endnote
文獻(xiàn)閱讀 熱心腸 SemanticScholar Geenmedical
擴(kuò)增子分析:圖表解讀 分析流程 統(tǒng)計(jì)繪圖
16S功能預(yù)測 ? PICRUSt FAPROTAX Bugbase Tax4Fun
在線工具:16S預(yù)測培養(yǎng)基 生信繪圖
科研經(jīng)驗(yàn):云筆記 云協(xié)作 公眾號(hào)
編程模板:Shell R Perl
生物科普:腸道細(xì)菌人體上的生命生命大躍進(jìn) 細(xì)胞暗戰(zhàn) 人體奧秘
寫在后面
為鼓勵(lì)讀者交流、快速解決科研困難,我們建立了“宏基因組”專業(yè)討論群,目前己有國內(nèi)外5000+ 一線科研人員加入。參與討論,獲得專業(yè)解答,歡迎分享此文至朋友圈,并掃碼加主編好友帶你入群,務(wù)必備注“姓名-單位-研究方向-職稱/年級(jí)”。PI請(qǐng)明示身份,另有海內(nèi)外微生物相關(guān)PI群供大佬合作交流。技術(shù)問題尋求幫助,首先閱讀《如何優(yōu)雅的提問》學(xué)習(xí)解決問題思路,仍未解決群內(nèi)討論,問題不私聊,幫助同行。
學(xué)習(xí)16S擴(kuò)增子、宏基因組科研思路和分析實(shí)戰(zhàn),關(guān)注“宏基因組”
點(diǎn)擊閱讀原文
創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎(jiǎng)勵(lì)來咯,堅(jiān)持創(chuàng)作打卡瓜分現(xiàn)金大獎(jiǎng)總結(jié)
以上是生活随笔為你收集整理的linux下kegg注释软件,KEGG功能注释工具 KofamKOALA 安装与使用的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: linux 二进制差分工具,打造Andr
- 下一篇: 采访数字营销顾问苏珊·哈勒姆