當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

入门必备！生物医学命名实体识别（BioNER）最全论文清单，附SOTA结果汇总

發布時間：2024/10/8 编程问答 31 豆豆

生活随笔收集整理的這篇文章主要介紹了入门必备！生物医学命名实体识别（BioNER）最全论文清单，附SOTA结果汇总小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

作者丨羅凌

學校丨大連理工大學博士

研究方向丨深度學習、文本分類

本人將之前整理的一些生物醫學命名實體識別相關的論文做了一個 BioNER Progress 放在了 Github 上。主要內容包括 BioNER 進展中的代表論文列表，以及目前各個主要數據集上的一些先進結果和相關論文，希望對入門 BioNER 的同學有所幫助。

Github地址：https://github.com/lingluodlut/BioNER-Progress

生物醫學命名實體識別（Biomedical Named Entity Recognition, BioNER）相關進展，BioNER Progress 主要內容包括兩部分：1）BioNER 進展中的代表論文列表（Papers）；2）目前各個主要數據集上的一些先進結果和相關論文（SOTA）。?
其中，論文列表首先給出一些綜述論文，然后根據 BioNER 研究的發展歷程依次給出了基于詞典，基于規則和基于機器學習方法的代表性工作。機器學習的方法又細分為了基于傳統機器學習模型（SVM、HMM、MEMM 和 CRF 模型）以及現在主流的神經網絡方法。
此外，SOTA 給出了目前各個主要數據集上的一些先進結果。根據實體類型的不同包括化學藥物（Chemical）、疾病（Disease）、基因蛋白（Gene/Protein）、基因變異（Mutation）和物種（Species）的實體識別。

必讀論文

?? 綜述論文

Overview of BioCreative II gene mention recognition. Smith L, Tanabe L K, nee Ando R J, et al.?Genome biology, 2008, 9(2): S2.?

https://genomebiology.biomedcentral.com/articles/10.1186/gb-2008-9-s2-s2

Biomedical named entity recognition: a survey of machine-learning tools. Campos D, Matos S, Oliveira J L.?Theory and Applications for Advanced Text Mining, 2012: 175-195.?

https://books.google.com.hk/books?hl=zh-CN&lr=&id=EfqdDwAAQBAJ&oi=fnd&pg=PA175&ots=WEKIblRekC&sig=FWoufJtWVSDHD3gbWaZXruEOiEs&redir_esc=y#v=onepage&q&f=false

Chemical named entities recognition: a review on approaches and applications. Eltyeb S, Salim N.?Journal of cheminformatics, 2014, 6(1): 17.?

https://jcheminf.biomedcentral.com/articles/10.1186/1758-2946-6-17

CHEMDNER: The drugs and chemical names extraction challenge. Krallinger M, Leitner F, Rabal O, et al.?Journal of cheminformatics, 2015, 7(1): S1.?

https://jcheminf.biomedcentral.com/articles/10.1186/1758-2946-7-S1-S1

A comparative study for biomedical named entity recognition. Wang X, Yang C, Guan R.?International Journal of Machine Learning and Cybernetics, 2015, 9(3): 373-382.?

https://link.springer.com/article/10.1007/s13042-015-0426-6

?? 基于詞典的方法

Using BLAST for identifying gene and protein names in journal articles. Krauthammer M, Rzhetsky A, Morozov P, et al.?Gene, 2000, 259(1-2): 245-252.?

https://www.sciencedirect.com/science/article/pii/S0378111900004315

Boosting precision and recall of dictionary-based protein name recognition. Tsuruoka Y, Tsujii J.?Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine-Volume 13, 2003: 41-48.?

https://aclanthology.info/pdf/W/W03/W03-1306.pdf

Exploiting the performance of dictionary-based bio-entity name recognition in biomedical literature. Yang Z, Lin H, Li Y.?Computational Biology and Chemistry, 2008, 32(4): 287-291.

https://www.sciencedirect.com/science/article/pii/S1476927108000340

A dictionary to identify small molecules and drugs in free text. Hettne K M, Stierum R H, Schuemie M J, et al.?Bioinformatics, 2009, 25(22): 2983-2991.?

https://academic.oup.com/bioinformatics/article-abstract/25/22/2983/180399https://biosemantics.org/index.php/resources/jochem

LINNAEUS: a species name identification system for biomedical literature. Gerner M, Nenadic G, Bergman C M.?BMC bioinformatics, 2010, 11(1): 85.?

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-85

???基于規則的方法

Toward information extraction: identifying protein names from biological papers. Fukuda K, Tsunoda T, Tamura A, et al.?Pac symp biocomput. 1998, 707(18): 707-718.?

https://pdfs.semanticscholar.org/335e/8b19ea50d3af6fcefe6f8421e2c9c8936f3f.pdf

A biological named entity recognizer. Narayanaswamy M, Ravikumar K E, Vijay-Shanker K.?Biocomputing?2003. 2002: 427-438.

https://www.worldscientific.com/doi/abs/10.1142/9789812776303_0040

ProMiner: rule-based protein and gene entity recognition. Hanisch D, Fundel K, Mevissen H T, et al.?BMC bioinformatics, 2005, 6(1): S14.?

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-S1-S14

MutationFinder: a high-performance system for extracting point mutation mentions from text. Caporaso J G, Baumgartner Jr W A, Randolph D A, et al.?Bioinformatics, 2007, 23(14): 1862-1865.?

https://academic.oup.com/bioinformatics/article/23/14/1862/188647http://mutationfinder.sourceforge.net/

Drug name recognition and classification in biomedical texts: a case study outlining approaches underpinning automated systems. Segura-Bedmar I, Martínez P, Segura-Bedmar M.?Drug discovery today, 2008, 13(17-18): 816-823.?

https://www.sciencedirect.com/science/article/pii/S1359644608002171

Investigation of unsupervised pattern learning techniques for bootstrap construction of a medical treatment lexicon. Xu R, Morgan A, Das A K, et al.?Proceedings of the workshop on current trends in biomedical natural language processing, 2009: 63-70.?

http://www.aclweb.org/anthology/W09-1308

Linguistic approach for identification of medication names and related information in clinical narratives. Hamon T, Grabar N.?Journal of the American Medical Informatics Association, 2010, 17(5): 549-554.?

https://academic.oup.com/jamia/article/17/5/549/831598

SETH detects and normalizes genetic variants in text. Thomas P, Rockt?schel T, Hakenberg J, et al.?Bioinformatics, 2016, 32(18): 2883-2885.?

https://academic.oup.com/bioinformatics/article/32/18/2883/1743171http://rockt.github.io/SETH/

PENNER: Pattern-enhanced Nested Named Entity Recognition in Biomedical Literature. Wang X, Zhang Y, Li Q, et al.?2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2018: 540-547.?

https://ieeexplore.ieee.org/abstract/document/8621485/

???基于機器學習的方法

SVM-based Methods

Tuning support vector machines for biomedical named entity recognition. Kazama J, Makino T, Ohta Y, et al.?Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain-Volume 3, 2002: 1-8.?

https://aclanthology.info/pdf/W/W02/W02-0301.pdf

Biomedical named entity recognition using two-phase model based on SVMs. Lee K J, Hwang Y S, Kim S, et al.?Journal of Biomedical Informatics, 2004, 37(6): 436-447.?

https://www.sciencedirect.com/science/article/pii/S1532046404000863

Exploring deep knowledge resources in biomedical name recognition. GuoDong Z, Jian S.?Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, 2004: 96-99.?

https://aclanthology.info/pdf/W/W04/W04-1219.pdf
HMM-based Methods

Named entity recognition in biomedical texts using an HMM model. Zhao S.?Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, 2004: 84-87.

https://aclanthology.info/pdf/W/W04/W04-1216.pdf

Annotation of chemical named entities. Corbett P, Batchelor C, Teufel S. Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, 2007: 57-64.?

https://aclanthology.info/pdf/W/W07/W07-1008.pdf

Conditional random fields vs. hidden markov models in a biomedical named entity recognition task. Ponomareva N, Rosso P, Pla F, et al.?Proc. of Int. Conf. Recent Advances in Natural Language Processing, RANLP. 2007, 479: 483.

http://clg.wlv.ac.uk/papers/Ponomareva-RANLP-07.pdf
MEMM-based Mehtods

Cascaded classifiers for confidence-based chemical named entity recognition. Corbett P, Copestake A.?BMC bioinformatics, 2008, 9(11): S4.?

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-S11-S4

OSCAR4: a flexible architecture for chemical text-mining. Jessop D M, Adams S E, Willighagen E L, et al.?Journal of cheminformatics, 2011, 3(1): 41.

https://jcheminf.biomedcentral.com/articles/10.1186/1758-2946-3-41
CRF-based Methods

ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Settles B.?Bioinformatics, 2005, 21(14): 3191-3192.

https://academic.oup.com/bioinformatics/article/21/14/3191/266815

BANNER: an executable survey of advances in biomedical named entity recognition. Leaman R, Gonzalez G.?Biocomputing?2008. 2008: 652-663.

https://psb.stanford.edu/psb-online/proceedings/psb08/leaman.pdf

Detection of IUPAC and IUPAC-like chemical names. Klinger R, Kolá?ik C, Fluck J, et al.?Bioinformatics, 2008, 24(13): i268-i276.?https://academic.oup.com/bioinformatics/article-abstract/24/13/i268/235854

Incorporating rich background knowledge for gene named entity classification and recognition. Li Y, Lin H, Yang Z.?BMC bioinformatics, 2009, 10(1): 223.

https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/1471-2105-10-223

A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. Jiang M, Chen Y, Liu M, et al.?Journal of the American Medical Informatics Association, 2011, 18(5): 601-606.?

https://academic.oup.com/jamia/article/18/5/601/834186

ChemSpot: a hybrid system for chemical named entity recognition. Rockt?schel T, Weidlich M, Leser U.?Bioinformatics, 2012, 28(12): 1633-1640.?

https://academic.oup.com/bioinformatics/article/28/12/1633/266861

Gimli: open source and high-performance biomedical name recognition. Campos D, Matos S, Oliveira J L.?BMC bioinformatics, 2013, 14(1): 54.?

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-54

tmVar: a text mining approach for extracting sequence variants in biomedical literature. Wei C H, Harris B R, Kao H Y, et al.?Bioinformatics, 2013, 29(11): 1433-1439.?

https://academic.oup.com/bioinformatics/article-abstract/29/11/1433/220291https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/tmvar/

Evaluating word representation features in biomedical named entity recognition tasks. Tang B, Cao H, Wang X, et al.?BioMed research international, 2014, 2014.

http://downloads.hindawi.com/journals/bmri/2014/240403.pdf

Drug name recognition in biomedical texts: a machine-learning-based method. He L, Yang Z, Lin H, et al.?Drug discovery today, 2014, 19(5): 610-617.?

https://www.sciencedirect.com/science/article/pii/S1359644613003322

tmChem: a high performance approach for chemical named entity recognition and normalization. Leaman R, Wei C H, Lu Z.?Journal of cheminformatics, 2015, 7(1): S3.

https://jcheminf.biomedcentral.com/articles/10.1186/1758-2946-7-S1-S3

GNormPlus: an integrative approach for tagging genes, gene families, and protein domains. Wei C H, Kao H Y, Lu Z.?BioMed research international, 2015, 2015.?

http://downloads.hindawi.com/journals/bmri/2015/918710.pdf

Mining chemical patents with an ensemble of open systems[J]. Leaman R, Wei C H, Zou C, et al.?Database, 2016, 2016.

https://academic.oup.com/database/article-abstract/doi/10.1093/database/baw065/2630406

nala: text mining natural language mutation mentions. Cejuela J M, Bojchevski A, Uhlig C, et al.?Bioinformatics, 2017, 33(12): 1852-1858.

https://academic.oup.com/bioinformatics/article-abstract/33/12/1852/2991428
Neural Network-based Methods

Recurrent neural network models for disease name recognition using domain invariant features. Sahu S, Anand A.?Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 2216-2225.

https://www.aclweb.org/anthology/P16-1209

Deep learning with word embeddings improves biomedical named entity recognition. Habibi M, Weber L, Neves M, et al.?Bioinformatics, 2017, 33(14): i37-i48.?

https://academic.oup.com/bioinformatics/article/33/14/i37/3953940

A neural joint model for entity and relation extraction from biomedical text. Li F, Zhang M, Fu G, et al.?BMC bioinformatics, 2017, 18(1): 198.?

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1609-9

A neural network multi-task learning approach to biomedical named entity recognition. Crichton G, Pyysalo S, Chiu B, et al.?BMC bioinformatics, 2017, 18(1): 368.?

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1776-8https://github.com/cambridgeltl/MTL-Bioinformatics-2016

Disease named entity recognition from biomedical literature using a novel convolutional neural network. Zhao Z, Yang Z, Luo L, et al.?BMC medical genomics, 2017, 10(5): 73.?

https://bmcmedgenomics.biomedcentral.com/articles/10.1186/s12920-017-0316-8

An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Luo L, Yang Z, Yang P, et al.?Bioinformatics, 2018, 34(8): 1381-1388.

https://academic.oup.com/bioinformatics/article-abstract/34/8/1381/4657076https://github.com/lingluodlut/Att-ChemdNER

GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text. Zhu Q, Li X, Conesa A, et al.?Bioinformatics, 2018, 34(9): 1547-1554.?

https://academic.oup.com/bioinformatics/article-abstract/34/9/1547/4764002

https://github.com/valdersoul/GRAM-CNN

D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information. Dang T H, Le H Q, Nguyen T M, et al.?Bioinformatics, 2018, 34(20): 3539-3546.?

https://academic.oup.com/bioinformatics/article/34/20/3539/4990492https://github.com/aidantee/D3NER

Transfer learning for biomedical named entity recognition with neural networks. Giorgi J M, Bader G D.?Bioinformatics, 2018, 34(23): 4087-4094.?

https://academic.oup.com/bioinformatics/article/34/23/4087/5026661

Label-Aware Double Transfer Learning for Cross-Specialty Medical Named Entity Recognition. Wang Z, Qu Y, Chen L, et al.?NAACL. 2018: 1-15.?

https://www.aclweb.org/anthology/N18-1001

Recognizing irregular entities in biomedical text via deep neural networks. Li F, Zhang M, Tian B, et al.?Pattern Recognition Letters, 2018, 105: 105-113.?

https://www.sciencedirect.com/science/article/pii/S0167865517302155

Cross-type biomedical named entity recognition with deep multi-task learning. Wang X, Zhang Y, Ren X, et al.?Bioinformatics, 2019, 35(10): 1745-1752.

https://academic.oup.com/bioinformatics/article/35/10/1745/5126922https://github.com/yuzhimanhua/lm-lstm-crf

Improving Chemical Named Entity Recognition in Patents with Contextualized Word Embeddings. Zhai Z, Nguyen D Q, Akhondi S, et al.?Proceedings of the 18th BioNLP Workshop and Shared Task. 2019: 328-338.?

https://www.aclweb.org/anthology/W19-5035https://github.com/zenanz/ChemPatentEmbeddings

Chinese Clinical Named Entity Recognition Using Residual Dilated Convolutional Neural Network with Conditional Random Field. Qiu J, Zhou Y, Wang Q, et al.?IEEE Transactions on NanoBioscience, 2019, 18(3): 306-315.?

https://ieeexplore.ieee.org/abstract/document/8678833/

A Neural Multi-Task Learning Framework to Jointly Model Medical Named Entity Recognition and Normalization. Zhao S, Liu T, Zhao S, et al.?Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33: 817-824.?

https://wvvw.aaai.org/ojs/index.php/AAAI/article/download/3861/3739

CollaboNet: collaboration of deep neural networks for biomedical named entity recognition. Yoon W, So C H, Lee J, et al.?BMC bioinformatics, 2019, 20(10): 249.?

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2813-6

BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Lee J, Yoon W, Kim S, et al.?Bioinformatics, Advance article, 2019.?

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz682/5566506

https://github.com/dmis-lab/biobert

HUNER: Improving Biomedical NER with Pretraining. Weber L, Münchmeyer J, Rockt?schel T, et al.?Bioinformatics, Advance article, 2019.?
https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz528/5523847?redirectedFrom=fulltexthttps://hu-ner.github.io/
Others

TaggerOne: joint named entity recognition and normalization with semi-Markov Models. Leaman R, Lu Z.?Bioinformatics, 2016, 32(18): 2839-2846.?

https://academic.oup.com/bioinformatics/article/32/18/2839/1744190https://www.ncbi.nlm.nih.gov/research/bionlp/tools/taggerone/

A transition-based joint model for disease named entity recognition and normalization. Lou Y, Zhang Y, Qian T, et al.?Bioinformatics, 2017, 33(15): 2363-2371.

https://academic.oup.com/bioinformatics/article-abstract/33/15/2363/3089942

https://github.com/louyinxia/jointRN

先進結果

?? Chemical NER

CHEMDNER?

CHEMDNER (chemical compound and drug name recognition) task as part of the BioCreative IV challenge aims to promote the development of systems for the automatic recognition of chemical entities in text. It was divided into two tasks: one covered the indexing of documents with chemicals (chemical document indexing - CDI task), and the other was concerned with finding the exact mentions of chemicals in text (chemical entity mention recognition - CEM task). Here, we only focus on the CEM task.?

The CHEMDNER corpus consists of 10,000 PubMed abstracts, which contains a total of 84,355 chemical entity mentions. The original corpus is divided into training set (3,500 abstracts), development set (3,500 abstracts) and test set (3,000 abstracts).

CDR-Chemical?
CDR (chemical disease relation) task as part of the BioCreative V challenge aims to automatically extract CDRs from the literature. The CDR corpus consists of 1,500 PubMed abstracts with annotated chemicals, diseases and chemical-disease interactions, which contains a total of 15,933 chemical entity mentions. The original corpus is separated into training set (500 abstracts), development set (500 abstracts) and test set (500 abstracts).

???Disease NER

NCBI-Disease
The NCBI Disease corpus consists of 793 PubMed abstracts separated into training (593), development (100) and test (100) subsets. It contains a total of 6,892 disease entity mentions.

CDR-Disease?
CDR (chemical disease relation) task as part of the BioCreative V challenge aims to automatically extract CDRs from the literature. The CDR corpus consists of 1,500 PubMed abstracts with annotated chemicals, diseases and chemical-disease interactions, which contains a total of 12,864 disease entity mentions. The original corpus is separated into training set (500 abstracts), development set (500 abstracts) and test set (500 abstracts).

???Gene/Protein NER

BC2GM?

Gene Mention Tagging task as part of the BioCreative II challenge is concerned with the named entity extraction of gene and gene product mentions in text. The BC2GM corpus contains a total of 24,583 gene entity mentions.

JNLPBA
JNLPBA corpus contains 2,404 abstracts extracted from MEDLINE using the MeSH terms “human”, “blood- cell” and “transcription factor”. The manual annotation of these abstracts was based on five classes of the GENIA ontology, namely protein, DNA, RNA, cell line, and cell type. This corpus was used in the Bio-Entity Recognition Task in BioNLP/NLPBA 2004, providing 2,000 abstracts for training and the remaining 404 abstracts for testing. The overall results are shown in the following table.

???Mutation NER

MuatationFinder corpus and tmVar corpus
The MutationFinder corpus was established to guide the construction of the patterns. The development data set is made up of 605 point mutation mentions in 305 abstracts selected randomly from primary citations in PDB. The evaluation data set is made up of 910 point mutation mentions in 508 abstracts annotated by two of the authors, not involved in the development of the system.?
The tmVar corpus comprises 500 abstracts manually annotated from which 334 were used for training tmVar while the remaining 166 were used for testing it.

???Species NER

LINNAEUS?corpus?

The LINNAEUS corpus: A set of open access documents in text format, manually annotated for species mention tags. It consists of 100 full-text documents from the PMC OA document, which contains a total of 4,259 species entity mentions.

現在，在「知乎」也能找到我們了

進入知乎首頁搜索「PaperWeekly」

點擊「關注」訂閱我們的專欄吧

關于PaperWeekly

PaperWeekly 是一個推薦、解讀、討論、報道人工智能前沿論文成果的學術平臺。如果你研究或從事 AI 領域，歡迎在公眾號后臺點擊「交流群」，小助手將把你帶入 PaperWeekly 的交流群里。

▽ 點擊 |?閱讀原文?| 收藏論文清單

總結

以上是生活随笔為你收集整理的入门必备！生物医学命名实体识别（BioNER）最全论文清单，附SOTA结果汇总的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： OAG – WhoIsWho 同名消歧竞
下一篇：戴尔电脑开机键按不了怎么办戴尔电脑开机

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

入门必备！生物医学命名实体识别（BioNER）最全论文清单，附SOTA结果汇总

SVM-based Methods

總結