CheNER: a tool for the identification of chemical entities and their classes in biomedical literature
MetadataShow full item record
Background: Small chemical molecules regulate biological processes at the molecular level. Those molecules are often involved in causing or treating pathological states. Automatically identifying such molecules in biomedical text is difficult due to both, the diverse morphology of chemical names and the alternative types of nomenclature that are simultaneously used to describe them. To address these issues, the last BioCreAtIvE challenge proposed a CHEMDNER task, which is a Named Entity Recognition (NER) challenge that aims at labelling different types of chemical names in biomedical text. Methods: To address this challenge we tested various approaches to recognizing chemical entities in biomedical documents. These approaches range from linear Conditional Random Fields (CRFs) to a combination of CRFs with regular expression and dictionary matching, followed by a post-processing step to tag those chemical names in a corpus of Medline abstracts. We named our best performing systems CheNER. Results: We evaluate the performance of the various approaches using the F-score statistics. Higher F-scores indicate better performance. The highest F-score we obtain in identifying unique chemical entities is 72.88%. The highest F-score we obtain in identifying all chemical entities is 73.07%. We also evaluate the F-Score of combining our system with ChemSpot, and find an increase from 72.88% to 73.83%. Conclusions: CheNER presents a valid alternative for automated annotation of chemical entities in biomedical documents. In addition, CheNER may be used to derive new features to train newer methods for tagging chemical entities. CheNER can be downloaded from http://metres.udl.cat and included in text annotation pipelines.
Is part ofJournal of Cheminformatics, 2015, vol. 7 (Suppl 1): S15, p. 1-8.
European research projects
The following license files are associated with this item:
Except where otherwise noted, this item's license is described as cc-by (c) Usié Chimenos, Anabel et al., 2015
Showing items related by title, author, creator and subject.
Usié Chimenos, Anabel; Alves, Rui; Solsona Tehàs, Francesc; Vázquez, Miguel; Valéncia, Alfonso (Oxford University Press, 2013)Motivation: Chemical named entity recognition is used to automatically identify mentions to chemical compounds in text, and is the basis for more elaborate information extraction. However, only a small number of ...
Usié Chimenos, Anabel; Karathia, Hiren; Teixidó Torrelles, Ivan; Alves, Rui; Solsona Tehàs, Francesc (PeerJ, 2014)One way to initiate the reconstruction of molecular circuits is by using automated text-mining techniques. Developing more efficient methods for such reconstruction is a topic of active research, and those methods are ...
Usié Chimenos, Anabel; Karathia, Hiren; Teixidó Torrelles, Ivan; Valls Marsal, Joan; Faus i Torà, Xavier; Alves, Rui; Solsona Tehàs, Francesc (BioMed Central, 2011)Background: Reconstruction of genes and/or protein networks from automated analysis of the literature is one of the current targets of text mining in biomedical research. Some user-friendly tools already perform this ...