The CHEMDNER corpus of chemicals and drugs and its annotation principles

dc.contributor.authorKrallinger, Martin
dc.contributor.authorRabal, Obdulia
dc.contributor.authorLeitner, Florian
dc.contributor.authorVázquez, Miguel
dc.contributor.authorSalgado, David
dc.contributor.authorLu, Zhiyong
dc.contributor.authorLeaman, Robert
dc.contributor.authorLu, Yanan
dc.contributor.authorJi, Donghong
dc.contributor.authorLowe, Daniel M.
dc.contributor.authorSayle, Roger A.
dc.contributor.authorBatista Navarro, Riza Theresa
dc.contributor.authorRak, Rafal
dc.contributor.authorHuber, Torsten
dc.contributor.authorRocktäschel, Tim
dc.contributor.authorMatos, Sérgio
dc.contributor.authorCampos, David
dc.contributor.authorTang, Buzhou
dc.contributor.authorXu, Hua
dc.contributor.authorMunkhdalai, Tsendsuren
dc.contributor.authorHo Ryu, Keun
dc.contributor.authorRamanan, SV
dc.contributor.authorNathan, Senthil
dc.contributor.authorŽitnik, Slavko
dc.contributor.authorBajec, Marko
dc.contributor.authorWeber, Lutz
dc.contributor.authorIrmer, Matthias
dc.contributor.authorAkhondi, Saber A.
dc.contributor.authorKors, Jan A.
dc.contributor.authorXu, Shuo
dc.contributor.authorAn, Xin
dc.contributor.authorKumar Sikdar, Utpal
dc.contributor.authorEkbal, Asif
dc.contributor.authorYoshioka, Masaharu
dc.contributor.authorDieb, Thaer M.
dc.contributor.authorChoi, Miji
dc.contributor.authorVerspoor, Karin
dc.contributor.authorKhabsa, Madian
dc.contributor.authorLee Giles, C.
dc.contributor.authorLiu, Hongfang
dc.contributor.authorElayavilli Ravikumar, Komandur
dc.contributor.authorLamurias, Andre
dc.contributor.authorCoute, Francisco M.
dc.contributor.authorDai, Hong Jie
dc.contributor.authorTzong Han Tsai, Richard
dc.contributor.authorAta, Caglar
dc.contributor.authorCan, Tolga
dc.contributor.authorUsié Chimenos, Anabel
dc.contributor.authorAlves, Rui
dc.contributor.authorSegura Bedmar, Isabel
dc.contributor.authorMartínez, Paloma
dc.contributor.authorOyarzabal, Julen
dc.contributor.authorValencia, Alfonso
dc.date.accessioned2016-06-21T08:11:43Z
dc.date.available2016-06-21T08:11:43Z
dc.date.issued2015
dc.description.abstractThe automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/ca_ES
dc.description.sponsorshipThis work is supported by the Innovative Medicines Initiative Joint Undertaking (IMI-eTOX) and the MICROME grant 222886-2.ca_ES
dc.identifier.doihttps://doi.org/10.1186/1758-2946-7-S1-S2
dc.identifier.idgrec023761
dc.identifier.issn1758-2946
dc.identifier.urihttp://hdl.handle.net/10459.1/57239
dc.language.isoengca_ES
dc.publisherBioMed Centralca_ES
dc.relation.isformatofReproducció del document publicat a https://doi.org/10.1186/1758-2946-7-S1-S2ca_ES
dc.relation.ispartofJournal of Cheminformatics, 2015, vol. 7, supl. 1ca_ES
dc.rightscc-by (c) Krallinger et al., 2015ca_ES
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca_ES
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/*
dc.titleThe CHEMDNER corpus of chemicals and drugs and its annotation principlesca_ES
dc.typearticleca_ES
dc.type.versionpublishedVersionca_ES
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
023761.pdf
Size:
2.33 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: