Adapting the OCMiner text processing system to the CTD controlled vocabulary
نویسندگان
چکیده
We adapted OCMiner, a modular text processing pipeline especially suited for high-speed processing of large document collections, to a specific controlled vocabulary as given by the Comparative Toxicogenomic Database (CTD). We provide a RESTful web service which processes documents given in the BioCreative XML format and annotates them with domainspecific terms from the CTD domains genes, chemistry, diseases and action terms.
منابع مشابه
ii Track 3 An Overview of the BioCreative Workshop 2012 Track III : Interactive
The BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems for the biological domain. The BioCreative Workshop 2012 subcommittee identified three areas, or tracks, that comprised independent, but complementary aspects of data curation in which they sought communi...
متن کاملOCMiner: Text Processing, Annotation and Relation Extraction for the Life Sciences
We present OCMiner, a high-performance text processing system for large document collections of scientific publications. Several linguistic options allow adjusting the quality of annotation results which can be specialized and fine-tuned for the recognition of Life Science terms. Recognized terms are mapped to semantic concepts which are ontologically located within their respective domain taxo...
متن کاملThe curation paradigm and application tool used for manual curation of the scientific literature at the Comparative Toxicogenomics Database
The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators read the scientific literature and convert free-text information into a structured format using official nomenclature, integrating third party controlled vocabularies for chemicals, genes, diseases and organisms, and a novel...
متن کاملVodcast: A Breakthrough in Developing Incidental Vocabulary Learning
Incidental vocabulary learning is often seen as superior to direct instruction on many occasions. Meanwhile, upon the emergence of the World Wide Web, second language (SL) learners have been introduced to 'podcasts' (recorded audio and video online broadcasts) which could be authentic sources of vocabulary learning. The relatively recent phenomenon of video podcast (vodcast) might be considered...
متن کاملMEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database
The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators manually curate a triad of chemical-gene, chemical-disease and gene-disease relationships from the scientific literature. The CTD curation paradigm uses controlled vocabularies for chemicals, genes and diseases. To curate di...
متن کامل