Linked annotations: a middle ground for manual curation of biomedical databases and text corpora

نویسندگان

  • Tatyana Goldberg
  • Shrikant Vinchurkar
  • Juan Miguel Cejuela
  • Lars Juhl Jensen
  • Burkhard Rost
چکیده

Annotators of text corpora and biomedical databases carry out the same labor-intensive task to manually extract structured data from unstructured text. Tasks are needlessly repeated because text corpora are widely scattered. We envision that a linked annotation resource unifying many corpora could be a game changer. Such an open forum will help focus on novel annotations and on optimally benefiting from the energy of many experts. As proof-of-concept, we annotated protein subcellular localization in 100 abstracts cited by UniProtKB. The detailed comparison between our new corpus and the original UniProtKB annotations revealed sustained novel annotations for 42% of the entries (proteins). In a unified linked annotation resource these could immediately extend the utility of text corpora beyond the text-mining community. Our example motivates the central idea that linked annotations from text corpora can complement database annotations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Meta-knowledge of Causality in Biomedical Scientific Discourse

Causality lies at the heart of biomedical knowledge, being involved in diagnosis, pathology or systems biology. Thus, automatic causality recognition can greatly reduce the human workload by suggesting possible causal connections and aiding in the curation of pathway models. For this, we rely on corpora that are annotated with classified, structured representations of important facts and findin...

متن کامل

Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II

Manual curation of data from the biomedical literature is a rate-limiting factor for many expert curated databases. Despite the continuing advances in biomedical text mining and the pressing needs of biocurators for better tools, few existing text-mining tools have been successfully integrated into production literature curation systems such as those used by the expert curated databases. To clo...

متن کامل

Automatic functional annotation of predicted active sites: combining PDB and literature mining

texts was drawn from the Uniprot corpus, where every abstract text must contain the tri-occurrences of organism, protein and residue. Notice that the detection of the entities was based on the entity recognition (ER) systems described in the previous section. It is not expected that the ER systems are performing at top level, and therefore a certain proportion of the filtered abstract texts con...

متن کامل

CALBC: Releasing the Final Corpora

A number of gold standard corpora for named entity recognition are available to the public. However, the existing gold standard corpora are limited in size and semantic entity types. These usually lead to implementation of trained solutions (1) for a limited number of semantic entity types and (2) lacking in generalization capability. In order to overcome these problems, the CALBC project has a...

متن کامل

ProFAL: PROtein Functional Annotation through Literature

We introduce ProFAL (PROtein Functional Annotation through Literature), a new information system for automatic annotation of biological databases using Bioinformatics methods. The annotations are (gene-product, functional property) pairs, associating the attributes of a gene-product, stored in the database, to functional properties. The system retrieves documents related to each geneproduct fro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2015