Integrating text-mining approaches to identify entities and extract events from the biomedical literature

نویسنده

  • Lars Martin Anders Gerner
چکیده

s indicate that species names are more likely to be mentioned in the main body. As with the MeSH and Entrez Gene document sets, precision values are of low relevance due to the evaluation set not including all species mentions. 2.4.3.5 PubMed Central linkouts Performance of LINNAEUS compared to PMC linkouts reveals recall levels similar to those obtained on the EMBL document set, but lower than those for MeSH or Entrez Gene, despite the fact that this evaluation set has been constructed with the similar aim of performing species tagging as LINNAEUS (although at a document level). Inspecting a number of false positives and negatives reveals that virtually all were incorrectly tagged in the PMC linkout evaluation set, often for no apparent reason. For some false negative cases, articles have been tagged with species whose names can only be found in the titles of the references. This suggests that species names in the PMC linkouts are detected also in referenced article titles (while in some cases linkouts are missed even when species are mentioned in the main article title). Lower performance for MEDLINE and PMC OA abstracts is due to comparing species names found by LINNAEUS only in abstracts to those found in the full documents in PMC linkouts, and as such are nots is due to comparing species names found by LINNAEUS only in abstracts to those found in the full documents in PMC linkouts, and as such are not

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text-mining approaches in molecular biology and biomedicine.

Biomedical articles provide functional descriptions of bioentities such as chemical compounds and proteins. To extract relevant information using automatic techniques, text-mining and information-extraction approaches have been developed. These technologies have a key role in integrating biomedical information through analysis of scientific literature. In this article, important applications su...

متن کامل

Biomedical Ontologies and Text Mining for Biomedicine and Healthcare: A Survey

In this survey paper, we discuss biomedical ontologies and major text mining techniques applied to biomedicine and healthcare. Biomedical ontologies such as UMLS are currently being adopted in text mining approaches because they provide domain knowledge for text mining approaches. In addition, biomedical ontologies enable us to resolve many linguistic problems when text mining approaches handle...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

A review of text mining approaches and their function in discovering and extracting a topic

Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling.  Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...

متن کامل

Integrating Large-Scale Text Mining and Co-Expression Networks: Targeting NADP(H) Metabolism in E. coli with Event Extraction

We present an application of EVEX, a literature-scale event extraction resource, in the concrete biological use case of NADP(H) metabolism regulation in Escherichia coli. We make extensive use of the EVEX event generalization based on gene family definitions in Ensembl Genomes, to extract cross-species candidate regulators. We manually evaluate the resulting network so as to only preserve corre...

متن کامل

Full Title Temporal Trajectories of bio-entities and bio-events extracted from PubMed Authors and Affiliations

(à revoir – voir résumé site web) Motivation: The rapid expansion of biomedical literature has provoked an increased demand of advanced text mining tools to rapidly extract relevant events from the continuously flooding knowledge published periodically. However, biologists are still reluctant to use text mining tools because of the considerable amount of events they extract for their queries. M...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012