Automatic extraction of gene and protein synonyms from MEDLINE and journal articles

نویسندگان

  • Hong Yu
  • Vasileios Hatzivassiloglou
  • Carol Friedman
  • Andrey Rzhetsky
  • W. John Wilbur
چکیده

Genes and proteins are often associated with multiple names, and more names are added as new functional or structural information is discovered. Because authors often alternate between these synonyms, information retrieval and extraction benefits from identifying these synonymous names. We have developed a method to extract automatically synonymous gene and protein names from MEDLINE and journal articles. We first identified patterns authors use to list synonymous gene and protein names. We developed SGPE (for synonym extraction of gene and protein names), a software program that recognizes the patterns and extracts from MEDLINE abstracts and full-text journal articles candidate synonymous terms. SGPE then applies a sequence of filters that automatically screen out those terms that are not gene and protein names. We evaluated our method to have an overall precision of 71% on both MEDLINE and journal articles, and 90% precision on the more suitable full-text articles alone

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Lane Extraction in Hemoglobin and Serum Protein Electrophoresis Using Image Processing

Image analysis is an image processing technique that aims to extract features or information from images. Image analysis in medicine has a special place because is a basis for disease diagnosis for physicians. Electrophoresis is a laboratory separating technique. Electrophoresis images are created during the electrophoresis process. Serum protein and hemoglobin electrophoresis test are the ...

متن کامل

Automatic Lane Extraction in Hemoglobin and Serum Protein Electrophoresis Using Image Processing

Image analysis is an image processing technique that aims to extract features or information from images. Image analysis in medicine has a special place because is a basis for disease diagnosis for physicians. Electrophoresis is a laboratory separating technique. Electrophoresis images are created during the electrophoresis process. Serum protein and hemoglobin electrophoresis test are the ...

متن کامل

Tagging gene and protein names in full text articles

Current information extraction efforts in the biomedical domain tend to focus on finding entities and facts in structured databases or MEDLINE abstracts. We apply a gene and protein name tagger trained on Medline abstracts (ABGene) to a randomly selected set of full text journal articles in the biomedical domain. We show the effect of adaptations made in response to the greater heterogeneity o...

متن کامل

Extracting synonymous gene and protein terms from biological literature

MOTIVATION Genes and proteins are often associated with multiple names. More names are added as new functional or structural information is discovered. Because authors can use any one of the known names for a gene or protein, information retrieval and extraction would benefit from identifying the gene and protein terms that are synonyms of the same substance. RESULTS We have explored four com...

متن کامل

Playing Biology's Name Game: Identifying Protein Names in Scientific Text

A growing body of work is devoted to the extraction of protein or gene interaction information from the scientific literature. Yet, the basis for most extraction algorithms, i.e. the specific and sensitive recognition of protein and gene names and their numerous synonyms, has not been adequately addressed. Here we describe the construction of a comprehensive general purpose name dictionary and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings. AMIA Symposium

دوره   شماره 

صفحات  -

تاریخ انتشار 2002