Parsing Biomedical Literature

نویسندگان

  • Matthew Lease
  • Eugene Charniak
چکیده

We present a preliminary study of several parser adaptation techniques evaluated on the GENIA corpus of MEDLINE abstracts [1, 2]. We begin by observing that the Penn Treebank (PTB) is lexically impoverished when measured on various genres of scientific and technical writing, and that this significantly impacts parse accuracy. To resolve this without requiring in-domain treebank data, we show how existing domain-specific lexical resources may be leveraged to augment PTB-training: part-of-speech tags, dictionary collocations, and namedentities. Using a state-of-the-art statistical parser [3] as our baseline, our lexically-adapted parser achieves a 14.2% reduction in error. With oracleknowledge of named-entities, this error reduction improves to 21.2%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Higher Order Relations From Biomedical Text

Argumentation in a scientific article is composed of unexpressed and explicit statements of old and new knowledge combined into a logically coherent textual argument. Discourse relations, linguistic coherence relations that connect discourse segments, help to communicate an argument’s logical steps. A biomedical relation exhibits a relationship between biomedical entities. In this paper, we are...

متن کامل

Towards Cross-Domain PDTB-Style Discourse Parsing

Discourse relation parsing is an important task with the goal of understanding text beyond the sentence boundaries. With the availability of annotated corpora (Penn Discourse Treebank) statistical discourse parsers were developed. In the literature it was shown that the discourse parsing subtasks of discourse connective detection and relation sense classification do not generalize well across d...

متن کامل

BioPPIExtractor: A protein-protein interaction extraction system for biomedical literature

Automatic extracting protein–protein interaction information from biomedical literature can help to build protein relation network, predict protein function and design new drugs. This paper presents a protein–protein interaction extraction system BioPPIExtractor for biomedical literature. This system applies Conditional Random Fields model to tag protein names in biomedical text, then uses a li...

متن کامل

Extraction of Gene/Protein Interaction from Text Documents with Relation Kernel

Even though there are many databases for gene/protein interactions, most such data still exist only in the biomedical literature. They are spread in biomedical literature written in natural languages and they require much effort such as data mining for constructing well-structured data forms. As genomic research advances, knowledge discovery from a large collection of scientific papers is becom...

متن کامل

Mining Protein Interaction from Biomedical Literature with Relation Kernel Method

Many interaction data still exist only in the biomedical literature and they require much effort to construct well-structured data. Discovering useful knowledge from large collections of papers is becoming more important for efficient biological and biomedical researches as genomic research advances. In this paper, we present a relation kernel-based interaction extraction method to extract know...

متن کامل

An Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature

The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies inte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005