GENIA-GR: a Grammatical Relation Corpus for Parser Evaluation in the Biomedical Domain

نویسندگان

  • Yuka Tateisi
  • Yusuke Miyao
  • Kenji Sagae
  • Jun'ichi Tsujii
چکیده

We report the construction of a corpus for parser evaluation in the biomedical domain. A 50-abstract subset (492 sentences) of the GENIA corpus (Kim et al., 2003) is annotated with labeled head-dependent relations using the grammatical relations (GR) evaluation scheme (Carroll et al., 1998) ,which has been used for parser evaluation in the newswire domain.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Distributional Similarity to Organise BioMedical Terminology

We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are defined for our purposes...

متن کامل

Steps towards a GENIA Dependency Treebank

In this paper we describe on-going work aimed at creating a dependency-based annotated treebank for the BioMedical domain. Our starting point is the GENIA corpus [14], which is a corpus of 2000 MEDLINE abstracts, which has been manually annotated for various biological entities, according to the GENIA Ontology.1 There is an exponential growth of published research in this sector, which makes it...

متن کامل

Self-Training for Biomedical Parsing

Parser self-training is the technique of taking an existing parser, parsing extra data and then creating a second parser by treating the extra data as further training data. Here we apply this technique to parser adaptation. In particular, we self-train the standard Charniak/Johnson Penn-Treebank parser using unlabeled biomedical abstracts. This achieves an f -score of 84.3% on a standard test ...

متن کامل

Using a Grammatical Relation Annotation Scheme

We describe a recently developed corpus annotation scheme for evaluating parsers that avoids some of the shortcomings of current methods. The scheme encodes grammatical relations between heads and dependents, and has been used to mark up a new public-domain corpus of naturally occurring English text. We show how the corpus can be used to evaluate the accuracy of a robust parser, and relate the ...

متن کامل

Detecting Protein-Protein Interactions in Biomedical Texts Using a Parser and Linguistic Resources

We describe the task of automatically detecting interactions between proteins in biomedical literature. We use a syntactic parser, a corpus annotated for proteins, and manual decisions as training material. After automatically parsing the GENIA corpus, which is manually annotated for proteins, all syntactic paths between proteins are extracted. These syntactic paths are manually disambiguated b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008