Learning Verb Argument Structure from Minimally Annotated Corpora
نویسندگان
چکیده
In this paper we investigate the task of automatically identifying the correct argument structure for a set of verbs. We exploit the distributions of some selected features from the local context of a verb. These distributions were extracted from a 23M word WSJ corpus based on partof-speech tags and phrasal chunks alone. This annotation was minimal as compared to previous work on this task which used full parse trees. We construct a decision tree classi er which achieved an error rate of 33.4%. Our result compares very favorably with previous work despite using considerably less data and requiring only minimal annotation of the data.
منابع مشابه
A large scale annotated child language construction database
Large scale annotated corpora of child language can be of great value in assessing theoretical proposals regarding language acquisition models. For example, they can help determine whether the type and amount of data required by a proposed language acquisition model can actually be found in a naturalistic data sample. To this end, several recent efforts have augmented the CHILDES child language...
متن کاملAutomatic Verb Classification Based on Statistical Distributions of Argument Structure
Automatic acquisition of lexical knowledge is critical to a wide range of natural language processing tasks. Especially important is knowledge about verbs, which are the primary source of relational information in a sentence--the predicate-argument structure that relates an action or state to its participants (i.e., who did what to whom). In this work, we report on supervised learning experimen...
متن کاملIdentifying Verb Arguments and their Syntactic Function in the Penn Treebank
In this paper, we present a tool that allows one to automatically extract verb argument-structure from the Penn Treebank as well as from other corpora annotated with the Penn Treebank release 2 conventions. More specifically, we examine each possible sequence of tags, both functional and categorial and determine whether such a sequence indicates an obligatory argument, an optional argument or a...
متن کاملAnnotation of Predicate-argument Structure on Molecular Biology Text
Annotated corpora are essential resources for natural language processing. This paper describes our approach for building a corpus annotated with predicateargument structure on research abstracts in molecular biology domain. Observation of the records in a database of cell signaling events and corresponding research abstracts showed that extracting predicateargument structure is a useful interm...
متن کامل