Automatic Processing of Large Corpora for the Resolution of Anaphora References

نویسندگان

  • Ido Dagan
  • Alon Itai
چکیده

Manual acquisition of semantic constraints in broad domains is very expensive. This paper presents an automatic scheme for collecting statistics on cooccurrence patterns in a large corpus. To a large extent, these statistics reflect, semantic constraints and thus are used to disambiguate anaphora references and syntactic ambiguities. The scherne was implemented by gathering statistics on the output of other linguistic tools. An experiment was performed to resolve references of the pronoun "it" in sentences that were randomly selected from the corpus. Ttle results of the experiment show that in most of the cases the cooccurrence statistics indeed reflect the semantic constraints and thus provide a basis {'or a useful disambiguat.ion tool.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The importance of annotated corpora for NLP: the cases of anaphora resolution and clause splitting

In this paper we present two applications that depend on annotated corpora for their implementation, evaluation and improvement. The first is an automatic anaphora resolution system. After describing the algorithm we discuss the importance of corpora for the tasks of evaluation and automatic scoring and the development of a coreferentially annotated corpus. We go on to look ahead at the role of...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

Object-Oriented Method for Automatic Extraction of Road from High Resolution Satellite Images

As the information carried in a high spatial resolution image is not represented by single pixels but by meaningful image objects, which include the association of multiple pixels and their mutual relations, the object based method has become one of the most commonly used strategies for the processing of high resolution imagery. This processing comprises two fundamental and critical steps towar...

متن کامل

Corpora for computational linguistics 1 Running head : CORPORA FOR COMPUTATIONAL LINGUISTICS Corpora for computational linguistics

Since the mid 90s corpora has become very important for computational linguistics. This paper offers a survey of how they are currently used in different fields of the discipline, with particular emphasis on anaphora and coreference resolution, automatic summarisation and term extraction. Their influence on other fields is also briefly discussed. Corpora for computational linguistics 3 Corpora ...

متن کامل

Introduction to the Special Issue on Computational Anaphora Resolution

Anaphora accounts for cohesion in texts and is a phenomenon under active study in formal and computational linguistics alike. The correct interpretation of anaphora is vital for natural language processing (NLP). For example, anaphora resolution is a key task in natural language interfaces, machine translation, text summarization, information extraction, question answering, and a number of othe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1990