Unsupervised Paraphrase Acquisition via Relation Discovery

نویسندگان

  • Takaaki Hasegawa
  • Satoshi Sekine
  • Ralph Grishman
چکیده

One of the difficulties in Natural Language Processing is the fact that there are many way to express the same thing or event. These expressions are called “Paraphrases”. Paraphrase is important in applications such as IR, QA and IE, and one of the difficulties in paraphrase research is acquiring the requisite paraphrase knowledge. In this paper, we describe an unsupervised method to discover paraphrases containing two named entities from a large untagged corpus. The proposed method consists of two stages. First, it finds relations between named entities using similarity of context and clustering. Then, the phrases which express the relation are selected from each cluster to acquire paraphrases. Our experiments with one year of newspaper reveal that we can discover a variety of paraphrases with high precision and high recall.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating a Generic Paraphrase-Based Approach for Relation Extraction

Unsupervised paraphrase acquisition has been an active research field in recent years, but its effective coverage and performance have rarely been evaluated. We propose a generic paraphrase-based approach for Relation Extraction (RE), aiming at a dual goal: obtaining an applicative evaluation scheme for paraphrase acquisition and obtaining a generic and largely unsupervised configuration for RE...

متن کامل

Paraphrase Alignment for Synonym Evidence Discovery

We describe a new unsupervised approach for synonymy discovery by aligning paraphrases in monolingual domain corpora. For that purpose, we identify phrasal terms that convey most of the concepts within domains and adapt a methodology for the automatic extraction and alignment of paraphrases to identify paraphrase casts from which valid synonyms are discovered. Results performed on two different...

متن کامل

Automatic Paraphrase Discovery based on Context and Keywords between NE Pairs

Automatic paraphrase discovery is an important but challenging task. We propose an unsupervised method to discover paraphrases from a large untagged corpus, without requiring any seed phrase or other cue. We focus on phrases which connect two Named Entities (NEs), and proceed in two stages. The first stage identifies a keyword in each phrase and joins phrases with the same keyword into sets. Th...

متن کامل

Scaling Web-based Acquisition of Entailment Relations

Paraphrase recognition is a critical step for natural language interpretation. Accordingly, many NLP applications would benefit from high coverage knowledge bases of paraphrases. However, the scalability of state-of-the-art paraphrase acquisition approaches is still limited. We present a fully unsupervised learning algorithm for Web-based extraction of entailment relations, an extended model of...

متن کامل

Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web

This paper presents a lightweight method for unsupervised extraction of paraphrases from arbitrary textual Web documents. The method differs from previous approaches to paraphrase acquisition in that 1) it removes the assumptions on the quality of the input data, by using inherently noisy, unreliable Web documents rather than clean, trustworthy, properly formatted documents; and 2) it does not ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005