Creating Disjunctive Logical Forms from Aligned Sentences for Grammar-Based Paraphrase Generation

نویسندگان

  • Scott Martin
  • Michael White
چکیده

We present a method of creating disjunctive logical forms (DLFs) from aligned sentences for grammar-based paraphrase generation using the OpenCCG broad coverage surface realizer. The method takes as input word-level alignments of two sentences that are paraphrases and projects these alignments onto the logical forms that result from automatically parsing these sentences. The projected alignments are then converted into phrasal edits for producing DLFs in both directions, where the disjunctions represent alternative choices at the level of semantic dependencies. The resulting DLFs are fed into the OpenCCG realizer for n-best realization, using a pruning strategy that encourages lexical diversity. After merging, the approach yields an n-best list of paraphrases that contain grammatical alternatives to each original sentence, as well as paraphrases that mix and match content from the pair. A preliminary error analysis suggests that the approach could benefit from taking the word order in the original sentences into account. We conclude with a discussion of plans for future work, highlighting the method’s potential use in enhancing automatic MT evaluation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Learning of Paraphrases

Paraphrasing constitutes a corner stone in many Natural Language Processing fields like monolingual text-to-text generation and automatic text summarization. Indeed, aligned monolingual corpora are likely to boost the learning process of text-to-text generation models. A Paraphrase learning strategy can be defined as a two-step process: (1) identifying and extracting related sentence pairs from...

متن کامل

Learning to Map Chinese Sentences to Logical Forms

This paper addresses the problem of learning to map Chinese sentences to logical forms. The training data consist of Chinese natural language sentences paired with logical representations of their meaning. Although many approaches have been developed for learning to map from some western natural languages to two different meaning representations, there is no such approached for Chinese language...

متن کامل

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

This paper describes a novel probabilistic approach for generating natural language sentences from their underlying semantics in the form of typed lambda calculus. The approach is built on top of a novel reduction-based weighted synchronous context free grammar formalism, which facilitates the transformation process from typed lambda calculus into natural language sentences. Sentences can then ...

متن کامل

MUTT: Metric Unit TesTing for Language Generation Tasks

METEOR a metric that computes soft similarities between sentences by computing synonym and paraphrase scores between sentence alignments SICK+: Since SICK is for compositional semantics, all sentences have proper grammar. We automatically generated ungrammatical sentences (without human-estimated scores) to supplement the existing sentence pairs. Dataset Case Study: SICK: We examine how well hu...

متن کامل

CCG Chart Realization from Disjunctive Inputs

This paper presents a novel algorithm for efficiently generating paraphrases from disjunctive logical forms. The algorithm is couched in the framework of Combinatory Categorial Grammar (CCG) and has been implemented as an extension to the OpenCCG surface realizer. The algorithm makes use of packed representations similar to those initially proposed by Shemtov (1997), generalizing the approach i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011