Diverse Words, Shared Meanings: Statistical Machine Translation for Paraphrase, Grounding, and Intent
نویسنده
چکیده
Can two different descriptions refer to the same event or action? Recognising that dissimilar strings are equivalent in meaning for some purpose is something that humans do rather well, but it is a task at which machines often fail. In the Natural Language Processing Group at Microsoft Research, we are attempting to address this challenge at sentence scale by generating semantically equivalent rewrites that can be used in applications ranging from authoring assistance to intent mapping for search or command and control. The Microsoft Translator paraphrase engine, developed in the NLP group, is a large-scale phrasal machine translation system that generates short sentential and phrasal paraphrases in English and has a public API that is available to researchers and developers. I will present the data extraction process, architecture, issues in generating diverse outputs, applications and possible future directions, and discuss the strengths and limitations of the statistical machine translation model as it relates to paraphrasing, how paraphrase is like machine translation, and how it differs in important respects. The statistical machine translation approach also has broad applications in capturing user intent in search, conversational understanding, and the grounding of language in objects and actions, all active areas of investigation in Microsoft Research. Chris Brockett. 2012. Diverse Words, Shared Meanings: Statistical Machine Translation for Paraphrase, Grounding, and Intent. In Proceedings of Australasian Language Technology Association Workshop, pages 3−3.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملUsing Multiple Metrics in Automatically Building Turkish Paraphrase Corpus
Paraphrasing is expressing similar meanings with different words in different order. In this sense it is viewed as translation in the same language. It is an important issue in natural language processing for automatic machine translation, question answering, text summarization and language generation. Studies in paraphrasing can be classified as paraphrase extraction, paraphrase generation, pa...
متن کاملNeural Paraphrase Generation using Transfer Learning
Progress in statistical paraphrase generation has been hindered for a long time by the lack of large monolingual parallel corpora. In this paper, we adapt the neural machine translation approach to paraphrase generation and perform transfer learning from the closely related task of entailment generation. We evaluate the model on the Microsoft Research Paraphrase (MSRP) corpus and show that the ...
متن کاملSHEF-Multimodal: Grounding Machine Translation on Images
This paper describes the University of Sheffield’s submission for the WMT16 Multimodal Machine Translation shared task, where we participated in Task 1 to develop German-to-English and Englishto-German statistical machine translation (SMT) systems in the domain of image descriptions. Our proposed systems are standard phrase-based SMT systems based on the Moses decoder, trained only on the provi...
متن کاملParaphrase Lattice for Statistical Machine Translation
Lattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. We show that lattice decoding is also useful for handling input variations. Given an input sentence, we build a lattice which represents paraphrases of t...
متن کامل