Automated Generalization of Phrasal Paraphrases from the Web
نویسندگان
چکیده
Rather than creating and storing thousands of paraphrase examples, paraphrase templates have strong representation capacity and can be used to generate many paraphrase examples. This paper describes a new template representation and generalization method. Combing a semantic dictionary, it uses multiple semantic codes to represent a paraphrase template. Using an existing search engine to extend the word clusters and generalize the examples. We also design three metrics to measure our generalized templates. The experimental results show that the representation method is reasonable and the generalized templates have a higher precision and coverage.
منابع مشابه
Extracting Paraphrases from a Parallel Corpus
While paraphrasing is critical both for interpretation and generation of natural language, current systems use manual or semi-automatic methods to collect paraphrases. We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the same source text. Our approach yields phrasal and single word lexical paraphrases as well as sy...
متن کاملA Probabilistic Model for Measuring Grammaticality and Similarity of Automatically Generated Paraphrases of Predicate Phrases
The most critical issue in generating and recognizing paraphrases is development of wide-coverage paraphrase knowledge. Previous work on paraphrase acquisition has collected lexicalized pairs of expressions; however, the results do not ensure full coverage of the various paraphrase phenomena. This paper focuses on productive paraphrases realized by general transformation patterns, and addresses...
متن کاملImage flip CAPTCHA
The massive and automated access to Web resources through robots has made it essential for Web service providers to make some conclusion about whether the "user" is a human or a robot. A Human Interaction Proof (HIP) like Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) offers a way to make such a distinction. CAPTCHA is a reverse Turing test used by Web serv...
متن کاملMonolingual Machine Translation for Paraphrase Generation
We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pairs automatically extracted from clustered news articles available on the World Wide Web. Alignment Error Rate (AER) is measured to gauge the quality of the resulting corpus. A monotone phrasal decoder generates contextu...
متن کاملLearning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
Previous work has shown that high quality phrasal paraphrases can be extracted from bilingual parallel corpora. However, it is not clear whether bitexts are an appropriate resource for extracting more sophisticated sentential paraphrases, which are more obviously learnable from monolingual parallel corpora. We extend bilingual paraphrase extraction to syntactic paraphrases and demonstrate its a...
متن کامل