A Preliminary Study of Finding Entailing Texts in a Domain-specific Monolingual Parallel Corpora

نویسندگان

  • Kenston Choi
  • Nathalie Rose Lim-Cheng
چکیده

This paper introduces the possible usages, benefits, and challenges involved in the use of domain-specific monolingual parallel corpora in determining textual entailment (TE). A system that finds entailing text for a given statement is to be developed using monolingual parallel translations of the Bible as corpus as this is one of the most accessible monolingual parallel corpora. Different existing methods for textual entailment are reviewed and related to the use of domain-specific and monolingual parallel corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Particle Swarm Optimizer to Cluster Parallel Spanish-English Short-text Corpora Un Optimizador basado en Cúmulo de Part́ıculas para el Agrupamiento de Textos Cortos de Colecciones Paralelas en Español-Inglés

Short-texts clustering is currently an important research area because of its applicability to web information retrieval, text summarization and text mining. These texts are often available in different languages and parallel multilingual corpora. Some previous works have demonstrated the effectiveness of a discrete Particle Swarm Optimizer algorithm, named CLUDIPSO, for clustering monolingual ...

متن کامل

TLAXCALA: a multilingual corpus of independent news

We acquire corpora from the domain of independent news from the Tlaxcala website. We build monolingual corpora for 15 languages and parallel corpora for all the combinations of those 15 languages. These corpora include languages for which only very limited such resources exist (e.g. Tamazight). We present the acquisition process in detail and we also present detailed statistics of the produced ...

متن کامل

An IR Approach for Translating New Words from Nonparallel, Comparable Texts

In recent years, there is a phenomenal growth in the amount of online text material available from the greatest information repository known as the World Wide Web. Various traditional information retrieval(IR) techniques combined with natural language processing(NLP) techniques have been re-targeted to enable efficient access of the WWW--search engines, indexing, relevance feedback, query term ...

متن کامل

استخراج پیکره‌ موازی از اسناد قابل‌مقایسه برای بهبود کیفیت ترجمه در سیستم‌های ترجمه ماشینی

Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...

متن کامل

Reference Lists for the Evaluation of Term Extraction Tools

In this paper, we discuss practical and methodological issues of the creation of reference term lists (RTLs) for the evaluation of monolingual and bilingual term candidate extraction from comparable corpora in the domains of wind energy and mobile technology. These reference term lists are intended to serve as a ”gold standard” for the qualitative and quantitative evaluation of automatic term e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011