Reciprocal Enrichment Between Basque Wikipedia and Machine Translation

نویسندگان

  • Iñaki Alegria
  • Unai Cabezón
  • Unai Fernandez de Betoño
  • Gorka Labaka
  • Aingeru Mayor
  • Kepa Sarasola
  • Arkaitz Zubiaga
چکیده

In this chapter, we define a collaboration framework that enables Wikipedia editors to generate new articles while they help development of Machine Translation (MT) systems by providing post-edition logs. This collaboration framework was tested with editors of Basque Wikipedia. Their post-editing of Computer Science articles has been used to improve the output of a Spanish to Basque MT system called Matxin. For the collaboration between editors and researchers, we selected a set of 100 articles from the Spanish Wikipedia. These articles would then be used as the source texts to be translated into Basque using the MT engine. A group of volunteers from Basque Wikipedia reviewed and corrected the raw MT translations. This collaboration ultimately produced two main benefits: (i) the change logs that would potentially help improve the MT engine by using an automated statistical post-editing system , and (ii) the growth of Basque Wikipedia. The results show that this process can improve the accuracy of an Rule Based MT (RBMT) system in nearly 10% benefiting from the post-edition of 50,000 words in the Computer Iñaki Alegria Ixa Group, University of the Basque Country UPV/EHU, e-mail: [email protected] Unai Cabezon Ixa Group, University of the Basque Country, e-mail: [email protected] Unai Fernandez de Betoño Basque Wikipedia and University of the Basque Country, e-mail: [email protected] Gorka Labaka Ixa Group, University of the Basque Country, e-mail: [email protected] Aingeru Mayor Ixa Group, University of the Basque Country, e-mail: [email protected] Kepa Sarasola Ixa Group, University of the Basque CountryU, e-mail: [email protected] Arkaitz Zubiaga Basque Wikipedia and Queens College, CUNY, CS Department, Blender Lab, New York, e-mail: [email protected]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Wikipedia and Machine Translation: killing two birds with one stone

In this paper we present the free/open-source language resources for machine translation created in OpenMT-2 wikiproject, a collaboration framework that was tested with editors of Basque Wikipedia. Post-editing of Computer Science articles has been used to improve the output of a Spanish to Basque MT system called Matxin. For the collaboration between editors and researchers, we selected a set ...

متن کامل

Domain Adaptation in MT using Wikipedia as a Parallel Corpus: Resources and Evaluation

This paper presents how a state-of-the-art Statistical Machine Translation system is enriched by using extra in-domain parallel corpora extracted from Wikipedia. We collect corpora from parallel titles and from parallel fragments in comparable articles from Wikipedia editions for English, Spanish and Basque. We carried out an evaluation with a double objective: to evaluate the quality of the ex...

متن کامل

The Sheffield and Basque Country Universities Entry to CHiC: Using Random Walks and Similarity to Access Cultural Heritage

The Cultural Heritage in CLEF 2012 (CHiC) pilot evaluation included these tasks: ad-hoc retrieval, semantic enrichment and variability tasks. At CHiC 2012, the University of Sheffield and the University of the Basque Country submitted a joint entry, attempting the three English monolingual tasks. For the ad-hoc task, the baseline approach used the Indri Search engine. Query expansion approaches...

متن کامل

Translation Memories Enrichment by Statistical Bilingual Segmentation

A majority of Machine Aided Translation systems are based on comparisons between a source sentence and reference sentences stored in Translation Memories (TMs). The translation search is done by looking for sentences in a database which are similar to the source sentence. TMs have two basic limitations: the dependency on the repetition of complete sentences and the high cost of building a TM. A...

متن کامل

Example-Based Machine Translation of the Basque Language

Basque is both a minority and a highly inflected language with free order of sentence constituents. Machine Translation of Basque is thus both a real need and a test bed for MT techniques. In this paper, we present a modular DataDriven MT system which includes different chunkers as well as chunk aligners which can deal with the free order of sentence constituents of Basque. We conducted Basque ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013