apertium-cy - a collaboratively-developed free RBMT system for Welsh to English

نویسندگان

  • Francis M. Tyers
  • Kevin Donnelly
چکیده

apertium-cy (http://www.cymraeg.org.uk) is a rule-based “gisting” machine translation system forWelsh to English, with both engine and data released under the GPL.We summarise the development of apertium-cy, evaluate its output, and discuss the advantages of a collaborative development model combined with rule-based MT for marginalised languages. 1. e Apertium platform apertium-cy is a “gisting” machine translation system for Welsh to English, based on the Apertium machine translation platform.1 e platform was originally aimed at the Romance languages of the Iberian peninsula, but is now being adapted for other languages (such as Basque, and languages from the Celtic group –Welsh, Irish, Breton), withmuch of the work on new languages being pursued by volunteers, following the increasingly common collaborative development model used for free2 and open-source soware. Apertium is licensed under the Free Soware Foundation’s GNUGeneral Public License,3 and all the soware and data for the 17 supported language pairs (and the other pairs in development) is available for download from the project website. Apertium follows a shallow-transfer approach, and is very fast. Finite-state transducers (Garrido-Alenda and Forcada, 2002, Roche and Schabes, 1997) processing up to 40,000 words per second are used for lexical processing, first-order hiddenMarkov models (HMM) are used for part-of-speech tagging, and multi-stage finite-state based chunking for structural transfer. 1http://www.apertium.org 2We follow the definition of “free” used by the Free Soware Foundation http://www.fsf.org. 3http://www.fsf.org/licensing/licenses/gpl.html, accessed 12/12/2008. © 2009 PBML. All rights reserved. Please cite this article as: Francis Tyers, Kevin Donnelly, apertium-cy – a collaboratively-developed free RBMT system for Welsh to English. The Prague Bulletin of Mathematical Linguistics No. 91, 2009, 57–66. PBML 91 JANUARY 2009 e soware behind the platform is implemented as a standard UNIX pipeline, with each stage in the translation having a separate C++ program. Communication between each stage uses piped text streams. XML-based formats are used to encode the linguistic data, which are then compiled into the high-speed formats used by the engine. Further details are given in (Armentano-Oller et al., 2006), and on the project website.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic acquisition of Named Entities for Rule-Based Machine Translation∗

This paper proposes to enrich RBMT dictionaries with Named Entities (NEs) automatically acquired from Wikipedia. The method is applied to the Apertium English–Spanish system and its performance compared to that of Apertium with and without handtagged NEs. The system with automatic NEs outperforms the one without NEs, while results vary when compared to a system with handtagged NEs (results are ...

متن کامل

Sharing resources between free/open-source rule-based machine translation systems: Grammatical Framework and Apertium

In this paper, we describe two methods developed for sharing linguistic data between two free and open source rule based machine translation systems: Apertium, a shallow-transfer system; and Grammatical Framework (GF), which performs a deeper syntactic transfer. In the first method, we describe the conversion of lexical data from Apertium to GF, while in the second one we automatically extract ...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

Rapid development of RBMT systems for related languages

The article describes a new way of constructing rule-based machine translation systems (RBMT). RBMT systems are currently among the best performing machine translation systems. Most of the "big named" machine translation systems (Systran, 2007)(Promt, 2007) belong to this category, but these systems have a big drawback; construction of such systems demands a great amount of time and resources, ...

متن کامل

Developing Prototypes for Machine Translation between Two Sámi Languages

This paper describes the development of two prototype systems for machine translation between North Sámi and Lule Sámi. Experiments were conducted in rule-based machine translation (RBMT), using the Apertium platform, and statistical machine translation (SMT) using the Mosesdecoder. The experiments show that both approaches have their advantages and disadvantages, and that they can both make us...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Prague Bull. Math. Linguistics

دوره 91  شماره 

صفحات  -

تاریخ انتشار 2009