SisHiTra : A Hybrid Machine Translation System from Spanish to Catalan

نویسندگان

  • José R. Navarro
  • Jorge González
  • David Picó
  • Francisco Casacuberta
  • Joan M. de Val
  • Ferran Fabregat
  • Ferran Plà
  • Jesús Tomás
چکیده

In the current European scenario, characterized by the coexistence of communities writing and speaking a great variety of languages, machine translation has become a technology of capital importance. In areas of Spain and of other countries, coofficiality of several languages implies producing several versions of public information. Machine translation between all the languages of the Iberian Peninsula and from them into English will allow for a better integration of Iberian linguistic communities among them and inside Europe. The purpose of this paper is to show a machine translation system from Spanish to Catalan that deals with text input. In our approach, both deductive (linguistic) and inductive (corpus-based) methodologies are combined in an homogeneous and efficient framework: finite-state transducers. Some preliminary results show the interest of the proposed architecture.

منابع مشابه

English-Catalan Neural Machine Translation in the Biomedical Domain through the cascade approach

This paper describes the methodology followed to build a neural machine translation system in the biomedical domain for the English-Catalan language pair. This task can be considered a low-resourced task from the point of view of the domain and the language pair. To face this task, this paper reports experiments on a cascade pivot strategy through Spanish for the neural machine translation usin...

متن کامل

A Large Spanish-Catalan Parallel Corpus Release for Machine Translation

We present a large Spanish-Catalan parallel corpus extracted from ten years of the paper edition of a bilingual Catalan newspaper. The produced corpus of 7.5 M parallel sentences (around 180 M words per language) is useful for many natural language applications. We report excellent results when building a statistical machine translation system trained on this parallel corpus. The Spanish-Catala...

متن کامل

OpenMT: Open Source Machine Translation Using Hybrid Methods

The main goal of the OpenMT project is the development of open source machine translation architectures based on hybrid models and advanced syntactic–semantic processors. These architectures combine the three main Machine Translation (MT) frameworks, Rule-based (RBMT), Statistical (SMT) and Example–based (EBMT), into hybrid systems. Defined architectures and results will be open source, allow f...

متن کامل

Catalan-English Statistical Machine Translation without Parallel Corpus: Bridging through Spanish

This paper presents a full experiment on large-vocabulary Catalan-English statistical machine translation without an English-Catalan parallel corpus, in the context of the debates of the European Parliament. For this, we make use of an English-Spanish European Parliament Proceedings parallel corpus and a Spanish-Catalan general newspaper parallel corpus, both of which of more than 30 M words. G...

متن کامل

Improving a Catalan-Spanish Statistical Translation System using Morphosyntactic Knowledge

In this paper, a human evaluation of a Catalan-Spanish Ngram-based statistical machine translation system is used to develop specific techniques based on the use of grammatical categories, lexical categorisation and text processing, for the enhancement of the final translation. The system is successfully improved when testing with ad hoc and general corpora, as it is shown in the final automati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004