Introducing the Per-Fide Project: Parallelizing Portuguese with six different Languages
نویسندگان
چکیده
In this paper we present the Per-Fide project, aimed at the construction of parallel corpora mapping the Portuguese language to six other languages English, Russian, French, Italian, German and Spanish in various domains including literary, journalistic and religious texts. First we will focus on the corpus design criteria and its main features, particularly those that distinguish this corpus from existing parallel corpora. Secondly, we will discuss the challenges of elaborating a typology of text-types for the religious domain and problems associated with the encoding of the texts belonging to this category. To conclude, we will demonstrate how the Per-Fide Corpus can be used in contrastive and translation studies with a case study of pronominal causative constructions in a French-Portuguese contrastive perspective.
منابع مشابه
The Per-Fide Corpus : A new Resource for Corpus-Based Terminology, Contrastive Linguistics and Translation Studies
The Per-Fide project is a joint collaboration between researchers at the Department of Informatics and the Institute of Arts and Humanities at the University of Minho, Portugal. The acronym Per-Fide stands for Portuguese (P) in parallel with 6 languages: English (E), Russian (R), French (F), Italian (I), German/Deutsch (D) and Spanish/ Español (E). First, we expound on the role of the Per-Fide ...
متن کاملLanguage independent and language adaptive large vocabulary speech recognition
This paper describes the design of a multilingual speech recognizer using an LVCSR dictation database which has been collected under the project GlobalPhone. This project at the University of Karlsruhe investigates LVCSR systems in 15 languages of the world, namely Arabic, Chinese, Croatian, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Swedish, Tamil, and Tu...
متن کاملThe Presence and Influence of English in the Portuguese Financial Media
As the lingua franca of the 21st century, English has become the main language for intercultural communication for those wanting to embrace globalization. In Portugal, it is the second language of most public and private domains influencing its culture and discourses. Language contact situations transform languages by the incorporations they make from other languages and Portugal has...
متن کاملMultilingual and Crosslingual Speech Recognition
This paper describes the design of a multilingual speech recognizer using an LVCSR dictation database which has been collected under the project GlobalPhone. This project at the University of Karlsruhe investigates LVCSR systems in 15 languages of the world, namely Arabic, Chinese, Croatian, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Swedish, Tamil, and Tu...
متن کاملSMT and Hybrid systems of the QTLeap project in the WMT16 IT-task
This paper presents the description of 12 systems submitted to the WMT16 IT-task, covering six different languages, namely Basque, Bulgarian, Dutch, Czech, Portuguese and Spanish. All these systems were developed under the scope of the QTLeap project, presenting a common strategy. For each language two different systems were submitted, namely a phrasebased MT system built using Moses, and a sys...
متن کامل