DGT-TM: A freely available Translation Memory in 22 languages
نویسندگان
چکیده
The European Commission’s (EC) Directorate General for Translation, together with the EC’s Joint Research Centre, is making available a large translation memory (TM; i.e. sentences and their professionally produced translations) covering twenty-two official European Union (EU) languages and their 231 language pairs. Such a resource is typically used by translation professionals in combination with TM software to improve speed and consistency of their translations. However, this resource has also many uses for translation studies and for language technology applications, including Statistical Machine Translation (SMT), terminology extraction, Named Entity Recognition (NER), multilingual classification and clustering, and many more. In this reference paper for DGT-TM, we introduce this new resource, provide statistics regarding its size, and explain how it was produced and how to use it.
منابع مشابه
Translation Technology Tools and Professional Translators’ Attitudes toward Them
Today technology is an integral part of professional translation; and it is generally assumed that translators’ attitudes toward translation technology tools influence their interaction with technology (Bundgaard, 2017). Therefore, the present two-phase study seeks to shed some light on what translation technology tools are and how professional translators feel toward them. The research method ...
متن کاملBilingual Terminology Extraction in Sketch Engine
We present a method of bilingual terminology extraction from parallel corpora and a few heuristics and experiments with improving the performance of the basic variant of the method. An evaluation is given using a small gold standard manually prepared for EnglishCzech language pair from DGT translation memory [1]. The bilingual terminology extraction (ABTE3) is available for several languages in...
متن کاملIncreasing Coverage of Translation Memories with Linguistically Motivated Segment Combination Methods
Translation memories (TMs) used in computer-aided translation (CAT) systems are the highest-quality source of parallel texts since they consist of segment translation pairs approved by professional human translators. The obvious problem is their size and coverage of new document segments when compared with other parallel data. In this paper, we describe several methods for expanding translation...
متن کاملA Simple Multilingual Machine Translation System
The multilingual machine translation system described in the first part of this paper demonstrates that the translation memory (TM) can be used in a creative way for making the translation process more automatic (in a way which in fact does not depend on the languages used). The MT system is based upon exploitation of syntactic similarities between more or less related natural languages. It cur...
متن کاملMetaMorpho TM
This paper describes the new linguistic approaches of the MetaMorphoTM system, a linguistically enriched translation memory. Our aim was to develop an improved TM system that uses linguistic analysis in both source and destination languages to yield more exact matches to the source sentence. The MetaMorphoTM stores and retrieve subsentential segments and uses a linguistically based measure to d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012