Machine Translation for Multilingual Troubleshooting in the IT Domain: A Comparison of Different Strategies
نویسندگان
چکیده
In this paper, we address the problem of machine translation (MT) of domain-specific texts for which large amounts of parallel data for training are not available. We focus on the IT domain and on English to Portuguese machine translation, and compare different strategies for improving system performance over two baselines, the first using only large dataset of out-of-domain data, and the second using only a small dataset of in-domain data. Our results indicate that adding a domain-specific bilingual lexicon to the training dataset significantly improves the performance of both a hybrid MT system and a PBSMT system, while adding out-of-domain sentence pairs to the training dataset only improves the performance of a hybrid MT system. Furthermore, we perform a human evaluation of the sentences generated by the hybrid MT system and the standard PBSMT system built using the same training datasets. The results indicate some significant differences between those two MT approaches in this specific task.
منابع مشابه
Enriching Statistical Translation Models Using a Domain-Independent Multilingual Lexical Knowledge Base
This paper presents a method for improving phrase-based Statistical Machine Translation systems by enriching the original translation model with information derived from a multilingual lexical knowledge base. The method proposed exploits the Multilingual Central Repository (a group of linked WordNets from different languages), as a domain-independent knowledge database, to provide translation m...
متن کاملFacilitating cross-language retrieval and machine translation by multilingual domain ontologies
This paper presents a method for facilitating cross-language retrieval and machine translation in domain specific collections. The method is based on a semi-automatic adaption of a multilingual domain ontology and it is particularly suitable for the eLearning domain. The presented approach has been integrated into a real-world system supporting cross-language retrieval and machine translation o...
متن کاملTransculturation and Multilingual Lives: Writing between Languages and Cultures
This paper looks at the issues of transculturation as explored in auto and semi-autobiographical accounts of linguistic and cultural transitions. The paper also addresses a number of questions about the structure of these texts, the authors’ linguistic competences, as well as questions about the theoretical and conceptual tool which may help us to discuss the issues the writers are reflecting o...
متن کاملCollaboration and Crowdsourcing: The Cases of Multilingual Digital Libraries
Purpose – This study aims to understand key features of existing multilingual digital libraries and to suggest strategies for building and/or sustaining multilingual information access for digital libraries. Design/methodology/approach – A case study approach was applied to examine four American multilingual digital libraries: Project Gutenberg, Meeting of Frontiers, The International Children’...
متن کاملA Comparative Study of English-Persian Translation of Neural Google Translation
Many studies abroad have focused on neural machine translation and almost all concluded that this method was much closer to humanistic translation than machine translation. Therefore, this paper aimed at investigating whether neural machine translation was more acceptable in English-Persian translation in comparison with machine translation. Hence, two types of text were chosen to be translated...
متن کامل