DOMCAT: A Bilingual Concordancer for Domain-Specific Computer Assisted Translation
نویسندگان
چکیده
In this paper, we propose a web-based bilingual concordancer, DOMCAT 1 , for domain-specific computer assisted translation. Given a multi-word expression as a query, the system involves retrieving sentence pairs from a bilingual corpus, identifying translation equivalents of the query in the sentence pairs (translation spotting) and ranking the retrieved sentence pairs according to the relevance between the query and the translation equivalents. To provide high-precision translation spotting for domain-specific translation tasks, we exploited a normalized correlation method to spot the translation equivalents. To ranking the retrieved sentence pairs, we propose a correlation function modified from the Dice coefficient for assessing the correlation between the query and the translation equivalents. The performances of the translation spotting module and the ranking module are evaluated in terms of precision-recall measures and coverage rate respectively.
منابع مشابه
TS3: an Improved Version of the Bilingual Concordancer TransSearch
Computer Assisted Translation tools remain the preferred solution of human translators when publication quality is of concern. In this paper, we present our ongoing efforts conducted within TS3, a project which aims at improving the commercial bilingual concordancer TransSearch. The core technology of this Web-based service mainly relies on sentence-level alignment. In this study, we discuss an...
متن کاملEnhancing the Bilingual Concordancer TransSearch with Word-Level Alignment
Despite the impressive amount of recent studies devoted to improving the state of the art of Machine Translation (MT), Computer Assisted Translation (CAT) tools remain the preferred solution of human translators when publication quality is of concern. In this paper, we present our perspectives on improving the commercial bilingual concordancer TransSearch, a Web-based service whose core technol...
متن کاملSubsentential Translation Memory for Computer Assisted Writing and Translation
This paper describes a database of translation memory, TotalRecall, developed to encourage authentic and idiomatic use in second language writing. TotalRecall is a bilingual concordancer that support search query in English or Chinese for relevant sentences and translations. Although initially intended for learners of English as Foreign Language (EFL) in Taiwan, it is a gold mine of texts in En...
متن کاملUsing sign language corpora as bilingual corpora for data mining: Contrastive linguistics and computer-assisted..
More and more sign languages nowadays are now documented by large scale digital corpora. But exploiting sign language (SL) corpus data remains subject to the time consuming and expensive manual task of annotating. In this paper, we present an ongoing research that aims at testing a new approach to better mine SL data. It relies on the methodology of corpus-based contrastive linguistics, exploit...
متن کاملTANGO: Bilingual Collocational Concordancer
In this paper, we describe TANGO as a collocational concordancer for looking up collocations. The system was designed to answer user’s query of bilingual collocational usage for nouns, verbs and adjectives. We first obtained collocations from the large monolingual British National Corpus (BNC). Subsequently, we identified collocation instances and translation counterparts in the bilingual corpu...
متن کامل