Stream-based Randomised Language Models for SMT
نویسندگان
چکیده
Randomised techniques allow very big language models to be represented succinctly. However, being batch-based they are unsuitable for modelling an unbounded stream of language whilst maintaining a constant error rate. We present a novel randomised language model which uses an online perfect hash function to efficiently deal with unbounded text streams. Translation experiments over a text stream show that our online randomised model matches the performance of batch-based LMs without incurring the computational overhead associated with full retraining. This opens up the possibility of randomised language models which continuously adapt to the massive volumes of texts published on the Web each day.
منابع مشابه
Stream-based statistical machine translation
We investigate a new approach for SMT system training within the streaming model of computation. We develop and test incrementally retrainable models which, given an incoming stream of new data, can efficiently incorporate the stream data online. A naive approach using a stream would use an unbounded amount of space. Instead, our online SMT system can incorporate information from unbounded inco...
متن کاملA Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملTaste of Two Different Flavours: Which Manipuri Script works better for English-Manipuri Language pair SMT Systems?
The statistical machine translation (SMT) system heavily depends on the sentence aligned parallel corpus and the target language model. This paper points out some of the core issues on switching a language script and its repercussion in the phrase based statistical machine translation system development. The present task reports on the outcome of EnglishManipuri language pair phrase based SMT t...
متن کاملNTT - NAIST SMT Systems for IWSLT 2013
This paper presents NTT-NAIST SMT systems for EnglishGerman and German-English MT tasks of the IWSLT 2013 evaluation campaign. The systems are based on generalized minimum Bayes risk system combination of three SMT systems: forest-to-string, hierarchical phrase-based, phrasebased with pre-ordering. Individual SMT systems include data selection for domain adaptation, rescoring using recurrent ne...
متن کاملSyntax-based Statistical Machine Translation
In its early development, machine translation adopted rule-based approaches, which can include the use of language syntax. The late 1980s and early 1990s saw the inception of the statistical machine translation (SMT) approach, where translation models can be learned automatically from a parallel corpus rather than created manually by humans. Initial SMT models were word-based and phrase-based, ...
متن کامل