Lexicon Stratification for Translating Out-of-Vocabulary Words
نویسندگان
چکیده
A language lexicon can be divided into four main strata, depending on origin of words: core vocabulary words, fullyand partiallyassimilated foreign words, and unassimilated foreign words (or transliterations). This paper focuses on translation of fullyand partially-assimilated foreign words, called “borrowed words”. Borrowed words (or loanwords) are content words found in nearly all languages, occupying up to 70% of the vocabulary. We use models of lexical borrowing in machine translation as a pivoting mechanism to obtain translations of out-of-vocabulary loanwords in a lowresource language. Our framework obtains substantial improvements (up to 1.6 BLEU) over standard baselines.
منابع مشابه
A hybrid language model for open-vocabulary Thai LVCSR
This paper investigates the use of a hybrid language model for open-vocabulary Thai LVCSR. Thai text is written without word boundary markers and the definition of word unit is often ambiguous due to the presence of compound words. Hence, to build open-vocabulary LVCSR, a very large lexicon is required to also handle word unit ambiguity. Pseudomorpheme (PM), a syllable-like sub-word unit specif...
متن کاملGeneration of Adaptive Vocabulary Lexicon for Japanese LVCSR
One of the thorniest problems of large vocabulary continuous speech recognition systems is the large number of out-of-vocabulary (OOV) words. This is especially the case for the languages like Japanese, which has many inflections, compound words and loanwords. The OOV words vary with the application domains. It's not realistic to have a big general-purpose lexicon including any possible 00V wor...
متن کاملLocal Methods for On-Demand Out-of-Vocabulary Word Retrieval
Most of the Web-based methods for lexicon augmenting consist in capturing global semantic features of the targeted domain in order to collect relevant documents from the Web. We suggest that the local context of the out-of-vocabulary (OOV) words contains relevant information on the OOV words. With this information, we propose to use the Web to build locally-augmented lexicons which are used in ...
متن کاملThe Effect of Lexicon-based Debates on the Felicity of Lexical Equivalents in Translating Literary Texts by Iranian EFL Learners
This study was an attempt to investigate the effect of lexicon-based debates on the felicity of lexical equivalents in translating literary texts by Iranian EFL learners. To fulfill the purpose of this study, 59 university students, majoring in English Translation, were randomly assigned to the experimental and control groups from a total of 73 students based on their performance on a mock TOE...
متن کاملCapturing Out-of-Vocabulary Words in Arabic Text
The increasing flow of information between languages has led to a rise in the frequency of non-native or loan words, where terms of one language appear transliterated in another. Dealing with such out of vocabulary words is essential for successful cross-lingual information retrieval. For example, techniques such as stemming should not be applied indiscriminately to all words in a collection, a...
متن کامل