Morphology to the Rescue Redux: Resolving Borrowings and Code-Mixing in Machine Translation
نویسندگان
چکیده
In the IBM LMT machine translation system, derivational morphological rules recognize and analyze words that are not found in its source lexicons, and generate default transfers for these unlisted words. Unfound words with no inflectional or derivational affixes are by default nouns. These rules are now expanded to provide lexical coverage of a particular set of words created on the fly in emails by bilingual Spanish-English speakers. What characterizes the approach is the generation of additional default parts of speech, and the use of morphological, semantic, and syntactic features from both source and target lexicons for analysis and transfer. A built-in rule-based strategy to handle language borrowing and code-mixing allows for the recognition of words with variable and unpredictable frequency of occurrence, which would remain otherwise unfound, thus affecting the accuracy of parsing and the quality of translation output.
منابع مشابه
The Effects of Oral Code-mixing and Glossing on Iranian EFL Learners' Vocabulary Knowledge
The current study investigated the effects of oral code-mixing and glossing on L2 vocabulary learning. To this end, 60 EFL learners studying at pre-university school were given a pre-test to make sure that they did not have any prior knowledge of the target words. Based on their scores in the pre-test, 36 pre-university students were selected and divided into three groups, including two experim...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملUsing Rich Morphology in Resolving Certain Hindi-english Machine Translation Divergence
Identification and resolution of translation divergence (TD) is very crucial for any automated machine translation (MT) system. Although this problem has received attention of a number of MT developers, devising general strategies is hard to achieve. Solution to the language specific pairs appears to be comparatively tractable. In this paper, we present a technique that exploits the rich morpho...
متن کاملGoal programming-based post-disaster decision making for allocation and scheduling the rescue units in natural disaster with time-window
Natural disasters, such as earthquakes, tsunamis, and hurricanes cause enormous harm during each year. To reduce casualties and economic losses in the response phase, rescue units must be allocated and scheduled efficiently, such that it is a key issues in emergency response. In this paper, a multi-objective mix integer nonlinear programming model (MOMINLP) is proposed to minimize sum of weight...
متن کاملMainland Chinese Students’ Shifting Perceptions of Chinese-English Code-Mixing in Macao
As a former Portuguese colony, Macao is the only region in China where Cantonese, a variety of Chinese, and English, an international language, are enjoying de facto official statuses, with Putonghua being a quasi-official language and Portuguese being another official language. Recently, with an increasing number of Mainland Chinese students crossing the border to pursue their tertiar...
متن کامل