Regularized Interlingual Projections: Evaluation on Multilingual Transliteration
نویسندگان
چکیده
In this paper, we address the problem of building a multilingual transliteration system using an interlingual representation. Our approach uses international phonetic alphabet (IPA) to learn the interlingual representation and thus allows us to use any word and its IPA representation as a training example. Thus, our approach requires only monolingual resources: a phoneme dictionary that lists words and their IPA representations.1 By adding a phoneme dictionary of a new language, we can readily build a transliteration system into any of the existing previous languages, without the expense of all-pairs data or computation. We also propose a regularization framework for learning the interlingual representation, which accounts for language specific phonemic variability, and thus it can find better mappings between languages. Experimental results on the name transliteration task in five diverse languages show a maximum improvement of 29% accuracy and an average improvement of 17% accuracy compared to a state-of-the-art baseline system.
منابع مشابه
Dissertation title : Discriminative Interlingual Representations
Dissertation title : Discriminative Interlingual Representations Jagadeesh Jagarlamudi, Doctor of Philosophy, 2013 Dissertation advised by: Hal Daumé III Department of Computer Science The language barrier in many multilingual natural language processing (NLP) tasks can be overcome bymapping objects from different languages (“views”) into a common low-dimensional subspace. For example, the name...
متن کاملAn Interlingual Lexical Organisation Based on Acceptions From the PARAX mock-up to the NADIA system
Many projects are conducted to develop multilingual lexical databases. Some of these projects use an interlingual approach (KBMT-89, EDR, ...), where others choose a bilingual approach (Multilex, ...). This paper presents an interlingual approach based on acceptions (word-senses) aiming at the development of a multilingual lexical database management system: NADIA. With this approach, the inter...
متن کاملInterlingual Lexical Organisation For Multilingual Lexical Databases In NADIA
We propose a lexical organisation for multilingual lexical databases (MLDB). This organisation is based on acceptions (word-senses). We detail this lexical organisation and show a mock-up built to experiment with it. We also present our current work in defining and prototyping a specialised system for the management of acception-based MLDB.
متن کاملQuillpad Multilingual Predictive Transliteration System
Transliteration has been one of the common methods for multilingual text input. Many earlier methods employed transliteration schemes for defining one to one mapping of input character combinations to output character combinations. Though such well defined mappings made it easier to write a transliteration program, the end user was burdened with learning the mappings. Further, though transliter...
متن کاملInterlingual Aspects Of Wikipedia's Quality
This paper presents interim results of an ongoing project on quality issues concerning Wikipedia. One focus of research is the relation of language and quality measurement. The other one is the use of interlingual relations for quality assessment and improvement. The study is based on monoand multilingual samples of featured and non-featured Wikipedia articles in English, French, German, and It...
متن کامل