Compositional Morphology for Word Representations and Language Modelling

نویسندگان

  • Jan A. Botha
  • Phil Blunsom
چکیده

This paper presents a scalable method for integrating compositional morphological representations into a vector-based probabilistic language model. Our approach is evaluated in the context of log-bilinear language models, rendered suitably efficient for implementation inside a machine translation decoder by factoring the vocabulary. We perform both intrinsic and extrinsic evaluations, presenting results on a range of languages which demonstrate that our model learns morphological representations that both perform well on word similarity tasks and lead to substantial reductions in perplexity. When used for translation into morphologically rich languages with large vocabularies, our models obtain improvements of up to 1.2 BLEU points relative to a baseline system using back-off n-gram models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics

Speakers of a language can construct an unlimited number of new words through morphological derivation. This is a major cause of data sparseness for corpus-based approaches to lexical semantics, such as distributional semantic models of word meaning. We adapt compositional methods originally developed for phrases to the task of deriving the distributional meaning of morphologically complex word...

متن کامل

Using functional magnetic resonance imaging (fMRI) to explore brain function: cortical representations of language critical areas

Pre-operative determination of the dominant hemisphere for speech and speech associated sensory and motor regions has been of great interest for the neurological surgeons. This dilemma has been of at most importance, but difficult to achieve, requiring either invasive (Wada test) or non-invasive methods (Brain Mapping). In the present study we have employed functional Magnetic Resonance Imaging...

متن کامل

Using functional magnetic resonance imaging (fMRI) to explore brain function: cortical representations of language critical areas

Pre-operative determination of the dominant hemisphere for speech and speech associated sensory and motor regions has been of great interest for the neurological surgeons. This dilemma has been of at most importance, but difficult to achieve, requiring either invasive (Wada test) or non-invasive methods (Brain Mapping). In the present study we have employed functional Magnetic Resonance Imaging...

متن کامل

Two New Models of Target Language Morphology in Translation

This proposal addresses the problem of translation into morphologically rich languages with two new models. In the first, we generate new word types that that are compositional translations of multiple source words (e.g., compounds in German) and augment existing translation models with these. In the second, we propose using automatically learned distributed representations of morphemes and the...

متن کامل

Probabilistic modelling of morphologically rich languages

This thesis investigates how the sub-structure of words can be accounted for in probabilistic models of language. Such models play an important role in natural language processing tasks such as translation or speech recognition, but often rely on the simplistic assumption that words are opaque symbols. This assumption does not fit morphologically complex language well, where words can have rich...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014