Compositional Morphology for Word Representations and Language Modelling
نویسندگان
چکیده
This paper presents a scalable method for integrating compositional morphological representations into a vector-based probabilistic language model. Our approach is evaluated in the context of log-bilinear language models, rendered suitably efficient for implementation inside a machine translation decoder by factoring the vocabulary. We perform both intrinsic and extrinsic evaluations, presenting results on a range of languages which demonstrate that our model learns morphological representations that both perform well on word similarity tasks and lead to substantial reductions in perplexity. When used for translation into morphologically rich languages with large vocabularies, our models obtain improvements of up to 1.2 BLEU points relative to a baseline system using back-off n-gram models.
منابع مشابه
Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics
Speakers of a language can construct an unlimited number of new words through morphological derivation. This is a major cause of data sparseness for corpus-based approaches to lexical semantics, such as distributional semantic models of word meaning. We adapt compositional methods originally developed for phrases to the task of deriving the distributional meaning of morphologically complex word...
متن کاملUsing functional magnetic resonance imaging (fMRI) to explore brain function: cortical representations of language critical areas
Pre-operative determination of the dominant hemisphere for speech and speech associated sensory and motor regions has been of great interest for the neurological surgeons. This dilemma has been of at most importance, but difficult to achieve, requiring either invasive (Wada test) or non-invasive methods (Brain Mapping). In the present study we have employed functional Magnetic Resonance Imaging...
متن کاملUsing functional magnetic resonance imaging (fMRI) to explore brain function: cortical representations of language critical areas
Pre-operative determination of the dominant hemisphere for speech and speech associated sensory and motor regions has been of great interest for the neurological surgeons. This dilemma has been of at most importance, but difficult to achieve, requiring either invasive (Wada test) or non-invasive methods (Brain Mapping). In the present study we have employed functional Magnetic Resonance Imaging...
متن کاملTwo New Models of Target Language Morphology in Translation
This proposal addresses the problem of translation into morphologically rich languages with two new models. In the first, we generate new word types that that are compositional translations of multiple source words (e.g., compounds in German) and augment existing translation models with these. In the second, we propose using automatically learned distributed representations of morphemes and the...
متن کاملProbabilistic modelling of morphologically rich languages
This thesis investigates how the sub-structure of words can be accounted for in probabilistic models of language. Such models play an important role in natural language processing tasks such as translation or speech recognition, but often rely on the simplistic assumption that words are opaque symbols. This assumption does not fit morphologically complex language well, where words can have rich...
متن کامل