Transition-Based Morphological Disambiguation
نویسندگان
چکیده
In Morphologically Rich Languages (MRLs), sentences are composed of ambiguous space-delimited tokens that ought to be disambiguated with respect to their constituent morphemes. Previous work on Morphological Disambiguation (MD) of MRLs has had variable success, with Semitic languages having sub-par results for downstream applications. Here we propose novel MD transition-based systems, both word-based and morphemebased, and tackle the challenge introduced by the variable length of hypothesized morpheme sequences. Our experiments show that transition-based morphemebased MD consistently outperforms the word-based variant, while providing new state of the art results on Hebrew MD.
منابع مشابه
Data-Driven Morphological Analysis and Disambiguation for Morphologically Rich Languages and Universal Dependencies
Parsing texts into universal dependencies (UD) in realistic scenarios requires infrastructure for morphological analysis and disambiguation (MA&D) of typologically different languages as a first tier. MA&D is particularly challenging in morphologically rich languages (MRLs), where the ambiguous space-delimited tokens ought to be disambiguated with respect to their constituent morphemes. Here we...
متن کاملAn Unsupervised Morpheme-Based HMM for Hebrew Morphological Disambiguation
Morphological disambiguation is the process of assigning one set of morphological features to each individual word in a text. When the word is ambiguous (there are several possible analyses for the word), a disambiguation procedure based on the word context must be applied. This paper deals with morphological disambiguation of the Hebrew language, which combines morphemes into a word in both ag...
متن کاملSHAKKIL: An Automatic Diacritization System for Modern Standard Arabic Texts
This paper sheds light on a system that would be able to diacritize Arabic texts automatically (SHAKKIL). In this system, the diacritization problem will be handled through two levels; morphological and syntactic processing levels. The adopted morphological disambiguation algorithm depends on four layers; Uni-morphological form layer, rule-based morphological disambiguation layer, statistical-b...
متن کاملMorphological Disambiguation of Turkish Text with Perceptron Algorithm
This paper describes the application of the perceptron algorithm to the morphological disambiguation of Turkish text. Turkish has a productive derivational morphology. Due to the ambiguity caused by complex morphology, a word may have multiple morphological parses, each with a different stem or sequence of morphemes. The methodology employed is based on ranking with perceptron algorithm which h...
متن کاملCharacter-Aware Neural Morphological Disambiguation
We develop a language-independent, deep learning-based approach to the task of morphological disambiguation. Guided by the intuition that the correct analysis should be “most similar” to the context, we propose dense representations for morphological analyses and surface context and a simple yet effective way of combining the two to perform disambiguation. Our approach improves on the languaged...
متن کامل