Morphological Analysis of the Dravidian Language Family
نویسندگان
چکیده
The Dravidian family is one of the most widely spoken set of languages in the world, yet there are very few annotated resources available to NLP researchers. To remedy this, we create DravMorph, a corpus annotated formorphological segmentation and part-of-speech. Also, we exploit novel features and higher-order models to achieve promising results on these corpora on both tasks, beating techniques proposed in the literature by as much as 4 points in segmentation F1.
منابع مشابه
Unity in Diversity: A Unified Parsing Strategy for Major Indian Languages
This paper presents our work to apply non linear neural network for parsing five r esource p oor I ndian L anguages belonging to two major language families Indo-Aryan and Dravidian. Bengali and Marathi are Indo-Aryan languages whereas Kannada, Telugu and Malayalam belong to the Dravidian family. While little work has been done previously on Bengali and Telugu linear transition-based parsing, w...
متن کاملJoint Bayesian Morphology learning for Dravidian languages
In this paper a methodology for learning the complex agglutinative morphology of some Indian languages using Adaptor Grammars and morphology rules is presented. Adaptor grammars are a compositional Bayesian framework for grammatical inference, where we define a morphological grammar for agglutinative languages and morphological boundaries are inferred from a plain text corpus. Once morphologica...
متن کاملSignificance of an Accurate Sandhi-Splitter in Shallow Parsing of Dravidian Languages
This paper evaluates the challenges involved in shallow parsing of Dravidian languages which are highly agglutinative and morphologically rich. Text processing tasks in these languages are not trivial because multiple words concatenate to form a single string with morpho-phonemic changes at the point of concatenation. This phenomenon known as Sandhi, in turn complicates the individual word iden...
متن کاملAugmenting Pivot based SMT with word segmentation
This paper is an attempt to bridge two well known performance degraders in SMT, viz., (i) difference in morphological characteristics of the two languages, and (ii) scarcity of parallel corpora. We address these two problems using “word segmentation” and through “pivots” on the morphologically complex language. Our case study is Malayalam to Hindi SMT. Malayalam belongs to the Dravidian family ...
متن کاملMorphological Analyzer for Affix Stacking Languages: A Case Study of Marathi
In this paper we describe and evaluate a Finite State Machine (FSM) based Morphological Analyzer (MA) for Marathi, a highly inflectional language with agglutinative suffixes. Marathi belongs to the Indo-European family and is considerably influenced by Dravidian languages. Adroit handling of participial constructions and other derived forms (Krudantas and Taddhitas) in addition to inflected for...
متن کامل