A Sequence Labeling Approach to Morphological Analyzer for Tamil Language
نویسنده
چکیده
Morphological analysis is the basic process for any Natural Language Processing task. Morphology is the study of internal structure of the word. Morphological analysis retrieves the grammatical features and properties of a morphologically inflected word. Capturing the agglutinative structure of Tamil words by an automatic system is a challenging job. Generally rule based approaches are used for building morphological analyzer. In this paper we propose a novel approach to solve the morphological analyzer problem using machine learning methodology. Here morphological analyzer problem is redefined as classification problem. This approach is based on sequence labeling and training by kernel methods that captures the non linear relationships of the morphological features from training data samples in a better and simpler way. Keywordsmorphology; morphological analyzer; machine learning; sequence labeling.
منابع مشابه
A Novel Approach to Morphological Analysis for Tamil Language
This paper presents the morphological analysis for complex agglutinative Tamil language using machine learning approach. Morphological analysis is concerned with retrieving the structure, syntactic rules, morphological properties and the meaning of a morphologically complex word. The morphological structure of an agglutinative language is unique and capturing its complexity in a machine analyza...
متن کاملA Novel Data Driven Algorithm for Tamil Morphological Generator
Tamil is a morphologically rich language with agglutinative nature. Being agglutinative language most of the word features are postpositionally affixed to the root word. The morphological generator takes lemma, POS category and morpho-lexical description as input and gives a word-form as output. It is a reverse process of morphological analyzer. In any natural language generation system, morpho...
متن کاملGrammar Checker Features in Modern Tamil Natural Language Processing
Generally, The NLP (Tamil) applications are programming with different kinds of input data. Inputs classified into Text, Image, sound waves etc., Tamil Text based applications are creating under the word formation techniques. These words analysis and generation are activating in these ways, i) Untagging & Tagging and ii) Word-level and Character-level accuracies. This method is processing based...
متن کاملStemmers for Tamil Language: Performance Analysis
Abstract— Stemming is the process of extracting root word from the given inflection word and also plays significant role in numerous application of Natural Language Processing (NLP). Tamil Language raises several challenges to NLP, since it has rich morphological patterns than other languages. The rule based approach light-stemmer is proposed in this paper, to find stem word for given inflectio...
متن کاملAutomated Paradigm Selection for FSA based Konkani Verb Morphological Analyzer
A Morphological Analyzer is a crucial tool for any language. In popular tools used to build morphological analyzers like XFST, HFST and Apertium’s lttoolbox, the finite state approach is used to sequence input characters. We have used the finite state approach to sequence morphemes instead of characters. In this paper we present the architecture and implementation details of a Corpus assisted F...
متن کامل