Supervised Methods for Domain Classification of Tamil Documents
نویسندگان
چکیده
The Era of digitization induces the need of domainclassification in both the on-line and off-line applications. The necessity of automatic text classification arises for utilizing it in diverse fields. Hence various methodologies like Machine Learningalgorithms were proposed to do the same. Here automatic document classification of Tamil documents have been proposed by considering the exponential growth of Tamil text documents in the form of unstructured data available as News, Encyclopedias, E-books, E-Governance, Social Media and much more. Max-Ent, CRF and SVM algorithms are used here to achieve more than 90 percentage average accuracy in both the sentence and document level classification of Tamil text documents. In this work Dinakarannewspaper dataset from EMILLE/CIIL Corpus has been utilized to experiment the ability of Machine Learning algorithms in Tamil domain classification.
منابع مشابه
Fisher Discriminant Analysis (FDA), a supervised feature reduction method in seismic object detection
Automatic processes on seismic data using pattern recognition is one of the interesting fields in geophysical data interpretation. One part is the seismic object detection using different supervised classification methods that finally has an output as a probability cube. Object detection process starts with generating a pickset of two classes labeled as object and non-object and then selecting ...
متن کاملSemi-Supervised Matrix Completion for Cross-Lingual Text Classification
Cross-lingual text classification is the task of assigning labels to observed documents in a label-scarce target language domain by using a prediction model trained with labeled documents from a label-rich source language domain. Cross-lingual text classification is popularly studied in natural language processing area to reduce the expensive manual annotation effort required in the target lang...
متن کاملWeb based classification of Tamil documents using ABPA
Automatic text classification based on vector space model (VSM), artificial neural networks (ANN), Knearest neighbor (KNN), N aives Bayes (NB) and support vector machine (SVM) have been applied on English language documents, and gained popularity among text mining and information retrieval (IR) researchers. This paper proposes the application of ANN for the classification of Tamil language docu...
متن کاملClassification of text documents supervised by domain ontologies
The research objective is to establish an approach for supporting the classification of text documents referring to a specified domain. The focus is on the preliminary topic assignment to the documents used for training the model. The method implements domain ontology as background knowledge. The idea consists in extracting the preliminary topics for training the classifier by means of unsuperv...
متن کاملSurvey of robust artificial intelligence classifier proper for various digital data
Artificial intelligence or machine intelligence should be considered as the vast domain of junction of many knowledge, sciences and old and new technics. Today, classification of documents is adopted extensively in information recovery for organizing documents. In the method of document supervised classification some correct information about documents that previously have been classified are a...
متن کامل