Text Classification Using Association Rules, Dependency Pruning and Hyperonymization
نویسندگان
چکیده
We present new methods for pruning and enhancing itemsets for text classification via association rule mining. Pruning methods are based on dependency syntax and enhancing methods are based on replacing words by their hyperonyms of various orders. We discuss the impact of these methods, compared to pruning based on tfidf rank of words.
منابع مشابه
Exploiting statistically significant dependent rules for associative classification
Established associative classification algorithms have shown to be very effective in handling categorical data such as text data. The learned model is a set of rules that are easy to understand and can be edited. However, they still suffer from the following limitations: first, they mostly use the support-confidence framework to mine classification association rules which require the setting of...
متن کاملArabic Language Text Classification Using Dependency Syntax-Based Feature Selection
We study the performance of Arabic text classification combining various techniques: (a) tfidf vs. dependency syntax, for feature selection and weighting; (b) class association rules vs. support vector machines, for classification. The Arabic text is used in two forms: rootified and lightly stemmed. The results we obtain show that lightly stemmed text leads to better performance than rootified ...
متن کاملText Document Categorization by Term Association
A good text classifier is a classifier that efficiently categorizes large sets of text documents in a reasonable time frame and with an acceptable accuracy, and that provides classification rules that are human readable for possible fine-tuning. If the training of the classifier is also quick, this could become in some application domains a good asset for the classifier. Many techniques and alg...
متن کاملText classification with the support of pruned dependency patterns
We propose a novel text classification approach based on two main concepts, lexical dependency and pruning. We extend the standard bag-of-words method by including dependency patterns in the feature vector. We perform experiments with 37 lexical dependencies and the effect of each dependency type is analyzed separately in order to identify the most discriminative dependencies. We analyze the ef...
متن کاملA Method for Classification based on Association Rules using Ontology in Web Data
This paper shows a new method based on association rule mining and ontology for the classification of web pages. This work is pruning of association rules, generated by mining process. The main complexity arises due to the fact that there are various number of text documents that are considered for generating the association rules using the A-priori algorithm. But these rules that were generate...
متن کامل