The naive Bayes text classification algorithm based on rough set in the cloud platform
نویسندگان
چکیده
This paper improves the naïve bayesian classification algorithm , combining with the rough set theory we can get a naive bayesian classifier algorithm based on the rough set. We implement this algorithm on a cloud platform using map-reduce programming mode and get a excellent result. A recall rate of 76.4 was achieved when classifying Tibetan Web pages .
منابع مشابه
A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier
With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...
متن کاملNaive Bayesian Rough Sets
A naive Bayesian classifier is a probabilistic classifier based on Bayesian decision theory with naive independence assumptions, which is often used for ranking or constructing a binary classifier. The theory of rough sets provides a ternary classification method by approximating a set into positive, negative and boundary regions based on an equivalence relation on the universe. In this paper, ...
متن کاملTwo-step Classification Algorithm Based on Decision- Theoretic Rough Set Theory
This paper introduces rough set theory and decision-theoretic rough set theory. Then based on the latter, a two-step classification algorithm is proposed. Compared with primitive DTRST algorithms, our method decreases the range of negative domain and employs a two-steps strategy in classification. New samples and unknown samples can be estimated whether it belongs to the negative domain when th...
متن کاملA Validation Test Naive Bayesian Classification Algorithm and Probit Regression as Prediction Models for Managerial Overconfidence in Iran's Capital Market
Corporate directors are influenced by overconfidence, which is one of the personality traits of individuals; it may take irrational decisions that will have a significant impact on the company's performance in the long run. The purpose of this paper is to validate and compare the Naive Bayesian Classification algorithm and probit regression in the prediction of Management's overconfident at pre...
متن کاملA Dimension Reduction Approach to Classification Based on Particle Swarm Optimisation and Rough Set Theory
Dimension reduction aims to remove unnecessary attributes from datasets to overcome the problem of “the curse of dimensionality”, which is an obstacle in classification. Based on the analysis of the limitations of the standard rough set theory, we propose a new dimension reduction approach based on binary particle swarm optimisation (BPSO) and probabilistic rough set theory. The new approach in...
متن کامل