An Efficient Multi-relational Naïve Bayesian Classifier Based on Semantic Relationship Graph
نویسندگان
چکیده
Classification is one of the most popular data mining tasks with a wide range of applications, and lots of algorithms have been proposed to build accurate and scalable classifiers. Most of these algorithms only take a single table as input, whereas in the real world most data are stored in multiple tables and managed by relational database systems. As transferring data from multiple tables into a single one usually causes many problems, development of multi-relational classification algorithms becomes important and attracts many researchers’ interests. Existing works about extending Naïve Bayes to deal with multi-relational data either have to transform data stored in tables to mainmemory Prolog facts, or limit the search space to only a small subset of real world applications. In this work, we aim at solving these problems and building an efficient, accurate Naïve Bayesian classifier to deal with data in multiple tables directly. We propose an algorithm named Graph-NB, which upgrades Naïve Bayesian classifier to deal with multiple tables directly. In order to take advantage of linkage relationships among tables, and treat different tables linked to the target table differently, a semantic relationship graph is developed to describe the relationship and to avoid unnecessary joins. Furthermore, to improve accuracy, a pruning strategy is given to simplify the graph to avoid examining too many weakly linked tables. Experimental study on both realworld and synthetic databases shows its high efficiency and good accuracy.
منابع مشابه
Entropy Based Feature Selection For Multi-Relational Naïve Bayesian Classifier
Current industries data’s are stored in relation structures. In usual approach to mine these data, we often use to join several relations to form a single relation using foreign key links, which is known as flatten. Flatten may cause troubles such as time consuming, data redundancy and statistical skew on data. Hence, the critical issues arise that how to mine data directly on numerous relation...
متن کاملAugmented Naïve Bayesian Model of Classification Learning
The Naïve Bayesian Classifier and an Augmented Naïve Bayesian Classifier are applied to human classification tasks. The Naïve Bayesian Classifier is augmented with feature construction using a Galois lattice. The best features, measured on their withinand between-category overlap, are added to the category’s concept description. The results show that space efficient concept descriptions can pre...
متن کاملClassification Using Naïve Bayes- a Survey
Classification, particularly Text Classification, is a supervised learning approach categorizing into various categories, the available training set of correctly identified observations analyzed into a set of features. There are many phases involved in classification. The main classification phase involves the use of classification algorithms or classifiers. Among the various classifiers, the N...
متن کاملEffective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining
As a probability-based statistical classification method, the Naïve Bayesian classifier has gained wide popularity despite its assumption that attributes are conditionally mutually independent given the class label. Improving the predictive accuracy and achieving dimensionality reduction for statistical classifiers has been an active research area in datamining. Our experimental results suggest...
متن کاملA Heterogeneous Naive-Bayesian Classifier for Relational Databases
© A Heterogeneous Naive-Bayesian Classifier for Relational Databases Geetha Manjunath, M Narasimha Murty, Dinkar Sitaram HP Laboratories HPL-2009-225 Relational databases, Classification, Data Mining, RDF Most enterprise data is distributed in multiple relational databases with expert-designed schema. Application of single-table data mining techniques to distributed relational data not only inc...
متن کامل