Selecting One Dependency Estimators in Bayesian Network Using Different MDL Scores and Overfitting Criterion

نویسندگان

  • Meng Han
  • Zhihai Wang
  • Yashu Liu
چکیده

The Averaged One Dependency Estimator (AODE) is integrated all possible Super-Parent-One-Dependency Estimators (SPODEs) and estimates class conditional probabilities by averaging them. In an AODE network some redundant SPODEs maybe result in some bias of classifiers, as a consequence, it could reduce the classification accuracy substantially. In this paper, a kind of MDL metrics is used to select SPODEs in a whole or partially, therefore there are three different classifiers presented. The performance comparisons between them and AODE have been shown not only the theoretical analyses are reasonable, but also efficient and effective. And Mean Square Error (MSE) is used to test overfitting. Experiential results have indicated that the classifier using MDL score metrics had better performance than original AODE, and at the same time, has less overfitting. At the end of the paper, further discussions and verifications of some properties of overfitting have also shown in the experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How Good Is Crude MDL for Solving the Bias-Variance Dilemma? An Empirical Investigation Based on Bayesian Networks

The bias-variance dilemma is a well-known and important problem in Machine Learning. It basically relates the generalization capability (goodness of fit) of a learning method to its corresponding complexity. When we have enough data at hand, it is possible to use these data in such a way so as to minimize overfitting (the risk of selecting a complex model that generalizes poorly). Unfortunately...

متن کامل

Calculating the Nml Distribution for Tree-structured Bayesian Networks

We are interested in model class selection. We want to compute a criterion which, given two competing model classes, chooses the better one. When learning Bayesian network structures from sample data, an important issue is how to evaluate the goodness of alternative network structures. Perhaps the most commonly used model (class) selection criterion is the marginal likelihood, which is obtained...

متن کامل

Scoring functions for learning Bayesian networks

The aim of this work is to benchmark scoring functions used by Bayesian network learning algorithms in the context of classification. We considered both information-theoretic scores, such as LL, AIC, BIC/MDL, NML and MIT, and Bayesian scores, such as K2, BD, BDe and BDeu. We tested the scores in a classification task by learning the optimal TAN classifier with benchmark datasets. We conclude th...

متن کامل

Model selection based on Bayesian predictive densities and multiple data records

Bayesian predictive densities are used to derive model selection rules. The resulting rules hold for sets of data records where each record is composed of an unknown number of deterministic signals common to all the records and a stationary white Gaussian noise. To determine the correct model, the set of data records is partitioned into two disjoint subsets. One of the subsets is used for estim...

متن کامل

On the importance of using treewidth as a model-selection criterion for learning Bayesian networks

This paper is motivated by the desire to learn Bayesian networks that allow efficient inference. Traditionally, model selection criteria such as BIC/MDL focus on learning Bayesian networks that fit the data and have low representation complexity (i.e. the number of parameters needed to specify the network). However, these criteria do not take into account the complexity of inference in the resu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Inf. Sci. Eng.

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2014