Graph Model Selection using the Minimum Description Length Principle
ثبت نشده
چکیده
In recent years, there has been a proliferation of theoretical graph models, e.g., preferential attachment, motivated by real-world graphs such as the Web or Internet topology. Typically these models are designed to mimic particular properties observed in the graphs, such as power-law degree distribution or the small-world phenomenon. The mainstream approach to comparing models for these graphs has been somewhat subjective and very application dependent — comparisons are often based on ad hoc graph properties. We use the Minimum Description Length principle to compare graph models: models are scored based on the degree of compression that they achieve on real data. This principle is popular across fields for various types of model selection because it is objective and not application specific. Unfortunately, computing this metric is usually a daunting algorithmic task, especially for existing models that were not designed with this metric in mind. To illustrate the feasibility of our approach, we design and implement sophisticated algorithms for computing the description length for four natural models: a power-law random graph model, a preferential attachment model, a small-world model, and a uniform random graph model. Based on experiments on three snapshots of the Internet topology graph, we find that the preferential attachment model ranks highest, while the uniform random graph model performs the worst. We hope that this metric will enable a more objective model comparison and the development of improved models.
منابع مشابه
Uncertainty Measures of Rough Set Prediction
The main statistics used in rough set data analysis, the approximation quality, is of limited value when there is a choice of competing models for predicting a decision variable. In keeping within the rough set philosophy of non–invasive data analysis, we present three model selection criteria, using information theoretic entropy in the spirit of the minimum description length principle. Our ma...
متن کاملModel Selection using Information Theory and the MDL Principle ∗
Information theory offers a coherent, intuitive view of model selection. This perspective arises from thinking of a statistical model as a code, an algorithm for compressing data into a sequence of bits. The description length is the length of this code for the data plus the length of a description of the model itself. The length of the code for the data measures the fit of the model to the dat...
متن کاملMinimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles minimum description length (MDL) and minimum message length (MML), abstracted as the ideal MDL principle and defined from Bayes’s rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be app...
متن کاملAlgorithmic Complexity and Structural Models of Social Networks∗
This article explores how the algorithmic complexity approach can be used to address the problem of identifying group structures in social networks. A specific implementation of the algorithmic complexity approach based on the principle of minimum description length (MDL) is compared to other model selection criteria, and compared and contrasted with a Bayesian approach to model selection. The ...
متن کاملThe Minimum Description Length Principle in Coding and Modeling
We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon’s basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to ...
متن کامل