Data complexity measured by principal graphs
نویسندگان
چکیده
How to measure the complexity of a finite set of vectors embedded in a multidimensional space? This is a non-trivial question which can be approached in many different ways. Here we suggest a set of data complexity measures using universal approximators, principal cubic complexes. Principal cubic complexes generalise the notion of principal manifolds for datasets with nontrivial topologies. The type of the principal cubic complex is determined by its dimension and a grammar of elementary graph transformations. The simplest grammar produces principal trees. We introduce three natural types of data complexity: 1) geometric (deviation of the data’s approximator from some “idealized” configuration, such as deviation from harmonicity); 2) structural (how many elements of a principal graph are needed to approximate the data), and 3) construction complexity (how many applications of elementary graph transformations are needed to construct the principal object starting from the simplest one). We compute these measures for several simulated and real-life data distributions and show them in the “accuracy-complexity” plots, helping to optimize the accuracy/complexity ratio. We discuss various issues connected with measuring data complexity. Software for computing data complexity measures from principal cubic complexes is provided as well.
منابع مشابه
Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members
Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...
متن کاملRobust principal graphs for data approximation
Revealing hidden geometry and topology in noisy data sets is a challenging task. Elastic principal graphs is a computationally efficient and flexible data approximator based on embedding a graph into the data space and minimizing the energy functional penalizing the deviation of graph nodes both from data points and from pluri-harmonic configuration (generalization of linearity). The structure ...
متن کاملThe effect of knowledge based economic indicators on the countries' economic complexity
Countries’ economic growth and development are significantly dependent on their productive capacity. In this research, we aimed to investigate which components of a knowledge-based economy has a more meaningful role in the production capacity. In order to measure production capacity, we used one of the most up-to-date indexes, the economic complexity index. The research used data panel consist...
متن کاملComplexity and approximation ratio of semitotal domination in graphs
A set $S subseteq V(G)$ is a semitotal dominating set of a graph $G$ if it is a dominating set of $G$ andevery vertex in $S$ is within distance 2 of another vertex of $S$. Thesemitotal domination number $gamma_{t2}(G)$ is the minimumcardinality of a semitotal dominating set of $G$.We show that the semitotal domination problem isAPX-complete for bounded-degree graphs, and the semitotal dominatio...
متن کاملTopographical complexity of multidimensional energy landscapes.
A scheme for visualizing and quantifying the complexity of multidimensional energy landscapes and multiple pathways is presented employing principal component-based disconnectivity graphs and the Shannon entropy of relative "sizes" of superbasins. The principal component-based disconnectivity graphs incorporate a metric relationship between the stationary points of the system, which enable us t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computers & Mathematics with Applications
دوره 65 شماره
صفحات -
تاریخ انتشار 2013