Discovering Attribute Dependence in Databases by Integrating Symbolic Learning and Statistical Techniques
نویسندگان
چکیده
The paper presents a method for integrating two different types of data analysis: symbolic inductive learning and statistical methods. The method concerns the problem of discovering rules characterizing the dependence between a group of dependent attributes and a group of independent groups, e.g., decision attributes and measurable attributes. The method proceeds along two paths. One paths generates a set of rules characterizing this dependency using a symbolic inductive learning program (AQ15). In the obtained rules, each attribute is assigned "importance score" which represents the ratio of the total number of examples that can be classified by this attribute to the maximum number of classified examples. Second path calculates the correlation coefficient between the decision and measurable attributes using the Chi-square test. The independent attributes with low "importance score" and low correlation are considered irrelevant, and removed from the data. The AQ15 is applied again to the data set spanned over the reduced set of attributes, in order to determine the simplest expression characterizing the dependency. The method has been experimentally applied to two real world problems: "Wind bracing" Mto determine the dependence of the type of wind-bracing for tall buildings on the parameters of the building, and "Accident factors"Mto determine the dependence between the age of construction workers and the types of accidents. The results have shown that the proposed approach of combining two different methods for relevant attribute determination gives better results than those obtained by applying either method separately.
منابع مشابه
Inductive learning from numerical and symbolic data: An integrated framework
Numerical induction from data is one of the statistical data analysis tasks, which uses a tabular model, with almost exclusively numerical features, as data representation formalism. The output representations are different: from functions to probability distributions, from surface equations to tables of indexes. One approach to extend the classical data analysis techniques to symbolic objects ...
متن کاملLearning Fuzzy Rules from Fuzzy Decision Trees
Classification rules are an important tool for discovering knowledge from databases. Integrating fuzzy logic algorithms into databases allows us to reduce uncertainty which is connected with data in databases and to increase discovered knowledge’s accuracy. In this paper, we analyze some possible variants of making classification rules from a given fuzzy decision based on cumulative information...
متن کاملData Mining using Nonmonotonic Connectionist Expert Systems
An application of Nonmonotonic Connectionist Expert Systems (NCESs) in mining classi cation rules from large relational databases is presented. NCESs are hybrid learning systems that can acquire symbolic knowledge of a nonmonotonic domain, represented using nonmonotonic inheritance networks. This initial knowledge can be re ned using connectionist learning techniques and a set of classi ed exam...
متن کاملDiscovering Maximal Generalized Decision Rules Through Horixontal and Vertical Data Reduction
We present a method to learn maximal generalized decision rules from databases by integrating discretization, generalization and rough set feature selection. Our method reduces the data horizontally and vertically. In the first phase, discretization and generalization are integrated and the numeric attributes are discretized into a few intervals. The primitive values of symbolic attributes are ...
متن کاملFrom unsupervised learning to data mining: linking cognition and data analysis
Recently, Knowledge Discovery on Databases (KDD) has emerged as a promising research area encompassing methods from several disciplines. Particularly, the data mining step of KDD shares most of its goals with unsupervised learning. But data mining methods are biased towards statistical techniques arguing that Machine Learning (ML) methods are not suitable to deal with real-world databases. We c...
متن کامل