Using Hierarchies, Aggregates and Statistical models to discover Knowledge from Distributed Databases
نویسندگان
چکیده
Data Warehouses and statistical databases (Shoshani 1997) contain both numerical attributes (measures) and categorical attributes (dimensions). These data are often stored within a relational database with an associated hierarchical structure. There are few algorithms to date that explicitly exploit this hierarchical structure when carrying out knowledge discovery on such data. We look at a number of aspects of knowledge discovery from a set of databases distributed over the internet including the following: • Discovery of statistical relationships, rules and exceptions from hierarchically structured data which may contain heterogeneous and non-independent instances; • Use of aggregates as a set of sufficient statistics in place of base data for efficient model computation; • Leveraging the power of a relational database system for efficient computation of sufficient statistics; • Use of statistical metadata to aid distributed data integration and knowledge discovery.
منابع مشابه
Conceptual Clustering of Heterogeneous Distributed Databases
With increasingly more databases becoming available on the Internet, there is a growing opportunity to globalise knowledge discovery and learn general patterns, rather than restricting learning to specific databases from which the rules may not be generalisable. Clustering of distributed databases facilitates learning of new concepts that characterise common features of, and differences between...
متن کاملUsing Concept Hierarchies in Knowledge Discovery
In Data Mining, one of the steps of the Knowledge Discovery in Databases (KDD) process, the use of concept hierarchies as a background knowledge allows to express the discovered knowledge in a higher abstraction level, more concise and usually in a more interesting format. However, data mining for high level concepts is more complex because the search space is generally too big. Some data minin...
متن کاملText Modeling using Unsupervised Topic Models and Concept Hierarchies
Statistical topic models provide a general data-driven framework for automated discovery of highlevel knowledge from large collections of text documents. While topic models can potentially discover a broad range of themes in a data set, the interpretability of the learned topics is not always ideal. Human-defined concepts, on the other hand, tend to be semantically richer due to careful selecti...
متن کاملClustering classifiers for knowledge discovery from physically distributed databases
Most distributed classification approaches view data distribution as a technical issue and combine local models aiming at a single global model. This however, is unsuitable for inherently distributed databases, which are often described by more than one classification models that might differ conceptually. In this paper we present an approach for clustering distributed classifiers in order to d...
متن کاملClustering Algorithm for Large-Scale Databases
Clustering systems can discover intentional structures in data and extract new knowledge from a database. Many incremental and non-incremental clustering algorithms have been proposed, but they have some problems. Incremental algorithms work very efficiently, but their performance is strongly affected by the input order of instances. On the other hand, non-incremental algorithms are independent...
متن کامل