Database Clustering and Data Warehousing
نویسندگان
چکیده
Due to the complexity of real-world applications, the number of databases and the volume of data have increased tremendously. Discovering qualitative and quantitative patterns from databases in such a distributed information-providingenvironment has been recognized as a challenging task. In response to such a demand, data mining and data warehousing techniques are emerging to extract the previously unknown and potentially useful knowledge to provide better decision support. This paper presents a mechanism called Markov Model Mediators (MMMs) to facilitate the understanding of the data warehouse schemas/views and the improvement of the query processing performance by analyzing and discovering the summarized knowledge at the database level. Simulation results show that the data mining process leads to a better federation of data warehouses and reduces the cost of query processing. To illustrate these beneets, our approach has been implemented and a simple example and several experiments on real databases are presented.
منابع مشابه
Transbase: a Leading-edge ROLAP Engine Supporting Multidimensional Indexing and Hierarchy Clustering
Analysis-oriented database applications, such as data warehousing or customer relationship management, play a crucial role in the database area. In general, the multidimensional data model is used in these applications, realized as star or snow-flake schemata in the relational world. The so-called star queries are the prevalent type of queries on such schemata. All database vendors have extende...
متن کاملRough Set Theory and Fuzzy Logic Based Warehousing of Heterogeneous Clinical Databases
Large amounts of data about the patients with their medical conditions are presented in the Medical databases. Analyzing all these databases is one of the difficult tasks in the medical environment. In order to warehouse all these databases and to analyze the patient‟s condition, we need an efficient data mining technique. In this paper, an efficient data mining technique for warehousing clinic...
متن کاملEfficient Bulk Deletes for Multi Dimensionally Clustered Tables in DB2
In data warehousing applications, the ability to efficiently delete large chunks of data from a table is very important. This feature is also known as Rollout or Bulk Deletes. Rollout is generally carried out periodically and is often done on more than one dimension or attribute. The ability to efficiently handle the updates of RID indexes while doing Rollouts is a well known problem for databa...
متن کاملConceptual Clustering of Heterogeneous Distributed Databases
With increasingly more databases becoming available on the Internet, there is a growing opportunity to globalise knowledge discovery and learn general patterns, rather than restricting learning to specific databases from which the rules may not be generalisable. Clustering of distributed databases facilitates learning of new concepts that characterise common features of, and differences between...
متن کاملIncremental Clustering for Mining in a Data Warehousing Environment
Data warehouses provide a great deal of opportunities for performing data mining tasks such as classification and clustering. Typically, updates are collected and applied to the data warehouse periodically in a batch mode, e.g., during the night. Then, all patterns derived from the warehouse by some data mining algorithm have to be updated as well. Due to the very large size of the databases, i...
متن کامل