Cross Table Cubing: Mining Iceberg Cubes from Data Warehouses

نویسندگان

  • Jian Pei
  • Moonjung Cho
  • David Wai-Lok Cheung
چکیده

All of the existing (iceberg) cube computation algorithms assume that the data is stored in a single base table, however, in practice, a data warehouse is often organized in a schema of multiple tables, such as star schema and snowflake schema. In terms of both computation time and space, materializing a universal base table by joining multiple tables is often very expensive or even unaffordable in real data warehouses. In this paper, we investigate the problem of computing iceberg cubes from data warehouses. Surprisingly, our study shows that computing iceberg cube from multiple tables directly can be even more efficient in both space and runtime than computing from a materialized universal base table. We develop an efficient algorithm, CTC (for Cross Table Cubing) to tackle the problem. An extensive performance study on synthetic data sets demonstrates that our new approach is efficient and scalable for large data warehouses.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding

Iceberg cubing is a valuable technique in data warehouses. The efficiency of iceberg cube computation comes from efficient aggregation and effective pruning for constraints. In advanced applications, iceberg constraints are often non-monotone and complex, for example, “Average cost in the range [δ1, δ2] and standard deviation of cost less than β”. The current cubing algorithms either are effici...

متن کامل

MDAG-Cubing: A Reduced Star-Cubing Approach

In this paper, we extend the Star-Cubing approach by introducing a new hybrid dimension-based approach to efficiently compute full or iceberg cubes with simple or complex measures. This new approach, named Multidimensional Direct Acyclic Graph Cubing (MDAG-Cubing), introduces the notion of external and internal nodes to reduce the cube representation without loss of generality. The reduced repr...

متن کامل

High-dimensional Hierarchical Olap : a Prefix– Index Hierarchical Cubing Approach

The pre-computation of data cubes is critical for improving the response time of OLAP(online analytical processing) systems and accelerating data mining tasks in large data warehouses. However, as the sizes of data warehouses grow, the time it takes to perform this pre-computation becomes a significant performance bottleneck. In a high dimensional OLAP, it might not be practical to build all th...

متن کامل

OLAP Formulations for Supporting Complex Spatial Objects in Data Warehouses

In recent years, there has been a large increase in the amount of spatial data obtained from remote sensing, GPS receivers, communication terminals and other domains. Data warehouses help in modeling and mining large amounts of data from heterogeneous sources over an extended period of time. However incorporating spatial data into data warehouses leads to several challenges in data modeling, ma...

متن کامل

Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration

Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down vs. bottomup. The former, represented by the MultiWay Array Cube (called MultiWay) algorithm [25], aggregates simultaneously on multiple dimensions; however, it cannot take advantage of Apriori pruning [2] when computing iceberg cubes (c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005