Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration
نویسندگان
چکیده
Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down vs. bottomup. The former, represented by the MultiWay Array Cube (called MultiWay) algorithm [25], aggregates simultaneously on multiple dimensions; however, it cannot take advantage of Apriori pruning [2] when computing iceberg cubes (cubes that contain only aggregate cells whose measure value satisfies a threshold, called iceberg condition). The latter, represented by two algorithms: BUC [6] and H-Cubing[11], computes the iceberg cube bottom-up and facilitates Apriori pruning. BUC explores fast sorting and partitioning techniques; whereas H-Cubing explores a data structure, H-Tree, for shared computation. However, none of them fully explores multi-dimensional simultaneous aggregation. In this paper, we present a new method, StarCubing, that integrates the strengths of the previous three algorithms and performs aggregations on multiple dimensions simultaneously. It utilizes a star-tree structure, extends the simultaneous aggregation methods, and enables the pruning of the group-by’s that do not satisfy the iceberg condition. Our performance study shows that Star-Cubing is highly efficient and outperforms all the previous methods in almost all kinds of data distributions. Work supported in part by U.S. National Science Foundation NSF IIS-02-09199, the University of Illinois, and Microsoft Research. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 29th VLDB Conference, Berlin, Germany, 2003
منابع مشابه
Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding
Iceberg cubing is a valuable technique in data warehouses. The efficiency of iceberg cube computation comes from efficient aggregation and effective pruning for constraints. In advanced applications, iceberg constraints are often non-monotone and complex, for example, “Average cost in the range [δ1, δ2] and standard deviation of cost less than β”. The current cubing algorithms either are effici...
متن کاملCross Table Cubing: Mining Iceberg Cubes from Data Warehouses
All of the existing (iceberg) cube computation algorithms assume that the data is stored in a single base table, however, in practice, a data warehouse is often organized in a schema of multiple tables, such as star schema and snowflake schema. In terms of both computation time and space, materializing a universal base table by joining multiple tables is often very expensive or even unaffordabl...
متن کاملMDAG-Cubing: A Reduced Star-Cubing Approach
In this paper, we extend the Star-Cubing approach by introducing a new hybrid dimension-based approach to efficiently compute full or iceberg cubes with simple or complex measures. This new approach, named Multidimensional Direct Acyclic Graph Cubing (MDAG-Cubing), introduces the notion of external and internal nodes to reduce the cube representation without loss of generality. The reduced repr...
متن کاملMultiway Iceberg Cubing on Trees
The Star-cubing algorithm performs multiway aggregation on trees but incurs huge memory consumption. We propose a new algorithm MG-cubing that achieves maximal multiway aggregation. Our experiments show that MG-cubing achieves similar and very often better time and memory efficiency than Star-cubing.
متن کاملAn Empirical Comparison of Methods for Iceberg-CUBE Construction
The Iceberg-Cube problem is to apply an aggregate function over a set of attributes to determine which combinations of attribute values are above a specified aggregate threshold. We implemented bottom-up and top-down methods for this problem. The bottom-down method we used already used pruning. Results show that even when the top-down method employed pruning, it was slower than the bottom-up me...
متن کامل