Efficacious Data Cube Exploration by Semantic Summarization and Compression
نویسندگان
چکیده
Data cube is the core operator in data warehousing and OLAP. Its efficient computation, maintenance, and utilization for query answering and advanced analysis have been the subjects of numerous studies. However, for many applications, the huge size of the data cube limits its applicability as a means for semantic exploration by the user. Recently, we have developed a systematic approach to achieve efficacious data cube construction and exploration by semantic summarization and compression. Our approach is pivoted on a notion of quotient cube that groups together structurally related data cube cells with common (aggregate) measure values into equivalence classes. The equivalence relation used to partition the cube lattice preserves the rollup/drill-down semantics of the data cube, in that the same kind of explorations can be conducted in the quotient cube as in the original cube, between classes instead of between cells. We have also developed compact data structures for representing a quotient cube and efficient algorithms for answering queries using a quotient cube for its incremental maintenance against updates. We have implemented SOCQET, a prototype data warehousing system making use of our results on quotient cube. In this demo, we will demonstrate (1) the critical Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 29th VLDB Conference, Berlin, Germany, 2003 techniques of building a quotient cube; (2) use of a quotient cube to answer various queries and to support advanced OLAP; (3) an empirical study on the effectiveness and efficiency of quotient cube-based data warehouses and OLAP; (4) a user interface for visual and interactive OLAP; and (5) SOCQET, a research prototype data warehousing system integrating all the techniques. The demo reflects our latest research results and may stimulate some interesting future studies.
منابع مشابه
On the Effectiveness of using Sentence Compression Models for Query-Focused Multi-Document Summarization
This paper applies sentence compression models for the task of query-focused multi-document summarization in order to investigate if sentence compression improves the overall summarization performance. Both compression and summarization are considered as global optimization problems and solved using integer linear programming (ILP). Three different models are built depending on the order in whi...
متن کاملAn Approach to Image Compression Using Three-Dimensional DCT
In this paper we propose a novel approach to image compression based on three-dimensional Discrete Cosine Transformation (DCT). The basic idea is to de-correlate similar pixel blocks through three-dimensional DCT transformation. A number of adjacent pixel blocks are grouped together to form a three-dimensional data cube. Each data cube is 3D DCT transformed, quantized, and Huffman encoded. Expe...
متن کاملText summarization using a trainable summarizer and latent semantic analysis
This paper proposes two approaches to address text summarization: modified corpus-based approach (MCBA) and LSA-based T.R.M. approach (LSA+T.R.M.). The first is a trainable summarizer, which takes into account several features, including position, positive keyword, negative keyword, centrality, and the resemblance to the title, to generate summaries. Two new ideas are exploited: (1) sentence po...
متن کاملData Cube Compression with QuantiCubes
Data warehouses typically store a multidimensional fact representation of the data that can be used in any type of analysis. Many applications materialize data cubes as multidimensional arrays for fast, direct and random access to values. Those data cubes are used for exploration, with operations such as roll-up, drill-down, slice and dice. The data cubes can become very large, increasing the a...
متن کاملChinese Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis
In this paper, two novel approaches are proposed to extract important sentences from a document to create its summary. The first is a corpus-based approach using feature analysis. It brings up three new ideas: 1) to employ ranked position to emphasize the significance of sentence position, 2) to reshape word unit to achieve higher accuracy of keyword importance, and 3) to train a score function...
متن کامل