Multi-Dimensional Analysis of Data Streams Using Stream Cubes

نویسندگان

  • Jiawei Han
  • Yandong Cai
  • Yixin Chen
  • Guozhu Dong
  • Jian Pei
  • Benjamin W. Wah
  • Jianyong Wang
چکیده

Large volumes of dynamic stream data pose great challenges to its analysis. Besides its dynamic and transient behavior, stream data has another important characteristic: multi-dimensionality. Much of stream data resides at a multidimensional space and at rather low level of abstraction, whereas most analysts are interested in relatively high-level dynamic changes in some combination of dimensions. To discover high-level dynamic and evolving characteristics, one may need to perform multi-level, multi-dimensional on-line analytical processing (OLAP) of stream data. Such necessity calls for the investigation of new architectures that may facilitate on-line analytical processing of multi-dimensional stream data. In this chapter, we introduce an interesting stream-cube architecture that effectively performs on-line partial aggregation of multi-dimensional stream data, captures the essential dynamic and evolving characteristics of data streams, and facilitates fast OLAP on stream data. Three important techniques are proposed for DATA STREAMS: MODELS AND ALGORITHMS the design and implementation of stream cubes. First, a tilted time frame model is proposed to register time-related data in a multi-resolution model: The more recent data are registered at finer resolution, whereas the more distant data are registered at coarser resolution. This design reduces the overall storage requirements of time-related data and adapts nicely to the data analysis tasks commonly encountered in practice. Second, instead of materializing cuboids at all levels, two critical layers: observation layer and minimal interesting layer, are maintained to support routine as well as flexible analysis with minimal computation cost. Third, an efficient stream data cubing algorithm is developed that computes only the layers (cuboids) along apopularpath and leaves the other cuboids for on-line, query-driven computation. Based on this design methodology, stream data cube can be constructed and maintained incrementally with reasonable memory space, computation cost, and query response time. This is verified by our substantial performance study. Stream cube architecture facilitates online analytical processing of stream data. It also forms a preliminary structure for online stream mining. The impact of the design and implementation of stream cube in the context of stream mining is also discussed in the chapter.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Influence of Stream channel morphology and in-stream habitats on fish community in Golestan province Streams

Four streams with different sizes were selected for studying the effects of environmental factors on fish assemblages using indirect (Detrended Correspondence Analysis, DCA) and direct (Redundancy Analysis, RDA) gradient analysis in Golestan province. DCA of presence-absence and relative abundance data showed well gradient and linear model of species variability. In the within-site RDA, environ...

متن کامل

Mining Unusual Patterns by Multi-Dimensional Analysis of Data Streams

It has been popularly recognized that stream data represents an important form of data, with broad applications. There have been a lot of studies on effective stream data management and query processing, as well as some recent studies on stream data mining. Although this is a promising direction, most existing studies have not paid enough attention to one critical fact: most data streams reside...

متن کامل

An Adaptive Grid-based Method for Clustering Multi- Dimensional Online Data Streams

Clustering is an important task in mining the evolving data streams. A lot of data streams are high dimensional in nature. Clustering in the high dimensional data space is a complex problem, which is inherently more complex for data streams. Most data stream clustering methods are not capable of dealing with high dimensional data streams; therefore they sacrifice the accuracy of clusters. In or...

متن کامل

Application of Markov-Chain Analysis and Stirred Tanks in Series Model in Mathematical Modeling of Impinging Streams Dryers

In spite of the fact that the principles of impinging stream reactors have been developed for more than half a century, the performance analysis of such devices, from the viewpoint of the mathematical modeling, has not been investigated extensively. In this study two mathematical models were proposed to describe particulate matter drying in tangential impinging stream dryers. The models were de...

متن کامل

Improved K-mean clustering with Mobile Agent

The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and click streams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. D-Stream algorithm is an extended grid-based clustering algorithm for different dimen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007