Data Stream Warehousing In Tidalrace
نویسندگان
چکیده
Big data is a ubiquitous feature of large modern enterprises. Many organizations generate huge amounts of on-line streaming data – examples include network monitoring, Twitter feeds, financial data, and industrial application monitoring. Making effective use of these data streams can be challenging. While Data Stream Management Systems can provide support for realtime alerting and data reduction, many applications require complex analytics on a data history to best make use of the streams. We have been developing technologies for data stream warehousing, starting with the DataDepot [13] system. A data stream warehouse continually ingests data streams, computes complex derived data products, and stores long (perhaps yearslong) histories. To take advantage of new technologies, we have developed a next-generation data stream warehousing system. In this paper we describe the Tidalrace system, our motivations for developing it, and architectural features of Tidalrace that support data stream warehousing.
منابع مشابه
An Efficient Stream-based Join to Process End User Transactions in Real-Time Data Warehousing
In the field of real-time data warehousing semistream processing has become a potential area of research since last one decade. One important operation in semi-stream processing is to join stream data with a slowly changing diskbased master data. A join operator is usually required to implement this operation. This join operator typically works under limited main memory and this memory is gener...
متن کاملLahar Demonstration: Warehousing Markovian Streams
Lahar is a warehousing system for Markovian streams—a common class of uncertain data streams produced via inference on probabilistic models. Example Markovian streams include text inferred from speech, location streams inferred from GPS or RFID readings, and human activity streams inferred from sensor data. Lahar supports OLAP-style queries on Markovian stream archives by leveraging novel appro...
متن کاملOptimizing Queue-Based Semi-Stream Joins with Indexed Master Data
In Data Stream Management Systems (DSMS) semi-stream processing has become a popular area of research due to the high demand of applications for up-to-date information (e.g. in real-time data warehousing). A common operation in stream processing is joining an incoming stream with disk-based master data, also known as semi-stream join. This join typically works under the constraint of limited ma...
متن کاملHistograms for OLAP and Data-Stream Queries
Histograms are an important tool for data reduction both in the field of data-stream querying and in OLAP, since they allow us to represent large amount of data in a very compact structure, on which both efficient mining techniques and OLAP queries can be executed. Significant timeand memory-cost advantages may derive from data reduction, but the trade-off with the accuracy has to be managed in...
متن کاملDatabase Research at UT Arlington ( ITLab @ CSE . UTA )
The Information Technology Laboratory (or ITLab) at the Computer Science and Engineering Department at The University of Texas at Arlington was established by Sharma Chakravarthy in Spring 2000. The mission of the ITLab is to conduct research and development on all aspects of information technology. Some of the topics currently being investigated are: Data Warehousing/Information Integration, D...
متن کامل