Block-Oriented Compression Techniques for Large Statistical Databases
نویسندگان
چکیده
Disk I/O has long been a performance bottleneck for very large databases. Database compression can be used to reduce disk I/O bandwidth requirements for large data transfers. In this paper, we explore the compression of large statistical databases and propose techniques for organizing the compressed data such that standard database operations such as retrievals, inserts, deletes and modifications are supported. We examine the applicability and performance of three methods. Two of these are adaptations of existing methods, but the third, called Tuple Differential Coding (TDC) [16], is a new method that allows conventional access mechanisms to be used with the compressed data to provide efficient access. We demonstrate how the performance of queries that involve large data transfers can be improved with these database compression techniques.
منابع مشابه
Accepted for Publication in the Ieee Transactions for Knowledge and Data Engineering Block-oriented Compression Techniques for Large Statistical Databases
Disk I/O has long been a performance bottleneck for very large databases. Database compression can be used to reduce disk I/O bandwidth requirements for large data transfers. In this paper, we explore the compression of large statistical databases and propose techniques for organizing the compressed data such that standard database operations such as retrievals, inserts, deletes and modiication...
متن کاملOptimizations and Heuristics to improve Compression in Columnar Database Systems
In-memory columnar databases have become mainstream over the last decade and have vastly improved the fast processing of large volumes of data through multi-core parallelism and in-memory compression thereby eliminating the usual bottlenecks associated with disk-based databases. For scenarios, where the data volume grows into terabytes and petabytes, keeping all the data in memory is exorbitant...
متن کاملAccessing Data in Block-Compressed Data Warehouses
The large size of most data warehouses (typically hundreds of gigabytes to terabytes), which results in non-trivial storage costs, makes compression techniques attractive for warehousing environments. In particular, block-level compression (as opposed to attribute or record level schemes) has been shown to achieve the greatest reductions in storage size for databases. A key issue is how to quic...
متن کاملAccess and Retrieval from Image Databases Using Image Thumbnails
The emerging role of thumbnail images in the selection of images for display from large image databases and via network access requires that the characteristics of thumbnails be seriously studied. We introduce a measure of effective compression Ceff(p) that is a function of the probability p that thumbnail access will be followed by the display of a full-scale image. For credible values of p, w...
متن کاملMILC: Inverted List Compression in Memory
Inverted list compression is a topic that has been studied for 50 years due to its fundamental importance in numerous applications including information retrieval, databases, and graph analytics. Typically, an inverted list compression algorithm is evaluated on its space overhead and query processing time. Earlier list compression designs mainly focused on minimizing the space overhead to reduc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Knowl. Data Eng.
دوره 9 شماره
صفحات -
تاریخ انتشار 1997