Block-Oriented Compression Techniques for Large Statistical Databases

نویسندگان

  • Wee Keong Ng
  • Chinya V. Ravishankar
چکیده

Disk I/O has long been a performance bottleneck for very large databases. Database compression can be used to reduce disk I/O bandwidth requirements for large data transfers. In this paper, we explore the compression of large statistical databases and propose techniques for organizing the compressed data such that standard database operations such as retrievals, inserts, deletes and modifications are supported. We examine the applicability and performance of three methods. Two of these are adaptations of existing methods, but the third, called Tuple Differential Coding (TDC) [16], is a new method that allows conventional access mechanisms to be used with the compressed data to provide efficient access. We demonstrate how the performance of queries that involve large data transfers can be improved with these database compression techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accepted for Publication in the Ieee Transactions for Knowledge and Data Engineering Block-oriented Compression Techniques for Large Statistical Databases

Disk I/O has long been a performance bottleneck for very large databases. Database compression can be used to reduce disk I/O bandwidth requirements for large data transfers. In this paper, we explore the compression of large statistical databases and propose techniques for organizing the compressed data such that standard database operations such as retrievals, inserts, deletes and modiication...

متن کامل

Optimizations and Heuristics to improve Compression in Columnar Database Systems

In-memory columnar databases have become mainstream over the last decade and have vastly improved the fast processing of large volumes of data through multi-core parallelism and in-memory compression thereby eliminating the usual bottlenecks associated with disk-based databases. For scenarios, where the data volume grows into terabytes and petabytes, keeping all the data in memory is exorbitant...

متن کامل

Accessing Data in Block-Compressed Data Warehouses

The large size of most data warehouses (typically hundreds of gigabytes to terabytes), which results in non-trivial storage costs, makes compression techniques attractive for warehousing environments. In particular, block-level compression (as opposed to attribute or record level schemes) has been shown to achieve the greatest reductions in storage size for databases. A key issue is how to quic...

متن کامل

Access and Retrieval from Image Databases Using Image Thumbnails

The emerging role of thumbnail images in the selection of images for display from large image databases and via network access requires that the characteristics of thumbnails be seriously studied. We introduce a measure of effective compression Ceff(p) that is a function of the probability p that thumbnail access will be followed by the display of a full-scale image. For credible values of p, w...

متن کامل

MILC: Inverted List Compression in Memory

Inverted list compression is a topic that has been studied for 50 years due to its fundamental importance in numerous applications including information retrieval, databases, and graph analytics. Typically, an inverted list compression algorithm is evaluated on its space overhead and query processing time. Earlier list compression designs mainly focused on minimizing the space overhead to reduc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Knowl. Data Eng.

دوره 9  شماره 

صفحات  -

تاریخ انتشار 1997