Sorted Sliding Window Compression
نویسنده
چکیده
Sorted Sliding Window Compression (SSWC) uses a new model (Sorted Sliding Window Model | SSWM) to encode strings e cient, which appear again while encoding a symbol sequence. The SSWM holds statistics of all strings up to certain length k in a sliding window of size n (the sliding window is de ned like in lz77). The compression program can use the SSWM to determine if the string of the next symbols are already contained in the sliding window and returns the length of match. SSWM gives directly statistics (borders of subinterval in an interval) for use in entropy encoding methods like Arithmetic Coding or Dense Coding [Gra97]. For a given number in an interval and the string length the SSWM gives back the corresponding string which is used in decompressing. After an encoding (decoding) step the model is updated with the just encoded (decoded) characters. The Model sorts all string starting points in the sliding window lexicographically. A simple way to implement the SSWM is by exhaustive search in the sliding window. An implementation with a B-tree together with special binary searches is used here. SSWC is a simple compression scheme, which uses this new model to evaluate its properties. It looks on the next characters to encode and determines the longest match with the SSWM. If the match is smaller than 2, the character is encoded. Otherwise the length and the subinterval of the string are encoded. The length values are encoded together with the single characters by using the same adaptive frequency model. Additionally some rules are used to reduce the matching length if the code length get worse. Encoding of frequencies and intervals is done with Dense Coding. SSWC is in average better than gzip [Gai93] on the Calgary corpus: 0:2 0:5 bits-per-byte better on most les and at most 0:03 bits-per-byte worse (progc and progl). This proves the quality and gives con dence in the usability of SSWM as a new building block in models for compression. SSWM has O(log k) computing complexity on all operations and needs O(n) space. SSWM can be used to implement PPM or Markov models in limited space environments because it holds all necessary informations.
منابع مشابه
LZW Data Compression on Large Scale and Extreme Distributed Systems
Results on the parallel complexity of Lempel-Ziv data compression suggest that the sliding window method is more suitable than the LZW technique on shared memory parallel machines. When instead we address the practical goal of designing distributed algorithms with low communication cost, sliding window compression does not seem to guarantee robustness if we scale up the system. The possibility ...
متن کاملThe Imaginary Sliding Window As a New Data Structure for Adaptive Algorithms
Abstract.1 The scheme of the sliding window is known in Information Theory, Computer Science, the problem of predicting and in stastistics. Let a source with unknown statistics generate some word . . . x−1x0x1x2 . . . in some alphabet A. For every moment t, t = . . . −1, 0, 1, . . ., one stores the word (”window”) xt−wxt−w+1 . . . xt−1 where w,w ≥ 1, is called ”window length”. In the theory of ...
متن کاملDictionary Compression on the PRAM
Parallel algorithms for lossless data compression via dictionary compression using optimal, longest fragment rst (LFF), and greedy parsing strategies are described. Dictionary compression removes redundancy by replacing substrings of the input by references to strings stored in a dictionary. Given a static dictionary stored as a su x tree, we present a CREW PRAM algorithm for optimal compressio...
متن کاملFDiBC: A Novel Fraud Detection Method in Bank Club based on Sliding Time and Scores Window
One of the recent strategies for increasing the customer’s loyalty in banking industry is the use of customers’ club system. In this system, customers receive scores on the basis of financial and club activities they are performing, and due to the achieved points, they get credits from the bank. In addition, by the advent of new technologies, fraud is growing in banking domain as well. Therefor...
متن کاملMultipath Communication with Finite Sliding Window Network Coding for Ultra-Reliability and Low Latency
We use random linear network coding (RLNC) based scheme for multipath communication in the presence of lossy links with different delay characteristics to obtain ultra-reliability and low latency. A sliding window version of RLNC is proposed where the coded packets are generated using packets in a window size and are inserted among systematic packets in different paths. The packets are schedule...
متن کامل