In-memory URL Compression using AVL Tree
نویسندگان
چکیده
A common problem of large scale search engines and web spiders is how to handle a huge number of encountered URLs. Traditional search engines and web spiders use hard disk to store URLs without any compression. This results in slow performance and more space requirement. This paper describes a simple URL compression algorithm allowing efficient compression and decompression. The compression algorithm is based on a delta encoding scheme to extract URLs sharing common prefixes and an AVL tree to get efficient search speed. Our results show that the 50% of size reduction is achieved.
منابع مشابه
In-memory URL Compression
A common problem of large scale search engines and web spiders is how to handle a huge number of encountered URLs. Traditional search engines and web spiders use hard disk to store URLs without any compression. This results in slow performance and more space requirement. This paper describes a simple URL compression algorithm allowing efficient compression and decompression. The compression alg...
متن کاملNew Combinatorial Properties and Algorithms for AVL Trees
In this thesis, new properties of AVL trees and a new partitioning of binary search trees named core partitioning scheme are discussed, this scheme is applied to three binary search trees namely AVL trees, weight-balanced trees, and plain binary search trees. We introduce the core partitioning scheme, which maintains a balanced search tree as a dynamic collection of complete balanced binary tre...
متن کاملCombining HTM and RCU to Implement Highly Efficient Balanced Binary Search Trees
In this paper we combine Hardware Transactional Memory (HTM) with Read-Copy-Update (RCU) to implement highly scalable concurrent balanced Binary Search Trees (BSTs). The two key features of our approach are: a) read-only operations require no synchronization or restarts and b) tree modifications are first performed in private copies of subtrees, then HTM is used to validate their consistency, a...
متن کاملCache-sensitive Memory Layout for Binary Trees
We improve the performance of main-memory binary search trees (including AVL and red-black trees) by applying cache-sensitive and cacheoblivious memory layouts. We relocate tree nodes in memory according to a multi-level cache hierarchy, also considering the conflict misses produced by set-associative caches. Moreover, we present a method to improve onelevel cache-sensitivity without increasing...
متن کاملAVL Trees with Relaxed Balance
The idea of relaxed balance is to uncouple the rebalancing in search trees from the updating in order to speed up request processing in main-memory databases. In this paper, we describe a relaxed version of AVL trees. We prove that each update gives rise to at most a logarithmic number of rebalancing operations and that the number of rebalancing operations in the semidynamic case is amortized c...
متن کامل