Virtual Cache Line: A New Technique to Improve Cache Exploitation for Recursive Data Structures
نویسندگان
چکیده
Recursive data structures (lists, trees, graphs, etc.) are used throughout scientific and commercial software. The common approach is to allocate storage to the individual nodes of such structures dynamically, maintaining the logical connection between them via pointers. Once such a data structure goes through a sequence of updates (inserts and deletes), it may get scattered all over memory yielding poor spatial locality, which in turn introduces many cache misses. In this paper we present the new concept of Virtual Cache Lines (VCLs). Basically, the mechanism keeps groups of consecutive nodes in close proximity–forming virtual cache lines–while allowing the groups to be stored arbitrarily far away from each other. Virtual cache lines increase the spatial locality of the given data structure resulting in better locality of references. Furthermore, since the spatial locality is improved, software prefetching becomes much more attractive. Indeed, we also present a software prefetching algorithm that can be used when dealing with VCLs, resulting in even higher data cache performance. Our results show that the average performance of linked list operations–like scan, insert, and delete–can be improved by more than 200% even in architectures that do not support prefetching. Moreover, when using prefetching, one can gain additional 100% improvement. We believe that given a program that manipulates certain recursive data structures, compilers will be able to generate VCL-based code. Until this vision becomes true, VCLs can be used to build more efficient user libraries, operating-systems, and applications programs.
منابع مشابه
Using Virtual Lines to Enchance Locality Exploitation
Because the spatial locality of numerical codes is sig-niicant, the potential for performance improvements is important. However, large cache lines cannot be used in current on-chip caches because of the important pollution they breed. In this paper, we propose a hardware design, called the Virtual Line Scheme, that allows the utilization of large virtual cache lines on when fetching data from ...
متن کاملPerformance Evaluation of Global Sequence Alignment Algorithm on Multicore Architectures With Reference to Cache
Several experimental studies have been conducted over last decade on block data array in conjunction with tiling as a data transformation technique to improve cache performance. Based on the tile size and cache performance analysis, we propose a new data block size selection method – here we call it as a buffer size selection, which tightly fits into cache to get optimal solution for wave front...
متن کاملCompression in Data Caches with Compressible Field Isolation for Recursive Data Structures
We introduce a software/hardware scheme called the Field Array Compression Technique (FACT) which reduces cache misses due to recursive data structures. Using a data layout transformation, data with temporal affinity is gathered in contiguous memory, where the recursive pointers and integer fields are compressed. As a result, one cacheblock can capture a greater amount of data with temporal aff...
متن کاملImprove Replica Placement in Content Distribution Networks with Hybrid Technique
The increased using of the Internet and its accelerated growth leads to reduced network bandwidth and the capacity of servers; therefore, the quality of Internet services is unacceptable for users while the efficient and effective delivery of content on the web has an important role to play in improving performance. Content distribution networks were introduced to address this issue. Replicatin...
متن کاملCompression in Data Caches with Data Layout Transformation for Recursive Data Structures
We introduce a software/hardware scheme called the Field Array Compression Technique (FACT) which reduces cache misses due to recursive data structures. Using a data layout transformation, data with temporal affinity is gathered in contiguous memory, where the recursive pointers and integer fields are compressed. As a result, one cache-block can capture a greater amount of data with temporal af...
متن کامل