Maps : A Compiler - Managed Memory System forSoftware - Exposed
نویسندگان
چکیده
Microprocessors must exploit both instruction-level parallelism (ILP) and memory parallelism for high performance. Sophisticated techniques for ILP have boosted the ability of modern-day microprocessors to exploit ILP when available. Unfortunately, improvements in memory parallelism in microprocessors have lagged behind. This thesis explains why memory parallelism is hard to exploit in microprocessors and advocate bank-exposed architectures as an eeective way to exploit more memory parallelism. Bank-exposed architectures are a kind of software-exposed architecture: one in which the low-level details of the hardware are visible to the software. In a bank-exposed architecture, the memory banks are visible to the software, enabling the compiler to exploit a high degree of memory parallelism in addition to ILP. Bank-exposed architectures can be employed by general-purpose processors, and by embedded chips, such as those used for digital-signal processing. This thesis presents Maps, an enabling compiler technology for bank-exposed archi-tectures. Maps solves the problem of bank-disambiguation, i.e., how to distribute data in sequential programs among several banks to best exploit memory parallelism, while retaining the ability to disambiguate each data reference to a particular bank. Two methods for bank disambiguation are presented: equivalence-class uniication and modulo unrolling. Taking a sequential program as input, a bank-disambiguation method produces two outputs: rst, a distribution of each program object among the memory banks; and second, a bank number for every reference that can be proven to access a single, known bank for that data distribution. Finally, the thesis shows why non-disambiguated accesses are sometimes desirable. Dependences between disambiguated and non-disambiguated accesses are enforced through explicit synchronization and software serial ordering. The MIT Raw machine is an example of a software-exposed architecture. Raw exposes its ILP, memory and communication mechanisms. The Maps system has been implemented in the Raw compiler. Results on Raw using sequential codes demonstrate that using bank disambiguation in addition to ILP improves performance by a factor of 3 to 5 over using ILP alone. Dedication To my wife, Alpana. Thank you for your love, support and patience! 5 Acknowledgments There are several people I would like to thank, the foremost among them are my two advisors, Saman Amarasinghe and Anant Agarwal; and my fellow student, Walter Lee. Saman became my advisor three years ago, but his impact on my thesis has been great. My research beneeted tremendously from his enthusiasm, and his willingness to put in many hours on discussions, brainstorming and giving comments. His deep knowledge …
منابع مشابه
Compiler-managed memory system for software-exposed architectures
Microprocessors must exploit both instruction-level parallelism (ILP) and memory parallelism for high performance. Sophisticated techniques for ILP have boosted the ability of modern-day microprocessors to exploit ILP when available. Unfortunately, improvements in memory parallelism in microprocessors have lagged behind. This thesis explains why memory parallelism is hard to exploit in micropro...
متن کاملCompiler Support for Scalable and Efficient Memory Systems
ÐTechnological trends require that future scalable microprocessors be decentralized. Applying these trends toward memory systems shows that the size of the cache accessible in a single cycle will decrease in a future generation of chips. Thus, a bank-exposed memory system comprised of small, decentralized cache banks must eventually replace that of a monolithic cache. This paper considers how t...
متن کاملPower Efficient Instruction Caches for Embedded Systems
Instruction caches typically consume 27% of the total power in modern high-end embedded systems. We propose a compiler-managed instruction store architecture (K-store) that places the computation intensive loops in a scratchpad like SRAM memory and allocates the remaining instructions to a regular instruction cache. At runtime, execution is switched dynamically between the instructions in the t...
متن کاملMaximizing the Filter Rate of L0 Compiler-Managed Instruction Stores by Pinning
We present an allocation algorithm for small L0 compiler-managed instruction stores (cmiss) that significantly reduces the energy consumed by the instruction storage hierarchy. With our algorithm, cmiss simultaneously achieve low access energy, low performance overhead, and high filter rate. Despite the lack of associativity in cmiss, our algorithm achieves filter rates similar to those of filt...
متن کاملTrace-Driven Simulation of Data-Alignment and Ohter Factors Affecting Update and Invalidate Based Coherent Memory
The exploitation of locality of reference in shared memory multiprocessors is one of the most important problems in parallel processing today. Locality can be managed in several levels: hardware, operating system , runtime environment of the compiler, user level. In this paper we investigate the problem of exploiting locality at the operating system level and its interactions with the compiler ...
متن کامل