EM2: A Scalable Shared-Memory Multicore Architecture
نویسندگان
چکیده
We introduce the Execution Migration Machine (EM2), a novel, scalable shared-memory architecture for large-scale multicores constrained by off-chip memory bandwidth. EM2 reduces cache miss rates, and consequently off-chip memory usage, by permitting only one copy of data to be stored anywhere in the system: when a thread wishes to access an address not locally cached on the core it is executing on, it migrates to the home core for that data and continues execution. Using detailed simulations of a 256-core chip multiprocessor on the SPLASH-2 benchmarks, we show that EM2 outperforms directory-based cache-coherence 1.13× on average using a high-bandwidth electrical network and 2× with an optical network. In addition, because of the dramatic reduction in off-chip memory accesses, EM2 improves energy consumption by 1.3× on average, and the energy-delay product by up to 5.4× over cache coherence.
منابع مشابه
Design of a novel congestion-aware communication mechanism for wireless NoC architecture in multicore systems
Hybrid Wireless Network-on-Chip (WNoC) architecture is emerged as a scalable communication structure to mitigate the deficits of traditional NOC architecture for the future Multi-core systems. The hybrid WNoC architecture provides energy efficient, high data rate and flexible communications for NoC architectures. In these architectures, each wireless router is shared by a set of processing core...
متن کاملSystem-level Optimizations for Memory Access in the Execution Migration Machine (EM2)
In this paper, we describe system-level optimizations for the Execution Migration Machine (EM2), a novel shared-memory architecture to address the memory wall and scalability issues for large-scale multicores. In EM2, data is never replicated and threads always migrate to the core where data is statically stored. This enables EM2 not only to provide cache coherence without any complex protocols...
متن کاملSimple, Fast and Scalable Parallel Algorithms for Shared Memory (Thesis Proposal)
To ease the transition into the multicore/manycore era, shared-memory programming must be made more natural and accessible to the community. Furthermore, shared-memory algorithms need to be fast and scalable in order to quickly process large data. In this proposed thesis we will study techniques for simplifying parallel programming and allowing users to easily write efficient and scalable algor...
متن کاملA Case for Fine-Grain Adaptive Cache Coherence
As transistor density continues to grow geometrically, processor manufacturers are already able to place a hundred cores on a chip (e.g., Tilera TILE-Gx 100), with massive multicore chips on the horizon. Programmers now need to invest more effort in designing software capable of exploiting multicore parallelism. The shared memory paradigm provides a convenient layer of abstraction to the progra...
متن کاملTall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures
To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of tasks of fine granularity where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on ...
متن کامل