EM2: A Scalable Shared-Memory Multicore Architecture

نویسندگان

Omer Khan

Mieszko Lis

Srinivas Devadas

چکیده

We introduce the Execution Migration Machine (EM2), a novel, scalable shared-memory architecture for large-scale multicores constrained by off-chip memory bandwidth. EM2 reduces cache miss rates, and consequently off-chip memory usage, by permitting only one copy of data to be stored anywhere in the system: when a thread wishes to access an address not locally cached on the core it is executing on, it migrates to the home core for that data and continues execution. Using detailed simulations of a 256-core chip multiprocessor on the SPLASH-2 benchmarks, we show that EM2 outperforms directory-based cache-coherence 1.13× on average using a high-bandwidth electrical network and 2× with an optical network. In addition, because of the dramatic reduction in off-chip memory accesses, EM2 improves energy consumption by 1.3× on average, and the energy-delay product by up to 5.4× over cache coherence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design of a novel congestion-aware communication mechanism for wireless NoC architecture in multicore systems

Hybrid Wireless Network-on-Chip (WNoC) architecture is emerged as a scalable communication structure to mitigate the deficits of traditional NOC architecture for the future Multi-core systems. The hybrid WNoC architecture provides energy efficient, high data rate and flexible communications for NoC architectures. In these architectures, each wireless router is shared by a set of processing core...

متن کامل

System-level Optimizations for Memory Access in the Execution Migration Machine (EM2)

In this paper, we describe system-level optimizations for the Execution Migration Machine (EM2), a novel shared-memory architecture to address the memory wall and scalability issues for large-scale multicores. In EM2, data is never replicated and threads always migrate to the core where data is statically stored. This enables EM2 not only to provide cache coherence without any complex protocols...

متن کامل

Simple, Fast and Scalable Parallel Algorithms for Shared Memory (Thesis Proposal)

To ease the transition into the multicore/manycore era, shared-memory programming must be made more natural and accessible to the community. Furthermore, shared-memory algorithms need to be fast and scalable in order to quickly process large data. In this proposed thesis we will study techniques for simplifying parallel programming and allowing users to easily write efficient and scalable algor...

متن کامل

A Case for Fine-Grain Adaptive Cache Coherence

As transistor density continues to grow geometrically, processor manufacturers are already able to place a hundred cores on a chip (e.g., Tilera TILE-Gx 100), with massive multicore chips on the horizon. Programmers now need to invest more effort in designing software capable of exploiting multicore parallelism. The shared memory paradigm provides a convenient layer of abstraction to the progra...

متن کامل

Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures

To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of tasks of fine granularity where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

EM2: A Scalable Shared-Memory Multicore Architecture

نویسندگان

چکیده

منابع مشابه

Design of a novel congestion-aware communication mechanism for wireless NoC architecture in multicore systems

System-level Optimizations for Memory Access in the Execution Migration Machine (EM2)

Simple, Fast and Scalable Parallel Algorithms for Shared Memory (Thesis Proposal)

A Case for Fine-Grain Adaptive Cache Coherence

Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures

عنوان ژورنال:

اشتراک گذاری