suitable locality of processing unit

نتایج جستجو برای: suitable locality of processing unit

تعداد نتایج: 21213500 فیلتر نتایج به سال:

A Register File Architecture and Compilation Scheme for Clustered ILP Processors

2002

Krishnan Kailas Manoj Franklin Kemal Ebcioglu

In Clustered Instruction-level Parallel (ILP) processors, the function units are partitioned and resources such as register file and cache are either partitioned or replicated and then grouped together into onchip clusters. We present a novel partitioned register file architecture for clustered ILP processors which exploits the temporal locality of references to remote registers in a cluster an...

متن کامل

Model-Driven Tile Size Selection for DOACROSS Loops on GPUs

2011

Peng Di Jingling Xue

DOALL loops are tiled to exploit DOALL parallelism and data locality on GPUs. In contrast, due to loop-carried dependences, DOACROSS loops must be skewed first in order to make tiling legal and exploit wavefront parallelism across the tiles and within a tile. Thus, tile size selection, which is performance-critical, becomes more complex for DOACROSS loops than DOALL loops on GPUs. This paper pr...

متن کامل

Designing a Coprocessor for Recurrent Computations

1993

Kumar N. Ganapathy Benjamin W. Wah

In this paper, we present the design of an application-speci c coprocessor for algorithms that can be modeled as uniform recurrences or \uniformized" a ne recurrences. The coprocessor has a regular array of processors connected to an access-unit for intermediate storage of data. The distinguishing feature of our approach is that we assume the coprocessor to be interfaced to a standard, slow (si...

متن کامل

Grigni : [ 9 ] On the Complexity of the Generalized Block Distribution

1996

Michelangelo Grigni Fredrik Manne

We consider the problem of mapping an array onto a mesh of processors in such a way that locality is preserved. When the computational work associated with the array is distributed in an unstructured way the generalized block distribution has been recognized as an e cient way of achieving an even load balance while at the same time imposing a simple communication pattern. In this paper we consi...

متن کامل

Utilizing Concurrency: A New Theory for Memory Wall

2016

Xian-He Sun Yu-Hang Liu

In addition to locality, data access concurrency has emerged as a pillar factor of memory performance. In this research, we introduce a concurrency-aware solution, the memory Sluice Gate Theory, for solving the outstanding memory wall problem. Sluice gates are designed to control data transfer at each memory layer dynamically, and a global control algorithm, named layered performance matching, ...

متن کامل

Seamless Integration of Parallelism and Memory Hierarchy

2002

Carlo Fantozzi Andrea Pietracaprina Geppino Pucci

We prove an analogue of Brent’s lemma for BSP-like parallel machines featuring a hierarchical structure for both the interconnection and the memory. Specifically, for these machines we present a uniform scheme to simulate any computation designed for v processors on a v0-processor configuration with v0 v and the same overall memory size. For a wide class of computations the simulation exhibits ...

متن کامل

The E-BSP Model: Incorporating General Locality and Unbalanced Communication into the BSP Model

1996

Ben H. H. Juurlink Harry A. G. Wijshoff

The BSP model was proposed as a step towards general purpose parallel computing. This paper introduces the E-BSP model that extends the BSP model in two ways. First, it provides a way to deal with unbalanced communication patterns, i.e., communication patterns in which the amount of data sent or received by each processor is different. Second, it adds a notion of general locality to the BSP mod...

متن کامل

Enhancing Data Locality by Using Terminal Propagation

1996

Bruce Hendrickson Robert W. Leland Rafael Van Driessche

Terminal propagation is a method developed in the circuit placement community for adding constraints to graph partitioning problems. This paper adapts and expands this idea, and applies it to the problem of partitioning data structures among the processors of a parallel computer. We show how the constmints in terminal propagation can be used to encourage partitions in which messages are communi...

متن کامل

Modeling Compiled Communication Costs in Multiplexed Optical Networks

1997

Charles A. Salisbury Rami G. Melhem

Improvements in optical technology will enable the construction of high bandwidth, low latency switching networks. These networks have many applications in massively parallel processing. However current circuit switching and packet switching techniques are not quite suitable for controlling such networks. Time division multiplexing (TDM) schemes can improve the performance of circuit switched o...

متن کامل

On the Complexity of the Generalized Block Distribution

1996

Michelangelo Grigni Fredrik Manne

We consider the problem of mapping an array onto a mesh of processors in such a way that locality is preserved. When the computational work associated with the array is distributed in an unstructured way the generalized block distribution has been recognized as an eecient way of achieving an even load balance while at the same time imposing a simple communication pattern. In this paper we consi...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید