نتایج جستجو برای: suitable locality of processing unit
تعداد نتایج: 21213500 فیلتر نتایج به سال:
In Clustered Instruction-level Parallel (ILP) processors, the function units are partitioned and resources such as register file and cache are either partitioned or replicated and then grouped together into onchip clusters. We present a novel partitioned register file architecture for clustered ILP processors which exploits the temporal locality of references to remote registers in a cluster an...
DOALL loops are tiled to exploit DOALL parallelism and data locality on GPUs. In contrast, due to loop-carried dependences, DOACROSS loops must be skewed first in order to make tiling legal and exploit wavefront parallelism across the tiles and within a tile. Thus, tile size selection, which is performance-critical, becomes more complex for DOACROSS loops than DOALL loops on GPUs. This paper pr...
In this paper, we present the design of an application-speci c coprocessor for algorithms that can be modeled as uniform recurrences or \uniformized" a ne recurrences. The coprocessor has a regular array of processors connected to an access-unit for intermediate storage of data. The distinguishing feature of our approach is that we assume the coprocessor to be interfaced to a standard, slow (si...
We consider the problem of mapping an array onto a mesh of processors in such a way that locality is preserved. When the computational work associated with the array is distributed in an unstructured way the generalized block distribution has been recognized as an e cient way of achieving an even load balance while at the same time imposing a simple communication pattern. In this paper we consi...
In addition to locality, data access concurrency has emerged as a pillar factor of memory performance. In this research, we introduce a concurrency-aware solution, the memory Sluice Gate Theory, for solving the outstanding memory wall problem. Sluice gates are designed to control data transfer at each memory layer dynamically, and a global control algorithm, named layered performance matching, ...
We prove an analogue of Brent’s lemma for BSP-like parallel machines featuring a hierarchical structure for both the interconnection and the memory. Specifically, for these machines we present a uniform scheme to simulate any computation designed for v processors on a v0-processor configuration with v0 v and the same overall memory size. For a wide class of computations the simulation exhibits ...
The BSP model was proposed as a step towards general purpose parallel computing. This paper introduces the E-BSP model that extends the BSP model in two ways. First, it provides a way to deal with unbalanced communication patterns, i.e., communication patterns in which the amount of data sent or received by each processor is different. Second, it adds a notion of general locality to the BSP mod...
Terminal propagation is a method developed in the circuit placement community for adding constraints to graph partitioning problems. This paper adapts and expands this idea, and applies it to the problem of partitioning data structures among the processors of a parallel computer. We show how the constmints in terminal propagation can be used to encourage partitions in which messages are communi...
Improvements in optical technology will enable the construction of high bandwidth, low latency switching networks. These networks have many applications in massively parallel processing. However current circuit switching and packet switching techniques are not quite suitable for controlling such networks. Time division multiplexing (TDM) schemes can improve the performance of circuit switched o...
We consider the problem of mapping an array onto a mesh of processors in such a way that locality is preserved. When the computational work associated with the array is distributed in an unstructured way the generalized block distribution has been recognized as an eecient way of achieving an even load balance while at the same time imposing a simple communication pattern. In this paper we consi...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید