suitable locality of processing unit

نتایج جستجو برای: suitable locality of processing unit

تعداد نتایج: 21213500 فیلتر نتایج به سال:

Locality Optimization on a NUMA Architecture for Hybrid LU Factorization

2013

Adrien Rémy Marc Baboulin Masha Sosonkina Brigitte Rozoy

We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear systems using an LU factorization algorithm. In particular we illustrate how an appropriate placement of the threads and memory on a NUMA architecture can improve the performance of the panel factorization and consequently accelerate the global LU factorization. We apply these placement strategies ...

متن کامل

Level-Hybrid Optoelectronic TESH Interconnection Network

2003

Vijay K. Jain Glenn H. Chapman

This paper discusses a hybrid optoelectronic scheme for a new interconnection network, "Tori connected mESHes (TESH)". The major features of TESH are the following: it is hierarchical, thus allowing exploitation of computation locality as well as easy expansion up to a million processors or devices, it permits efficient VLSI/ULSI realization, it is designed to make use of redundancy for defect ...

متن کامل

Memory Hierarchy Management for Iterative Graph Structures

1998

Ibraheem Al-Furaih Sanjay Ranka

The increasing gap in processor and memory speeds has forced microprocessors to rely on deep cache hierarchies to keep the processors from starving for data. For many applications, this results in a wide disparity between sustained and peak achievable speed. Applications need to be tuned to processor and memory system architectures for cache locality, memory layout and data prefetch and reuse. ...

متن کامل

Evaluating Advanced Routing Algorithms for Content-Based Publish/Subscribe Systems

2002

Gero Mühl Ludger Fiege Felix C. Freiling Alejandro P. Buchmann

We present an evaluation of advanced routing algorithms for content-based publish/subscribe systems that focuses on the inherent characteristics of routing algorithms (routing table sizes and filter forwarding overhead) instead of system-specific parameters (CPU load etc.). The evaluation is based on a working prototype instead of simulations and compares several routing algorithms to each othe...

متن کامل

Parallel Multilevel Tetrahedral Grid Refinement

Journal: :SIAM J. Scientific Computing 2005

Sven Groß Arnold Reusken

In this paper we analyze a parallel version of a multilevel red/green local refinement algorithm for tetrahedral meshes. The refinement method is similar to the approaches used in the UG-package [33] and by Bey [11, 12]. We introduce a new data distribution format that is very suitable for the parallel multilevel refinement algorithm. This format is called an admissible hierarchical decompositi...

متن کامل

Efficient Implementation of Nearest Neighbor Classification

2005

José R. Herrero Juan J. Navarro

An efficient approach to Nearest Neighbor classification is presented, which improves performance by exploiting the ability of superscalar processors to issue multiple instructions per cycle and by using the memory hierarchy adequately. This is accomplished by the use of floating-point arithmetic which outperforms integer arithmetic, and block (tiled) algorithms which exploit the data locality ...

متن کامل

FV-MSB: A Scheme for Reducing Transition Activity on Data Buses

2003

Dinesh C. Suresh Jun Yang Chuanjun Zhang Banit Agrawal Walid A. Najjar

Power consumption becomes an important issue for modern processors. The off-chip buses consume considerable amount of total power [9,7]. One effective way to reduce power is to reduce the overall bus switching activities since they are proportional to the power. Up till now, the most effective technique in reducing the switching activities on the data buses is Frequent Value Encoding (FVE) that...

متن کامل

On the Complexity of the Generalized

1996

Michelangelo Grigni Fredrik Manne

We consider the problem of mapping an array onto a mesh of processors in such a way that locality is preserved. When the computational work associated with the array is distributed in an unstructured way the generalized block distribution has been recognized as an eecient way of achieving an even load balance while at the same time imposing a simple communication pattern. In this paper we consi...

متن کامل

Optimal broadcast on parallel locality models

Journal: :J. Discrete Algorithms 2000

Ben H. H. Juurlink Petr Kolman Friedhelm Meyer auf der Heide Ingo Rieping

In this paper matching upper and lower bounds for broadcast on general purpose parallel computation models that exploit network locality are proven. These models try to capture both the general purpose properties models like the PRAM or BSP on the one hand, and to exploit network locality of special purpose models like meshes, hypercubes, etc., on the other hand. They do so by charging a cost l...

متن کامل

CLOCK-Pro: An Effective Improvement of the CLOCK Replacement

2005

Song Jiang Feng Chen Xiaodong Zhang

With the ever-growing performance gap between memory systems and disks, and rapidly improving CPU performance, virtual memory (VM) management becomes increasingly important for overall system performance. However, one of its critical components, the page replacement policy, is still dominated by CLOCK, a replacement policy developed almost 40 years ago. While pure LRU has an unaffordable cost i...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید