نتایج جستجو برای: suitable locality of processing unit

تعداد نتایج: 21213500  

2013
Adrien Rémy Marc Baboulin Masha Sosonkina Brigitte Rozoy

We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear systems using an LU factorization algorithm. In particular we illustrate how an appropriate placement of the threads and memory on a NUMA architecture can improve the performance of the panel factorization and consequently accelerate the global LU factorization. We apply these placement strategies ...

2003
Vijay K. Jain Glenn H. Chapman

This paper discusses a hybrid optoelectronic scheme for a new interconnection network, "Tori connected mESHes (TESH)". The major features of TESH are the following: it is hierarchical, thus allowing exploitation of computation locality as well as easy expansion up to a million processors or devices, it permits efficient VLSI/ULSI realization, it is designed to make use of redundancy for defect ...

1998
Ibraheem Al-Furaih Sanjay Ranka

The increasing gap in processor and memory speeds has forced microprocessors to rely on deep cache hierarchies to keep the processors from starving for data. For many applications, this results in a wide disparity between sustained and peak achievable speed. Applications need to be tuned to processor and memory system architectures for cache locality, memory layout and data prefetch and reuse. ...

2002
Gero Mühl Ludger Fiege Felix C. Freiling Alejandro P. Buchmann

We present an evaluation of advanced routing algorithms for content-based publish/subscribe systems that focuses on the inherent characteristics of routing algorithms (routing table sizes and filter forwarding overhead) instead of system-specific parameters (CPU load etc.). The evaluation is based on a working prototype instead of simulations and compares several routing algorithms to each othe...

Journal: :SIAM J. Scientific Computing 2005
Sven Groß Arnold Reusken

In this paper we analyze a parallel version of a multilevel red/green local refinement algorithm for tetrahedral meshes. The refinement method is similar to the approaches used in the UG-package [33] and by Bey [11, 12]. We introduce a new data distribution format that is very suitable for the parallel multilevel refinement algorithm. This format is called an admissible hierarchical decompositi...

2005
José R. Herrero Juan J. Navarro

An efficient approach to Nearest Neighbor classification is presented, which improves performance by exploiting the ability of superscalar processors to issue multiple instructions per cycle and by using the memory hierarchy adequately. This is accomplished by the use of floating-point arithmetic which outperforms integer arithmetic, and block (tiled) algorithms which exploit the data locality ...

2003
Dinesh C. Suresh Jun Yang Chuanjun Zhang Banit Agrawal Walid A. Najjar

Power consumption becomes an important issue for modern processors. The off-chip buses consume considerable amount of total power [9,7]. One effective way to reduce power is to reduce the overall bus switching activities since they are proportional to the power. Up till now, the most effective technique in reducing the switching activities on the data buses is Frequent Value Encoding (FVE) that...

1996
Michelangelo Grigni Fredrik Manne

We consider the problem of mapping an array onto a mesh of processors in such a way that locality is preserved. When the computational work associated with the array is distributed in an unstructured way the generalized block distribution has been recognized as an eecient way of achieving an even load balance while at the same time imposing a simple communication pattern. In this paper we consi...

Journal: :J. Discrete Algorithms 2000
Ben H. H. Juurlink Petr Kolman Friedhelm Meyer auf der Heide Ingo Rieping

In this paper matching upper and lower bounds for broadcast on general purpose parallel computation models that exploit network locality are proven. These models try to capture both the general purpose properties models like the PRAM or BSP on the one hand, and to exploit network locality of special purpose models like meshes, hypercubes, etc., on the other hand. They do so by charging a cost l...

2005
Song Jiang Feng Chen Xiaodong Zhang

With the ever-growing performance gap between memory systems and disks, and rapidly improving CPU performance, virtual memory (VM) management becomes increasingly important for overall system performance. However, one of its critical components, the page replacement policy, is still dominated by CLOCK, a replacement policy developed almost 40 years ago. While pure LRU has an unaffordable cost i...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید