نتایج جستجو برای: linear speedup

تعداد نتایج: 490347  

2009
Depeng Yang Gregory D. Peterson Husheng Li

This paper proposes a hardware accelerator for Cholesky decomposition on FPGAs by designing a single triangular linear equation solver. Good performance is achieved by reordering the computation of Cholesky factorization algorithms and thus alleviating the data dependency. The dedicated hardware architecture for solving triangular linear equations is designed and implemented for different accur...

Journal: :Journal of computational and applied mathematics 2014
Zhisong Fu T. James Lewis Robert Michael Kirby Ross T. Whitaker

The finite element method (FEM) is a widely employed numerical technique for approximating the solution of partial differential equations (PDEs) in various science and engineering applications. Many of these applications benefit from fast execution of the FEM pipeline. One way to accelerate the FEM pipeline is by exploiting advances in modern computational hardware, such as the many-core stream...

2016
Xingguo Li Tuo Zhao Raman Arora Han Liu Jarvis D. Haupt

We propose a stochastic variance reduced optimization algorithm for solving a class of large-scale nonconvex optimization problems with cardinality constraints. Theoretically, we provide sufficient conditions under which the proposed algorithm enjoys strong linear convergence guarantees and optimal estimation accuracy in high dimensions. We further extend the analysis to its asynchronous varian...

Journal: :Quantum Science and Technology 2018

2017
Gábor Braun Sebastian Pokutta Daniel Zink

Conditional gradient algorithms (also often called Frank-Wolfe algorithms) are popular due to their simplicity of only requiring a linear optimization oracle and more recently they also gained significant traction for online learning. While simple in principle, in many cases the actual implementation of the linear optimization oracle is costly. We show a general method to lazify various conditi...

2002
M. El-Shenawee C. Rappaport D. Jiang W. Meleis D. Kaeli

The computational solution of large-scale linear systems of equations necessitates the use of fast algorithms but is also greatly enhanced by employing parallelization techniques. The objective of this work is to demonstrate the speedup achieved by the MPI (Message Passing Interface) parallel implementation of the Steepest Descent Fast Multipole Method (SDFMM). Although this algorithm has alrea...

1998
Bruce E. Tucker

The Origin 2000 is a high performance computing platform produced jointly by Silicon Graphics / Cray. This scalable shared memory processor (SSMP) may be configured with up to 128 processors in a single system image. The Origin is a scalable, cache coherent, non-uniform memory access (CC-NUMA), distributed shared memory (DSM) architecture based on a hypercube interconnection topology. Effective...

2014
Newsha Ardalani Karthikeyan Sankaralingam Xiaojin Zhu

Heterogeneous processing using GPUs is here to stay and today spans mobile devices, laptops, and supercomputers. Although modern software development frameworks like OpenCL and CUDA serve as a high productivity environment, software development for GPUs is time consuming. First, much work needs to be done to restructure software and data organization to match the GPU’s many-threaded programming...

2013
Alexander Dallmann Philip-Daniel Beck Jürgen Wolff von Gudenberg

In this paper we present CUDA kernels that compute an interval matrix product. Starting from a naive implementation we investigate possible speedups using commonly known techniques from standard matrix multiplication. We also evaluate the achieved speedup when our kernels are used to accelerate a variant of an existing algorithm that finds an enclosure for the solution of a linear system. Moreo...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید