linear speedup

نتایج جستجو برای: linear speedup

تعداد نتایج: 490347 فیلتر نتایج به سال:

High Performance Reconfigurable Computing for Cholesky Decomposition

2009

Depeng Yang Gregory D. Peterson Husheng Li

This paper proposes a hardware accelerator for Cholesky decomposition on FPGAs by designing a single triangular linear equation solver. Good performance is achieved by reordering the computation of Cholesky factorization algorithms and thus alleviating the data dependency. The dedicated hardware architecture for solving triangular linear equations is designed and implemented for different accur...

متن کامل

Architecting the finite element method pipeline for the GPU

Journal: :Journal of computational and applied mathematics 2014

Zhisong Fu T. James Lewis Robert Michael Kirby Ross T. Whitaker

The finite element method (FEM) is a widely employed numerical technique for approximating the solution of partial differential equations (PDEs) in various science and engineering applications. Many of these applications benefit from fast execution of the FEM pipeline. One way to accelerate the FEM pipeline is by exploiting advances in modern computational hardware, such as the many-core stream...

متن کامل

Stochastic Variance Reduced Optimization for Nonconvex Sparse Learning

2016

Xingguo Li Tuo Zhao Raman Arora Han Liu Jarvis D. Haupt

We propose a stochastic variance reduced optimization algorithm for solving a class of large-scale nonconvex optimization problems with cardinality constraints. Theoretically, we provide sufficient conditions under which the proposed algorithm enjoys strong linear convergence guarantees and optimal estimation accuracy in high dimensions. We further extend the analysis to its asynchronous varian...

متن کامل

Pairing-induced speedup of nuclear spontaneous fission

Journal: :Physical Review C 2014

متن کامل

A deceptive step towards quantum speedup detection

Journal: :Quantum Science and Technology 2018

متن کامل

Lazifying Conditional Gradient Algorithms

2017

Gábor Braun Sebastian Pokutta Daniel Zink

Conditional gradient algorithms (also often called Frank-Wolfe algorithms) are popular due to their simplicity of only requiring a linear optimization oracle and more recently they also gained significant traction for online learning. While simple in principle, in many cases the actual implementation of the linear optimization oracle is costly. We show a general method to lazify various conditi...

متن کامل

Electromagnetics Computations Using the MPI Parallel Implementation of the Steepest Descent Fast Multipole Method (SDFMM)

2002

M. El-Shenawee C. Rappaport D. Jiang W. Meleis D. Kaeli

The computational solution of large-scale linear systems of equations necessitates the use of fast algorithms but is also greatly enhanced by employing parallelization techniques. The objective of this work is to demonstrate the speedup achieved by the MPI (Message Passing Interface) parallel implementation of the Steepest Descent Fast Multipole Method (SDFMM). Although this algorithm has alrea...

متن کامل

Parallel Processing Using the Silicon Graphics / Cray Origin 2000

1998

Bruce E. Tucker

The Origin 2000 is a high performance computing platform produced jointly by Silicon Graphics / Cray. This scalable shared memory processor (SSMP) may be configured with up to 128 processors in a single system image. The Origin is a scalable, cache coherent, non-uniform memory access (CC-NUMA), distributed shared memory (DSM) architecture based on a hypercube interconnection topology. Effective...

متن کامل

Estimating GPU Speedups for Programs Without Writing a Single Line of GPU Code

2014

Newsha Ardalani Karthikeyan Sankaralingam Xiaojin Zhu

Heterogeneous processing using GPUs is here to stay and today spans mobile devices, laptops, and supercomputers. Although modern software development frameworks like OpenCL and CUDA serve as a high productivity environment, software development for GPUs is time consuming. First, much work needs to be done to restructure software and data organization to match the GPU’s many-threaded programming...

متن کامل

Finding Enclosures for Linear Systems Using Interval Matrix Multiplication in CUDA

2013

Alexander Dallmann Philip-Daniel Beck Jürgen Wolff von Gudenberg

In this paper we present CUDA kernels that compute an interval matrix product. Starting from a naive implementation we investigate possible speedups using commonly known techniques from standard matrix multiplication. We also evaluate the achieved speedup when our kernels are used to accelerate a variant of an existing algorithm that finds an enclosure for the solution of a linear system. Moreo...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید