نتایج جستجو برای: linear speedup
تعداد نتایج: 490347 فیلتر نتایج به سال:
This paper proposes a hardware accelerator for Cholesky decomposition on FPGAs by designing a single triangular linear equation solver. Good performance is achieved by reordering the computation of Cholesky factorization algorithms and thus alleviating the data dependency. The dedicated hardware architecture for solving triangular linear equations is designed and implemented for different accur...
The finite element method (FEM) is a widely employed numerical technique for approximating the solution of partial differential equations (PDEs) in various science and engineering applications. Many of these applications benefit from fast execution of the FEM pipeline. One way to accelerate the FEM pipeline is by exploiting advances in modern computational hardware, such as the many-core stream...
We propose a stochastic variance reduced optimization algorithm for solving a class of large-scale nonconvex optimization problems with cardinality constraints. Theoretically, we provide sufficient conditions under which the proposed algorithm enjoys strong linear convergence guarantees and optimal estimation accuracy in high dimensions. We further extend the analysis to its asynchronous varian...
Conditional gradient algorithms (also often called Frank-Wolfe algorithms) are popular due to their simplicity of only requiring a linear optimization oracle and more recently they also gained significant traction for online learning. While simple in principle, in many cases the actual implementation of the linear optimization oracle is costly. We show a general method to lazify various conditi...
The computational solution of large-scale linear systems of equations necessitates the use of fast algorithms but is also greatly enhanced by employing parallelization techniques. The objective of this work is to demonstrate the speedup achieved by the MPI (Message Passing Interface) parallel implementation of the Steepest Descent Fast Multipole Method (SDFMM). Although this algorithm has alrea...
The Origin 2000 is a high performance computing platform produced jointly by Silicon Graphics / Cray. This scalable shared memory processor (SSMP) may be configured with up to 128 processors in a single system image. The Origin is a scalable, cache coherent, non-uniform memory access (CC-NUMA), distributed shared memory (DSM) architecture based on a hypercube interconnection topology. Effective...
Heterogeneous processing using GPUs is here to stay and today spans mobile devices, laptops, and supercomputers. Although modern software development frameworks like OpenCL and CUDA serve as a high productivity environment, software development for GPUs is time consuming. First, much work needs to be done to restructure software and data organization to match the GPU’s many-threaded programming...
In this paper we present CUDA kernels that compute an interval matrix product. Starting from a naive implementation we investigate possible speedups using commonly known techniques from standard matrix multiplication. We also evaluate the achieved speedup when our kernels are used to accelerate a variant of an existing algorithm that finds an enclosure for the solution of a linear system. Moreo...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید