Implementation of GP-GPU with SIMT Architecture in the Embedded Environment
نویسندگان
چکیده
Recent embedded processors become to be multi-cored, due to the increased power consumption by higher operating frequencies. Multi-core processors stimulate applications to be parallelized. Since general purpose CPU has small number of core, which is optimized for serial processing, it has a limitation of parallel processing. To overcome this limitation, GPU is used for the parallel processing. In this paper, we implement GP-GPU of SIMT architecture for parallel processing in the embedded environment. The performance of the implemented GP-GPU is compared with the existing multi-core CPU of the embedded environment. The comparison results show the performance of parallel processing with the implemented GP-GPU is improved significantly.
منابع مشابه
Implementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)
Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...
متن کاملMulti-tier Dynamic Vectorization for Translating GPU Optimizations into CPU Performance
Developing high performance GPU code is labor intensive. Ideally, developers could recoup high GPU development costs by generating high-performance programs for CPUs and other architectures from the same source code. However, current OpenCL compilers for non-GPUs do not fully exploit optimizations in well-tuned GPU codes. To address this problem, we develop an OpenCL implementation that efficie...
متن کاملWarp-Level Parallelism: Enabling Multiple Replications In Parallel on GPU
Stochastic simulations need multiple replications in order to build confidence intervals for their results. Even if we do not need a large amount of replications, it is a good practice to speed-up the whole simulation time using the Multiple Replications In Parallel (MRIP) approach. This approach usually supposes to have access to a parallel computer such as a symmetric multiprocessing machine ...
متن کاملParallelizing RSA Algorithm on Multicore CPU and GPU
Public key algorithms are extensively known to be slower than symmetric key alternatives in the a r e a of cryptographic algorithms for the reason of their basis in modular arithmetic. The most public key algorithm widely used is the RSA. Therefore, how to enhance the speed of RSA algorithm has been the research significant topic in the computer security as well as in computing fields. With rem...
متن کاملAn approach to Improve Particle Swarm Optimization Algorithm Using CUDA
The time consumption in solving computationally heavy problems has always been a concern for computer programmers. Due to simplicity of its implementation, the PSO (Particle Swarm Optimization) is a suitable meta-heuristic algorithm for solving computationally heavy problems. However, despite the simplicity, the algorithm is inefficient for solving real computationally heavy problems but the pr...
متن کامل