CUDA based Parallel Derivation of Parametric L-system
نویسندگان
چکیده
This paper proposes an approach to derive a parametric L-system in parallel based on Compute Unified Device Architecture (CUDA). It consists of a host program running on CPU and a device program running on CUDA enabled GPU. The host program is used to transfer data between CPU and GPU, pre-allocate host and device memory, and launch the device program. The device program takes charge of deriving an L-system by computing module strings with kernel functions which represent productions of the L-system and are executed by CUDA threads in parallel. Unlike most traditional L-system derivation algorithms based on string operations, our algorithms, which use newly defined data structures to represent an L-system, can be implemented without using any string operations that are not supported by current CUDA compute capability. The given experiments illustrate that this method has better computing performance than the method using serial derivation. The parameters which affect the performance, such as the number of module strings computed by each CUDA thread and the size of each thread block, are also investigated through these experiments.
منابع مشابه
Parallelization of Rich Models for Steganalysis of Digital Images using a CUDA-based Approach
There are several different methods to make an efficient strategy for steganalysis of digital images. A very powerful method in this area is rich model consisting of a large number of diverse sub-models in both spatial and transform domain that should be utilized. However, the extraction of a various types of features from an image is so time consuming in some steps, especially for training pha...
متن کاملParametric GPU Code Generation for Affine Loop Programs
Partitioning a parallel computation into finitely sized chunks for effective mapping onto a parallel machine is a critical concern for source-to-source compilation. In the context of OpenCL and CUDA, this translates to the definition of a uniform hyper-rectangular partitioning of the parallel execution space where each partition is subject to a fine-grained distribution of resources that has a ...
متن کاملIsolated Persian/Arabic handwriting characters: Derivative projection profile features, implemented on GPUs
For many years, researchers have studied high accuracy methods for recognizing the handwriting and achieved many significant improvements. However, an issue that has rarely been studied is the speed of these methods. Considering the computer hardware limitations, it is necessary for these methods to run in high speed. One of the methods to increase the processing speed is to use the computer pa...
متن کاملComparison between four dissimilar solar panel configurations
Several studies on photovoltaic systems focused on how it operates and energy required in operating it. Little attention is paid on its configurations, modeling of mean time to system failure, availability, cost benefit and comparisons of parallel and series–parallel designs. In this research work, four system configurations were studied. Configuration I consists of two sub-components arranged ...
متن کاملAsynchronous Parallel Computing Model of Global Motion Estimation with CUDA
For video coding, weighing the balance between and coding rate image quality, we apply global motion search algorithm to avoid loss of image quality and parallel computing capacity of graphics processors to accelerate the encoding process. According to the heterogeneous system of CPU+GPU, and the multi-threaded parallel structure, thread synchronization features of CUDA platform, we build a pro...
متن کامل