Latency Considerations of Depth-first GPU Ray Tracing
نویسنده
چکیده
Despite the potential divergence of depth-first ray tracing [AL09], it is nevertheless the most efficient approach on massively parallel graphics processors. Due to the use of specialized caching strategies that were originally developed for texture access, it has been shown to be compute rather than bandwidth limited. Especially with recents developments however, not only the raw bandwidth, but also the latency for both memory access and read after write register dependencies can become a limiting factor. In this paper we will analyze the memory and instruction dependency latencies of depth first ray tracing. We will show that ray tracing is in fact latency limited on current GPUs and propose three simple strategies to better hide the latencies. This way, we come significantly closer to the maximum performance of the GPU.
منابع مشابه
Efficient GPU Screen-Space Ray Tracing
We present an efficient GPU solution for screen-space 3D ray tracing against a depth buffer by adapting the perspective-correct DDA line rasterization algorithm. Compared to linear ray marching, this ensures sampling at a contiguous set of pixels and no oversampling. This paper provides for the first time full implementation details of a method that has been proven in production of recent major...
متن کاملGPU Rendering of Secondary Effects
In this paper we present an efficient data structure and algorithms for GPU ray tracing of secondary effects like reflections, refractions and shadows. Our method extends previous work on layered depth cubes in that it uses layered depth cubes as an adaptive space partitioning scheme for ray tracing. We propose a new method to efficiently build LDCs on the GPU using geometry shaders available i...
متن کاملFast Ray Sorting and Breadth-First Packet Traversal for GPU Ray Tracing
We present a novel approach to ray tracing execution on commodity graphics hardware using CUDA. We decompose a standard ray tracing algorithm into several data-parallel stages that are mapped efficiently to the massively parallel architecture of modern GPUs. These stages include: ray sorting into coherent packets, creation of frustums for packets, breadth-first frustum traversal through a bound...
متن کاملAlgorithm optimizations and mapping scheme for interactive ray tracing on a reconfigurable architecture
This paper presents a mapping scheme of an optimized octree-based ray tracing algorithm and its implementation on a SIMD reconfigurable architecture, MorphoSys, with appropriate hardware incorporated. A two-level SIMD mapping scheme for ray tracing is chosen to get better trade-off between coherence exploitation efficiency and bandwidth requirements. We apply an SIMD octree traversal algorithm ...
متن کاملGPU-Based Ray-Casting of Quadratic Surfaces
Quadratic surfaces are frequently used primitives in geometric modeling and scientific visualization, such as rendering of tensor fields, particles, and molecular structures. While high visual quality can be achieved using sophisticated ray tracing techniques, interactive applications typically use either coarsely tessellated polygonal approximations or pre-rendered depth sprites, thereby tradi...
متن کامل