Tuesday, December 1, 2015

GPU path tracing tutorial 2: interactive triangle mesh path tracing

While the tutorial from the previous post was about path tracing simple scenes made of spheres, this tutorial will focus on how to build a very simple path tracer with support for loading and rendering triangle meshes. Instead of rendering the entire image in the background and saving it to a file as was done in the last tutorial, this path tracer displays an interactive viewport which shows progressively rendered updates. This way we can see the rendered image from the first pass and watch it converge to a noise free result (which can take some time in the case of path tracing triangle meshes without using acceleration structures).

For this tutorial I decided to modify the code of a real-time CUDA ray tracer developed by Peter Trier from the Alexandra Institute in 2009 (described in this blog post), because it's very compact, does not use any external libraries (except for CUDA-OpenGL interoperability) and provides a simple obj loader for triangle meshes. I modified the ray tracing kernel to handle path tracing (without recursion) using the path tracing code from the previous tutorial, added support for perfectly reflective and refractive materials (like glass) based on the code of smallpt. The random number generator from the previous post has been replaced with CUDA's own random number generation library provided by curand(), which is less prone to patterns at low sample rates and has more uniform distribution properties. The seed calculation is based on a trick described in a post on RichieSam's blog.

Features of this path tracer

- primitive types: supports spheres, boxes and triangles/triangle meshes
- material types: support for perfectly diffuse, perfectly reflective and perfectly refractive materials
- progressive rendering
- interactive viewport displaying intermediate rendering results

Scratch-a-Pixel has some excellent lessons on ray tracing triangles and triangle meshes, which discuss barycentric coordinates, backface culling and the fast Muller-Trumbore ray/triangle intersection algorithm that is also used in the code for this tutorial:

- ray tracing triangles: http://www.scratchapixel.com/lessons/3d-basic-rendering/ray-tracing-rendering-a-triangle

- ray tracing polygon meshes: http://www.scratchapixel.com/lessons/3d-basic-rendering/ray-tracing-polygon-mesh

The code is one big CUDA file with lots of comments and can be found on my Github repository.

Github repository link: https://github.com/straaljager/GPU-path-tracing-tutorial-2 

Some screenshots

Performance optimisations

- triangle edges are precomputed to speed up ray intersection computation and triangles are stored as (first vertex, edge1, edge2)
- ray/triangle intersection uses the fast Muller-Trumbore technique
- triangle data is stored in the GPU's texture memory which is cached and is a bit faster than global memory because fetching data from textures is accelerated in hardware. The texture cache is also optimized for 2D spatial locality, so threads that access addresses in texture memory that are close together in 2D will achieve best performance. 
- triangle data is aligned in float4s (128 bits) for coalesced memory access, maximising memory throughput,  (see https://docs.nvidia.com/cuda/cuda-c-programming-guide/#device-memory-accesses and http://blog.spectralstudios.net/raytracing/realtime-raytracing-part-3/#more-573)
- for expensive functions (such as sin() and sqrt()), compute fast approximations using single precision intrinsic math functions such as __sinf(), __powf(), __fdividef(): these functions are performed in hardware by the special function units (SFU) on the GPU and are much faster than the standard divide and sin/cos functions at the cost of precision and robustness in corner cases (see https://docs.nvidia.com/cuda/cuda-c-programming-guide/#intrinsic-functions
- to speed up the ray tracing an axis aligned bounding box is created around the triangle mesh. Only rays hitting this box are intersected with the mesh. Without this box,  all rays would have to be tested against every triangle for intersection, which is unbearably slow.

In the next tutorial, we'll have a look at implementing an acceleration structure, which speeds up the rendering by several orders of magnitude. This blog post provides  a good overview of the most recent research in ray tracing acceleration structures for the GPU. There will also be an interactive camera to allow real-time navigation through the scene with depth of field and supersampled anti-aliasing (and there are still lots of optimisations).