Regarding my microcosm game idea where each viewpoint is rendered on the fly, in semi real time, here's a brain dump on the subject or ray tracing. I have been reading a lot on the subject of "interactive ray tracing" during my thanksgiving vacation. Mostly the papers by Ingo Wald, who apparently has done a lot of research into the subject. His 2004 thesis paper Realtime Ray Tracing and Interactive Global Illumination suggests that multiple frames per second can be achieved by:
- Tracing multiple "coherent" ray "packets" at once. (similar directions)
- SIMD instructions (SSE) to process multiple ray intersections at once
- Awareness of processor cache and predictable memory access (keep related data close together and minimize random memory access)
- Axis-aligned BSP trees (with SIMD traversal and early termination)
- Optimal BSP construction using surface area heuristics
- "Instant Global Illumination" (IGI) using "vitual point lights" (VPLs) placed in a pre processing step.
- Optimal placment of the VPLs by using Quasi monte-carlo methods (importance sampling I think).
- Shadow rays are traced to these VPLs from the first bounce (screen rays) and averaged.
- Instead of sampling all VPLs at each hit, sample a small subset but change the subset for every pixel (interleaving)
- Blur the VPL samples together on regions of the image which are continuous (flat surfaces) to remove artifacts (he calls it discontinuity buffering)
My thoughts:
- His engine gets maybe 10 FPS spread across a small cluster of computers (back in 2004?).
- I have only one CPU and perhaps GPU to work with.
- I need only, say, 4 FPS (250ms from the time the user clicks to the time they see the next viewpoint).
- For lower end machines we can turn off features like anti-aliasing and reduce the number of VPLs sampled. etc.
Perhaps faster tracing can be achieved by:
Instead of handling millions of triangles (his tracer does only triangle meshes), support only mathematical geometric primitives: spheres, boxes, cylinders, cones, pyramids etc. And of course, support CSG operations between them. Yes, this would limit the amount of detail that could be modeled, but perhaps that's acceptable. It would create a sort of visual style to the worlds - designers would focus less on highly detailed modeling and more on artistic simplicity. Perhaps, to support more organic, free-form shapes, subdivision surfaces could be supported. They could be subdivided on the fly as Ingo Wald outlined in Packet-based Ray Tracing of Catmull-Clark Subdivision Surfaces.
The theoretical advantage I see in this approach is:
- Less RAM needed to hold the scene which means less access to RAM and more would fit in the CPU cache.
- Smoother curves (eg, a perfect sphere instead of triangles approximating a sphere)
- Fewer primitives in the acceleration structure (BSP or BVH) means faster traversal?
- More opportunities for bulk processing (keep reading)
A bulk processing algorithm:
Instead of executing the entire render equation for each pixel, what if we execute steps in the render equation piece by piece. This would increase caching performance and SIMD opportunities wouldn't it? An algorithm like this:
- First shoot the primary rays from the camera (in packets) into the scene. Stop traversal as soon as the bounding box for a complex primitive (like a subdivision surface) is hit. Perhaps even stop on the bounding box for any primitive.
- For each struck bounding box (sorted by type of primitive) calculate the exact hit point of all the rays which hit the bounding box. Repeat steps 1 & 2 to handle rays which miss the actual object.
- For each struck primitive (sorted by material type), compute reflection or refraction ray. (Are reflection rays only necessary for certain material types?)
- At the same time compute rays to VPLs and other light sources.
- Supersampling for anti-aliasing fits in here somewhere
- Trace all these rays in bulk.
- Compute material colors where rays hit (in bulk, sorted by material)
- Multiply material colors with lighting values in bulk and save pixel values
Such an algorithm will have a high memory footprint but I believe it will be largely a process of streaming into and from memory (predictable) instead of random access. Certain portions would also lend themselves well to bulk processing by the GPU. Perhaps many parts could be written in OpenCL?
The big unknowns to me right now are how occlusion culling is achieved since the acceleration structure does not store fine triangle meshes but instead entire implicit surfaces. In other words, how do you find the first object which the ray hits because many bounding boxes will overlap.
Secondly, how many reflection bounces are needed? My reading says that IGI handles color bleeding so it seems that you'd only need reflection bounces on shiny or glossy surfaces.
Changing subject back to primitive types. I really couldn't live without rock. Perhaps some sort of grid based rock as described by the Arches terrain framework?