Final Project - Writeup

CS 851 Interactive Ray Tracing

Rui Wang (rw2p@cs.virginia.edu)

Introduction

The goal of this project is to implement the idea of the "render cache" (see the paper here) and design better sampling and reconstruction strategies. The basic idea of render cache is re-use rendered pixels from previous frames to give a quicker response of rendered result. The render cache keeps a cache of rendered 3D points; duing motion, cached points are transformed and projected to screen space to form new rendered pixels; some screen pixels may not have cache points to map to, also cached points need to be updated to get new values, so a sampling stage computes the place on screen where new samples are most needed, and informs the renderer to render these points; finally a frame is reconstructed from possibly sparse samples on screen.

In their original paper, it's process like this:

There are a few problems with it :

1) Although the sampling strategy is priority-based, it's a simple priority error diffusion dithering algorithm. It doesn't work very well to solve the problem that sampling should be both based on priority and distributed evenly in space.

2) Smooth interpolation does not work very well to reconstruct image from sparse samples. In fact, their simple filtering method blurs the image and gives undesirable result.

3) Newly requested samples will be added to cache in the next frame instead of the current. When user is moving the viewpoint continuously, this gives a delay of displaying new samples.

The new design tries to solve the above problems, and it's now processed like this:

 

Implementation

1) Renderer:
For simplicity and easiness to control, the renderer is just a OpenGL application. I added all necessary code to parse .aff scene files (BART testing scenes), and I used the kitchen scene as a test scene. Basically for each frame, the color buffer and depth buffer are read back using glReadPixels, then render cache is called; duing sampling stage, pixels that need samples will read information from color and depth buffer. This simulates a real ray tracer, where the renderer computes neede pixel values. Since OpenGL programs are substantially faster than ray tracer, a number is set as the maximum points produced per second, which determinted how many samples can be rendered each frame, given the desired frame rate. By this, I can also vary the number to see how the system works under very low or very high pixel output.

2) Transformation and Projection:
Since the depth values read from OpenGL depth buffer are in projection space (x, y, z values are all projected through projection matrix and clamped between [-1, 1]), to avoid extra conversion computatoin, the render cache also keeps 3D points in projection space. When there is no motion, there is no need to transform the points, so when user stops moving, the process will skip transformation and projection.

To do the transformation, the projection matrix P, old modelview matrix Mo and new modelview matrix Mn are read from OpenGL; for point s, its transformed point (again in projection space) is s' = P*Mn*Mo.inv*P.inv*s; to simplify it, I concatenate the four matrix multiplication as M, and do s' = M*s for every point in the cache.

As mentioned in the paper, transformation of NxN points (N is screen resolution. Since the cache keeps a little bit more than screen pixels, it's actually more than NxN points) is expensive. The proposed using Intel's SSE2 instructions to improve the performance by taking use of vector calculation. Since my office computer is AMD and only supports SSE instructions, I ended up with doing transformation with SSE. It's still very fast. For 512x512 points, a transformation only takes about 17-20ms.

After transformation, x, y, z values will be clamped between [-1,1]. For points falling outside this range, they are simply marked as UNSEEN and kept in the cache. If new samples come and there is not enough space, a cache_clean routine is called to clear all UNSEEN points.

3) Compute Priorities:
In their paper, they have a 'depth cull' stage after projection, which removes points that are occluded after projection. I added this part and experimented with it, but it has not a whole lot difference with not having it at all. This happens because of two reasons: the depth cull is mostly needed when an object is suddenly occluded by another object, for example, because of the user moves viewpoint backward across a door, but this is not frequently happening; even when if happens, the simple depth cull test as in the paper won't solve the problem very well since it only checks the 3x3 neighbors and depends on some threshold values. So the depth cull might work for a synthetic scene such as the sphere demo, but isn't very useful in real scenes.

Priorities are computed based on the age of a point (the age is a property of cache point, and priority is a property of image point, notice that after projection, the process leaves cache space and goes to image space. so the priority based on age is assigned during projection stage when a 3D point is mapped to a 2D image pixel). It's also dependent on a density estimate: the number of pixels that are non-empty in the 3x3 neighbors. The formula to compute priority is currently like this: for empty pixels, it's linearly scaled between [0,0.5] based on density estimate; for non-empty pixels, it's linealy scaled between [0.5,1] based on age. Here lower priority values mean higher priority to sample. This means empty pixels (pixels that do not have 3D point to map to) will always get higher priority than non-empty pixels. This guarantees that when there is black (empty) pixels existing, they should always be sampled first; then pixels that are dirty (old) are updated. So the process always tries to fill holes first and then refine the image.

4) Sampling Priority Map:
The problem of sampling the priority map has two goals that are somehow contradictory to each other: pixels having higher priorities should be sampled first, and sampling should be evenly distributed across the image. Usually pixels with high priorities tend to cluster together, hence sampling based purely on priority will give poor spatial distribution; and sampling purely evenly will disgard consideration of priority at all.

I tried a new strategy based on the hit-and-test approach in Tole's 2002 Siggraph paper: first, pixels that have priority less than 1 are gathered into a sample pool (priority=1 means it's newly rendered and the user hasn't moved since then, so there is no need to sample); second, a random point is selected from the pool; third, a random reference number is generated and compared with the point priority, if the priority is less than reference number, it's successfully selected, AND, the point is swapped with the last element in the sample pool, and the sample pool size decrease by one (this guarantees that no point will be selected twice); if it's not selected, step two is repeated until we get enough sample points. In practice, when the sample pool is small, this selection process will give poor performance, and in this case, priorities are simply disgarded and samples are taken spatially uniformly.

Next values for the selected pixels are requested from rendered, and these samples are immediately added, so there won't be a one-frame delay like in the render cache paper.

5) Image Reconstruction:
One problem with the render cache paper is their simple reconstruction (by a box filter) tends to blue image everywhere, even for non-empty pixels. So instead, I use reconstruction kernel that is weighted by the reciprocal of the distance between two pixels. As a result, non-empty pixels are not blurred at all, and ideally empty-pixels appear to be bilinearly interpolated from neighbor non-empty pixels. Certainly for very sparse samples, it won't work very well. Ideally I can construct delauney triangulation of the sparse samples and render them as smooth polygons. But triangulation of unorganized points is very expensive hence not practical in interactive environment.

Results

1) Renderer:

2) Render cache without image reconstruction:

3) Render cache with full image reconstruction: (5x5 kernel)

4) Priority Map:

5) Sampling Map: (suppose a rendered with 20,000 pixel output / sec)