851 Interactive Ray Tracing
goal of this project is to implement the idea of the "render
cache" (see the paper here)
and design better sampling and reconstruction strategies. The
basic idea of render cache is re-use rendered pixels from previous
frames to give a quicker response of rendered result. The render
cache keeps a cache of rendered 3D points; duing motion, cached
points are transformed and projected to screen space to form
new rendered pixels; some screen pixels may not have cache points
to map to, also cached points need to be updated to get new
values, so a sampling stage computes the place on screen where
new samples are most needed, and informs the renderer to render
these points; finally a frame is reconstructed from possibly
sparse samples on screen.
their original paper, it's process like this:
are a few problems with it :
Although the sampling strategy is priority-based, it's a simple
priority error diffusion dithering algorithm. It doesn't work
very well to solve the problem that sampling should be both
based on priority and distributed evenly in space.
Smooth interpolation does not work very well to reconstruct
image from sparse samples. In fact, their simple filtering method
blurs the image and gives undesirable result.
Newly requested samples will be added to cache in the next frame
instead of the current. When user is moving the viewpoint continuously,
this gives a delay of displaying new samples.
new design tries to solve the above problems, and it's now processed
For simplicity and easiness to control,
the renderer is just a OpenGL application. I added all necessary
code to parse .aff scene files (BART
testing scenes), and I used the kitchen scene as a test scene.
Basically for each frame, the color buffer and depth buffer
are read back using glReadPixels, then render cache is called;
duing sampling stage, pixels that need samples will read information
from color and depth buffer. This simulates a real ray tracer,
where the renderer computes neede pixel values. Since OpenGL
programs are substantially faster than ray tracer, a number
is set as the maximum points produced per second, which determinted
how many samples can be rendered each frame, given the desired
frame rate. By this, I can also vary the number to see how the
system works under very low or very high pixel output.
Transformation and Projection:
Since the depth values read from OpenGL
depth buffer are in projection space (x, y, z values are all
projected through projection matrix and clamped between [-1,
1]), to avoid extra conversion computatoin, the render cache
also keeps 3D points in projection space. When there is no motion,
there is no need to transform the points, so when user stops
moving, the process will skip transformation and projection.
do the transformation, the projection matrix P, old modelview
matrix Mo and new modelview matrix Mn are read from OpenGL;
for point s, its transformed point (again in projection space)
is s' = P*Mn*Mo.inv*P.inv*s; to simplify it, I concatenate the
four matrix multiplication as M, and do s' = M*s for every point
in the cache.
mentioned in the paper, transformation of NxN points (N is screen
resolution. Since the cache keeps a little bit more than screen
pixels, it's actually more than NxN points) is expensive. The
proposed using Intel's SSE2 instructions to improve the performance
by taking use of vector calculation. Since my office computer
is AMD and only supports SSE instructions, I ended up with doing
transformation with SSE. It's still very fast. For 512x512 points,
a transformation only takes about 17-20ms.
transformation, x, y, z values will be clamped between [-1,1].
For points falling outside this range, they are simply marked
as UNSEEN and kept in the cache. If new samples come and there
is not enough space, a cache_clean routine is called to clear
all UNSEEN points.
In their paper, they have a 'depth cull'
stage after projection, which removes points that are occluded
after projection. I added this part and experimented with it,
but it has not a whole lot difference with not having it at
all. This happens because of two reasons: the depth cull is
mostly needed when an object is suddenly occluded by another
object, for example, because of the user moves viewpoint backward
across a door, but this is not frequently happening; even when
if happens, the simple depth cull test as in the paper won't
solve the problem very well since it only checks the 3x3 neighbors
and depends on some threshold values. So the depth cull might
work for a synthetic scene such as the sphere demo, but isn't
very useful in real scenes.
are computed based on the age of a point (the age is a property
of cache point, and priority is a property of image point, notice
that after projection, the process leaves cache space and goes
to image space. so the priority based on age is assigned during
projection stage when a 3D point is mapped to a 2D image pixel).
It's also dependent on a density estimate: the number of pixels
that are non-empty in the 3x3 neighbors. The formula to compute
priority is currently like this: for empty pixels, it's linearly
scaled between [0,0.5] based on density estimate; for non-empty
pixels, it's linealy scaled between [0.5,1] based on age. Here
lower priority values mean higher priority to sample. This means
empty pixels (pixels that do not have 3D point to map to) will
always get higher priority than non-empty pixels. This guarantees
that when there is black (empty) pixels existing, they should
always be sampled first; then pixels that are dirty (old) are
updated. So the process always tries to fill holes first and
then refine the image.
Sampling Priority Map:
The problem of sampling the priority map
has two goals that are somehow contradictory to each other:
pixels having higher priorities should be sampled first, and
sampling should be evenly distributed across the image. Usually
pixels with high priorities tend to cluster together, hence
sampling based purely on priority will give poor spatial distribution;
and sampling purely evenly will disgard consideration of priority
tried a new strategy based on the hit-and-test approach in Tole's
2002 Siggraph paper:
first, pixels that have priority less than 1 are gathered into
a sample pool (priority=1 means it's newly rendered and the
user hasn't moved since then, so there is no need to sample);
second, a random point is selected from the pool; third, a random
reference number is generated and compared with the point priority,
if the priority is less than reference number, it's successfully
selected, AND, the point is swapped with the last element in
the sample pool, and the sample pool size decrease by one (this
guarantees that no point will be selected twice); if it's not
selected, step two is repeated until we get enough sample points.
In practice, when the sample pool is small, this selection process
will give poor performance, and in this case, priorities are
simply disgarded and samples are taken spatially uniformly.
values for the selected pixels are requested from rendered,
and these samples are immediately added, so there won't be a
one-frame delay like in the render cache paper.
One problem with the render cache paper
is their simple reconstruction (by a box filter) tends to blue
image everywhere, even for non-empty pixels. So instead, I use
reconstruction kernel that is weighted by the reciprocal of
the distance between two pixels. As a result, non-empty pixels
are not blurred at all, and ideally empty-pixels appear to be
bilinearly interpolated from neighbor non-empty pixels. Certainly
for very sparse samples, it won't work very well. Ideally I
can construct delauney triangulation of the sparse samples and
render them as smooth polygons. But triangulation of unorganized
points is very expensive hence not practical in interactive
Render cache without image reconstruction:
Render cache with full image reconstruction:
Sampling Map: (suppose a rendered with
20,000 pixel output / sec)