|
Steve Baker Kevin Dale Michael Holroyd |
Helmholtz Stereopsis
Computer Vision Final Project
1. Introduction
An important area of focus in computer vision is the process of recovering the 3-D structure of a scene from photographs. The traditional approach in solving this problem is to take two (or more) calibrated images of a scene and identify 2-dimensional points in each image that correspond to the same 3-dimensional point in the scene. Given this information, the 3-dimensional point can be placed in the scene using simple triangulation.
Unfortunately, the problem of finding corresponding points is very
difficult in general. A common technique for simplifying this matching
process is to assume that all surfaces in the scene are perfect
Lambertian reflectors. In this case, the radiance due to any point in
the scene is the same regardless of the imaging position, and so the
problem reduces to finding pixels of identical intensity in each
image. This assumption is made frequently even in more complex such as
structure from motion [2] and photometric stereo [3]. Some techniques
allow a broader range of materials, but require that the bidirectional
reflectance distribution function (BRDF) be known in advance.
Although this assumption is valid for many scenes of interest, there are many cases in which it fails. In particular, points on an object that exhibit strong specular highlights can vary greatly in intensity even for very similar camera positions. A new technique named Helmholtz stereopsis developed in [1] is a variant on multinocular stereopsis that relies on the reciprocity of light to find corresponding image points independently from the BRDF of objects in the scene.
Although the matching procedure in Helmholtz stereopsis does not depend explicitly on the BRDF, in this project we investigate the impact of different reflection models on the overall performance of Helmholtz stereopsis using synthetic data with known ground truth. We also acquire a series of images using very roughly calibrated light and camera positions (positioned and swapped by hand) and examine how sensitive Helmholtz stereopsis is to precise calibration
2. The reciprocity constraint
Consider a camera positioned at ol and a point light source at or. Let vl and vr be unit vectors pointing toward ol and or respectively. Then, for a given point pin the scene the irradiance is computed simply as

where fr is the BRDF and n is the surface normal. The numerator corresponds to the familiar cosine falloff due to the angle made between surface and the incoming light, and the denominator to the r2 falloff due to the distance from the light source. Now, if we switch the positions of the camera and light, the same point in the scene is imaged as


Helmholtz reciprocity requires that light be reflected equally when the direction of the light and camera are switched (this is what allows tracing rays from the eye instead of the light in modern ray-tracing), and so fr(vl,vr) = fr(vr,vl).
The key observation of Helmholtz stereopsis is that we can now combine the above equations:
![]() |
![]() |
![]() |
![]() |
where we write w(p) to emphasize that only the point p and the normal at that point n are the only unknowns in this equation. At any point actually on the surface of an object in the scene, this Helmholtz constraint must hold. Most importantly, notice that the BRDF has been completely removed from this equation. As a result, specular highlights appear fixed to the surface of objects, and areas occluded in one view are not illuminated in the other. Here is an example showing these effects for a very shiny plastic material:
3. Implementation
The Helmholtz constraint gives us a way to check if a proposed point in the scene is plausible or not. Given a point p we construct the following equation:
Where each wi(p) is the Helmholtz constraint for a reciprocal image pair. Since n has three unknowns, we require at least 3 reciprocal pairs of images to exploit this equation. In practice we take multiple reciprocal pairs and find a least-squared error approximation of n using SVD.
In practice due to noise in the measurements the solution is not usually perfect. We estimate how well a point matched the constraint by computing the singular value decomposition of the entire W matrix and taking
as the matching measure. Notice that in the perfect case, the third singular value is 0, because n defines the 1-dimensional null-space of W. This is a better alternative to simply measuring
, since that norm will be biased toward areas of low intensity.The remaining step is to determine how we select candidate points in 3-dimensional space. Since our desired output is a depth map, we define a virtual orthographic reference camera at the mean position of all reciprocal pairs, and test points at discrete depths along a ray passing through each pixel. The result of this computation is a large 3-dimensional matrix that is m pixels wide, n pixels tall, and D pixels deep, where D is the number of discrete depths we sample. Each entry in the matrix is the matching measure determined for that point.
Although the matching constraint is a necessary condition for a point on the surface, there is no guarantee that points off the surface will not also match the constraint. Although the opportunity for a more advanced algorithm is clearly available, we choose to follow [1] and blur this matrix along the x and y dimensions, making the assumption that surfaces change slowly across the image. Then, for each pixel in the virtual reference camera we select the best match, and assign the corresponding depth and normal to the final result.
4. Results
We tested our implementation on two synthetic datasets rendered
with pbrt [4]. Results for both test scenes are shown below, with a
depth map (right) and quiver plot showing the recovered
normals (center, down-sampled for visualization):
| Original scene | Recovered normals | Recovered depths |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Here's ground truth for the Buddha data set:
![]() |
![]() |
![]() |
Although the recovered depths are relatively coarse due to blurring the D matrix, precise normals are acquired at every individual pixel.
By constructing these synthetic scenes we are able to precisely measure the error due to Helmholtz stereopsis for a variety of materials. The results from this experiment are shown below for several reflectance models built into pbrt.
![]() |
![]() |
We also evaluated the algorithm's performance for different degrees of specularity and anisotropy. We first generated multiple input data sets of both scenes, varying the degree of glossiness in the plastic material. Results are shown below.
![]() |
![]() |
We performed two similar experiments to evaluate this technique for varying degrees of anisotropy. Here, we generated data sets for predominantly-glossy and predominantly-diffuse substrate materials, varying the degree of anisotropy in each. We found the algorithm to be quite robust to anisotropic materials. The standard deviation of both normal and depth estimation error across both test scenes was less than 1% of the mean. This is especially interesting since these effects frequently confuse traditional techniques that expect materials to appear similar from all angles.
5. Reconstruction:
As in the original paper [1], we use the method of Chellappa and
Frankot [5] to reconstruct an improved depth estimate by integrating
the normal field. Results for the Buddha model are shown below.
|
![]() |
6. Discussion:
Examining the performance of Helmholtz stereopsis on different
materials has resulted in some seemingly contradicting results.
However, these discrepancies are likely due to sampling error. Here
we're essentially band-limited by both the resolution of the input
images and depth tests, so high-frequency components in both the BRDF
and the scene geometry beyond the Nyquist limit lead to aliasing when
(re)sampling the irradiance in the input images.
The large error in both depths and normals for the metal and plastic
materials are most likely due to this sampling error. Smoothing out
the D matrix, however, acts as a low-pass filter for input to the depth
approximation. It is possible that using a larger window size during
this step would improve the results for both of these shiny materials.
Smoothing of the D matrix only effects the depth approximation,
however. Aliasing in the image irradiance is still carried through to
normal estimation. This would explain the specularity test results;
for the Buddha data set, depth estimation error decreases
substantially with increasing specularity (there is no clear trend for
the killeroo data set). However, the opposite trend can be seen in
normal approximation error for both data sets. That is, the window
size was sufficient to avoid aliasing artifacts that would otherwise
corrupt the depth estimate, but the aliasing did negatively
impact the normal estimation. Intelligent filtering of the irradiance
to account for this phenomenon is an item for future work.
The most promising aspect of this technique is the relatively low
error regardless of the specific BRDF. There are several obvious
improvements to the algorithm. Again, careful filtering of the input
images when sampling at projected 3D candidate positions could improve
normal estimation for complex geometry and materials. Additionally,
more efficient guessing of candidate depths along each ray could
improve the overall efficiency. Finally, an iterative procedure
which refines the depth estimation based on the recovered normals
could result in an improved inital depth estimate.
6. Future work:
A dataset was acquired
using a real camera (seen in the diagram earlier). Unfortunately we
ran out of time before successfully reconstructing the scene with the
algorithm. Certainly we would like to finish this and examine how
sensitive Helmholtz stereopsis is to the calibration procedure.
Bibliography:
[1] Zickler, Todd; Belhumeur, Peter; Kreigman, David (2002), Helmholtz Stereopsis: Exploiting Reciprocity for Surface Reconstruction
[2] Webb, J.A. (1981), Shape and structure from motion of objects. Ph.D. Thesis, University of Texas at Austin.
[3] Horn, Berthold K.P. (1975), Obtaining Shape from Shading Information. Shape from Shading pp. 123-173.
[4] Pharr, M; Humphreys, G. (2004), Physically Based Rendering: From
Theory to Implementation.
[5] Chellapppa, R; Frankot, R.T (1987), A Method for Enforcing Integrability in Shape from Shading Algorithms. International Conference on Computer Vision, pp. 118-128.

















