Steve Baker
Kevin Dale
Michael Holroyd

Helmholtz Stereopsis
Computer Vision Final Project

1. Introduction
An important area of focus in computer vision is the process of recovering the 3-D structure of a scene from photographs. The traditional approach in solving this problem is to take two (or more) calibrated images of a scene and identify 2-dimensional points in each image that correspond to the same 3-dimensional point in the scene. Given this information, the 3-dimensional point can be placed in the scene using simple triangulation.
Unfortunately, the problem of finding corresponding points is very difficult in general. A common technique for simplifying this matching process is to assume that all surfaces in the scene are perfect Lambertian reflectors. In this case, the radiance due to any point in the scene is the same regardless of the imaging position, and so the problem reduces to finding pixels of identical intensity in each image. This assumption is made frequently even in more complex such as structure from motion [2] and photometric stereo [3]. Some techniques allow a broader range of materials, but require that the bidirectional reflectance distribution function (BRDF) be known in advance.
Although this assumption is valid for many scenes of interest, there are many cases in which it fails. In particular, points on an object that exhibit strong specular highlights can vary greatly in intensity even for very similar camera positions. A new technique named Helmholtz stereopsis developed in [1] is a variant on multinocular stereopsis that relies on the reciprocity of light to find corresponding image points independently from the BRDF of objects in the scene.
Although the matching procedure in Helmholtz stereopsis does not depend explicitly on the BRDF, in this project we investigate the impact of different reflection models on the overall performance of Helmholtz stereopsis using synthetic data with known ground truth. We also acquire a series of images using very roughly calibrated light and camera positions (positioned and swapped by hand) and examine how sensitive Helmholtz stereopsis is to precise calibration

2. The reciprocity constraint
Consider a camera positioned at ol and a point light source at or. Let vl and vr be unit vectors pointing toward ol and or respectively. Then, for a given point pin the scene the irradiance is computed simply as

i_l = f_r(v_r,v_l) \frac{n \cdot v_r}{|o_r - p|^2}

where fr is the BRDF and n is the surface normal. The numerator corresponds to the familiar cosine falloff due to the angle made between surface and the incoming light, and the denominator to the r2 falloff due to the distance from the light source. Now, if we switch the positions of the camera and light, the same point in the scene is imaged as
i_r = f_r(v_l,v_r) \frac{n \cdot v_l}{|o_l - p|^2}

Helmholtz reciprocity requires that light be reflected equally when the direction of the light and camera are switched (this is what allows tracing rays from the eye instead of the light in modern ray-tracing), and so fr(vl,vr) = fr(vr,vl).
The key observation of Helmholtz stereopsis is that we can now combine the above equations:
i_l \frac{|o_r - p|^2}{n \cdot v_r} = i_r \frac{|o_l - p|^2}{n \cdot v_l}
i_l \frac{n \cdot v_l}{|o_l - p|^2} = i_r \frac{n \cdot v_r}{|o_r - p|^2}
(i_l \frac{v_l}{|o_l - p|^2} - i_r \frac{v_r}{|o_r - p|^2}) \cdot n = 0
w(d) \cdot n= 0

where we write w(p) to emphasize that only the point p and the normal at that point n are the only unknowns in this equation. At any point actually on the surface of an object in the scene, this Helmholtz constraint must hold. Most importantly, notice that the BRDF has been completely removed from this equation. As a result, specular highlights appear fixed to the surface of objects, and areas occluded in one view are not illuminated in the other. Here is an example showing these effects for a very shiny plastic material:

3. Implementation
The Helmholtz constraint gives us a way to check if a proposed point in the scene is plausible or not. Given a point p we construct the following equation:

\begin{bmatrix} w_1(p) \\ w_2(p) \\ \vdots \\ w_N(p) \end{bmatrix} \cdot n = 0

Where each wi(p) is the Helmholtz constraint for a reciprocal image pair. Since n has three unknowns, we require at least 3 reciprocal pairs of images to exploit this equation. In practice we take multiple reciprocal pairs and find a least-squared error approximation of n using SVD.
In practice due to noise in the measurements the solution is not usually perfect. We estimate how well a point matched the constraint by computing the singular value decomposition of the entire W matrix and taking \frac{\sigma_2}{\sigma_3}as the matching measure. Notice that in the perfect case, the third singular value is 0, because n defines the 1-dimensional null-space of W. This is a better alternative to simply measuring |W \cdot n|, since that norm will be biased toward areas of low intensity.

The remaining step is to determine how we select candidate points in 3-dimensional space. Since our desired output is a depth map, we define a virtual orthographic reference camera at the mean position of all reciprocal pairs, and test points at discrete depths along a ray passing through each pixel. The result of this computation is a large 3-dimensional matrix that is m pixels wide, n pixels tall, and D pixels deep, where D is the number of discrete depths we sample. Each entry in the matrix is the matching measure determined for that point.
Although the matching constraint is a necessary condition for a point on the surface, there is no guarantee that points off the surface will not also match the constraint. Although the opportunity for a more advanced algorithm is clearly available, we choose to follow [1] and blur this matrix along the x and y dimensions, making the assumption that surfaces change slowly across the image. Then, for each pixel in the virtual reference camera we select the best match, and assign the corresponding depth and normal to the final result.

4. Results
We tested our implementation on two synthetic datasets rendered with pbrt [4]. Results for both test scenes are shown below, with a depth map (right) and quiver plot showing the recovered normals (center, down-sampled for visualization):

Original scene Recovered normals Recovered depths

Here's ground truth for the Buddha data set:

Although the recovered depths are relatively coarse due to blurring the D matrix, precise normals are acquired at every individual pixel.
By constructing these synthetic scenes we are able to precisely measure the error due to Helmholtz stereopsis for a variety of materials. The results from this experiment are shown below for several reflectance models built into pbrt.

These results were somewhat surprising, since we initially hypothesized that the algorithm's performance would improve with increasing complexity in the BRDF. Here, however, we see poor performance on the shiny metal and plastic materials, both of which have a strong glossy component. The metal material contains an additional mirror component. The substrate material is anisotropic, and the skin, based on measured data. We attribute the relatively poor performance of the algorithm on the plastic and metal examples here to sampling error, which we discuss in Sec. 6.
We also evaluated the algorithm's performance for different degrees of specularity and anisotropy. We first generated multiple input data sets of both scenes, varying the degree of glossiness in the plastic material. Results are shown below.

The error in depth estimation for the Buddha data set agrees with our original hypothesis (there's no noticable trend for the 'Roo data). However, the opposite trend is apparent in the normal error across both test scenes. We address this distinction in Sec. 6.
We performed two similar experiments to evaluate this technique for varying degrees of anisotropy. Here, we generated data sets for predominantly-glossy and predominantly-diffuse substrate materials, varying the degree of anisotropy in each. We found the algorithm to be quite robust to anisotropic materials. The standard deviation of both normal and depth estimation error across both test scenes was less than 1% of the mean. This is especially interesting since these effects frequently confuse traditional techniques that expect materials to appear similar from all angles.

5. Reconstruction:
As in the original paper [1], we use the method of Chellappa and Frankot [5] to reconstruct an improved depth estimate by integrating the normal field. Results for the Buddha model are shown below.

6. Discussion:
Examining the performance of Helmholtz stereopsis on different materials has resulted in some seemingly contradicting results. However, these discrepancies are likely due to sampling error. Here we're essentially band-limited by both the resolution of the input images and depth tests, so high-frequency components in both the BRDF and the scene geometry beyond the Nyquist limit lead to aliasing when (re)sampling the irradiance in the input images.
The large error in both depths and normals for the metal and plastic materials are most likely due to this sampling error. Smoothing out the D matrix, however, acts as a low-pass filter for input to the depth approximation. It is possible that using a larger window size during this step would improve the results for both of these shiny materials.
Smoothing of the D matrix only effects the depth approximation, however. Aliasing in the image irradiance is still carried through to normal estimation. This would explain the specularity test results; for the Buddha data set, depth estimation error decreases substantially with increasing specularity (there is no clear trend for the killeroo data set). However, the opposite trend can be seen in normal approximation error for both data sets. That is, the window size was sufficient to avoid aliasing artifacts that would otherwise corrupt the depth estimate, but the aliasing did negatively impact the normal estimation. Intelligent filtering of the irradiance to account for this phenomenon is an item for future work.
The most promising aspect of this technique is the relatively low error regardless of the specific BRDF. There are several obvious improvements to the algorithm. Again, careful filtering of the input images when sampling at projected 3D candidate positions could improve normal estimation for complex geometry and materials. Additionally, more efficient guessing of candidate depths along each ray could improve the overall efficiency. Finally, an iterative procedure which refines the depth estimation based on the recovered normals could result in an improved inital depth estimate.

6. Future work:
A dataset was acquired using a real camera (seen in the diagram earlier). Unfortunately we ran out of time before successfully reconstructing the scene with the algorithm. Certainly we would like to finish this and examine how sensitive Helmholtz stereopsis is to the calibration procedure.

[1] Zickler, Todd; Belhumeur, Peter; Kreigman, David (2002), Helmholtz Stereopsis: Exploiting Reciprocity for Surface Reconstruction
[2] Webb, J.A. (1981), Shape and structure from motion of objects. Ph.D. Thesis, University of Texas at Austin.
[3] Horn, Berthold K.P. (1975), Obtaining Shape from Shading Information. Shape from Shading pp. 123-173.
[4] Pharr, M; Humphreys, G. (2004), Physically Based Rendering: From Theory to Implementation.
[5] Chellapppa, R; Frankot, R.T (1987), A Method for Enforcing Integrability in Shape from Shading Algorithms. International Conference on Computer Vision, pp. 118-128.