Computer Vision
• Stereo
Due: November 20
Code
- To understand the notion and structure of disparity space, two tests sets will be used along with 4 line segments in each. The 1D disparity will be computed along these line segments for each test image. This will result in a 2D function, where the 'x' axis corresponds to the distance along the line from the start of the line, and the 'y' axis corresponds to the disparity. The values encoded in this 2D plot represent the difference between a right image and a left image, given by the function:

- The resulting 2D function will be a measure of the closeness of a pixel in the right image to a pixel in the left image, for all disparities. In theory, this function is easy to understand, but implementing it has some side-effects. First, in order to have a true 2D function over some domain, all values of the function must be defined in that domain. But since the pictures have finite extent, they must be padded to allow for all of the disparity space to be considered. This introduces the second issue, which is: "How do we pad the image?" If we wish to preserve the notion of the equation above, values of infinity work the best.
- Here is a series of image showing the line segments in the test image pairs, and the corresponding disparity maps:
- These images are generated from the function above, and false colored to reveal their indications.
- There are some features of these images that are important to realize. The first property is that for the first (200-offset) pixels, the 2D map has the same value. This is because the images are padded on the left and right and the repeated value is the difference between the pixel and the pad. In this case, since the pad has been set to infinity, all pixel values become black in these regions.
- The second thing to note is that the disparity maps do not have unambiguous minima if there is little texture along the segment. Take the 3rd image above. Notice that the corresponding region in the image are realtively untextured and the disparity map is almost all the same value. The small band present is due to a blue streak across that beige patch, though it is difficult to see unless zoomed in on. Nevertheless, if one were to attempt to find the minimum in this map, there is a strong possibility that it would be incorrect.
- Along the same lines, the 5th image has this same uniformity in the disparity map, yet the line segment in the image does have texture. The issue here is the scale of the texture and it's structure. The texture pattern is very small compared to the overall scale of the scene and therefore results in very small spikes in the disparity map. If the spike is too narrow, the minimum might be lost to noise as well. More importantly, since the texture is repeated and small, it results in many spikes of approximately the same value resulting in the same problem as noted in the previous bullet.
- In areas that strong texture that is not repeated, the disparity maps have a stronger notion of a minimum and therefore will result in better approximations to the actual disparity.
Stereo
- Following the same process as above, we can compute the SSD-type disparity map for a range of window sizes. Since we are dealing with the 1D case, these windows will only aggregate information from neighborhoods along the x-direction. Window sizes of {1,3,5,10,20} were used to test the differences therein. The following images show the results where the window size increases from left to right, top to bottom:
- As expected with most windowed SSD algorithms, the disparity maps become more blurred as the window size is increased. This is expected because the windows infer that each pixel in the disparity map is actually an aggregation of its neighbors, thereby blurring the boundaries in the map.
- [See below for disparity plot analysis]
- Apart from the windowed SSD-type approach, we can also attempt to perform some type of smoothing that respects the qualities of the disparity map we are interested in. For this, we consider the membrane-diffusion method proposed by Scharstein and Szeliski. In this method, we attempt to smooth the disparity map around a given pixel according to a gaussian weight. The gaussian weight implies that we trust pixels that are closer to the pixel in question by some factor dubbed λ (which also controls the rate of convergence of the diffusion process). Further, we assert that the diffused pixel value should not differ greatly from the initial pixel value. We can control how much affinity the resulting pixel has to its initial value by the parameter β. The diffusion update routine is therefore:

- Where Ni is the 1D neighborhood of pixels around the ith pixel. For the results that follow, λ was set to 0.15 and β to 0.5. To show the diffusion process, one line segment from each dataset was chosen and its diffusion process recorded. The following series shows a series of 10 diffusion steps, from left to right, top to bottom:
- From these images, hopefully it is apparent that unlike the normal windowed SSD approach, the diffusion approach actually preserves the overall structure of the disparity map, but still performs a blurring in more localized regions. Unfortunately boundaries still become somewhat convoluted by this process. A newer approach would want to perform the blurring within boundaries rather than across them.
- A slightly different approach involves using knowledge about how certain we are that a computed disparity is in fact a good global minimum of the disparity map. Using another method proposed by Scharstein and Szeliski, we can attempt to use this knowledge to our advantage. The measure of how accurately we can compute a minumum is related to the difference between the minimum we compute and the second largest minumum. This will tell us how close we were to getting a strong minimum rather than possibly just being corrupted by noise. If this difference is large, we expect that our estimate is good, i.e. has large certainty. The certainty function given by Scharstein and Szeliski is:

- This will gives us the normalized certainty of a disparity value for each x-offset in a line. This value can then be used to stop the diffusion process in areas where certainty is unchanged by a diffusion step. Since diffusion should only affect disparities that are noisy (aka have low certainty), we should only perform diffusion where certainty is changed after a diffusion step. The underlying assumption is thatthe certainty of disparities with low certainty are highly affected by the diffusion step whereas disparities which we are fairly certain about, should not change across a step.
- To implement this, the normal diffusion routine is run (not the membrane model) and after each step, the certainty is checked with the certainty just before the diffusion step. If the certainty has not increased, we set the disaprities in that column back to their original values. The normal diffusion step is given by:

- Below is an example of one line in each test dataset as it is diffused by this process:
- These images show that the diffusion stopping process is a more global solution to the problem in that it diffuses things down to almost the same scale, but still provides the blurring necessary to reduce the noise in the image.
- Finally, here are the recovered disparities lfrom each of the aforementioned techniques along with the ground truth disaparity plots. These were generated by performing the windowed SSD approach, and then using that data to drive the diffusion methods. In doing so, a more hybrid approach was explored that had a denoised image as its input.
- The initial assumption that strong, non-repetative textures were a source of good disparity estimates seems to be upheld. Notice that the last line segment is passing through regions of little texture. The corresponding plots are not nearly as close to ground truth as the plots for the 2nd and 3rd segments. However, the winner-margin approach (which was the second diffusion technique covered) actually did perform rather well at identifying disparity. This is exactly what we would expect from this type of algorithm. Because it only diffuses inregions of low certainty, we preserve any good disparity guesses (of which there are a few across that line segment), but attempts to diffuse out the more uncertain regions. The uncertain regions are the larges blocks of uniform texture. Essentially this method allows us to use the meaningful texture data across the segment, and blur the less-meaningful texture data to acheive the best case scenario.
- In general, the pre-blurring of the input data to the diffusion methods helped to increase accuracy in some places. However, as expect, a lot of accurate data was lost to his process. For example, the second line segment's first plot with a 0px window exhibits very accurate resujlts from the diffusion methods. The same line segment with a pre-blurred disparity map with a 20px window is still somewhat accurate, but much of the finer resolution of disparities is lost.
- Here are the results from the cones dataset:
- The same trends are noted in this dataset. The one thing that is rather interesting is how well non-textured regions are estimated with a large pre-blur. Notice that in the third line segment, there is almost no texture present in the image, yet the plots for disparity for the 20px window are actually quite good, all things considered. It must be the case that there are some small patch of texture in those "non-textured" segments that the diffusion algorithms can highlight relatively well.
Space-Time Stereo
- We would assume that if we could somehow take into account images across the frames of a movie we could improve our estimate of the depth at every pixel. In the next set of experiments exactly this is done. First, we should have a baseline comparison. The first pair of recitified images was loaded, and their disparities were calculated for window sizes of {1,5,10}. Next the space-time approach was applied to the entire image sequence (31 images). In calculating the disparity map for each pixel, the modified disparity function was used:

- To initiate this experiment, two segments were chosen from the first pair of rectified images in the set. The disparity map was then calculated for these two line segments just as in the experiments above. There was no windowing in this experiment. Here are the results:
- These disparity maps exhibit the same behavior as before. The right segment was chosen to show what happens across the mouth of the vase. Since these pixels are all black, there is a constant value in the disparity map for that region.
- Next, the SSD-approach was applied to these maps, and the resulting disparities were calculated. The window sizes used were {1,5,10}. Here are the resulting disparity maps:
- The disparity estimates are rather poor. The larger windows do recover some of the structure of the vase (especially for the second segment), but in general, the results are inadequate. By incorporating the temporal information from the dataset, we should be able to capture a better estimate of the disparity. By using the modified disparity function, all of the images in the sequence were summed together over their individual disparity maps. The disparity was then calculated as the minimum sum in this new, temporally aggregated, disparity map. Here are the two maps resulting from this process:
- You can actually already see the resulting disparity map from these images! Notice the dark lines that are apparent in the two images. These lines represent the disparity in the images. Here are the actual recovered disparity plots:
- There are some artifacts from this process. The sharp spikes represent a misinterpretation of the data somehow, most likely due to noise. However, the rest of the disparity estimates are almost dead-on. In this first segment, notice that the vase slowly bends away from us. If this is hard to see, notice that we are below that actual center of the vase, and therefore closer to the underside of it. The disparity map shows this exactly. The disparity increases slowly across the segment. This essentially means that the distance increases across the segment.
- For the second segment, the disparity map is also very good. Notice that the vase bends away from us, sharply comes back to us, goes to blackness, and then is moderately far awway from us. This again is exactly what the disparity plot shows us.
- Although it is clear to the human eye how certain we can be of our measurements of disparity, a computer does not have the same ability. Therefore to illustrate the point that we have indeed increased the certainty in our estimate of disparity, the certainty is computed across each of the segments for the two methods. This certainty comes from the same equation as outlined before. Here are the resulting plots:
 |
| 1px Window |
5px Window |
10px Window |
 |
 |
 |
 |
| 1px Window |
5px Window |
10px Window |
 |
 |
 |
- Clearly we get more certainty with the space-time approach as opposed to the single image pair results. The only exception here is in the regions of blackness. In the first segment, the very end is black and in the second segment, about halfway through it turns black. In both certainty plots this gives certainty of basically 0. That isn't surprising since there is absolutely no texture and therefore no features to attempt to match. Adding the temporal component to this should not increase this certainty because we still will have blackness in all of the frames.
© 2007 Sean M. Arietta
University of Virginia