CS651: Computer Vision
Fall 2007
|
Home |
Schedule |
Assignments |
Additional Resources
Assignment 2
Due Tuesday, October 16
1. Questions (30%)
- Detecting and computing symmetries of 3D objects has many
applications in shape analysis, shape matching, shape representation, object detection,
object recognition, etc. Assume you have been given an oriented point cloud: a
dense set of 3D points (x,y,z) and corresponding surface normals
(n_x,n_y,n_z), evenly distributed over the surface of some object. Using the Hough transform, develop an algorithm for detecting
the planes of symmetry for this object. Hint: use simple examples and
consider this problem in 2D before generalizing your approach to arbitrary shapes in 3D.
- Imagine you are given two vectors, a "signal" S and a
"template" T. Assume T is shorter than S. Now,
you want to find the position within S at which T is the
best match according to the sum of squared differences (SSD)
criterion. That is, given an offset k at which you are looking
for T within S, you want to find the k that
minimizes:
Show how you can do this without an explicit loop over k
by computing two convolutions:
- S convolved with some variant of T
Hint: this won't necessarily be T itself, but
some simple transformation. Look at the equation above, and compare
with the definition of discrete convolution.
- The vector consisting of the square of each element of S,
convolved with a vector the same length as T but consisting
of all ones.
You will need this result for the face detection portion of the
assignment below.
- Suppose you only need to determine the most significant
eigenvector (i.e., that whose corresponding eigenvalue has the
largest magnitude) of an arbitrary NxN matrix A. Although computing
the full SVD of A will certainly do the job, this approach is rather
inefficient since it requires computing a complete decomposition
of the matrix. This question asks you to devise a more efficient
strategy.
Assume A is not singular and has eigenvalues
and corresponding
eigenvectors
. First,
convince yourself that we can write any arbitrary N-element vector
as a linear combination of these eigenvectors:
Next, note what happens when we multiply v by A:
Use this result to devise a simple iterative algorithm for
approximating the most significant eigenvector of A. Discuss the
expected running time of your algorithm as it relates to the
eigenvalues of A. There is a situation (property of A) in which you
will not be able to identify the eigenvector with the largest
eigenvalue using this trick. What is it? What is a real-world
application of this algorithm? Lastly, describe how to use this
same property to find the second most significant eigenvector, the
third, etc.
2. Aligned database of faces (20%)
You are going to build a face detection system for this assignment.
Our first task is to assemble a database of aligned and cropped images
of the faces of people who have take this class over the years. You should have received an
email that contains directions for downloading a collection of face
images (if this is not the case then please e-mail the instructor).
There should be two sets of images (neutral_XX.jpg and smile_XX.jpg).
Write a simple program in MATLAB that:
- Loads and displays each image in turn
- Lets the user click on the centers of the eyes in each image and
stores these coordinates (use the getpts function).
- Warps the images so that the eye points are mapped to fixed
locations, 100 pixels apart horizontally (use
the imtransform and cp2tform functions).
- Crops out an appropriate section of the image (e.g., 300 pixels
tall by 200 pixels wide). Note the easiest way to do this might be
to use the XData and YData options
to imtransform
- Saves the results so you don't have to repeat the clicking
multiple times.
At this point you should have a collection of equal-sized images with
the person's eyes in roughly the same position.
Next compute an intensity image of the "average face" in which each
pixel is set to the average intensity of that pixel across your set of
aligned and cropped images. Note the quality of this average face
(i.e., how well it resembles an actual face with distinguishable eyes,
mouth, etc.) will depend on the quality of your alignment and is a
great way to test that you got the first part correct. Submit this
image as part of your write-up. Although not required for this
assignment, You may want to also consider using more than just the
eyes to perform the alignment (e.g., have the user click on the tip
of nose, corners of the mouth, etc.).
3. Face detection (40%)
Treat the average face you have computed as a template. To detect
faces in a target image compare its intensity to the template at each
pixel and record the SSD between the two. You should have received
e-mail describing how to download a set of test images taken in class
(if not, contact the instructor). Also, browse the web and/or
the CS
photo collection for additional test images.
Because these images are large it is important to use the FFT to
efficiently perform the convolutions required to compute the SSD
between the template and target image (recall that you derived the
necessary relationship between SSD and convolution in part 1 of this
assignment).
Build a simple face detector that reports the presence of a face when
the SSD is below a threshold. Play around with the value of this
threshold and show images that indicate its relationship to the
sensitivity of your detector. Extend this basic face detector in the
following two ways:
- Perform non-maximum suppression whereby you return only the
strongest match within local image regions (similar to what you did
for your edge and corner detectors).
- Compute the SSD image for the target image at different scales
(consult help imresize in MATLAB). Look for faces at 100%,
75%, 50% and 25% of the original size.
A good "sanity check" for your detector is to apply it to the images
used to form the template in the first place. Use the unaligned and
uncropped versions and see if you can find the face. Your write-up
should show the SSD images and detection results for several different
target images and answer the following questions:
- How well does your detector work? How would you evaluate its
performance in a principled way (i.e., how would you scientifically
compare multiple face detectors)?
- Is there significant difference in its accuracy when applied to
images of people not in our class and thus not included in the
construction of the template?
- What types of false positives does it return (i.e. what image
regions tend to resemble your face template)?
- How might you automate the alignment/cropping stage from before
for a huge collection of face images using this detector? How would
you create your template?
Finally, incorporate color information into your template and discuss
and evaluate how (if) this improves the performance of your detector.
4. Eigenfaces (10%)
First construct a matrix by unrolling each aligned and cropped face
image into a separate column (consult help reshape in
MATLAB). The final matrix should have #rows = #pixels in each image
and #cols = #subjects. Perform "whitening" by first subtracting the
average face from each image before it is placed into the matrix.
Apply SVD to this matrix of face images (consult help svd in
MATLAB, especially the svd(X,O) syntax).
Include in your write-up a plot of the singular values and show the
five most significant eigenfaces as images. Use a visualization of
your choice (simple mapping to grayscale is fine).
Project each face image onto the FIRST TWO PRINCIPAL COMPONENTS.
Hint: these are stored in V^T, but can also be computed as the dot
product of a face image organized into a vector and each principal
component. Now you have two coordinates associated with each face
image in your database. Create a scatter plot of these 2-D points
(consult help scatter in MATLAB).
Answer the following questions:
- The eigenfaces encode the principal sources of variation in your
dataset (e.g., absence/presence of facial hair, skin tone, glasses,
etc.). What do the eigenfaces you computed correspond to?
- Using the plot of singular values as a guide, choose a number of
principal components that you would use to form a compact and
accurate representation of your face database. Justify/explain your
decision. What is the total approximation error introduced by using
this number of components in place of the original images? What is
the total amount of space required to store this compressed
representation? What is the compression ratio (i.e., ratio of
compressed size to original size). Show your work for these
calculations.
Submitting
This assignment is due Tuesday, October 16, 2007 at 11:59 PM. Please
see the general notes on
submitting your assignments, as well as the
late policy and the
collaboration policy.
Please submit:
- Your write-up as an HTML file, with links to your code and
input/output images.
Note that programming in Matlab is not an excuse to write unreadable code.
You are expected to use good programming style, including meaningful variable
names, a comment or three describing what the code is doing, etc. Also, all
images you submit should have been saved with the "imwrite" function - do
not submit screen captures of the image window.