CS651: Computer Vision
Spring 2007
|
Home |
Schedule |
Assignments |
Additional Resources
Assignment 2
Due Tuesday, Feb. 27
1. Questions (30%)
- Imagine you are given two vectors, a "signal" S and a
"template" T. Assume T is shorter than S. Now,
you want to find the position within S at which T is the
best match according to the sum of squared differences (SSD)
criterion. That is, given an offset k at which you are looking
for T within S, you want to find the k that
minimizes:
Show how you can do this without an explicit loop over k
by computing two convolutions:
- S convolved with some variant of T
Hint: this won't necessarily be T itself, but
some simple transformation. Look at the equation above, and compare
with the definition of discrete convolution.
- The vector consisting of the square of each element of S,
convolved with a vector the same length as T but consisting
of all ones.
You will need this result for the face detection portion of the
assignment below.
- Suppose you only need to determine the most significant
eigenvector (i.e., that whose corresponding eigenvalue has the
largest magnitude) of an arbitrary NxN matrix A. Although computing
the full SVD of A will certainly do the job, this approach is rather
inefficient since it spends time performing a complete decomposition
of the matrix. This question asks you to devise a more efficient
strategy.
Assume A is not singular and has eigenvalues
and corresponding
eigenvectors
. First,
convince yourself that we can write any arbitrary N-element vector
as a linear combination of these eigenvectors:
Next, note what happens when we multiply v by A:
Use this result to devise a simple iterative algorithm for
approximating the most significant eigenvector of A. Discuss the
expected running time of your algorithm as it relates to the
eigenvalues of A. There is a situation (property of A) in which you
will not be able to identify the eigenvector with the largest
eigenvalue using this trick. What is it? What is a real-world
application of this algorithm? Extra credit: how might you use this
same property to find the second most significant eigenvector, the
third, etc. (hint: this is not simple).
2. Aligned database of faces (20%)
You are going to build a face detection system for this assignment.
Our first task is to assemble a database of aligned and cropped images
of the faces of people in this class. You should have received an
email that contains directions for downloading a collection of face
images (if this is not the case then please e-mail the instructor).
There should be two sets of images (neutral_XX.jpg and smile_XX.jpg).
Write a simple program in MATLAB that:
- Loads and displays each image in turn
- Lets the user click on the centers of the eyes in each image and
stores these coordinates (use the getpts function).
- Warps the images so that the eye points are mapped to fixed
locations, 100 pixels apart horizontally (use
the imtransform and cp2tform functions).
- Crops out an appropriate section of the image (e.g., 300 pixels
tall by 200 pixels wide). Note the easiest way to do this might be
to use the XData and YData options
to imtransform
- Saves the results so you don't have to repeat the clicking
multiple times.
At this point you should have a collection of equal-sized images with
the person's eyes in roughly the same position.
Next compute an intensity image of the "average face" in which each
pixel is set to the average intensity of that pixel across your set of
aligned and cropped images. Note the quality of this average face
(i.e., how well it resembles an actual face with distinguishable eyes,
mouth, etc.) will depend on the quality of your alignment and is a
great way to test that you got the first part correct. Submit this
image as part of your write-up.
For extra credit, use more than just the eyes to improve the alignment
(e.g., have the user click on tip of nose, corners of the mouth,
etc.).
For extra credit, automatically detect the eyes, mouth, etc. to avoid
manual clicking using a template-based method.
3. Face detection (40%)
Treat the average face you have computed as a template. To detect
faces in a target image compare its intensity to the template at each
pixel and record the SSD between the two. You should have received
e-mail describing how to download a set of test images taken in class
(if not, contact the instructor). Also, browse the web and/or
the CS
photo collection for additional test images.
Because these images are large it is important to use the FFT to
efficiently perform the convolutions required to compute the SSD
between the template and target image (recall that you derived the
relationship between SSD and convolution in part 1 of this
assignment).
Build a simple face detector that reports the presence of a face when
the SSD is below a threshold. Play around with the value of this
threshold and show images that indicate its relationship to the
sensitivity of your detector. Extend this basic face detector in the
following two ways:
- Perform non-maximum suppression whereby you return only the
strongest match within local image regions (similar to what you did
for the Canny edge detector).
- Compute the SSD image for the target image at different scales
(consult help imresize in MATLAB). Look for faces at 100%,
75%, 50% and 25% of the original size.
A good "sanity check" for your detector is to apply it to the images
used to form the template in the first place. Use the unaligned and
uncropped versions and see if you can find the face. Your write-up
should show the SSD images and detection results for several different
target images and answer the following questions:
- How well does your detector work? How might your evaluate its
performance in a principled way (i.e., how might you scientifically
compare multiple face detectors)?
- Is there significant difference in its accuracy when applied to
images of people not in our class and thus not included in the
construction of the template?
- What types of false positives does it return (i.e. what image
regions tend to resemble your face template)?
- How might you automate the alignment/cropping stage from before
for a huge collection of face images using this detector? How would
you create your template?
For extra credit, incorporate color information into your template and
discuss and evaluate how (if) this improves the performance of your
detector.
4. Eigenfaces (10%)
First construct a matrix by unrolling each aligned and cropped face
image into a separate column (consult help reshape in
MATLAB). The final matrix should have #rows = #pixels in each image
and #cols = #subjects. Perform "whitening" by first subtracting the
average face from each image before it is placed into the matrix.
Apply SVD to this matrix of face images (consult help svd in
MATLAB, especially the svd(X,O) syntax).
Include in your write-up a plot of the singular values and show the
five most significant eigenfaces as images. Use a visualization of
your choice (simple mapping to grayscale is fine).
Project each face image onto the FIRST TWO PRINCIPAL COMPONENTS.
Hint: these are stored in V^T, but can also be computed as the dot
product of a vector formed from a face image and each principal
component. Now you have two coordinates associated with each face
image in your database. Create a scatter plot of these 2-D points
(consult help scatter in MATLAB).
Answer the following questions:
- The eigenfaces encode the principal sources of variation in your
dataset (e.g., absence/presence of facial hair, skin tone, glasses,
etc.). What do the eigenfaces you computed seem to correspond to?
- Using the plot of singular values as a guide, choose a number of
principal components that you would use to form a compact and
accurate representation of your face database. Justify/explain your
decision. What is the total approximation error introduced by using
this number of components in place of the original images? What is
the total amount of space required to store this compressed
representation? What is the compression ratio (i.e., ratio of
compressed size to original size). Show your work for these
calculations.
You can stop here if you wish. The rest is extra credit and a maximum
of 10 additional points (out of 100) may be added to your grade.
Create a face recognition system using the eigenfaces computed above.
For a target image, project the image region centered at each pixel
onto your reduced eigenface basis (again, this is just a simple dot
product between the image region and each eigenface). Compare the
coordinates at each pixel in this basis to those computed for each
entry in the original database. Devise a simple scheme (e.g., based
on Euclidean distances and thresholds) for answering two questions:
- Does this image region correspond to a face?
- Does it correspond to a particular person in my face database?
(if the answer is yes, return their image from the database).
- Is the person smiling?
Your write-up should explain how your recognition algorithm works and
show its output for several test images. Also, answer the following
questions:
- How well does it work? How would you scientifically compare it
with other face recognition systems?
- What are some engineering challenges in using a very large number
of subjects or high-res images when constructing a good set of
eigenfaces.
- What would be required for designing a recognition system that is
insensitive to the orientation of the face? What other common
sources of variation might be interesting to consider?
Submitting
This assignment is due Tuesday, February 27, 2007 at 11:59 PM. Please
see the general notes on
submitting your assignments, as well as the
late policy and the
collaboration policy.
Please submit:
- Your write-up as an HTML file, with links to your code and
input/output images.
Note that programming in Matlab is not an excuse to write unreadable code.
You are expected to use good programming style, including meaningful variable
names, a comment or three describing what the code is doing, etc. Also, all
images you submit should have been saved with the "imwrite" function - do
not submit screen captures of the image window.