CS 4501: Computer Vision
Spring 2011
|
Home |
Schedule |
Assignments |
Additional Resources
Assignment 2
Due Wednesday, March 2
1. Questions (30%)
- Detecting and computing symmetries of 3D objects has many
applications in shape analysis, shape matching, shape representation,
object detection, object recognition, etc. Assume you have been given
an oriented point cloud: a dense set of 3D points (x,y,z) and
corresponding surface normals (n_x,n_y,n_z), evenly distributed over
the surface of some object. Using the Hough transform, develop an
algorithm for detecting the planes of symmetry for this object. Hint: use simple examples and consider this problem in
2D before generalizing your approach to arbitrary shapes in
3D.
- Imagine you are given two vectors, a
"signal" S and a "template" T. Assume T is
shorter than S. Now, you want to find the position within
S at which T is the best match according to the sum of
squared differences (SSD) criterion. That is, given an offset
k at which you are looking for T within S, you
want to find the k that minimizes:
Show how you can do this without an
explicit loop over k by computing two convolutions: -
S convolved with some variant of T
Hint: this won't necessarily be T itself, but some
simple transformation. Look at the equation above, and compare with
the definition of discrete convolution. - The vector
consisting of the square of each element of S, convolved with a
vector the same length as T but consisting of all ones.
You will need this result for the face detection portion of the
assignment below.
- Suppose you only need to determine
the most significant eigenvector (i.e., that whose corresponding
eigenvalue has the largest magnitude) of an arbitrary NxN matrix A.
Although computing the full SVD of A will certainly do the job, this
approach is rather inefficient since it requires computing a complete
decomposition of the matrix. This question asks you to devise a more
efficient strategy.
Assume A is not singular and has eigenvalues
and corresponding
eigenvectors
. First,
convince yourself that we can write any arbitrary N-element vector as
a linear combination of these eigenvectors:
Next, note what happens when we multiply v by
A:
Use this result to
devise a simple iterative algorithm for approximating the most
significant eigenvector of A. Discuss the expected running time of
your algorithm as it relates to the eigenvalues of A. There is a
situation (property of A) in which you will not be able to identify
the eigenvector with the largest eigenvalue using this trick. What is
it? What is a real-world application of this algorithm? Lastly,
describe how to use this same property to find the second most
significant eigenvector, the third, etc.
2. Aligned database of faces (20%)
You are going to build a face detection system for this assignment.
Our first task is to assemble a database of aligned and cropped images
of a set of faces. You should have received e-mail with instructions
for how to download our database of face images (facedb.zip). Inside
that ZIP file you will find 68 400x400 color JPEG images of your
fellow classmates smiling and/or showing a neutral
expression. (Warning: You are prohibited from distributing these
images or making them in any way accessible to anyone outside this
class under any circumstances!!)
Write a simple program in MATLAB that:
- Loads and displays each image in turn
- Lets the user click on the centers of the eyes in each image and
stores these coordinates (use the getpts function).
- Warps the images so that the eye points are mapped to fixed
locations, 100 pixels apart horizontally (use
the imtransform and cp2tform functions).
- Crops out an appropriate section of the image (e.g., 300 pixels
tall by 200 pixels wide). Note the easiest way to do this might be
to use the XData and YData options
to imtransform
- Saves the results so you don't have to repeat the clicking
multiple times.
At this point you should have a collection of equal-sized images with
the person's eyes in roughly the same position.
Next compute an intensity image of the "average face" in which each
pixel is set to the average intensity of that pixel across your set of
aligned and cropped images. Note the quality of this average face
(i.e., how well it resembles an actual face with distinguishable eyes,
mouth, etc.) will depend on the quality of your alignment and is a
great way to test that you got the first part correct. Submit this
image as part of your write-up. Although not required for this
assignment, You may want to also consider using more than just the
eyes to perform the alignment (e.g., have the user click on the tip of
nose, corners of the mouth, etc.).
3. Face detection (40%)
Treat the average face you have computed as a template. To detect
faces in a target image compare its intensity to the template at each
pixel and record the SSD between the two. You should have received
e-mail describing how to download the file class.zip which contains a
small set of test images taken during lecture. (See above disclaimer
about privacy and distributing images.) Feel free to browse the web
for additional test images.
Because these images are large it is important to use the FFT to
efficiently perform the convolutions required to compute the SSD
between the template and target image (recall that you derived the
necessary relationship between SSD and convolution in part 1 of this
assignment).
Build a simple face detector that reports the presence of a face when
the SSD is below a threshold. Play around with the value of this
threshold and show images that indicate its relationship to the
sensitivity of your detector. Extend this basic face detector in the
following two ways:
- Perform non-maximum suppression whereby you return only the
strongest match within local image regions (similar to what you did
for your edge and corner detectors).
- Compute the SSD image for the target image at different scales
(consult help imresize in MATLAB). Look for faces at 100%,
75%, 50% and 25% of the original size.
A good "sanity check" for your detector is to apply it to the images
used to form the template in the first place. Use the unaligned and
uncropped versions and see if you can find the face. Your write-up
should show the SSD images and detection results for several different
target images and answer the following questions:
- How well does your detector work? How would you evaluate its
performance in a principled way (i.e., how would you scientifically
compare multiple face detectors)?
- Is there significant difference in its accuracy when applied to
images of people not in our class and thus not included in the
construction of the template?
- What types of false positives does it return (i.e. what image
regions tend to resemble your face template)?
- How might you automate the alignment/cropping stage from before
for a huge collection of face images using this detector? How would
you create your template?
4. Eigenfaces (10%)
To begin, construct a matrix by unrolling each aligned and cropped
face image into a separate column (consult help reshape in
MATLAB). The final matrix should have #rows = #pixels in each image
and #cols = #subjects. Perform "whitening" by subtracting the average
face from each image before it is placed into the matrix.
Apply SVD to this matrix of face images (consult help svd in
MATLAB, especially the svd(X,O) syntax).
Include in your write-up a plot of the singular values and show the
five most significant eigenfaces as images. Use a visualization of
your choice (simple mapping to grayscale is fine).
Project each face image onto the first two principal
components. Hint: these are stored in V^T, but can also be
computed as the dot product of a face image organized into a vector
and each principal component. Now you have two coordinates associated
with each face image in your database. Create a scatter plot of these
2-D points (consult help scatter in MATLAB).
Answer the following questions:
- The eigenfaces encode the dominant sources of variation in your
dataset (e.g., absence/presence of facial hair, skin tone, glasses,
etc.). What do the eigenfaces you computed correspond to?
- Using the plot of singular values as a guide, choose a number of
principal components that you would use to form a compact and
accurate representation of your face database. Justify/explain your
decision. What is the total approximation error introduced by using
this number of components in place of the original images? What is
the total amount of space required to store this compressed
representation? What is the compression ratio (i.e., ratio of
compressed size to original size). Show your work for these
calculations.
Submitting
This assignment is due Wednesday, March 2, 2011 at 11:55 PM. Please
see the general notes on
submitting your assignments, as well as the late policy and the collaboration policy.
Please submit as one single ZIP file:
- Your face
detection code (as one or more .m files).
- The various
visualizations and plots described above.
- A WRITEUP.html file
that contains all of the images and visualizations you produce, along
with a description of your experiments with different parameters, and
any relevant implementation notes. This file should also include your
answers to the various questions posed above. This is extremely
important. Submissions that do not include a WRITEUP.html file will
receive a grade of ZERO.
Note that programming in Matlab is not an excuse to write unreadable code.
You are expected to use good programming style, including meaningful variable
names, a comment or three describing what the code is doing, etc. Also, all
images you submit should have been saved with the "imwrite" function - do
not submit screen captures of the image window.