Computer Vision
• Final Project: Headtracked Mouse Interface
Due: December 18
Code
Presentation

- We implemented a system that would allow a person to control his/her mouse without any input other than head movements. This would useful in situations where a user may not be able use his/her hands to control a mouse or in situations where a user may not need to use a mouse very often. Our first goal was to successfully track a users head in a passive way. We implemented the algorithm from [1] to perform our headtracking portion of the system. As an initialization step a user is asked to place four points on a picture of his/her head. This allows us to extract the ellipse that bounds the user's head. We then perform an update at each frame whereby we attempt to estimate the new position of the ellipse given a new input image.
- At each frame we search a neighborhood of new ellipse parameters to determine the new best fit to the next frame in the sequence. Instead of using the current estimate as the neighborhood's center, we compute a velocity estimate based on the previous two frames, and use that as our neighborhood center. This is analogous to the approach in [2]. For each new set of ellipse parameters in the neighborhood we compute two terms that we sum together to find the new estimate of the ellipse's parameters. We use the gradient along the boundary of the ellipse and the color histogram of the interior as in [1].
- Once we have recovered the new ellipse parameters for the new frame, we use a template-based approach to determine the wink state of the user. At initialization the user is prompted to perform a wink and a non-wink to be captured as templates. These are then used to determine the wink state.
- Finally, the mouse is moved to appropriate location and clicked based on the wink state.
Gradient Module
- For each frame, we compute the gradient of the entire frame in both x and y. We perform this operation using the convolution of the image and the derivative of a gaussian. In general the parameters of the guassian can be set to whatever is appropriate, but from previous work it was found that a σ of 1.5 and a width of 10 was appropriate in general settings. The ellipses boundary is then discretized into a predetermined number of samples. We found 100 samples to be appropriate in most cases, although sometimes it was often to increase this number to 150. For each sample, the magntiude of the dot product between the gradient and the ellipse's normal is computed. [1] suggests that the dot product approach yields better results, although we did not confirm this ourselves. Once all of these terms have been computed, they are summed and normalized. As a last step, the gradient module is converted to a percentage so that it can be added to the color histogram module.
Color Histogram Module
- We again follow the strategy of [1] to compute the color histogram on the interior of every set of ellipse parameters. For each ellipse interior, we compute the 3D histogram in B-G, G-R, R+G+B color space. Using this color space helps to eleviate the necessity of strong colored background textures. At the initialization step, we compute the color histogram of the user-defined ellipse and store it as a reference. Then at every possible set of ellipse parameters, we compute the color histgram and take its intersection with the reference. We use the intersection method described by [1], which becomes:
- As in the gradient module, we convert this term to a percentage so that it can be combined with the gradient module. We then simply find the ellipse parameters that maximize the sum of the gradient module percentage and the color histogram percentage.
Wink Detection
- At initialization, the user is prompted to produce a wink and non-wink face. These are then stored away for use as references. Once the face's ellipse has been found via the previously explained method, the SSD is computed between the image that is contained within that ellipse and each of the reference wink images.
- We also used a histogram intersection approach to compare the references with the current estimate.
Webcam Interoperability
- Although relatively easy, it was a bit of a challenge to interopt with the webcam in a real-time fashion. We ended up using MATLAB's webcam interface to constantly feed frames to our algorithm from a webcam attached to a local USB port. We also attempted to use OpenCV to read the images from the webcam, but this ended up being less fruitful than the straightforward MATLAB code already available.
Results
- Here is a video of a head being tracked:
- Sorry for all of the flashing, but it seems to be a biproduct of recording movies in MATLAB.
- The mouse movements work well, but unfortunately it is difficult to make a movie that visualizes this. During the presentation we had a live demo. If you have the image acquisition toolbox for MATLAB and a webcam you can run the code included to reproduce our results.
Conclusions
- Our headtracking algorithm worked very well. The implementation of [1] resulted in very accurate tracking of the head as long as movement was not too abrupt. If movement was too abrupt, the headtracker's search neighborhood would not coinicide with a true face, and would usually result in a runaway ellipse. In this case, we simply re-intialize the ellipse to the center. The wink detection was another story. The wink detection did not work well. However, we found that taking only a small subset of the reference images and current face image was more accurate, especially around the eyes. In addition, we attempted to compare histgrams rather than just SSD, but this provided no gains because the colors in the faces were not changing drastically enough.
- The end result of the project is a very strong headtracker with a not-so-great wink detector. We believe that with a bit more time and consideration we could improve the wink detector by first determining eye positions and then using the color histogram method on only those pixels. This should work because a closed eye will be a flesh tone, whereas an open eye will have white and black in it.
References
[1]
S. Birchfield. Elliptical Head Tracking Using Intensity Gradients and Color Histograms. In IEEE Conference on Computer
Vision and Pattern Recognition,
June 1998.
[2]
S. Birchfield. An Elliptical Head Tracker. In Proc. of the 31st
Asilomar Conf. on Signals, Systems and Computers, 1997.
© 2007 Sean M. Arietta
University of Virginia