Markers are typically thought to describe what and where for task dependent objects. So, for the note taking task, the PA system will have a marker whose what is writing implement and a marker whose what is writing surface. The selection of an object which can fulfill the marker's role is the job of the marker's identity component. Identity is divided into two not-necessarily distinct routines, Locate and Track. Locate is used to initially instantiate the marker, while Track is used to update the marker's position thereafter.
When trying to instantiate the writing implement marker there may be a great many objects which can fullfil the marker's role. Pens, pencils, crayons, etc. can all be used to write. However, many writing tasks are more structured than taking a phone message. One would probably not write a business letter with a crayon. In fact, one might only use ones favorite pen for the task. Our system does not currently do any functional recognition , so the identity routines are used to instantiate markers. However, if the writing implement marker's identity was set to pencil, any pencil fitting the marker's description can be selected as any pencil will fulfill the marker's role in the task.
Another important marker component to be considered is the where component. This component specifies the location of the object associated with the marker. Our system stores positions in ego-centric polar coordinates. A marker's where guides the perception of the PA system. If one wants to select a specific pencil from a group of indistinguishable pencils, the writing implement marker needs to be given an appropriate where (such as "the one on the end") as well as an identity of pencil. Also, a marker's where tells the PA system when it can expect to instantiate a marker. If the writing surface marker's where is "out in the kitchen in the top drawer of the desk". The PA system need not run the marker's Locate routine until the agent has gone to the kitchen and opened the top drawer of the desk.
Once a marker has been instantiated, the PA system tracks its position using both sensory and proprioceptive data. A marker's estimated position, in ego-centric polar coordinates, is calculated through a "hypothesis and confirm (if possible)" process. The hypothesis is created from the previously stored position and proprioceptive data, e.g. the most recent values read from the agent's drive shaft encoders. If the hypothesized position should be visible in the current absolute field of view, the agent looks for a correspondent in the current image. If a correspondent can be found, the object's image position (based on its azimuth and elevation in the image with the assumption that all objects lie on the ground) is stored instead of its expected location.
If the hypothesized position should not be visible, or no correspondent is found, then that hypothesis becomes the current position. Visual correspondence is done by finding the ground/non-ground boundary, or groundline (Horswill 1993). Vertical discontinuities in this line represent the edges of objects. Each segmented image region is analyzed by each marker's track routine. Matching of markers to objects is based on results of the match determined by track and the distance between this object's position and the position stored in the marker.
Visual correspondence can be performed by any routine which the designer places in the marker's Locate or Track routine. However, routines of arbitrary complexity may degrade PA system performance to unacceptable levels. Currently, we use several different identity routines, including vertical discontinuity detection and histogram intersection (Swain and Ballard 1991) with a stored model and the segmented image regions produced by the groundline.
A final aspect of markers is their action component. This specifies the action which the agent wishes to take with this object. For the note taking task, the action on the writing implement marker would be write. Many actions, such as write, require more than one object. The writing surface marker represents the actual surface on which the note taking will be done. Clearly, the write action requires the agent to not only have selected the writing implement and writing surface, but to be aware of their positions as the action takes place. Once the note taking task is complete, or if it is interrupted, the close connection between the writing surface and the writing implement (as identified by their respective markers) is no longer needed. Both markers can be dropped from the PA system's representation. We call the markers for the collection of object required to execute a PA system level action dependent markers. Each marker has a component which identifies which other marker's in the PA system are in its dependent set.
This discussion has been primarily about what a marker is and what it contains in our work. Others have described markers in the psychology literature (Pylyshyn and Storm 1988) (Attneave and Farrar 1977). Ullman uses the term marking in his discussion of visual routines (Ullman 1984). Agre and Chapman (Agre and Chapman 1987) use markers to identify object's relevant to the Penguin's current task. These markers are not instantiated or maintained by vision as in our system. The positions of the marked objects come from an "overhead" view of the scene. Brill (Brill 1996) uses markers in dynamic 3D domains, from a 1st person viewpoint, with occlusion. My work extends this by further developing the data contained in a marker, as well as the concept of uninstantiated markers. These ideas allow markers to be used as the communication mechanism between the PA system and the remainder of the agent architecture.
Read about our agent architecture.
Attneave, F. and Farrar, P. 1977. The Visual World Behind the Head. American Journal of Psychology 90(4): 549-563.
Brill, F.Z. 1996. Representation of Local-Space in Perception/Action Systems: Behaving Appropriately in Difficult Situations. Ph.D. Dissertation, Department of Computer Science, University of Virginia.
Horswill, I. 1993. Polly: A Vision-Based Artificial Agent. AAAI-93: 824-829.
Pylyshyn, Z.W. and Storm, R.W. 1988. Tracking Multiple Independent Targets: Evidence for a Parallel Tracking Mechanism. Spatial Vision 3(3): 179-197.
Swain, M.J. and Ballard, D.H. 1991. Color Indexing. International Journal of Computer Vision 7(1): 11-32.
Ullman, S. 1984. Visual Routines. Cognition 18: 97-159.