Markers Elucidated and Applied
in Local 3-Space

Frank Z. Brill (brill@virginia.edu)
Worthy N. Martin (martin@virginia.edu)
Thomas J. Olson (olson@csc.ti.com)

Computer Science Department
University of Virginia, Thornton Hall
Charlottesville, VA 22901

Phone: (804) 982-2225
FAX (804) 982-2214

Abstract

The term ``marker'' as used in the active vision and reactive planning literature is poorly defined. The underspecification of the marker concept limits the usefulness of this important tool for building perception and action systems for autonomous robots. This paper refines the concept of markers, both at the level of their role in perception-action system, and the the level of the representations used in their implementation. Our definition encompasses markers in dynamic 3D environments, which are retained even when not in the current perceptual input, either due to occlusion, a limited field of view, or sensing errors. We develop methods for maintaining the representation in the face of these difficulties. The marker concept is further elaborated by a categorization of marker types we have found useful in our research. Relationships among markers are also discussed: relationships which potentially form a link between classical and reactive planning. Finally, we describe a marker-based implementation of an autonomous agent performing search and avoidance tasks in a three-dimensional virtual environment.


Table of Contents

1. Introduction
2. Related Work
3. Markers
3.1. Definition
4. Categorization of Markers
4.1. Instantiation source (top-down vs. bottom-up)
4.2. Primary goal markers vs. dependent markers
4.3. Coordinate system (image vs. egocentric 3D)
4.4. Tentative markers
4.5. Hypothesized object markers
5. Marker maintenance
5.1. Compensating for ego-motion
5.2. Correspondence for visible markers
5.3. Perception overrides memory
6. Implemented system
7. Conclusions
8. References
Summary Page

1. Introduction

For agents using marker-based representations, the environment is represented via markers on task-relevant objects. An agent is endowed with a limited set of these markers, each of which is a small storage structure that contains information about a single object in the environment. The information stored in the set of markers is the primary input to a reactive planner responsible for immediate action determination. This approach has been successful in 2D environments [1,5].

This paper elaborates the concept of markers, and extends the marker-based approach to representation of local space in dynamic 3D environments. Occlusion and a limited field of view in 3-space prevent straightforward application of a 2D system. Important problems addressed in this paper include allocation of markers based on images of a 3D environment, determination of the coordinate system in which the markers are located, maintenance of the markers in a dynamic environment, maintenance of markers on objects that are occluded or otherwise outside of the field of view, and mediation among multiple goals to make decisions about what action to take based on the state of markers in three-space.

After discussing related work in section 2, section 3 defines and elaborates the marker concept. Section 4 is a categorization of markers, with special attention to types that arise in three dimensional contexts. Although all markers have the function of retaining what and where information for an object, markers can be further classified into categories based on their more specific functions. Section 5 deals with the problem of maintaining the markers in the face of complications such as movement of the marked object, movement of the agent, markers passing outside the field of view, occlusion, etc. Section 6 describes an implemented autonomous agent that operates in a three-dimensional virtual environment based purely on visual input, and uses many of the concepts described in the previous sections of the paper. This agent is modelled after a small creature such as a rabbit that searches for food and must also avoid predators. Section 7 finishes with some conclusions drawn from our work.


2. Related work

Agre and Chapman introduced deictic, or ``pointing'' representations to the reactive planning literature [1, 2]. They implemented an agent that used deictic representations to play a video game called Pengi. Rather than identify and label all the objects in the game, as would be the case in the classical planning paradigm, markers were placed on only the nearest (or otherwise most task-relevant) items. The markers served as input to reactive-style planning circuitry. Agre and Chapman addressed the problem at the level of intermediate vision, and did not attempt to address problems such as occlusion and the underconstrained nature of early vision. Also, markers were placed only on currently visible objects; no state was retained concerning off screen items [5].

Maja Mataric developed the use of maps in a reactive planning context [10]. Sonar sensors and a low-resolution digital compass were used to identify landmarks and construct a topological map of the environment. The map enables the robot to navigate to locations in the environment as directed by a human. These maps are useful in navigating large-scale space, but are fundamentally different from the local-space representations of our research. The local-space markers are metric, in that they identify the locations of objects relative to the agent in a low-resolution coordinate system, whereas the Mataric maps are primarily topological, e.g., they record relationships like ``to the left of.'' Topological maps are useful for navigating the large-scale space, while local representations are used for coping with the immediate surroundings. In this way, the two representations are complimentary.

Andrew McCallum conducted experiments with deictic representations in a virtual environment [11]. He constructed a ``go-cart'' simulator, for virtual driving to investigate the role of deictic visual behaviors. His work is primarily concerned with the use of the fovea as a marker, and the learning of the visual behaviors. Spatial memory is limited to simple left-right distinctions, whereas we are concerned with higher resolution maintenance of multiple markers.


3. Markers

We begin the discussion of our use of markers in a three dimensional context by defining our use of the term ``marker.'' Once the term is defined, we briefly discuss our rationale for the use of these markers via some illustrative analogies. Our treatment of markers is similar to that in the original work of Agre and Chapman, but stresses the three dimensional aspects that are new to our research.

3.1. Definition

A marker is a data structure that contains two primary pieces of information: what an object is in terms of its role in the current task, and where the object is in an egocentric coordinate system. By what, we do not mean the marker contains a complete, objective, geometrical description of the object. Rather, the marker identifies an object relative to the current task being performed by the agent. One of the most important aspects of markers is their task dependence. Accomplishing a task in the world requires certain items be used: markers provide the place holders for these items. For example, the task of pouring a bowl of cereal requires two items: a bowl and a box of cereal. There may be many bowls and boxes of cereal in the cupboard, but we need only one of each. By placing a marker on one bowl and one box, the of pouring task need only refer to the bowl and the box. The marker need not contain descriptive information such as the color or the bowl or the brand of the cereal, since they are not relevant to the current task.

Similarly, by where, we do not mean the object is located in some objective external coordinate system. The purpose of marker-based representations is to enable to agent to interact with its immediate surroundings, so the appropriate coordinate system is egocentric. We will now flesh out our definition with analogies we have found to be useful in our understanding of markers and their function. These analogies will also provide motivation for our use of markers.

Marker as pronoun

As the above definition makes clear, the primary analogy is that markers are reference pronouns augmented with hand gestures. For example, if a person in a classroom is asked ``who is the teacher?'' the answer might be ``Professor Jones,'' or it might be ``her,'' (with an associated finger point). The latter response is in a marker mode.

Marker as register

Markers can be thought of loosely as registers that contain information regarding important objects. To extend this analogy further, we think of the physical world as the ``main memory'' of an agent, and the determination of the objects to mark as a register allocation problem. A multi-agent environment is analogous to a multiprocessor system, with each of the other processors capable of modifying main memory, i.e., the state of the world. We therefore have a ``cache coherence'' problem with which to contend. Just as the data in a cache can fall out-of-sync with main memory, data in the markers can fall out-of-sync with the state of the world.

In multiprocessor systems, cache coherence is handled by having writes to main memory be accompanied by broadcast messages advising all processors to invalidate their cache. No analog for such a broadcast exists in the physical world, so an agent acting on the basis of internal state always runs the risk of acting on stale data, which is part of the original motivation for the reactive approach (i.e., no internal state reduces the risk). The Agre and Chapman environment updated the state of the markers automatically once they were placed; thus their environment had the equivalent of the broadcast message that not only invalidated the old data, but updated them, thereby solving the cache coherence problem.

The real world does not have such a broadcast mechanism, so agents must rely on other strategies to synchronize their internal state with the environment. A realistic treatment of the marker maintenance problem is a significant contribution of our work. Fortunately, stabilities and regularities in the physical world mitigate the problem somewhat, so by limiting the amount of internal state and monitoring the environment periodically to update that state, we sharply decrease the risk of acting on stale information.

The locations of some markers can be monitored visually, making marker maintenance for the visible markers amount to visual tracking with multiple targets. Maintenance of markers outside the field of view is rather more difficult. Use of an egocentric coordinate system requires that the marker locations be updated as the agent moves. One of the unique features of our system, as compared to other reactive agents, is use of spatial memory for objects not currently visible, either due to occlusion or limited field of view.

Marker as parameter

Another helpful analogy is to think of the relationship of markers to tasks as like the relationship of parameters to subroutines. We can think of the cereal pouring task mentioned previously as a subroutine that takes two parameters: one of type BOWL, and one of type CEREAL_BOX. We use the term task to refer to the action to take in the world, and task-agency to refer to the hardware and software (or subroutine) that is to accomplish the task. There are important differences between subroutines and task-agencies. Unlike subroutines, task-agencies are active and opportunistic. By ``active,'' we mean that they can ``look for'' their markers/parameters, e.g., by pointing directional sensors appropriately, by launching the appropriate visual routines in the intermediate vision system, or by stimulating other agencies to perform actions likely to result in the appropriate world objects being marked. By ``opportunistic,'' we mean that if a task-agency has its inputs available, whether or not they became available due to the direct actions of the task-agency, it can use those inputs to accomplish its task. A complete autonomous agent may have several tasks to accomplish, each with its respective agency for accomplishing it. Any number of these task-agencies may be active in parallel at any given time.


4. Categorization of markers

Although all markers have the ``what and where'' properties given in our definition, markers can be logically divided into several categories based on their general role in a task, the way they are instantiated, their coordinate system, and other relevant attributes. This section gives a categorization of markers based on distinctions we have found useful in our work. One might think of the concept of a marker as a class, and this section as describing common subclasses.

4.1. Instantiation source (top-down vs. bottom-up)

One distinction that can be made among markers is based on how they are instantiated. Some markers may be created because the perceptual system has detected something in the sensor input that requires a marker. Other markers are placed due to the direct actions of an active task-agency. This is the bottom-up/top-down distinction familiar to the perception community. The significance of this distinction among markers in our work is how these markers relate to tasks.

Recall that our agent architecture is a collection of task-agencies, some of which may be active at any given time. When relevant percepts are detected, a bottom-up marker is instantiated. This will cause one or more task-agencies, which may not have previously been active, to become active, with the new bottom-up marker as one of the agency's inputs. Once an agency is active, it may require other inputs in order to carry out its task. The agency directs the perceptual system to carry out activities that will instantiate markers, this time top-down markers, that serve as the agency's other inputs.

Consider the sample domain of a small creature that must avoid predators. This creature has a run-away-and-hide agency to coordinate the predator avoidance task. Whether this agency is active or not, a ``predator-detector'' is always scanning for predators over the entire field of view. If one is found, a bottom-up marker is instantiated and placed on the predator's location. The instantiation of the predator marker activates the run-way-and-hide agency, which directs the creature to start looking for a place to hide. This might be a rock or a tree, but the object's role in the current task is as a ``hiding-place,'' so when a suitable object is found, a top-down marker is instantiated to serve as the ``hiding-place'' input to the ``run-away-and-hide'' agency.

Note that both the top-down and bottom-up markers are task-oriented, even though the bottom-up marker wasn't instantiated directly by a task-agency. The creature would not need a predator detector or the resulting marker if there were not a run-away-and-hide task-agency that needed a predator location on which to base its actions. Another thing to note is that the hiding-place marker is put on a rock or tree, objects that would not normally be marked if it were not for the task-agency. Put another way, exactly the same stimulus (say, a rock at some relative location) may or may not be marked, depending on the internal state of the agent, i.e., the activation of its task-agencies. To summarize the distinction: the placement of bottom-up markers is stimulus driven, whereas the placement of top-down markers is task driven.

4.2. Primary goal markers vs. dependent markers

The markers discussed thus far are placed on explicit primary goal objects in the environment. For example, a rabbit's goals are to find food and avoid predators, so berries and predators are marked. A navigation task-agency is activated by instantiating an appropriate destination marker. However, it may not be possible to proceed directly to the primary destination, due the presence of an obstacle. In general, complex domains necessitate establishing goals subsidiary to the primary goals.

The navigation task-agency, rather than blindly moving towards the destination, launches perceptual machinery to find any obstacles to its given destination. If an obstacle is found, it is marked with an obstacle marker. Obstacle markers are always top-down markers, since an object is only an obstacle to the extent that it is in the path to a destination. We say that obstacle markers are dependent on the associated primary destination, so that if the destination is ever dropped or changed, the obstacle marker is dropped as well.

If an obstacle marker is instantiated, the navigation agency looks for a path around the obstacle. When an appropriate location is found, the navigation agency instantiates an intermediate-destination marker which is dependent on the obstacle marker. The navigation agency guides the agent towards the intermediate destination just as it would any other destination. Reaching the intermediate destination triggers a reevaluation of the ``obstacleness'' of the associated object with respect to the primary destination.

As the domain becomes increasingly complex, this pattern can be extended to hierarchies of markers. These hierarchies bear resemblance to the classic concept of a partially ordered plan [7], with dependent markers analogous to steps in the plan. Further research is needed to determine the full relationship of marker hierarchies to partially ordered plans. Marker hierarchies may potentially bridge the gap between reactive and classical planning, at least for navigation tasks.

The entire process--searching for an obstacle, creating intermediate destination markers, and moving to the intermediate destination--repeats until there is no longer an obstacle in the path to the original destination marker, and the destination location is reached. The combination of destination, obstacle, and intermediate destination markers is sufficient for most simple navigation tasks.

4.3. Coordinate system (image vs. egocentric 3D)

Another distinction among markers can be made based on the coordinate system in which they are located. In previous work with markers this was not an issue, since there was only a single coordinate system: image coordinates. However, in a realistic visual environment, the natural coordinate system for visual perception (image coordinates) is not the natural coordinate system for acting in the environment (egocentric 3D coordinates). In a 2D video game environment, the perception and action coordinates can be confounded, whereas in a 3D environment, they cannot. This gives rise to the two coordinate systems, and hence, two types of markers.

Image coordinate markers are placed on objects in the current image, and are used to facilitate visual routines. They mark objects such as edges, lines, terminations, etc. They may either be bottom-up or top-down. Bottom-up image-coordinate markers make up a set of items similar to Marr's raw primal sketch [9]. Top-down image-coordinate markers correspond to the markers in Ullman's visual routines [12].

Egocentric 3D markers are the focus of our research; all markers discussed in the remainder of this paper will be in egocentric 3D coordinates. When the visual system has processed the input enough to determine what and where a task-critical object is in the environment, a marker is instantiated. The natural coordinate system for acting in the world is egocentric, and furthermore, is centered on the effector taking the action. This idea is well known in the active vision community [3]; e.g., if an effector is a manipulator grasping an object, the correct direction to move the effector is given by the signs of the coordinates of the object in the manipulator coordinate system. We are using markers for navigation in a robot that accepts velocity commands in the form of a forward speed and a turn speed. Clearly, the natural coordinate system for such a robot is polar, centered on the robot, with the zero angle directly ahead. When this is the case, the direction to turn to get to a destination at coordinate (r, q) is given by the sign of q, the amount to turn is given by the magnitude of q, and the distance to move is given by r.

4.4. Tentative markers

A single image may contain evidence for the existence of an object; however, that ``evidence'' may simply be an artifact of the noise in the imaging process. If we immediately mark an ``object'' without retaining some measure of our confidence of the object's existence, the control system will act upon the marker as if there is no doubt as to the existence of the object, even if the evidence is scant. We attach a confidence measure to markers indicating estimated probability of the object's existence, based on the evidence thus far. While there may be a continuum of ``probabilities,'' we adopt the simplification of categorizing markers into two classes based on their confidence measure: those we intend to act upon, and those we do not. Tentative markers are placed on objects for which there is some evidence, but which we do not intend to act upon until more evidence for their existence is obtained.

Our intent with the tentative markers is to filter out noise in the perceptual input, so one useful measure of confidence is whether the percept is stable over a sequence of images. One implementation of a confidence measure can therefore be a count of the number of times the object has been found in the image stream at the expected location. When first seen, an object is given a tentative marker, but when the number of frames in which it has been seen crosses some threshold, the tentative marker is replaced with a regular marker. This implementation is especially effective coupled with an ego motion induced change in viewpoint. If the viewpoint changes between consecutive frames, the location of the potential object's projection in the image will change. If the ego motion is known, the new location of the projection in the next image can be predicted. It is highly unlikely that noise will move in the image in a way that is consistent with the ego motion, so the tentative marker on noise will be dropped.

4.5. Hypothesized object markers

In some cases, especially in a 3D environment with occlusion, it is necessary to reason about objects that might exist, even though they aren't currently visible, or possibly haven't even been seen yet. For example, consider a rabbit in the presence of rocks and predators. Since a predator might be behind any given rock, the rabbit could consider the possibility that such a predator exists. In fact, the rabbit might behave as if there were a predator behind every rock. We can induce this behavior by hypothesizing the existence and location of a predator, and placing a hypothesized object marker labelled ``predator'' at that location. In some cases, the desired action to take depends on whether the object is actual or hypothesized, whereas in others it isn't necessary to make a distinction. Our rabbit, once it hypothesizes the existence of a predator behind a rock, may behave exactly as if there were an actual predator. The marker update procedure will treat real and hypothesized object markers identically.


5. Marker maintenance

In our control system, at every instant of time an action is selected for execution based on the information about the state of world currently available. In a purely reactive system, the available information about the world is limited to the current sensor values only. We expand on the reactive model by including the state of the markers as additional information. The advantage of this approach is that the agent is better informed when determining an action, since it can now remember a portion of the world around it. A disadvantage of this approach is that it admits the potential for the representation to be incorrect with respect to the actual state of the world. We have found that an incorrect representation is worse than no representation at all, since it encourages the agent to confidently take an action that is not at all appropriate in the current circumstances. Brooks reports that adding representation to an agent usually decreased the competence of the robots [4]; we believe his observation is a result of this phenomenon. Representation is not intrinsically bad--it's only bad if it's wrong. It is therefore critical that the representation be maintained accurately, or discarded.

5.1. Compensating for ego-motion

When the agent moves, updating the markers requires estimating the coordinate transformation between the agent's previous and current locations. It is then a simple matter of applying the transformation to all the marker coordinates. We distinguish short term and long term approaches to estimating the transformation. Short term methods estimate the transformation continuously (or as near to continuous as possible), but have a tendency to drift from the actual transformation over time due to accumulated error. Long term methods are used periodically to correct the drift.

There are three main short-term methods: dead-reckoning, acceleration measurement, and optical-flow based methods. Dead-reckoning assumes that all motion of the autonomous agent is due purely to the motor commands given by the agent itself. Since the agent knows the motor commands issued, the expected motion can be calculated directly. However, motor commands are often not executed perfectly, and forces external to the agent can influence the motion. If the motor commands are accurate and well characterized, this is the easiest option to implement. Another method is to measure the acceleration continuously, and then integrate twice to determine change in position. Measuring the acceleration enables the marker locations to be updated even if the motion of the agent is not purely the result of the agent's motor commands. Optical flow can also estimate the transformation under these conditions. An ideal system will use a combination of all three methods.

The primary long term method is to locate several points in the environment, and then solve a structure from motion problem. However, this option requires a correspondence problem be solved. (Another possibility might be to use a ``global positioning system,'' if it's an option in the domain.)

5.2. Correspondence for visible markers

Marked objects are not featureless points; they have perceptual properties that allow them to be identified. Those same properties can be used to re-identify them later. Clearly, to do this we must retain some of the perceptual information associated with the marker from frame to frame, i.e., primary visual cues, e.g., the color and size of a berry. The obvious place to keep this information is in the marker.

Our definition of ``marker'' required only two elements: a role in a task (what), and a location (where). This does not mean that these are the only two pieces of information allowed in the marker. In order to maintain them, markers must also retain some information about the perceptual aspects of the objects they mark. We might say that markers must have the ability to ``find themselves'' in the perceptual input, and will therefore retain any additional information necessary to do so.

Given markers with this additional information, the general marker maintenance procedure at each time frame has two steps. First, all markers ``find themselves'' in the input. Second, any objects in the new input that aren't already marked can potentially instantiate a new bottom-up marker.

5.3. Perception overrides memory

As we have discussed, memory for the location of objects runs the risk of being incorrect, and acting on incorrect information is potentially hazardous. Information derived from current perceptual input is more reliable than that stored in memory, and therefore overrides the information in memory. In a first cut at an implementation of this policy, whenever a marker cannot find itself in the current input, the marker is dropped. But this policy would drop any marker outside of the current field of view, so we amend it to dropping any marker we expect to find in the input but don't. For visual input, the two main cases in which we don't expect to find the marked object are when the object is outside the field of view, and when the object is occluded. The field of view case can be detected by knowing the field of view of the sensor and comparing it with the location stored in the marker. The second case requires additional visual processing; one must determine that there is some object along on the azimuth towards the marker, but is nearer than the marked object. The visual routine that performs this occlusion computation is only executed when a marker indicates that an object should be in the field of view, but no visual cue for the object is found. Note that this policy enables the maintenance and deletion of hypothesized object markers without any special machinery.


6. Implemented system

We have implemented a marker based control system that illustrates many of the concepts developed above. Using virtual reality rapid prototyping software developed at UVa [6], we created an environment with trees, rocks, berries, and walls. An agent survives in this environment by eating berries, and avoiding obstacles. Furthermore, the environment contains a predator which actively seeks out the agent. The agent has the concurrent goals of finding food, avoiding obstacles, and avoiding the predator. This section describes some details of this agent. The food finding behaviors and marker-based obstacle avoidance behaviors are fully implemented, and the predator avoidance behavior is near completion.

Figure 1: Views of the agent's test environment

The simulation system is a separate process, running on separate hardware from the agent control software. The only information that passes from the simulation system to the agent is a sequence of images of the environment taken from the viewpoint of the agent. The image is 160x120 pixels, and the color is quantized to 64 colors, resulting in a series of images such as those shown in Figure 1.

The agent's control process decides upon an action based on the images, and sends velocity commands to the simulation system, which updates the display based on the new location of the viewpoint. The agent and simulation are completely asynchronous, and run in real time at 10-15 frames per second.

Object locations are determined based on azimuth and elevation in the visual field. The system attempts to keep the four closest berries and any nearby predator marked at all times. The markers are maintained by using dead-reckoning based on the most recent velocity commands to estimate the expected new position of the object. If the expected position is in the image, then the correspondent is found in the image if possible.

The strategy for the agent is to head for the nearest berry, which may not be currently visible, due to occlusion or the limited field of view. We have to carefully follow the ``perception overrides memory'' doctrine with this agent. Due to noise in the image caused by severe color quantization some erroneous markers are created. Markers also occasionally drifted due to imperfections in the dead-reckoning procedure.

The problems with erroneous markers were overcome by having the agent drop any markers that were supposed to be in the field of view, but didn't have any perception corresponding to them. So for example, if a berry marker is maintained to the left of the agent, the agent turns to the left to look at it. If no berry is seen to the left, the marker is dropped. This behavior serves to allow the agent to use the markers that are accurate, without being led too far astray by incorrect markers.

The strategy described above must be augmented to notice the occlusion of marked objects and behave appropriately. The technique we used is similar to Horswill's obstacle avoidance routine [8], in that it relies only on being able to segment the ground from non-ground (i.e., obstacles). To determine if a marked object is occluded, points on the ground/non-ground boundary are converted to egocentric 3D coordinates and compared with coordinates of the marker. If the ground boundary points in the direction of the marked object is strictly nearer than the marked object, then the object is assumed to be occluded. Occluded markers are not dropped, but now we need a strategy for navigating around the obstacle. The original behavior, which was to go directly towards the nearest marker, obviously will not work for occluded markers. We have implemented the navigation algorithm described in section 4.2 using the target berry's location as the destination marker.

To determine the location of the obstacle marker, the ground/non-ground boundary in the region of the goal marker is analyzed for sharp jumps that indicate the edges of the obstacle. Simple geometric reasoning is used to determine the shortest distance around the obstacle, and the obstacle's edge is marked. An intermediate destination marker is then instantiated, and the agent is directed to move towards the intermediate destination, thereby circumventing the obstacle.


7. Conclusions

The representations of local space presented here have been successfully applied in a navigation system, measurably improving the agent's behavior. By extracting and representing only those elements of the environment necessary to accomplish given tasks, the perception and reasoning computations become manageable. We have shown that useful metric representations of the local environment can be acquired in real-time, maintained adequately, and used to improve performance in navigation tasks. Techniques for accomplishing this level of competence were outlined, and an implementation was described. A general lesson learned from the experience of implementing the system is that modest memory structures derived from task oriented goals qualitatively improved the performance of otherwise purely reactive systems, and that marker based mechanisms are effective and efficient task-oriented memory structures.


8. References

  1. Agre, P. E., and D. Chapman, 1987. ``Pengi: an implementation of a theory of activity,'' AAAI-87, pp. 268-272.

  2. Agre, P. E., and D. Chapman, 1990. ``What are plans for?'' Robotics and Autonomous Systems, 6, pp. 17-34.

  3. Ballard, D. H., 1989. ``Reference frames for animate vision,'' Proc. IJCAI-89, pp. 1635-1641.

  4. Brooks, R. A., 1990. ``Elephants don't play chess,'' Robotics and Autonomous Systems, 6, pp. 3-15.

  5. Chapman, D., 1991. Vision, Instruction, and Action, The MIT Press, Cambridge, Massachusetts.

  6. Conway, M., R. Pausch, R. Gossweiler, and T. Burnette, "Alice: a rapid prototyping system for building virtual environments", Proc. ACM CHI'94, pp. 295.

  7. Georgeff, M. P., 1987. ``Planning,'' Ann. Rev, Comput. Sci., 2, 359-400.

  8. Horswill, I., 1993. ``Polly: a vision-based artificial agent,'' Proc. AAAI-93, pp. 824-829.

  9. Marr, D., 1982. Vision, W. H. Freeman and Company, New York

  10. Mataric, M. J., 1992. ``Integration of representation into goal-driven behavior-based robots,'' IEEE Trans. on Robotics and Automation, 8(3), pp. 304-312.

  11. McCallum, R. A., 1993. ``Overcoming incomplete perception with utile distinction memory,'' Proc. 10th Intl. Mach. Learning Conf.

  12. Ullman, S., 1984. ``Visual routines,'' Cognition, 18, pp. 97-159.


Markers Elucidated and Applied
in Local 3-Space

Summary Page

1. What is the original contribution of this work?

This paper represents the first work to demonstrate that marker-based representations can improve performance in 3D domains. The 3D virtual environment used has several real-world properties, each of which are interesting research problems:

  1. Unarticulated visual input (pixels, not predicates)
  2. Occlusion
  3. Limited field of view
  4. Dynamic environment with real-time considerations
  5. Arbitration among multiple goals

All of these issues are important topics of research, and all are present in the simulation that forms the platform for this research. A fully implemented system is described that details algorithms for instantiating, maintaining, and deleting markers in pursuit of a given goal. The marker-based system measurably improves the performance of an autonomous agent in an environment with these complicating properties.

The paper develops and elaborates a vocabulary for marker-based systems. Several techniques using various ``types'' and hierarchies of markers are described. A classification of markers is constructed to facilitate this description.

2. Why should this contribution be considered important?

Classical and reactive planning have complementary strengths and weaknesses. Autonomous agents operating in real physical environments using visual perception must combine the strengths of both approaches to successfully perform useful tasks. Marker-based representations have the potential for bridging the gap between classical and reactive planning. The work described in this paper demonstrates the viability of the marker-based approach in three-dimensional environments.

3. What is the most closely related work by others and how does this work differ?

This work is related to the original work in markers conducted by Agre and Chapman in 2D environments. While Agre and Chapman only address points d and e above, this paper represents the first marker-based system that addresses points a, b, and c above.

4. How can other researchers make use of the results of this work?

This system architecture and the techniques it uses are suitable for incorporation into mobile robotics systems. Certainly, the low-level perception and manipulation components must be replaced by appropriate real-world analogs, but the core architecture is fully transferrable to physical robots.


Maintained by Frank Brill (brill@virginia.edu)
Last modified: Fri Aug 18 13:48:55 1995