Index

Title: Planning under time constraints in stochastic domains

Author(s): Dean, T. and Kaelbling, L. P. and Kirman, J. and Nicholson, A.

Source: Artificial Intelligence

Year: 1995

Critique:

This paper addresses the issue of integrating information about uncertainty into the planning process. The authors propose using a Markov decision process model to represent the possible states of the world, and the probabilistic transitions between states. Once this model is represented it becomes possible to develop a policy which specifies the optimal action to take when the world is in any specific state. This is effectively a universal plan for the current goal.

The Markov decision process requires that every possible configuration of the world be represented as a unique state. For example, (using the domain from the paper) if a robot can be in any one of 166 discrete locations, then there must be 166 states in the model. Further, if the robot can be in any of four orientations, then there need to be 664 states, one for each location and orientation. If the domain contains other mutable conditions, the Markov model must represent the entire cross product of all conditions to be effective. Clearly, the size of the state space becomes intractable very quickly.

In order to make this process feasible, certain assumptions must be made to reduce the size of the state space. The initial limiting assumptions are:

Only navigation will be considered
There is a high solution density (lots of feasible paths exist)
Low dispersion rate (there are only a few adjacent locations)
Continuity (values of nearby states are similar)

These assumptions provide a basis for making the planning in stochastic domains tractable. The use of the high solution density is needed to support the planning under time constraints portion. The authors propose an any-time algorithm in which the quality of the plan improves as more time is devoted to planning, provided that a feasible solution can be found quickly.

The basic planning algorithm is as follows:

  1. Find a feasible solution from the current state to the goal.
  2. Build a sub-set of the complete state space, containing this feasible path - an envelope.
  3. Develop an optimal policy for this envelope.
  4. Execute the policy.

This algorithm depends on the empirical evidence that step 3, can be accomplished fairly quickly (it is not necessarily polynomial, but for the papers domain runs effectively). The actual algorithm uses the solution to a set of |S| linear equations, where |S| is the size of the envelope, to determine a new candidate policy, and keeping the best policy so far. This meets the requirements of the anytime algorithm.

While this build a policy for the envelope, there is no guarantee that the policy will not fall out of the subset. If this occurs, the authors propose building a new envelope which has been expanded to include both the new state, and paths back to either the goal, or the existing envelope. Since this would result in monotonically increasing the envelope (envelope extension), a mechanism for removing states which are unlikely to be encountered is also proposed (envelope pruning).

The authors go on to investigate the scheduling of the planning process. Given the anytime nature of the algorithm, there is a meta-level of deliberation during which the amount of resources (time) to devote to planning is determined, and several different mechanisms of interleaving planning and execution are discussed.

In the experimental results section, the planning system is used to compare the envelope based planning against complete state space optimization, and determined that equivalent quality of plans were obtained with approximately 25% of the time invested. They also compared various traditional planning algorithms with the algorithm they propose and found significant benefits in all tested domains (Note: a domain is a specific assignment of values to the problem's size, volatility, Operator reliability, and number of 'fatal' locations in the world.

The final section of the paper addresses addition areas of research for the algorithm, including representations of partial observability (Sensor uncertainty), multiple levels of abstraction in the domain representation, and increasing the set of actions that can be taken by the robot.

This paper addresses the issues of integrating uncertainty into the planning process, and also looks at the need to decide how much to plan. The underlying planning system, while functional for restricted state spaces, may prove ineffective in more complex planning domains. With a simple navigation model the number of states becomes significant, with the addition of robot effectors, parts to be moved around, varying states of equipment, doors, supplies, etc., the need to represent the entire cross product of all possible mutable conditions in the world will rapidly become intractable. In addition, it is also necessary to assign realistic probabilities to each possible action in each possible state of the world, and this may also be intractable, however, this is a problem associated with and probabilistic planning model.


Index
jpg3u@virginia.edu
Last modified: Wed Mar 10 11:22:18 1999