Planning Models

Given the subset model of uncertainty we can now address what planning model to use. Following the same general idea as before, a list of candidates will be presented, and each will be evaluated with respect to the subset representation.
Again, as this document evolves, both the features that are used as evaluation criteria, and the choice of planning models will be subject to change.

Finding high probability plans

There seem to be two different approaches to finding the 'best'. Either search the entire space and select the optimum, or re-arrange the search order, so that the first solution found is the optimum. When searching for the highest probability plan, both of these can be used. The first is trivial, since if we search the entire space, we will find the best. Unfortunately, this is intractable for all but the most trivial planning problems. So it is necessary to limit the search space, by controlling the search. The second approach is to rearrange the search order, so that the first plan found also meets the optimality criteria. In the case of classical planning, optimality is defined as plan length, so any search that pushes a frontier outward, will find the shortest plan first, thus eliminating the need to search the entire space.  In the shortest path model, each operator adds a fixed amount to the plan cost, so all N step plans are less valuable that all (N-K) length plans.

While this works well for shortest plan first, it is a little trickier for highest probability. The probability of plan success is the product of the probability of the steps (assuming independent operators). This means that there is no direct coupling between the plan length and the plan probability. So the same type of greedy algorithm that works for SFP planners, won't apply directly to probability aware planning. It is possible, however, to transform the planning space to allow SFP planners to produce MaxProb plans, under certain constraints.

Features

Candidate Planning Models


Backward Chaining Planners

This is the classic planning model. The planner begins from the desired goal state, and works toward the initial state. This approach can limit the search space by only looking at the operators that could modify the world to be closer to the goal.

Probability

Unfortunately, backward chaining planners are primarily designed to find the shortest plan that spans from the initial state to the goal state.

Elaboration Time


Candidate List

Forward Chaining Planners

Forward chaining planners work from the initial state to the goal, and control the search space by the use of explicit search control rules. These frequently work as breadth first search systems, and/or rely on the developer to encode rules that can be used to prune in-effective branches of the search tree.

Probability

Since the developer is responsible for encoding the search control rules, it is fairly straight forward to encode rules that limit search based on expected probability of success. In the case of my existing forward chaining planner, two classes of search control rules are used. First I use a function (called Stupid()) to determine whether it is worth pursuing a branch. Once the goal is reached, a second function is used to evaluate the probability, and expected value of the plan, and if it exceeds the best plan so far, the new plan is kept.

Elaboration Time


Candidate List

Planning as satisfaction

Probability

Elaboration Time


Candidate List

Constraint Propagation planners

Probability

Elaboration Time


Candidate List

Graph Theoretic Planners

Probability

Elaboration Time


Candidate List

OR/MDP/DT Planners

Probability

Elaboration Time


Candidate List

Plan Coalescence

Probability

Elaboration Time


Candidate List

Monte Carlo Planners

Probability

Elaboration Time


Candidate List

Sub-plan Exploration

The basic concept of sub-plan exploration is that it is computationally easier to find two plans of length (N/2) than it is to find one plan of length (N). This is, of course due to the exponential growth of the plan space as a function of plan length. For example, In the Fixit problem, given an operator set of size 14, and a plan of length 19, the naive plan space for the full plan is on the order of 6 * 10^21, whereas the plan space for the two half plans is only 6 * 10^11. This alone reduces the search space, in this case, by a factor of 10^10.

So, what we would like to do is find a state, which is reachable from the initial state in about (n/2) steps, and from which the goal state is reachable in about (n/2) steps. Then do two planning episodes. The only headache is how to find this mid-state?

Suppose that our representation of operators is in the form of an initial world description (IWD) and a terminal world description (TWD). Let's assume an open world model, with the absence of specification indicating a 'don't care' condition. Then the operator can be applied if its IWD is a subset of the world state at any time, since its preconditions would be met by that world state. After application, the world state is modified to reflect the effects of the operator' TWD. This would act as the new world state for the next operator, and so on until the goal state was reached.

With this representation, it is possible to treat the world descriptions as attribute sets, where each one consists of those attribute which are salient to the planning problem (the initial and final world descriptions), and the available operators.

Given this set representation, we can take the difference between the two sets, and use this to guide the process of selecting possible partitioning points for the planner.

Since this set difference captures only the salient differences between the initial and terminal world descriptions, it represents the 'net effect' of any plan which achieves the goals. If such a plan exists, then at least one point on the trajectory created by executing this plan must be a subset of the set difference. (!! Insert a link to a brief sketch of a betweeness proof, based on plan space as an asymetric pseudo-metric space !!)

So the long and the short of it is this: use the set difference to guide the search of possible way-points in the plan trajectory, for each possibility, search for connectedness to one end (a much smaller problem), and if the path is found, then search for the other half of the plan. If that fails, move onto the next possible way-point.

The headache with this is goal clobbering. Goal clobbering occurs when achieving one goal must be undone to achieve a second goal. The classic example of this is Sussman's Anomaly, set in the Blocks World Domain. Imagine that we have three blocks on a table, and Block C is on top of Block A which is on the table. Block B is also on the table. Our goal state is a tower with A on top of B on top of C.

Given two operators Pick-up and Stack, Pick-up will grasp a block as long as there is no other block ontop of it, and Stack will place a grasped block ontop of a second, as long as the second block is clear.

Sussman's anomoly arises when the construction of the tower is attempted as multiple sub-goals. We need to achieve three changes to the world: 1) place Block C on the table, 2) place block B on top of Block C, and 3) place Block A on top of Block B. If the three sub-goals are done in this order, everything is cool. But if the order is altered, then Block B is picked-up from the table, and placed on top of Block C (sub-goal!), now however, the pick-up operator cannot be applied to Block A, since it is not clear. Thus the achieved sub-goal must be undone (hopefully by putting Block B on the table, then Block C onto the table) then re-done after the other sub-goals are completed.

This is the risk associated with the sub-plan evaluation approach. However, the benefits can be significant. Using the Fixit world, experiments indicated that the cost of elaborating all 6 possible mid-points in the plan space explored only 0.000364 of the total plan space.

Probability

This extends to probability fairly well, since we are finding ways to investigate a set of possible plans at lower computational cost. However, it does not really exploit the exisitence of possible alternate outcomes. So it is more suited to POS or BUG models that to reasoning about the expected value of the plan.

Elaboration Time

The elaboration time is tricky. In the asymptotic case, we are simply doing exhaustive search of the space. However, we are using the heuristic to guide the search to cause viable plans to be located much sooner in the search process.
Candidate List 

References