Adaptive Modelling and Planning for Learning Intelligent Behaviour

Mykel J. Kochenderfer, Ph.D. Thesis, University of Edinburgh

Abstract: An intelligent agent must be capable of using its past experience to develop an understanding of how its actions affect the world in which it is situated. Given some objective, the agent must be able to effectively use its understanding of the world to produce a plan that is robust to the uncertainty present in the world. This thesis presents a novel computational framework called the Adaptive Modelling and Planning System (AMPS) that aims to meet these requirements for intelligence.

The challenge of the agent is to use its experience in the world to generate a model. In problems with large state and action spaces, the agent can generalise from limited experience by grouping together similar states and actions, effectively partitioning the state and action spaces into finite sets of regions. This process is called abstraction. Several different abstraction approaches have been proposed in the literature, but the existing algorithms have many limitations. They generally only increase resolution, require a large amount of data before changing the abstraction, do not generalise over actions, and are computationally expensive. AMPS aims to solve these problems using a new kind of approach.

AMPS splits and merges existing regions in its abstraction according to a set of heuristics. The system introduces splits using a mechanism related to supervised learning and is defined generally, allowing AMPS to leverage a wide variety of representations. The system merges existing regions when an analysis of the current plan indicates that doing so could be useful. Because several different regions may require revision at any given time, AMPS prioritises revision to best utilise whatever computational resources are available. Changes in the abstraction lead to changes in the model, requiring changes to the plan. AMPS prioritises the planning process, and when the agent has time, it replans in high-priority regions. This thesis demonstrates the flexibility and strength of this approach in learning intelligent behaviour from limited experience.

Download PDF (for free)

Order from Amazon

Order from Barnes and Noble