Inconsistent data. ⢠Non-deterministic actions. ⢠Non-deterministic hierarchies. ⢠Dealing with failure recovery.
Integrating Learning into a BDI Agent for Environments with Changing Dynamics Dhirendra Singh1
Sebastian Sardina1 1 2
Lin Padgham1
Geoff James2
RMIT University, Melbourne, Australia
CSIRO Energy Technology, Sydney, Australia
International Joint Conference on Artificial Intelligence July 2011, Barcelona, Spain
Belief-Desire-Intention (BDI) Agent Architecture SENSORS
Events
Pending Events
Environment
Beliefs
BDI Engine
Intention Stacks
ACTUATORS
Dynamic
Plan Library
Static
Actions
A plan is a programmed recipe for achieving a goal in some situation. We wish to improve plan selection in situations, based on actual experience. Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
1 / 12
Belief-Desire-Intention (BDI) Agent Architecture SENSORS
Events
Pending Events
Environment
Beliefs
BDI Engine
Intention Stacks
ACTUATORS
Dynamic
Plan Library
Static
Actions
A plan is a programmed recipe for achieving a goal in some situation. We wish to improve plan selection in situations, based on actual experience. Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
1 / 12
Plan Selection in a BDI Goal-Plan Hierarchy P1 G2
G4
P2
P4 ×
×
G3
×
× G6
G5 P3 √
×
P5 √
×
P6 √
×
×
For plan P1 to succeed, several correct choices must be made.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
2 / 12
The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
3 / 12
The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically
Execute and record outcome
Update plan’s decision tree
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
3 / 12
The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically
1
Execute and record outcome
For every plan whose programmed applicability condition holds, calculate a selection weight based on perceived likelihood of success.
2
Select a plan probabilistically using selection weights.
3
Execute the plan (hierarchy) and record the outcome(s).
4
Update the plan’s decision tree.
5
Repeat.
Update plan’s decision tree
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
3 / 12
The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically
1
Execute and record outcome
For every plan whose programmed applicability condition holds, calculate a selection weight based on perceived likelihood of success.
2
Select a plan probabilistically using selection weights.
3
Execute the plan (hierarchy) and record the outcome(s).
4
Update the plan’s decision tree.
5
Repeat.
Update plan’s decision tree
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
3 / 12
The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically
1
Execute and record outcome
For every plan whose programmed applicability condition holds, calculate a selection weight based on perceived likelihood of success.
2
Select a plan probabilistically using selection weights.
3
Execute the plan (hierarchy) and record the outcome(s).
4
Update the plan’s decision tree.
5
Repeat.
Update plan’s decision tree
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
3 / 12
The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically
1
Execute and record outcome
For every plan whose programmed applicability condition holds, calculate a selection weight based on perceived likelihood of success.
2
Select a plan probabilistically using selection weights.
3
Execute the plan (hierarchy) and record the outcome(s).
4
Update the plan’s decision tree.
5
Repeat.
Update plan’s decision tree
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
3 / 12
The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically
1
Execute and record outcome
For every plan whose programmed applicability condition holds, calculate a selection weight based on perceived likelihood of success.
2
Select a plan probabilistically using selection weights.
3
Execute the plan (hierarchy) and record the outcome(s).
4
Update the plan’s decision tree.
5
Repeat.
Update plan’s decision tree
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
3 / 12
The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically
Execute and record outcome
Update plan’s decision tree
Singh et al. (RMIT & CSIRO)
wa wb ... ... wb ... ... wa ... ...
× × • Incomplete data
√
• Inconsistent data • Non-deterministic actions
• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics
BDI Learning in Changing Environments
IJCAI 2011
3 / 12
The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically
Execute and record outcome
Update plan’s decision tree
Singh et al. (RMIT & CSIRO)
wa wb ... ... wb ... ... wa ... ...
× × • Incomplete data
√
• Inconsistent data • Non-deterministic actions
• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics
BDI Learning in Changing Environments
IJCAI 2011
3 / 12
The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically
Execute and record outcome
Update plan’s decision tree
Singh et al. (RMIT & CSIRO)
wa wb ... ... wb ... ... wa ... ...
× × • Incomplete data
√
• Inconsistent data • Non-deterministic actions
• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics
BDI Learning in Changing Environments
IJCAI 2011
3 / 12
The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically
Execute and record outcome
Update plan’s decision tree
Singh et al. (RMIT & CSIRO)
wa wb ... ... wb ... ... wa ... ...
× × • Incomplete data
√
• Inconsistent data • Non-deterministic actions
• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics
BDI Learning in Changing Environments
IJCAI 2011
3 / 12
The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically
Execute and record outcome
Update plan’s decision tree
Singh et al. (RMIT & CSIRO)
wa wb ... ... wb ... ... wa ... ...
× × • Incomplete data
√
• Inconsistent data • Non-deterministic actions
• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics
BDI Learning in Changing Environments
IJCAI 2011
3 / 12
The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically
Execute and record outcome
Update plan’s decision tree
Singh et al. (RMIT & CSIRO)
wa wb ... ... wb ... ... wa ... ...
× × • Incomplete data
√
• Inconsistent data • Non-deterministic actions
• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics
BDI Learning in Changing Environments
IJCAI 2011
3 / 12
The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically
Execute and record outcome
Update plan’s decision tree
Singh et al. (RMIT & CSIRO)
wa wb ... ... wb ... ... wa ... ...
× × • Incomplete data
√
• Inconsistent data • Non-deterministic actions
• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics
BDI Learning in Changing Environments
IJCAI 2011
3 / 12
The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically
Execute and record outcome
Update plan’s decision tree
wa wb ... ... wb ... ... wa ... ...
× × • Incomplete data
√
• Inconsistent data • Non-deterministic actions
• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics
Need a quantitative measure of confidence in the current learning.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
3 / 12
A Dynamic Confidence Measure
We build confidence from observed performance of a plan using:
P G1
• how well-informed were the recent
G2
decisions, or stability-based measure • how well we know the worlds we are
Pa ×
Pb √
Pc √
Pd
Pe
witnessing, or world-based measure
×
×
• an averaging window n and preference
bias α Plan selection weight, that dictates exploration, is then calculated using the predicted likelihood of success together with the confidence measure.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
4 / 12
A Dynamic Confidence Measure
We build confidence from observed performance of a plan using:
P G1
• how well-informed were the recent
G2
decisions, or stability-based measure • how well we know the worlds we are
Pa ×
Pb √
Pc √
Pd
Pe
witnessing, or world-based measure
×
×
• an averaging window n and preference
bias α Plan selection weight, that dictates exploration, is then calculated using the predicted likelihood of success together with the confidence measure.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
4 / 12
A Dynamic Confidence Measure
We build confidence from observed performance of a plan using:
P G1
• how well-informed were the recent
G2
decisions, or stability-based measure • how well we know the worlds we are
Pa ×
Pb √
Pc √
Pd
Pe
witnessing, or world-based measure
×
×
• an averaging window n and preference
bias α Plan selection weight, that dictates exploration, is then calculated using the predicted likelihood of success together with the confidence measure.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
4 / 12
A Dynamic Confidence Measure
We build confidence from observed performance of a plan using:
P G1
• how well-informed were the recent
G2
decisions, or stability-based measure • how well we know the worlds we are
Pa ×
Pb √
Pc √
Pd
Pe
witnessing, or world-based measure
×
×
• an averaging window n and preference
bias α Plan selection weight, that dictates exploration, is then calculated using the predicted likelihood of success together with the confidence measure.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
4 / 12
A Dynamic Confidence Measure
We build confidence from observed performance of a plan using:
P G1
• how well-informed were the recent
G2
decisions, or stability-based measure • how well we know the worlds we are
Pa ×
Pb √
Pc √
Pd
Pe
witnessing, or world-based measure
×
×
• an averaging window n and preference
bias α Plan selection weight, that dictates exploration, is then calculated using the predicted likelihood of success together with the confidence measure.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
4 / 12
Example: Dynamic Confidence Measure
P G1 Pa ×
Confidence
1 G2
Pb √
Pc √
Pd
Pe
×
×
Solution found at E=10 and full confidence at E=15.
0.5
0 0
5
10 Executions
15
α = 0.5, n = 5
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
5 / 12
Example: Dynamic Confidence Measure
P G1 Pa ×
What if the environment changes after we have learnt the solution?
G2 Pb √
Pc
Pd
×
×
Singh et al. (RMIT & CSIRO)
Pe √
Say after execution 15, plan Pc no longer works for resolving goal G2 , but plan Pe does.
BDI Learning in Changing Environments
IJCAI 2011
5 / 12
Example: Dynamic Confidence Measure
P G1 Pa ×
Confidence
1 G2
Pb √
Pc
Pd
×
×
Pe √
After E=15, Pc starts to fail. The confidence drops, promoting new exploration and re-learning.
0.5
0 0
5
10 15 20 Executions
25
α = 0.5, n = 5
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
5 / 12
Consumption
A Battery Storage Application
ph
0
t1 Demand
Time Battery
t2 Grid Supply
Given net building demand, calculate an appropriate battery response in order to maintain grid power consumption within range [0, ph ].
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
6 / 12
Design: A Battery Storage Application
G(k) Set module k to Charge
Set module k to Discharge
G(k-1)
G(k-1)
Charge module 5
Disconnect module 4
Disconnect module k
Operate & evaluate
G(k-1)
...
Charge module 1
Operate battery
Aim: Learn appropriate plan selection to achieve a desired battery response rate, given the current battery state. State space for a battery with five modules is ≈13 million. Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
7 / 12
Experiment: Capacity Deterioration 1
Success
0.95 0.9 0.85 0.8 0.75 0
5k
10k
15k 20k Episodes
25k
30k
35k
Recovery from deterioration in module capacities at 5k episodes.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
8 / 12
Experiment: Partial Failure with Restoration 1
Success
0.9 0.8 0.7 0.6 0
5k
10k 15k 20k 25k 30k 35k 40k Episodes
Recovery from temporary module failures during [0, 20k], [20k, 40k] episodes.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
9 / 12
Experiment: Complete Failure with Restoration 1
Success
0.8 0.6 0.4 0.2 0 0
5k
10k Episodes
15k
20k
Recovery from complete system failure during [0, 5k] episodes.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
10 / 12
Limitations and Future Work
• We cannot account for inter-dependence between subgoals of a
higher-level plan. • Maintaining full training data is not practical. It becomes obsolete when
the environment changes. We tried an arbitrary scheme to filter “old” data in the battery agent and reduced the data set by 75% with no loss in performance. • Onus is on the designer to select appropriate state representation and
learning parameters. Some of this could be automatically extracted from the BDI program.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
11 / 12
Limitations and Future Work
• We cannot account for inter-dependence between subgoals of a
higher-level plan. • Maintaining full training data is not practical. It becomes obsolete when
the environment changes. We tried an arbitrary scheme to filter “old” data in the battery agent and reduced the data set by 75% with no loss in performance. • Onus is on the designer to select appropriate state representation and
learning parameters. Some of this could be automatically extracted from the BDI program.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
11 / 12
Limitations and Future Work
• We cannot account for inter-dependence between subgoals of a
higher-level plan. • Maintaining full training data is not practical. It becomes obsolete when
the environment changes. We tried an arbitrary scheme to filter “old” data in the battery agent and reduced the data set by 75% with no loss in performance. • Onus is on the designer to select appropriate state representation and
learning parameters. Some of this could be automatically extracted from the BDI program.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
11 / 12
Summary
• Learning BDI plan selection is desirable when plan applicability cannot be
easily programmed or when the environment changes over time. • We present a learning framework for additionally determining a plan’s
applicability conditions using decision trees. Plans are selected probabilistically based on their predicted likelihood of success and our perceived confidence in the learning. • We evaluate the framework empirically in a storage domain where a
battery controller must continually adapt to changes that cause previous solutions to become ineffective.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
12 / 12
Summary
• Learning BDI plan selection is desirable when plan applicability cannot be
easily programmed or when the environment changes over time. • We present a learning framework for additionally determining a plan’s
applicability conditions using decision trees. Plans are selected probabilistically based on their predicted likelihood of success and our perceived confidence in the learning. • We evaluate the framework empirically in a storage domain where a
battery controller must continually adapt to changes that cause previous solutions to become ineffective.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
12 / 12
Summary
• Learning BDI plan selection is desirable when plan applicability cannot be
easily programmed or when the environment changes over time. • We present a learning framework for additionally determining a plan’s
applicability conditions using decision trees. Plans are selected probabilistically based on their predicted likelihood of success and our perceived confidence in the learning. • We evaluate the framework empirically in a storage domain where a
battery controller must continually adapt to changes that cause previous solutions to become ineffective.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
12 / 12
Summary
• Learning BDI plan selection is desirable when plan applicability cannot be
easily programmed or when the environment changes over time. • We present a learning framework for additionally determining a plan’s
applicability conditions using decision trees. Plans are selected probabilistically based on their predicted likelihood of success and our perceived confidence in the learning. • We evaluate the framework empirically in a storage domain where a
battery controller must continually adapt to changes that cause previous solutions to become ineffective.
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
12 / 12
References D. Singh, S. Sardina, L. Padgham. Extending BDI plan selection to incorporate learning from experience. Journal of Robotics and Autonomous Systems (RAS), 2010. D. Singh, S. Sardina, L. Padgham, and S. Airiau. Learning context conditions for BDI plan selection. Proceedings of Autonomous Agents and Multi-Agent Systems (AAMAS), 2010. S. Airiau, L. Padgham, S. Sardina, and S. Sen. Enhancing adaptation in BDI agents using learning techniques. International Journal of Agent Technologies and Systems (IJATS), 2009. Poster today in Rooms 133–134 at 16:40–17:20
Singh et al. (RMIT & CSIRO)
BDI Learning in Changing Environments
IJCAI 2011
12 / 12