Bayesian Toolbox for Dynamic Distributed Decision Making Václav Šmídl ÚTIA, Academy of Sciences of the Czech Republic, Prague 8, Czech Republic,
[email protected]
Abstract The paper outlines a new challenging area of control, namely distributed decision making under uncertainty. Presence of many decision-making units, called the participants is the main distinction from the classical control theory. The problem is analysed from Bayesian point of view. Each participant has the abilities of classical adaptive controller, i.e. learning of the environment and design of the control strategy. What is new is the ability to communicate with its neighbours. The effect of this ability on learning and control strategy design is studied in the paper. Methodology for dealing with the problem is introduced.
I. Introduction Decision Making (DM) [1, 2], i.e. an active and purposeful selection among several alternative options, is in heart of any human-related activity. Mostly, the selection of a whole sequence of such actions is to be made with dynamically delayed consequences: dynamic DM is faced. Obviously, control [3, 4] can be viewed as a specific instance of dynamic DM, cf. the IEEE Series of Conferences on Decision and Control. Dynamic DM, and thus control, is always made under uncertainty caused by incomplete knowledge of decision maker on the mechanism relating actions and their consequences. In fact, the ever present uncertainty is the real reason for feedback that modifies its actions according to observations made. Stochastic control theory [5] and generally theory of statistical DM [2] model this situation. It guides in designing optimal sequence of rules converting available knowledge, DM aims and observations to actions, in design decision strategy (or control law). Often, the knowledge is accumulated from observations made either before applying the DM strategy or even during the course of actions [6, 7]. Thus, a sort of learning becomes a part of the DM strategy. The following assumptions made in the design of the standard (single-participant) optimal DM strategy are relevant for this text. 1. The optimized strategy (controller) is the only system that intentionally influences the optimized responses. 2. The DM aim is given a priori and often it is single one. The actions can be multivariate. Thus, seemingly, the first of the the listed assumptions seems to make no problems. The computational and communication complexity, inherent to DM under uncertainty, makes this assumption very restrictive. The DM strategies designed under it are practically feasible in relatively low dimensional problems, the solution is far from being scalable. The problem is practically solved by decomposing the whole decision problem, by providing a sort of (necessarily approximate) distributed DM. In this approach, the complex multivariate decision-maker is replaced by a group of simpler decision-makers influencing only part of the controlled system. Definitely, this methodology shifts the complexity boundary of solvable cases much further on, e.g. [8, 9]. At the same time, a lot of problems arise and it seems that there is no commonly accepted methodology how to approach the solution. Some problems even seem to be conceptual ones [10]. The second assumption either brings scalability boundary or it is violated in distributed settings and raises complexity of the solution via the need to solve co-ordination and negotiation problems. In this paper, we address the problem using the Bayesian decision-making theory. In Section II., we review selected parts of the Bayesian theory. These topics are extended in Section III. for the distributed decision making (also known as multiple-participant decision-making). II. Review of Bayesian Decision Making for the Classical Scenario Single controller, as a prototype of a single decision maker, influences a part of real world of its interest. It is called traditionally the system. Behaviour of the system is known via the observed data y (t) = [y 1 , y2 , . . . , yt ], where yt denotes the observation at time t, and y(t) denotes the whole trajectory of observations (the same notational distinction is made for all other quantities). The controller can directly influence certain quantities of the system, which will be denoted u t . Its aim is to choose such values of ut , so that the future data observed on the system, yt+1 , are as close to the desired behaviour as possible. In other words, the system can be viewed as a transformation of its inputs u(t) to its outputs y(t). The problem is graphically illustrated in Figure 1 (left).
However, in many practical problems, the system respond differently to the same input values. The difference can be dependent on time of the observation, history of the system output, etc. Formally, we model these differences as consequences of unobserved system quantities x. In order to deal with these uncertainties we employ the Bayesian methodology. In the Bayesian methodology, all quantities are treated as random variables. Model in the closed loop is described by a probability density function (pdf): f (y(τ ), u(τ ), x(τ )) =
τ Y
f (yt |y (t − 1) , u (t) , x (t)) f (xt |, y (t − 1) , u (t) , x (t − 1)) f (ut |y (t − 1) , u (t − 1) , x (t − 1)) .
t=1
(1) The above made decomposition follows from the chain rule of probability calculus. It naturally splits the model into the following parts: Observation model: f (yt |y (t − 1) , u (t) , x (t)) describes pdf of the system output given the complete knowledge of its past output, past and current input, and uncertain quantities (state). For computational reasons, the model is not conditioned by possibly infinite history, but only a finite number of historical data. The observation model then reduces to f (yt |ut , xt ). State evolution model: f (xt |y (t − 1) , u (t) , x (t − 1)) describes pdf of the inner system quantities. For the same reasons as above, the model is simplified by additional assumption and it is reduced to f (x t |ut , xt ). Controller law: f (ut |y (t − 1) , u (t − 1) , x (t − 1)) describes the decision making strategy of the controller. As the controller has no information about the state xt , we restrict the conditioning part on the observed data only: f (ut |y (t − 1) , u (t − 1)). Note that generally, the controller does not have to be deterministic. The probabilistic controller can choose its actions as samples from this distribution. The classical deterministic control is revealed if this pdf is chosen as Dirac delta function. The system model is then described as: f (y(τ ), u(τ ), x(τ )) =
τ Y
f (yt |ut , xt ) f (xt |ut , xt−1 ) f (ut |y (t − 1) , u (t − 1)) .
(2)
t=1
The final aim of the modelling is, of course, the decision making process. In the classical Bayesian decision making theory [2], a decision is considered as optimal if it minimizes the expected value of the loss function. The loss function is chosen by the designer as a tool for description of the ideal behaviour of the system. This subjective choice plays an important role in the tractability of the design of the control strategy. Choice of an inappropriate function may lead to difficulties in design. The fully probabilistic design (FPD) [11] was developed to overcome this problem. 1. Fully Probabilistic Design of Control Law The aim and the constraints are jointly described by the ideal pdf, that represents the desired behaviour of the closed control loop. It is constructed in the way analogous to (2): I
f (y(τ ), u(τ ), x(τ )) =
τ Y
I
f (yt |ut , xt ) If (xt |ut , xt−1 ) If (ut |y (t − 1) , u (t − 1)) .
(3)
t=1
Here, the superscript I f (·) on the left side of the pdf denotes user-selected ideal distribution of the system behaviour. The loss function of the classical Bayesian approach is chosen as Kullback-Leibler divergence [12] of the observed pdf (2) and the ideal (3): Z fτ KL fτ || Ifτ = fτ ln I dy (t) du (t) dx (t) . (4) fτ y(t),u(t),x(t) Here, shortened notation fτ = f (y(τ ), u(τ ), x(τ )) and Ifτ = If (y(τ ), u(τ ), x(τ )) is used. The task of the FPD is to find the control strategy, i.e. f (ut |u (t − 1) , y (t − 1)), t = 1, . . . , τ , minimizing (4). It was proven [11, 13], that the problem has a unique solution as follows: o
f (ut |d(t − 1)) =
γ (y(t − 1), u(t − 1)) ≡
exp [−ω (ut , y(t − 1)u(t − 1))] , t = 1, . . . , τ, f (ut |y(t − 1), u(t − 1)) γ (y(t − 1), u(t − 1)) Z I f (ut |y(t − 1), u(t − 1)) exp [−ω (y(t − 1), u(t))] dut , o
(5)
where of (ut |·) denotes the optimized decision-making strategy. The auxiliary function γ (·) is generated recursively as follows: Z ω (y(t − 1), u(t)) ≡ Ω (ut , d(t − 1), xt−1 ) f (xt−1 |y(t − 1), u(t − 1)) dxt−1 , (6) Z Ω (u (t) , y(t − 1), xt−1 ) ≡ f (yt |ut , xt ) f (xt |ut , xt−1 ) f (yt |ut , xt ) f (xt |ut , xt−1 ) dyt dxt . ln γ(y (t) , u (t)) If (yt |ut , xt ) If (xt |ut , xt−1 ) Note that the evaluation is performed backwards, i.e. for t = τ, . . . , 1. Auxiliary functions f (x t |y(t), u (t)) and f (xt |y(t − 1), u (t)) have a special role and will be explained in the following Section. For detailed derivation and proofs of the formulas see [13]. 2. Bayesian Filtration (Learning) The main objective of the paper—i.e. decision making—was conceptually solved in the previous Section. However, some pdfs that arise in (5) as intermediate steps in the decision-making are important in learning (parameter estimation, state estimation). These are: Time updating: of the predicting pdf of the state variable using data up to time t − 1 Z f (xt |u (t) , y(t − 1)) = f (xt |ut , xt−1 )f (xt−1 |u(t − 1), y(t − 1)) dxt−1 . Data updating: correcting the previous distribution by the new observation: f (xt |u(t), y(t)) = Z
f (yt |ut , xt )f (xt |u (t) , y(t − 1))
.
f (yt |ut , xt )f (xt |u (t) , y(t − 1)) dxt | {z } f (yt |u(t),y(t−1))
Note that neither time nor data updating depends on the pdf describing the control strategy (i.e. {f (u t |d(t − 1))}t=1...τ ), but only on its realizations {ut }t=1...τ . This is important as it makes FPD optimization possible. 3. State-of-the-art In the previous Sections, we have described two key step of the Bayesian decision making. Notably, all the formulae heavily depends on integration over the support (probabilistic operations of marginalization and normalization). These integrations are analytically tractable only for a limited class of models, namely AR modelling [11] (Learning and FPD), mixtures of AR models [14]. Bayesian learning for the state-space models yields formulae identical to the well known Kalman filter [15]. III.
Multiple participant scenario
In the considered multiple participant scenario, parts of the system can be influenced by several controllers, participants in the DM process. The traditional understanding of the system looses its clarity and it is reasonable to adopt the term environment. This is again a part of the world which is to be influenced by the participant but it is embedded into an abstract medium called environment to which the participant sends its actions (including communication-oriented) and receives data informing him about outer world (including communication data from other participants). This is illustrated in Figure 1. The communication among participants is the only way how allow them to cooperate. If the participants are not aware of each other presence or they do not care about others, they act independently following their different and possibly contradictory aims. In such a situation, their mutual effect is adverse and yields poor performance of participants. Thus, the communication between participants serving for exchange knowledge, aims, restrictions and DM strategies is vital for a good performance. This communication goes via environment. Because of the discussed importance, it is sometimes reasonable to distinguish this path as illustrated in Figure ??. Therefore we distinguish three scenarios of cooperation: Selfish: participants are not aware of each other. Each of them follows its own aims. In effect, their mutual actions can lead to a conflict. Hierarchical: participants have assigned different roles which are ordered in hierarchical order. The superior participant has right to enforce its models and aims of the subordinate one. Typical example of this type is selection of set-points in the classical control theory.
Figure 1: Relation of classical decision making and multiple-participant DM terminology. Left: Classical set-up of a control (decision making) problem, Right: Multiple participant scenario of distributed decision making problem. Democratic: each participant is allowed to modify its own models and aims based on the information from its peers. Ideally, the participants influencing one input in the environment should be able to agree on the optimal value of this input before they perform the actual action. This agreement should be reached by negotiation. We believe that these three scenarios cover all the possible types of multiple-participant decision making. The mathematical formalism introduced in Section II. describes behaviour of an isolated decision-maker (i.e. the selfish scenario). For the hierarchical scenario, a new mechanism for communication between the participants is needed. This mechanism may be application-specific and therefore it is not covered by this paper. The superior participant sends its models or aims in form of pdf. The subordinate participant receives them and incorporate them into its own models and aims. However, the full models of the communicating participants does not have to be on the same parameter space. Therefore, they communicate via marginal distributions on the common support. The pdfs must be merged into one consistent model. This is known as information fusion and will be described shortly. The democratic scenario extends the hierarchical scenario, the participants mutually exchange knowledge (via pdfs) however, they are allowed to use or, to refuse the knowledge coming from its neighbours. In certain applications, it may be required to reach an agreement on the output of common interest. Then, the information must be exchanged and fused repeatedly, until the consensus is reached, which will be known as negotiation. 1. Information fusion Information fusion is a broad area dealing with merging of information from many sources. In our approach, the information is concentrated within pdfs. Full treatment of the problem can be found in [16]. Here, we will illustrate it on a small example. Consider two pdfs, f1 (θ1 , θ2 ) and f2 (θ2 ), of the arguments θ1 ,θ2 . These pdfs may represent both models (1), as well as ideal (3), where conditioning on the observed data was omitted for simplicity. Each of the sources have assigned its own weight, α ∈ h0, 1i and (1 − α), respectively. This weight models degree of belief (or reliability) of the source. The task of the fusion operation is to find a new pdf f˜ (θ1 , θ2 ) containing the appropriate ratio of information from each of the original sources. The task can be interpreted as an optimization problem in terms of KL divergence [16]. The optimal solution is then: f˜ (θ1 , θ2 ) = (αf1 (θ2 ) + (1 − α) f2 (θ2 )) f (θ1 |θ2 ) .
(7)
Operation (7) is used in the hierarchical scenario as follows: f1 (θ1 , θ2 ) is the original model of the subordinate participant, f2 (θ2 ) is the “order” from its superior. The weight α models the hierarchical structure, and it is considered as fixed. For α = 1, the participant f1 (·) is independent of f2 (·), for α = 0, the participant f1 (·) obeys f2 (·) absolutely. 2. Negotiation In the case democratic scenario, the hierarchy (i.e. weights α) is not fixed. Each participant is free to adjust its belief in its neighbours via the weights α. Then, we need to distinguish between the weights of the models and the ideals. We continue to use the notation α for the weights of the model (1), while we introduce a new weights β ∈ h0, 1i for the ideal (3). This distinction allows to consider α as random variable (e.g. as part of x t ), which may be estimated using the Bayesian paradigm. In other words, if the model f2 ( ·) obtained from the neighbour is describes the observations better than the current model f1 (·), the posterior estimate of the weight (1 − α) will increase. Note that quality of the model is judged with respect to the the final decision-making, i.e. with respect to the ideal function If1 (·). Note that β is the weight of the ideal function. Thus, there is no criterion we can judge whether some value of β is better that other. Therefore, the hierarchical approach will be used in this case. In principle, deterministic strategies for selection of β may be also used.
3. Probabilistic Models Up to this moment we used pdfs, f (·) as wildcard for any possible distribution. As noted in Section 2., the exact Bayesian inference is possible only for a limited class of models. However, none of them is wide enough to describe operations such as (7). We seek a class of models that allows (7), and for which an approximate Bayesian inference is available. The Bayesian Networks (BN) [17] seems to be a suitable candidate at the moment. However, the nature of our problem implies some requirement that are not very common in the mainstream Bayesian network models. These are: Dynamic networks that models temporal dependencies of variables, see [18], Undirected relations between variables. In other words, we do not assume any causality relations between variables. Also, for communication between participants, we need to be able to work with marginal distribution of any subset of variables. The principal of conditional independence seems to be a reasonable approach [19]. Inference (i.e. learning, Section 2.) in the Bayesian Networks is well established area under active research at the moment. However, no generally acceptable method for design of control strategies for general BN is known to us. IV. Implementation: Bayesian Toolbox We intend to apply the above described theory to real world industrial problems. It requires to create a software image of the theory. In this Section, we summarize the aim of the software and the requirements imposed on it. These issues will play a major role in the actual design of the software specifications. 1. Purpose of the Software Foundation of the DM theory are fixed for a long time, but there are (and will be) many details that that must be addressed. We intend to design a software framework of the basic DM task. However, we do not intend to implement all possible details of the task. Practically, the software should consider even tasks (operations, functions) for which an algorithmic solution is not yet known. If this can be achieved, it will lay basis for a long term research, where the attention can be focused on particular problem and not on re-implementation of the overall framework. This leaves an open space for demonstration of novel approaches to particular subproblems. Thus, some researchers involved will actively deal with minor parts of the system and will act as passive users of its rest. We distinguish two principal parts of the software package: Framework is a general description of the distributed dynamic DM. It specifies (i) data structures, and (ii) algorithms. Implementation of the framework in programming language. structures. The framework specification will be done to reflect the theory. Therefore, it is intended to remain fixed as long as possible. Its various implementations may arise. These implementations may be application-specific with different intellectual property rights. However, all implementation should follow the framework specifications to be mutually compatible. 2. Software Framework The software must be logically structured to be accessible both to an expert in estimation theory, as well as a practicallyoriented user. We follow Lego-like concept. Namely, the software framework should provide basic building blocks that can be seamlessly composed into complicated structures: • data structures correspond to basic structural elements in the theory • composition tools are operators in the theory • addition of new types of all elements (within the framework) does not require changes in the original methods These requirements are common to many software projects. An approach known as Object-Oriented programming was developed to address these issues [20]. 3. Software Implementation The software is to be used in full-scale applications, which induces high requirements for quality and maturity of the code. The following points seem to be indispensable: • available on wide range of platforms; • flexible, testing of new algorithms and setting should be easy and straightforward. • should allow to reuse the code that is already available; most of the available code is written in Matlab and C. However, these requirements (mostly the requirement of Matlab and C), are not well compatible with available objectoriented (OO) languages. Therefore, we try to implement the advantageous OO properties as follows: Encapsulation each “generic function” use the type-field to choose a method appropriate for the argument. 1. attributes of an object can be seen or modified only by calling the object’s methods; Without a proper support from the compiler, this is a matter of coding-discipline
2. implementation of the object can change (within reasonable bounds) without changing the interface visible to callers; This is certainly possible, if the interface is well designed from the beginning Inheritance: Definition of one object extends/changes data-fields or methods of another. It’s a very convenient feature for code reuse; This can be achieved using a rather simple external mechanism, such as table of inheritance. For example, if the method can not find an appropriate method for the given argument, it look up the table of inheritance and seeks an method appropriate for the “parental” structure. Polymorphism: The ability to work with several different kinds of objects as if they were the same. This can be achieved via “generic function” The introduced mechanism represent an OO approach that is more labour-intensive than the use of native OO language. However, we expect that number of different objects will be relative low (tens of objects), which is still maintainable. Also, external consistency-checking utilities can be developed to support this approach. The chosen languages (i.e. Matlab and C) does not support the distinction between public and private methods therefore, access to various fields in data structures is left to discipline of the developers. V. Conclusion In this paper, we have presented The Bayesian theory of decision making and it extensions for multiple decision-makers. We have outlined requirements and design principles for a new software toolbox for this theory. We intend to develop the toolbox as open-source project. We invite anybody interested in the subject to discuss the outlined approach to the problem, to download, use and test the software, and finally to contribute new ideas and code. Check out http://www.utia. cas.cz/AS for status of the project. Acknowledgement ˇ S1075351, GACR ˇ 102/03/0049, AV CR ˇ 1E100750401. This work is supported by the following grants: AV CR References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]
R. Keeny and H. Raiffa, Decisions with multiple objectives: Preferences and value tradeoffs, J. Wiley and Sons, New York, 1978. J. Berger, Statistical Decision Theory and Bayesian Analysis, Springer-Verlag, New York, 1985. K. Astrom and B. Wittenmark, Adaptive Control, Addison-Wesley Publishing Company, Reading, Massachusetts, 1989. D. Bertsekas, Dynamic Programming and Optimal Control, Athena Scientific, Nashua, 2001, 2nd edition. K. Astrom, Introduction to stochastic control, Academic Press, New York, San Francisco, London, 1970. A. Feldbaum, “Theory of dual control,” Autom. Remote Control, 21(9), 1960. P. Wellstead and M. Zarrop, Self-tuning Systems, John Wiley & Sons, Chichester, 1991. Y. Haimes and D. Li, “Hierarchical multiobjective analysis for large scale systems: Review and current status,” Automatica, 24(1):53–69, 1988. T. Sandholm, “Distributed rational decision making,” in Multiagent Systems - A Modern Approach to Distributed Artificial Intelligence, G. Weiss, Ed., pp. 201–258. 1999. H. Nurmi, “Resolving group choice paradoxes using probabilistic and fuzzy concepts,” Group Decision and Negotiation, 10:177–198, 2001. M. Kárný, “Towards fully probabilistic control design,” Automatica, 32(12):1719–1722, 1996. S. Kullback and R. Leibler, “On information and sufficiency,” Annals of Mathematical Statistics, 22:79–87, 1951. M. Kárný and T. Guy, “Fully probabilistic control design,” Systems & Control Letters, 2004, in review. M. Kárný, J. Andrýsek, L. Tesaˇr, and P. Nedoma, “Mixture-based adaptive probabilistic control,” International Journal of Adaptive Control and Signal Processing, 2003, draft of the paper. L. Ljung, System Identification: Theory for the User, Prentice-Hall, London, 1987. J. Kracík, “On composition of probability density functions,” in Multiple Participant Decision Making, J. Andrýsek, M. Kárný, and J. Kracík, Eds., vol. 9 of International Series on Advanced Intelligence, pp. 113–121. Advanced Knowledge International, Adelaide, Australia, 2004. F. Jensen, Bayesian Networks and Decision Graphs, Springer-Verlag, New York, 2001. K. Murphy, Dynamic Bayesian Networks: Representation, Inference and Learning, Ph.D. thesis, University of California, Berkeley, 2002. H. Attias, “A Variational Bayesian framework for graphical models.,” in Advances in Neural Information Processing Systems, T. Leen, Ed., vol. 12. MIT Press, 2000. P. Coad and J. Nicola, Object-Oriented Programming, Prentice Hall/Yourdon Press, 1993.