Agent Based Decision Support System Using Reinforcement Learning under Emergency Circumstances Devinder Thapa, In-Sung Jung, and Gi-Nam Wang Department of Industrial and Information Engineering, Ajou University, South Korea {debu, gabriel7}@ajou.ac.kr,
[email protected]
Abstract. This paper deals with agent based decision support system for patient’s right diagnosis and treatment under emergency circumstance. The well known reinforcement learning is utilized for modeling emergency healthcare system. Also designed is a novel interpretation of Markov decision process providing clear mathematical formulation to connect reinforcement learning as well as to express integrated agent system. Computational issues are also discussed with the corresponding solution procedure.
1
Introduction
The objective of this paper is to combine the agent based decision support system with ubiquitous artifacts and make it more intelligent so that it can help the doctors to acquire on time correct diagnosis and select appropriate treatment choices. An attempt is given to supervise the dynamic situation by using agent based ubiquitous artifacts and to find out the appropriate solution for emergency circum-stances providing correct diagnosis and appropriate treatment in time. As per the work done by M. Hauskret, H. Fraser[7], the reason for using the RL (Reinforcement Learning) agent based on MDP (Markov Decision Process) model is that it needs less number of parameters and it also gives approximation method to make trade off between accuracy and speed, in turn, solving the complex number of cases in less time compare to the existing system. The idea of interface agent has been derived from the concept of [4] although the functional architecture is different but the conceptual idea is similar to our work. The implementation of reinforcement learning agent approach has been utilized in the previous work [7] using the model of partially observable Markov decision process [POMDP]. The concept of ubiquitous healthcare system using agent technology has studied in [2]. All of the existing works have focused on the exploitation of ubiquitous system for the betterment of healthcare system. Our idea is to develop integrated emergency system using agent based approach. L. Wang, K. Chen, and Y.S. Ong (Eds.): ICNC 2005, LNCS 3610, pp. 888–892, 2005. c Springer-Verlag Berlin Heidelberg 2005
Agent Based Decision Support System Using Reinforcement Learning
2
889
Reinforcement Learning Agents
Reinforcement learning (RL) is based on interaction with an environment, from the consequences of action, rather than from explicit teaching [5]. RL could be characterized by a mathematical framework of Markov decision processes (MDPs). Main elements of Reinforcement learning is states s, actions a and rewards r. The reinforcement learning agent (RL-agent) is connected to his environment via sensors. In every step of interaction the agent receives a feedback about the state of the environment (st+1 ) and the reward (rt+1 ) of its latest action at. The agent chooses an action (at+1 ) representing the output function, which changes the state (st+2 ) of environment. The agent gets a new feedback, through the reinforcement signal (rt+2 ).
3
Scenario of Reinforcement Learning Agent at Emergency Circumstances
When a high risk patient, far from medical facilities, gets some perilous occurrence in their body the ubiquitous devices attached to their body sends some signals to the hospital knowledge base server. This signal sends the patient profile to the HIS (Hospital Information Server). Knowledge about the patient will be accumulated by the RL-agent (named as decision maker agent) from the HIS database [1]. The RL-agent compares the patient current status with his existing diagnosis history, RL-agent search for the related physician his scheduling, and sends the patients profile to the related departments. On the bases of this crucial data the decision maker agent, based on reinforcement learning approach, make inference of the data and provide entire data history of the patient with best alternate action(diagnosis and treatment) to the related department with minimal time cost. In this scenario, decision maker agent uses some model based on previous patient’s profile, to collect the patient data; however this paper only deals with the processing of decision maker agent based on RL approach.
4
Markov Decision Process
An MDP is defined by a set of states S, and actions A, Reward R, and transition probabilities T. V ∗ (s) = max(R(s, a) + ΣT (s, a, s )V ∗ (s )), ∀s ∈ S
(1)
∞ rt ) V ∗ (s) = max E(
(2)
a
π
R(s, a) =
s ∈S
t=0
P (s |s, a)R(s, a, s )
(3)
890
D. Thapa, I.-S. Jung, and Gi-Nam Wang
The objective of this model is to find out the optimized action to maximize the reward or cost in a finite horizon (2). Due to the computation complexities of the pure MDP model we use Bellman’s value function recursively; it (1) calculates the total reward value by adding all the suboptimal values (3).
5
Formulations to a Reinforcement Learning Problem a1,1 a1,1 { r, p }
a2 1
{r,p}
S2
S1
{ -r, p }
a1,2 { r, p }
Fig. 1. Symbolic representation of Markov Decision Process
r=Reward, p=Transition probability, a=Action, S=State Decision Epochs: [Finite time horizon] T={1,2,,N}, N ≤ ∞ States: [Patient Condition: Serious, Normal] S={S1, S2} Actions: [Medication, No action]As1 {a1,1 , a1,2 }, As2 = {a2,1 } Rewards: [Cost] rt(S1, a1,1 ) = r1,1 ; rt(S1, a1,2 ) = r1,2 ; rt(S2, a2,1 ) = r2,1 ;r N(S1)=0;rt(S2)=0; If N ≤ ∞ Transition Probabilities: [Effect of diagnosis and treatment] pt(S1|S1, a1,2 ) = p1,2,1 ; pt(S1|S2, a2,1 ) = p2,1,2 ; pt(S2|S1, a1,1 ) = p1,1,3 ; pt(S2|S1, a1,2 ) = p1,2,4 ; pt(S2|S2, a2,1 ) = p2,1,5 Expected Reward/Cost: rt(S1, a1,1 )= rt(S1, a1,1 , S1) pt(S2|S1, a1,1 ) +rt(S1, a1,1 , S2)pt(S2|S1, a1,1) 5.1
Finding the Best Policy or the Minimum Cost Function Using DP (Dynamic Programming) Approach
Choose an arbitrary policy loop
Agent Based Decision Support System Using Reinforcement Learning
891
Π := Π’ compute the value function of policy : solve the linear equations V π (s) = R(s, π(s)) + Σs ∈S T (S, π(S), S )V π (S )
(4)
improve the policy at each state: Π (s) := arg min(R(S, a) + Σs ∈s T (s, a, s )V π (s )) a
(5)
until Π = Π’ Denote a policy as Π, where Π=action selected in current state. Where V π (s) and π’(s) are optimal value and control function. We can take Π’ as any random policy and V π (s) is reward value starting from current state and following the Π policy. Now we can define another greedy policy in terms of Π’(s) and make iteration of the value function V π (s) function until Π = Π’ .
6
Conclusions and Future work
This paper presents and describes a Reinforcement Learning agent based model used for information acquiring and real time decision support system at emergency circumstances. Markov decision process is also employed to provide clear mathematical formulation in order to connect reinforcement learning as well as to express integrated agent system. This method will be highly effective for the real time diagnosis and treatment of high risk patient during the emergency circumstances, when they are away from the hospital premises. Further pursuing will be to develop some prototype, and simulate the testing data, planning modules, and find out the actual outcome of this approach.
References 1. Rodriguez,M., Favela,J., Gonzalez,V., and Muoz,M. : Agent Based Mobile Collaboration and Information Access in a Healthcare Environment. Proceedings of Workshop of E-Health, Applications of Computing Science in Medicine and Health Care., ISBN: 970-36-0118-9. Cuernavaca, Mxico, December,(2003) 2. Bardram,J., E. : The Personal Medical Unit – A Ubiquitous Computing Infrastructure for Personal Pervasive Healthcare. UbiHealth 2004: The 3rd International Workshop on Ubiquitous Computing for Pervasive Healthcare Applications , (2004) 3. Watrous,R.L. , and Towell,G. : A Patient-adaptive Neural Network ECG Patient Monitoring Algorithm. In Proceedings Computers in Cardiology, Vienna, Austria (1995), 229–232 4. Wendelken,S.,M., McGrath,S.,P. and Blike,G.,T. : Medical assessment algorithm for automated remote triage. International conference of the IEEE EMBS, Mexico,September (2003) 5. Sutton,R. ,S. , and Barto,A., G.: Reinforcement Learning: An Introduction. MIT Press, A Bradford Book, Cambridge, MA , (1998)
892
D. Thapa, I.-S. Jung, and Gi-Nam Wang
6. Kaelbling,L., P. , Littman,M., L. ,Moore, A., W. : Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, (1996),237-285 7. Hauskret,M. , Fraser,H. : Planning Treatment of Ischemic Heart Disease with Partially Observable Markov Decision Process. Artificial Intelligence in Medicine,vol(18),(2000), 221-244