"Markov Decision Processes", by Lodewijk Kallenberg

4 downloads 163 Views 5MB Size Report
reward at decision time point t for an action a in state i will be denoted by rt i(a); if the reward is independent of t