Business Process Optimization via Reinforcement ...

Business Process Optimization via Reinforcement Learning Mauricio Arango, Mark Foster, Ralf Mueller, David Vengerov February 2, 2017

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Agenda 1

Project Description

2

Reinforcement Learning (RL) Overview

3

Business Process Optimization and RL

4

RL in other Oracle Products

5

Future Work and Conclusions


2

Problem Statement From structured to unstructured Business Processes

• Business Processes are typically sequential decision making systems – Well-defined sequence of activities – Often considers only the “happy path”, exceptions or going back and forth in the process not factored in – Increasing process variance results in cluttered process diagrams

• Process Engine is in the “driver seat” of the process, user comes into picture only at certain steps


Example: Loan Request Process What people think it is…


4

Example: Loan Request Process Oh well, maybe I need to go back at certain steps in the process…


5

Example: Loan Request Process Heck, organizational roles should be considered too…


6

Example: Loan Request Process Uuups, folks might enter the system from the Internet or Bank Shop…


7

Business Process Optimization – analogous to minimizing maze traversal time for robot Dynamically changing maze


8

Example: Loan Request Process Next Iteration


9

Unstructured Processes aka (Dynamic / Adaptive) Case Management • Forrester “A semi-structured but also collaborative, dynamic, human and information intensive process that is driven by outside events and requires incremental and progressive responses from the business domain handling the case.” • Peter Drucker coined the term “Knowledge Worker” in 1959 “Employees such as data analysts, product developers, planners, programmers and researchers who are engaged primarily in acquisition, analysis and manipulation of information as opposed to in production of goods or services.” • Paradigm shift* Case Management moves the process knowledge gathering in the life cycle from the template analysis/modeling phase into process execution. The Case Management System collects actionable knowledge based on process patterns created by business users Business Process Optimization supports the Knowledge Worker in the decision making process and thrives to automate tasks for repetitive work or work with human involvement * http://www.xpdl.org/nugen/p/adaptive-case-management/public.htm Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

10

Unstructured Processes • Typical Use Cases: – Investigative Case Management – Service Request Handling – Incident Handling

• User in the “driver seat” of the process, not process engine • Knowledge support and Process Optimization is key differentiator for any product in this area


11

Business Process Optimization areas • Optimizing a business process involves selecting actions at each intermediate state that maximize/minimize metrics including: – The success rate of reaching successful end states is maximized – The cost of reaching successful end states is minimized – Time required to complete process is minimized – Human involvement is minimized – Maximum degree of automation in the process

• Selecting the optimal action at a given state is produced via a policy function • The goal of business process optimization techniques is finding policy functions for business process instances • This project uses Reinforcement Learning to generate policy functions for target business processes


Agenda 1

Project Description

2


3


4

5

RL in other Oracle Products Future Work and Conclusions


13

Machine Learning Frameworks • Supervised learning problems – Regression, Classification

• Unsupervised learning problems: – Clustering, Self-organizing maps for dimensionality reduction

• Reinforcement learning (RL) problems: – Policy evaluation, policy optimization


Reinforcement Learning problem Learning through interactive experience

• Agent interacts with the environment and learns an optimal policy function Agent

State

• Optimal policy function – produces optimal action given a state

Reward

Action

• Interaction: Environment

s0

a0 r0

s1

a1 r1

s2

a2 r2

…

– Agent observes environment state, uses policy function to calculate optimal next action – Environment executes next action – Environment advances to a new state and produces a reward value (metric of action quality) – Agent updates policy function (RL algorithm) Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Policy Optimization Using RL • A policy specifies the action that should be taken in each state; IF-THEN rules are a common representation

• Optimal policy: maximizes the discounted sum of future rewards. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Reinforcement Learning Approaches • Direct RL approach: represent a policy as a parameterized function mapping states into actions (useful for continuous actions) State description

= action

• Indirect RL approach: represent the long-term value obtained by the system as a parameterized function Q(s,a) – An each state s, take the action a that maximizes Q(s,a)

• Multi-Agent RL approach: represent the long-term value obtained by each agent i as a parameterized function Vi(s) – Agents reconfigure their states (say trade resources) so as to maximize ∑i[Vi (si)] Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Indirect RL Approach: Q-learning • Q(s,a) gives the expected discounted sum of future rewards if action a is taken in state s and the optimal policy is followed thereafter: ∞

Q( st , a ) = E[r ( st , a )] + E[∑ γ k r ( st + k , at + k )] = E[r ( st , a) + γ max Q( st +1 , a)] a

k =1

• Given an approximation Qˆ ( s, a ) , its error can be computed as δ t = r ( st , at ) + γ max Qˆ ( st +1 , a ) − Qˆ ( st , at ) a

• Natural updating (learning) rule: keep adjusting Qˆ in the direction that decreases error at each time t. Commonly known as Q-learning: Qˆ ( st , at ) = Qˆ ( st , at ) + α tδ t where α t is a learning rate at time t. If it is decreased over time (say α . t = α t ), leads to convergence: Qˆ ( s, a ) → Q ( s, a ) as t → ∞ Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Fitted Q-iteration (FQI) Batch-oriented, uses regression for Q function approximation • Use a sliding window of N latest decisions: a = arg max Q ( i ) ( st , a ) a

• Fit a function to predict outputs based on inputs: Inputs: ( st − N , at − N , st − N +1 )

( st − N +1 , at − N +1 , st − N + 2 )

.. .

( st −1 , at −1 , st ) ( st , at , st +1 )

Outputs:

Q (i +1) ( s, a)

Qˆ ( i ) ( st − N , at − N ) + α i ⋅ δ t − N Qˆ ( i ) ( s ,a ) + α ⋅δ t − N +1

.. t − N +1 .

i

t − N +1

Qˆ ( i ) ( st −1 , at −1 ) + α i ⋅ δ t −1 Qˆ ( i ) ( st , at ) + α i ⋅ δ t

• Just a few fitting iterations are sufficient to learn a good approximation to the true Q-function, which is then used to choose actions over the next N state observations Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Agenda 1

Project Description

2


3


4

5



20

Business process use cases optimized via Reinforcement Learning Proof of concept implementations

• Use cases – Insurance Claims Management – Mortgage Application

• RL platform


Business process optimization as a Reinforcement Learning problem Mapping to RL architecture Interaction application - Uses Fitted Q-Iteration - Produces policy function

Agent

State

Reward

Environment

Action

Target business process


Applying RL-based optimization to best next action scenario Training phase

RL agent

State

Reward

Service phase Policy queries

Best action recommending feature

Service agent

Action




RL framework architecture Application-independent RL framework RL Agent API

RL agent application - Java

fqi() growBatch() getQValues()

FQI Java-R adapter – Java/JRI

API State

Reward

Action

Target business process or simulator - Java


Fitted QIteration - R

Fitted Q-Iteration Batch RL prototyping platform

• Fitted Q-Iteration (FQI) implementation in R – Batch-based Q learning • Incremental - growing batch with sliding window • Can also be used in interactive mode, combining batch and iterative learning

– Q value function approximation (as opposed to a table-based) using supervised learning (regression) – tested with: • Neural networks (nnet) • SVM • Linear regression

• Java API – Allows Java application (RL Agent) to use R-based FQI implementation – Uses the Java R Interface (JRI) library Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Insurance Claims Management


Insurance Claims Management • Simple business process with only one alternate path • Challenge – State representation – Validate agent learns to complete the business process without knowing anything about the internal behavior of the process – Validate agent learns to choose shortest path, given necessary conditions

• Proof of concept with simulator Update First Notice of Loss

S1

Update First Notice of Loss

Verify Coverage

S2

S3

S4

S5

S6

S7

S8

Create cost estimate

Assign field adjuster

…… S9

S10

S11

S12

S13

S14

S18

Initiate Fast-Track Fast track rule: if claim-value < threshold then fast-track Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

S19

What patterns in business processes provide room for optimization? • Tasks with capability to bypass sections of the process (fast-track in Insurance Claims Management) • Tasks (activities) with capability to execute in parallel or in series, depending process state attributes – Under some conditions parallel execution is optimal – Under other conditions serial execution is optimal – Challenge is to learn an adaptive policy function that chooses the best route, given current conditions


Example of business process with tasks enabled for parallel or serial execution Mortgage application process – two versions Parallel execution

Serial execution

Reference: “Intelligent Business Process Optimization for the Service Industry”, by Markus Kress. Chapter 9. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Mortgage application simulator Combined version, capable of choosing between parallel or serial execution succeed (p1) S2 Check collaterals

Start

Check construction documents – succeed (p2)

fail (1-p1)

Check construction documents

S5

S4 fail (1-p2)

fail (1-p2)

Accept

succeed (p1)

succeed (p2)

Check in parallel

p1 – probability of success for check collaterals action p2 – probability of success for check construction documents action

Check construction documents – fail (1-p2)

S3

Check collaterals fail (1-p1)

succeed (p2)

succeed (p2) Check construction documents

S7

fail (1-p2) S6

succeed (p2)

fail (1-p1)

Check collaterals

S8 succeed (p1) Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Reject

Mortgage application Results analysis

• Actions have a succeed or fail outcome, depending on state attributes and rules that may not be visible to the RL agent. Action behavior modeled as probability distribution in simulator • Intuitively, the optimal policy would be to always choose check in parallel • But, as the probability of failure for check collaterals (1-p1) increases, choosing check collaterals (serial) first becomes a better option – Reason: avoids costs of check construction documents when check collaterals fails

• Validated capability of RL agent to dynamically adjust and obtain optimal policy – Optimal policy not always feasible to derive by human agents Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Mortgage application Dynamic RL-based policy (green) vs. hardwired/fixed policy (red)


General purpose process simulator Ongoing work • Process simulator that enables simple set up of simulations of complex of business process transition patterns, with non-intuitive optimal policies to human agents

• Validate RL agents can obtain optimal policies


Initial Results Average Reward for different policies 50 0 0

0.2

0.4

0.6

0.8

1

-50 -100

Action 1

-150

Action 2 RL

-200 -250 -300

Action 2 Cost Multiple

• RL learns to always take action 2 when its cost multiple is 0. • As the cost multiple for action 2 increases, RL uses it less frequently and only when it is more profitable than action 1 (in high-quality states) Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

34

Agenda 1

Project Description

2


3


4

5



35

Other Oracle products that can benefit from RL • Oracle Advanced Analytics (OAA) – RL would enrich OAA’s strong portfolio of machine learning algorithms – Open OAA to building solutions for sequential decision problems in many business areas – R-based FQI implementation is aligned with OAA/ORE strong coverage of R

• Oracle Stream Analytics (OSA) & IoT CS – Industrial process optimization – Intelligent buildings optimization – Predictive maintenance


Other Oracle products that can benefit from RL

• Oracle Healthcare Cloud – Personalized medicine

• Oracle Cloud ERP – Resource allocation optimization – Process optimization


Agenda 1

Project Description

2


3


4

5

RL in other Oracle Products Conclusions & Future Work


38

Conclusions • Initial validation of capability of RL in optimizing business processes • Developed application-independent RL platform with Fitted Q-Iteration – can be used in multiple other projects • Identified key pending challenges involved in end-to-end automation of process optimization


Future work

• Integration with Process Cloud Service – Automated extraction of state and action sets

• General-purpose process simulator – Capability to simulate increasingly complex process scenarios

• Performance evaluation and visualization


Questions & Answers


Business Process Optimization via Reinforcement ...

Business Process Optimization via Reinforcement ...

Suggest Documents

Business Process Adaptations via Protocols - Semantic Scholar

Business Process Adaptations via Protocols - Lancaster University

Business Process Adaptations via Protocols - Lancaster University

Resolving Business Process Interference via Dynamic Reconfiguration

Business Process Adaptations via Protocols - Semantic Scholar

Business Process Analysis and Optimization: Beyond ... - CiteSeerX

The business process optimization of offer

Business Process Optimization (BPO) Report - Payroll Department

Market Making via Reinforcement Learning

Reinforcement Learning and Dynamic Optimization

Optimization of electrospinning process of poly(vinyl alcohol) via ...

Optimization via Information Geometry

Optimization via Chebyshev Polynomials

Autonomous helicopter flight via Reinforcement Learning

Clustering via optimization

Reinforcement Learning via Recurrent Convolutional Neural ... - arXiv

Interactively Shaping Agents via Human Reinforcement

Interaction-based Group Identity Detection via Reinforcement

Autonomous helicopter flight via reinforcement learning

Quadruped Robot Obstacle Negotiation via Reinforcement Learning

Adaptive Optimal Control via Reinforcement Learningfor Omni

Negative reinforcement via motivational withdrawal is ...

Shape Grammar Parsing via Reinforcement Learning - CiteSeerX

Sample-efficient Reinforcement Learning via Difference Models