Preprints of the 18th IFAC World Congress Milano (Italy) August 28 - September 2, 2011
Optimal and MPC Switching Strategies for Mitigating Viral Mutation and Escape Esteban A. Hernandez-Vargas ∗ Richard H. Middleton ∗ Patrizio Colaneri ∗∗ ∗
Hamilton Institute, National University of Ireland, Maynooth, Co. Kildare, Ireland (e-mail:
[email protected], abelardo
[email protected] ) ∗∗ DEI, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milano, Italy (e-mail:
[email protected])
Abstract: This paper is motivated by the study of mutation in HIV infection. Combination antiretroviral therapy slows the clinical progression of HIV infection, however drug resistance due to viral mutation is a challenging problem. Some studies have speculated that alternating between drug regimens on a fixed schedule might forestall therapeutic failure. To further analyze this speculation, we consider a model of 64 viral strains with 3 drug combinations to analyze drug regimens to maximise the delay till viral escape. A model predictive control scheme is proposed for determining near optimal switching drug schedules. This technique is compared with an optimal control approach and with the strategy commonly used in clinical practice. Keywords: Biological Systems, HIV Mutation, MPC 1. INTRODUCTION
mature retrovirusparticle
CCR5antagonist Fusioninhibitors
budding Cytoplasm
HIV can only replicate inside cells. The process typically begins when a virus particle contacts an immune system cell that carries on its surface a special protein called CD4. Different drugs have been proposed to affect specific parts of the HIV life cycle, see Fig.1. Highly active antiretroviral therapy (HAART) is a combination therapy used to reduced viral replication and to delay the progression of the infection. However, HAART is not always successful, many patients experience virologic failure, defined as the inability to sustain suppression of HIV RNA levels to less than 50 copies/ml. Temporary suppression followed by increased viral load is referred to as viral rebound (Department of Health [2009]). This had lead to the conclusion that switching therapeutic options will be required lifelong in order to prevent HIV disease progression. Antiretroviral drug sequencing provides a strategy to deal with virologic failure and anticipates that therapy will fail in a proportion of patients due to resistant mutations. The primary objectives of therapy sequencing are the avoidance of accumulation of mutations and control of multi-drugresistant viruses Martinez et al. [2008]. Using mathematical modeling, in Amato et al. [1998] it was hypothesized that alternating HAART regimens, even while plasma HIV RNA levels were lower than 50 copies/ml, would further reduce the likelihood of the emergence of resistance. This concept has preliminary support from a clinical trial Mar This work was supported by Science Foundation of Ireland 07/PI/I1838.
Copyright by the International Federation of Automatic Control (IFAC)
penetration assembly bl
uncoating
structuralproteins
reverse transcriptase
NRTI/NNRTI
Protease Inhibitors
Nucleus
unspliced RNA unsplicedRNA
splicedRNA
unintegrate viral DNA viralDNA transcription Integrase Inhibitors
integrated g proviralDNA
Fig. 1. HIV cycle tinez et al. [2003], the SWATCH (SWitching Antiviral Therapy Combination against HIV) study. The design of drug sequencing has been a point of discussion. Ryan et al. [2007] proposed a pattern of structured treatment interruptions preceding the introduction of the new regimen can significantly decrease the risk of resistance emerging to new regimens. However, structured treatment interruption were prohibited by the Department of Health [2009]. Motivated by switching regimens, Middleton et al. [2010] proposed a 4 variant, 2 drug combination model, which can be seen as a switched system. Using optimal control
14857
Preprints of the 18th IFAC World Congress Milano (Italy) August 28 - September 2, 2011
strategies was shown the importance of alternating drug regimens. However the solution to optimal control problems frequently results in a two point boundary condition problem, which can not be solved using regular integration techniques. Numerical algorithms have been proposed to determine optimal trajectories. Iterative solutions based on Pontryagin’s maximum principle have been proposed, for example Kirk [1998], but without any guarantee of convergence. On the other hand, dynamic programming may be useful for problems of reasonable dimension Kirk [1998] and Middleton et al. [2010a]. Model Predictive Control (MPC) appears to be suitable for a suboptimal application to the biomedical area, due to its robustness to disturbances, model uncertainties and the capability of handling constraints. Using MPC techniques to plan treatment applications for HIV is not a new idea Zurakowski et al. [2006], Landi et al. [2010]. However, the models used in these previous approaches do not accurately reflect the interaction between different genotypes and drug treatments, and consequently do not predict the possibility of the appearance of highly resistant genotypes. The paper is organized as follows. The notation used throughout the paper is introduced in Section 2. A virus mutation treatment model and the cost function to be minimized are presented in Section 3. The optimal control problem and the algorithm to compute optimal trajectories are reviewed in Section 4. A MPC algorithm for the computation of antiretroviral drug sequencing is proposed in Section 5. Simulations results to the mitigation of viral escape is introduced in Section 6. The paper is finalized in Section 7.
i TH Vi + a1 Li − d1 Ti∗ + T˙i∗ = ψk1,σ
Throughout this paper, R denotes the field of real numbers, Rn stands for the vector space of all n-tuples of real numbers, Rn×n is the space of n × n matrices with real entries, and N denotes the set of natural numbers. Matrices or vectors are said to be positive (non-negative) if all their entries are positive (non-negative). We write A for the transpose of A, and exp(A) for the matrix exponential of A. 3. VIRUS MUTATION TREATMENT MODEL We consider a model with n different viral genotypes, with viral populations, Vi : i = 1, ...n; and N different possible drug therapies that can be administered, represented by σ(t) ∈ {1, ..., N }, where σ is permitted to change with time, t. In the model equations below, TH represents the uninfected CD4+T cell population, Ti∗ are the infected cells and Li represents the latently infected cells. The equations for the HIV treatment model are given below:
μmi,j Vj TH (1)
1 i L˙ i = (1 − ψ)k1,σ TH Vi − a1 Li − d2 Li (2) i ˙ Vi = k2,σ Ti∗ − k3 T Vi − d3 Vi (3)
HIV can infect a number of different cells; activated CD4+T cell, resting CD4+T cell, quiescent CD4+T cell, macrophages and dentritic cells. For simplicity, we consider only CD4+T cells as viral hosts. The infectivity rate i can be expressed as k1,σ , this parameter depends of the genotype and the therapy that is being used. Once CD4+T cells are infected, a proportion of cells, ψ passes into the infected cells population, whereas a proportion (1 − ψ) passes into the latently-infected cell population. These latently-infected cells might be activated later and start reproducing virus, this activation is represented through the a1 term. Viral proliferation is achieved in infected activated CD4+T cells, this is represent by k2,σ , which depends on the fitness of the genotype and the therapy. The mutation rate is represented by μ , the death or decay rates are represented by d1 , d2 , d3 for T ∗ , L, V respectively. mi,j ∈ {0, 1} represents the genetic connections between genotypes. Under normal treatment circumstances both simulations and clinical data suggest that healthy CD4+T counts are approximately constant Middleton et al. [2010]. This assumption allows us to simplify the dynamics to being essentially linear. Then the system (1-3) can be rewritten as follows; ⎡
Λ1,σ 0 ⎢ 0 Λ2,σ x˙ = ⎢ ⎣ ... 0
2. NOTATION
n
0
... ... .. .
⎤ 0 0 ⎥ x + μTH M x .. ⎥ . ⎦
(4)
. . . Λn,σ
where M = mi,j , Λi,σ is given by ⎤ ⎡ i a1 ψk1,σ TH −d1 i Λi,σ = ⎣ 0 −(a1 + d2 ) (1 − ψ)k1,σ TH ⎦ i k2,σ 0 −(k3 TH + d3 ) and x = [T1∗ , L1 , V1 , . . . , Tn∗ , Ln , Vn ]. 3.1 A 64 variant, 3 drug combination model We propose a model with 64 genetic variants, that is n = 64, and 3 possible drug therapies, N = 3. The viral variants (also called ‘genotypes’) are organized in a threedimensional lattice as is shown in Fig. 2. This lattice is based on the simplifying assumption that multiple independent mutations are required to achieve resistance to all therapies. The wild type genotype (g1 ) would be the most prolific variant in the absence of any drugs, however, it is also the variant that all drug combinations have been designed to combat, and therefore is susceptible to all therapies. On the other hand, after several mutations the highly resistant genotype (g64 ) is a strain with low proliferation rate,
14858
Preprints of the 18th IFAC World Congress Milano (Italy) August 28 - September 2, 2011
g49
g33
g50
g34
g51
g35
and penalises the total viral load. For biological reasons, if the total viral load is small enough during a finite time of treatment, then there is a significant probability that the total viral load becomes zero. Munoz et al. [1997] noted that the stronger the suppression of viral replication the less likely a significant resistance will emerge.
g52
g36 g56
g18
g19
g20 g40
g1
g2
g3
g4
g60
Therapy y3
g17
g24
4. OPTIMAL CONTROL PROBLEM REVIEW
g44 g5
g6
g7
g8
g64 g28
The model for the treatment of viral mutation given in (4) is described in continuous time. In practice, measurements can only reasonably be made at certain periodic intervals. For simplicity, we consider a regular treatment interval τ , during which treatment is fixed. If we use k ∈ N to denote the number of intervals since t = 0, then
g48 g9
g10
g11
g12 g32
g13
g14
g15
g16
x(k + 1) = Aσ(k) x(k)
Therapy 1
Fig. 2. Mutation tree for a 64 genotype and 3 drug combination model but that is resistant to all drug therapies. For simulation purposes, we shall use parameters shown in Table 1. Table 1. Parameters Values Parameter k1 k2 k3 d1 d2 d3 a ψ
Value 3.8 × 10−3 6.45 × 10−1 7.79 × 10−6 0.4 0.005 2.4 3 × 10−4 0.97
where x(k) = x(kτ ) is the sampled state, Aσ := exp((diag(Λn,σ ) + μM )τ ), σ(k) is the switching sequence σ(k) ∈ {1, 2, ..., N }, and x(0) = x0 is the initial condition. Clearly, σ(k) constrains Aσ(k) to jump among the N vertices of the matrix polytope A1 , ..., AN . The cost function (7) is to be minimized over all admissible switching sequences. The optimal switching signal, the corresponding trajectory and the optimal cost function will be denoted as σ o (k), xo (k) and J(x0 , x0 , σ 0 ) respectively. Letting u = σ(k), and using the Hamilton-Jacobi-Bellman equation for the discrete case, we have;
Value taken from: Conejeros et al. [2007] Fitted Conejeros et al. [2007] Conejeros et al. [2007] Conejeros et al. [2007] Conejeros et al. [2007] Conejeros et al. [2007] Conejeros et al. [2007]
V (x, k) = min{V (k + 1, x(k + 1))} u∈U
In this model therapies are composed of reverse transcriptase inhibitors and protease inhibitors, which can affect two specific parts of the HIV cycle. This is represented by i k1,σ = k1 fi ησ,i
(8)
where denoting the costate vector by p(k), the general solution for this system is (10) V (x(k), k) = p(k) x(k) Using equations (7), (8), (9) and (10), we obtain the following system
(5)
xo (k + 1) = Aσo (k) xo (k)
i k2,σ
= k2 fi θσ,i (6) where ησ,i represents the infection efficiency for genotype i under treatment σ, and θσ,i expresses the production efficiency for the genotype i under treatment. We assume that in the absence of treatment mutation reduces the fitness of the genotype, thus we use linear decreasing factors for fi , which represents the fitness of the genotype i. The drug efficiencies can be seen in Fig. 2, where the arrows indicate the efficiency of the drug. For instance, the genotypes g1 , g5 , ..., g61 are all on one face of the lattice, and are fully susceptible to therapy 1. The opposite face, g4 , g8 , ..., g64 describes all genotypes highly resistant to therapy 1.
(9)
po (k) = Aσo (k) po (k + 1)
(11)
σ (k) = argmin{p (k + 1) As x (k)} o
o
o
s
with boundary conditions x(0) = x0 and p(T ) = c. Remark 1. Notice that equations (11) are inherently nonlinear. The system must be integrated forward whereas the co-state equation must be integrated backward, both according to the coupling condition given by the switching rule. (11) is a two point boundary value problem, and can not be solved using regular integration techniques.
3.2 Cost Function Motivation
4.1 Algorithm for Computing Optimal Trajectories
We consider a cost function of the form J := c x(T ) (7) where c = [0, 0, 1, ..., 0, 0, 1], and T is an appropriate final time. This cost was proposed by Middleton et al. [2010],
Because of the discrete nature of the input, a possible numerical solution for the optimal control problem stated in (11) is to make an exhaustive examination of every possible switching rule. Given the cost (7) and the initial
14859
Preprints of the 18th IFAC World Congress Milano (Italy) August 28 - September 2, 2011
condition x(0) the optimal control problem can be written as c AiT AiT −1 . . . Ai1 x(0) (12) min iT ,iT −1 ,...,i1 ∈{1,...,N }
Let us recursively define the sequence of matrices
predicted one. In order to incorporate a feedback mechanism, the first step of the optimal control sequence is implemented. When the next measurement becomes available, at time t + τ , the whole procedure -prediction and optimization- is repeated to find a new input function with the control and prediction horizons moving forward.
ΩT = c Ωk−1 = [A1 Ωk A2 Ωk . . . AN Ωk ]
Then we have that V (x, 0) = mini Ω0,i x, where Ω0,i is the ith column of Ω0 and in general (13) V (x, k) = min ΩT −k,i x(k) i
At each step of the evolution the feedback strategy can be computed as (14) u(x(k)) = argmin ΩT −k,i x(k) i
namely selecting the smallest component of the vector ΩT −k x(k). The implementation of this strategy that we call “brute force” requires storing the columns of ΩT −k x(k) whose number would be 1, N, N 2 , N 3 , . . . , N T . In order to reduce computational and storage demands we can combine the algorithm above with a dual version Middleton et al. [2010a].
5.1 Mathematical Formulation of MPC From the biological nature of HIV infection, the system (8) is unstable and in fact not stabilizable. This is because of the existence of a highly resistant genotype that is not affected by any treatment. Therefore, once the highly resistant mutant has “emerged” the population will “blow up” after a period of time. The objective of MPC is to suppress the total viral load, (7) for as long as possible. In order to distinguish the real system and the system model used to predict the future for the controller, we denote the internal variables in the controller by a bar (¯ x, σ ¯ ). We can formulate the following model predictive control problem; Problem 1 Find min J(x(t), σ¯ ; Tc , Tp ),
5. MODEL PREDICTIVE CONTROL Model predictive control problem can be formulated as solving on-line a finite horizon open-loop optimal control problem subject to system dynamics and constraints involving states and controls Allgower et al. [2002]. The basic idea of MPC is based on measurements obtained at time t, the controller predicts the future dynamic behavior of the system over a prediction horizon Tp and computes an open-loop optimal control problem with control horizon Tc , to predict the future input for the system, see Fig.3.
future prediction
past
set-point i predicted state x
closed-loop state x
σ ¯
with J(x(t), σ¯ ; Tp , Tc ) := c x(t + Tp ) subject to: x ¯(k + h + 1/k) = Aσ¯ (k+h/k) x ¯(k + h/k) σ ¯ (k + h/k) ∈ D x¯(k + h/k) ∈ X For numerical solution, we use the next algorithm; MPC Algorithm (1) Compute the open-loop optimal control for a receding horizon time Tp (2) Apply only the first input of the optimal command sequence to the system (3) The remaining optimal inputs are disregarded (4) Collect the new measurement from the system (5) Continue with point (1) until the final time is reached
open loop input u
closed-loop closed loop state u
t
t+
t + Tc
t + Tp
control horizon Tc prediction horizon Tp
Fig. 3. Model Predictive Control Due to disturbances, measurement noise and model-plant mismatch, the true system behavior is different from the
In this algorithm, Tp has to be chosen in advance. It is important to mention that the shorter the horizon, the less costly the solution of the on-line optimization problem. The method to solve the open-loop optimal problem using (14) has an exponential growth. Then, it is desirable use short horizons MPC schemes for computational reasons. In general it is not true that a repeated minimization over a finite horizon objective in a receding horizon manner leads to an optimal solution for the the infinite horizon problem Bitmead et al. [1990]. In fact, both solutions differ significantly if a short horizon is chosen.
14860
Preprints of the 18th IFAC World Congress Milano (Italy) August 28 - September 2, 2011
Total viral load
5.2 MPC Treatment Scheduling MPC algorithms typically achieve superior performance with respect to other control strategies when manipulated and controlled variables have constraints to meet. A thorough overview of the MPC history can be found in Morari et al. [1999]. However, for this particular application the system is not stabilizable. The MPC objective is to delay as far as possible the escape of the system. In mathematical terms, the problem could be thought as a minimization of the worst eigenvalue of Aσ under the switching σ. However, this would only be expected to be optimal over an infinite time horizon. In general the finite set of possible control values causes problems for many control design techniques, but for this problem helps MPC by making the optimization easier to solve. 6. SIMULATION STUDIES We constrain the control to take the values of 1, 2 or 3, this represent the possible treatments. It is important to mention that patient can not be under two or more treatments at the same moment. To be consistent with application in a clinical setting, we simulate measurements and new clinical decisions being made every 3 months. Thus the decision time τ is fixed to 90 days.
5
10
0
10
Infected cells
0
0.5
1
1.5
2
2.5
3
3.5
4
5
10
Ti
0
10
L
0
0.5
1
1.5
2
2.5
3
3.5
4
0.5
1
1.5
2 Time (years)
2.5
3
3.5
4
4
σ(k)
3 2 1 0 0
Fig. 4. Simulation of switch virologic failure treatment strategy 6.2 Implementation of MPC Algorithms Using 10 months as prediction horizon for the MPC, we present in Fig. 5 the closed-loop response of T ∗ , L, V and the switched drug therapy over a period of 14 years. The MPC algorithm accomplishes the main goal of containing the viral load as long as possible.
Total viral load
6.1 Switch on Virologic Failure For comparison purposes, we introduce a “switched on virologic failure” strategy. This is based on the guidelines for the use of antiretrovirals agents in HIV-1 infected adults presented by the Department of Health [2009]. Virologic failure means virologic rebound after complete suppression. There is no clear consensus on the optimal time to change therapy for virologic failure. The most accepted approach allows detectable viremia up to an arbitrary level (e.g., 1000-5000 copies/ml). For simulation purpose, we will consider as a virologic failure when viral load is > 1000 copies/ml. This is because, the ongoing viral replication in the presence of antiretroviral drugs promotes the selection of drug resistance mutations and may limit future treatment options. Using the virologic failure Treatment, see in Fig.4, there are three changes in therapy in a period of 4 years. The first therapy keeps the viral concentration below 1000 copies/ml for 1 year, however resistant genotypes appear. Then the second treatment is introduced, the viral population is decreased for a while, but because there is a high concentration of infected cells already, virologic failure occurs in a shorter period of time. It is then necessary to introduce the third treatment. However, we can observe how the viral load starts to escape after 4 years in this model. An important fact is that latently infected cells remain almost constant, this fits with clinical and theoretical studies which agree that these cells played a very important role for late stage HIV infection.
0
10
−5
10
0
2
4
6
8
10
12
14
5
Infected cells
10
0
10
Ti L
−5
10
0
2
4
6
2
4
6
8
10
12
14
8
10
12
14
4
σ
3 2 1 0 0
Time (years)
Fig. 5. Closed loop treatment using MPC In fact, for this example the virologic failure would occur after 12 years, this means that we can extend the virologic failure for 8 years more compared to the current clinical assessment. In addition the total population of infected CD4+T cells is decreased for a period of 6 years. These results are consistent with the preliminary clinical SWATCH trial Martinez et al. [2003]. This trial concluded that proactive alternation of antiretroviral regimens might extend the long-term effectiveness of treatment options without adversely affecting patients’ adherence or quality of life.
14861
Preprints of the 18th IFAC World Congress Milano (Italy) August 28 - September 2, 2011
We may notice in Fig. 5 how the therapy switching is irregular for the first 4 years, but it remains quite regular for the remaining years. This means that perhaps using an oscillating treatment could be a good strategy to mitigate the viral replication when it is not possible to measure the complete HIV state. In fact, oscillating strategy is very similar to the trial SWATCH proposed by Martinez et al. [2003]. We notice in Table 2 that the oscillating drug regimen gives very close results to the MPC and optimal control. Based on simulation results and clinical trials we could infer that this approach would minimize emergence of drug-resistant virus better than frequent monitoring for viral rebound. For comparison purpose, we test one step ahead MPC strategy, which gives very low computational time and has good results with respect to the other strategies. Table 2. Simulation results for a 5 years period treatment Strategy Virologic Failure MPC Optimal Control Oscillating One step ahead
Viral Load 5.14e28 2.31e-9 2.20e-9 6.22e-9 6.74e-8
Computational Time 1.2 sec 4.4 sec 32 min 1.5 sec 1.9 sec
6.3 Computational Resources From these results, we conclude that alternating between drugs regimens is important to suppress HIV RNA levels maximally and prevent further selection of resistant mutations. However, to find the optimal trajectories can require high computational resources. For example, using a “brute force” approach which analyzes all possible combinations for therapy 1, 2 and 3 with decision time τ = td for a period T of T , there are 3 td possible combinations. Considering a period treatment of 5 years and decision time of 3 months, we can compare in Table 2 the computational time for different strategies. For instance, to compute the open-loop optimal switching trajectory requires 32 min simulation time, while on the other hand the MPC just require 4.4 sec; the difference on the viral load between these techniques is very small. Then we can infer that relaxing the condition of optimality, MPC may reduce dramatically simulation resources. 7. CONCLUSIONS In this paper, a 64 variant with 3 drug combination model is presented to analyze different drug regimens to mitigate HIV escape. Simulation results show the potential of proactive switching and alternation of antiretroviral regimens with drugs to extend the overall long-term effectiveness of treatment options. These results are consistent with preliminary clinical trials. At the expense of some conservatism in the cost function, MPC is a very appropriate framework for this problem, because it provides close results to the open-loop optimal control problem, extends the virologic failure time and reduces computational resources.
REFERENCES Panel of Antiretroviral Guidelines for Adults and Adolescents. Guidelines for the use of Antiretroviral agents in HIV-1 infected adults and adolescents. Department of Health and Human Services, December 1, 2009 J. Martinez-Cajas and M.A. Wainberg. Antiretroviral Therapy: Optimal Sequencing of Therapy to Avoid Resistance, Drugs, 139:81-89, 2003 R.M. D’Amato, R.T. D’Aquila and L.M. Wein. Management of antiretroviral therapy for HIV infection: modelling when to change therapy. Antivir Ther. 3:147-158, 1998 J. Martinez-Picado, E. Negredo and et al. Alternation of Antiretroviral Drug Regimens for HIV infection. Annals of Internal Medicine, 68(1):43-72, 2003 R. Zurakowski and D. Wodarz. Treatment interruptions to decrease risk of resistance emerging during therapy switching in HIV treatment. CDC, New Orleans, USA, 2007 E.A. Hernandez-Vargas, R.H. Middleton, P. Colaneri, and F. Blanchini. Discrete-time control for switched positive systems with application to mitigating viral escape. International Journal of Robust and Nonlinear Control, 2010 D.E. Kirk. Optimal Control theory, Dover Publications, New York, 1998. E.A. Hernandez-Vargas, R.H. Middleton, P. Colaneri, and F. Blanchini, Dyanmic Optimization Algorithm to Mitigate HIV escape. CDC, Atlanta, USA, 2010 R. Zurakowski and A.R. Teel. A Model Predictive Control based scheduling method for HIV therapy. Journal of Theoretical Biology, 238:368-382, 2006 G. Pannocchia, M. Laurino and A. Landi. A Model Predictive Control Strategy Toward Optimal Structured Treatment Interruptions in Anti-HIV Therapy. IEEE Transactions on Biomedical Engineering, 57(5):10401050, 2010 J. Mellors, A. Munoz, J. Giorgi, et al. Plasma viral load and CD4+ lymphocytes as prognostic markers of HIV-1 infection. Ann Intern Med, 126(12):946-954, 1997 R. Findeisen and F. Allgower. An introduction to Nonlinear Model Predictive Control. 21st Benelux Meeting on Systems and Control, Veldhoven, 2002 R. R. Bitmead, M. Gevers and V. Wertz. Adaptive Optimal Control. The thinking Man’s GPC, Prentice Hall, New york, 1990 M. Morari and J.H. Lee. Model Predictive Control: past, present and future. Computers & Chemical Engineering, 23:667-682, 1999 M. Hadjiandreou, R. Conejeros and V. Vassiliadis. Towards a Long-Term Model Construction for the Dynamic Simulation of HIV Infection. Mathematical Bioscience and Engineering, 4:489-504, 2007.
14862