Francis N. Madden, 1 Keith R. Godfrey, 1'3 Michael J. Chappell, 1. Roman Hovorka, 2 and Ronald A. Bates 1. Received November 9, 1995---Final April 26, 1996.
Journal of Pharmacokinetics and Biopharmaceutics, Vol. 24, No. 3, 1996
A Comparison of Six Deconvolution Techniques Francis N. Madden, 1 Keith R. Godfrey, 1'3 Michael J. Chappell, 1 Roman Hovorka, 2 and Ronald A. Bates 1 Received November 9, 1995---Final April 26, 1996 We present results for the comparison of six deconvolution techniques. The methods we consider are based on Fourier transforms, system identification, constrained optimization, the use of cubic spline basis functions, maximum entropy, and a genetic algorithm. We compare the performance of these techniques by applying them to simulated noisy data, in order to extract an input function when the unit impulse response is known. The simulated data are generated by convolving the known impulse response with each of five different input functions, and then adding noise of constant coefficient of variation. Each algorithm was tested on 500 data sets, and we define error measures in order to compare the performance of the different methods.
KEY WORDS: constrained optimization; cubic splines; deconvolution; Fourier transforms; genetic algorithms; maximum entropy; system identification.
INTRODUCTION In pharmacokinetics, the convolution integral is used to model the response of the body to an input of a particular drug. The integral is defined as
c(t) =
fo
u(t-
dr
(1)
where C is the measured concentration, U is the unit impulse response and I is the input. In this context, deconvolution is the process of calculating the input from measurements of concentration and unit impulse response, The work described in this paper was carried out as part of Grant GR/J67130 "Identifiability and Indistinguishability of Nonlinear Dynamic Systems" from the U.K. Engineering and Physical Sciences Research Council. 1Department of Engineering, University of Warwick, Coventry CV4 7AL, United Kingdom. 2Centre for Measurement and Information in Medicine, Department of Systems Science, City University, London EC1V 0HB, United Kingdom. 3To whom correspondence should be addressed. 283 0090-466x/96/0600-0283509.50/09 1996PlenumPublishingCorporation
284
Madden, Godfrey, Chappell, Hovorka, and Bates
or calculating the unit impulse response from concentration profiles and the input. For example, deconvolution is used in pharmacokinetics to determine drug release or absorption which can be useful in designing drug delivery systems or as a tool for diagnosis. A number of appropriate techniques have been developed for numerical deconvolntion including the point area method, the use of polyexponential functions, basis function fitting, maximum entropy, etc. (1-14). In this paper we compare the results produced by some of these techniques when they are applied to calculate input functions from simulated concentration profiles and an assumed unit impulse response. The techniques we consider are Fourier transform based deconvolution (15); system identification (11); a constrained deconvolution technique (14); a method using spline basis functions (12); maximum entropy (1,13,16), and, finally, a stochastic mutation and selection algorithm that was written for this study and which we describe below. The descriptions of some of these techniques contain limited comparisons with other methods that we have not considered. An indication of the relative performance of these other methods can be gained by considering our results in conjunction with those of previous authors. For example, Vajda et al. (11) produced some results comparing the system identification method with the methods of Cutler (3,4) and Veng Pedersen (5-8), while Hovorka et al. (14) contains similar comparisons with the methods implemented by two computer programs SIPHAR (a point area method) (9), and PCDCON (10), a commercially available package. In both of these cases the authors conclude that the techniques which we will consider perform better than the ones we ignore. The nature of the pharmacokinetics problem imposes constraints on the measured concentration profiles. These profiles are typically a small number of noisy data points ( ~ 20), unequally spaced in time. In our simulations we use an extension of the sampling schedule used by Cutler (3,4), and we also define the unit impulse response and five input functions which we aim to recover using the deconvolution. The input functions are those suggested by Cutler together with three others which represent different forms of input of practical interest: a simple unimodal input, a more complex bimodal input, and a simulated sustained release. The aim of this study is to evaluate the performance of the different algorithms by using simulated noisy data which are generated from a variety of input functions. It is also intended that the results should be "blind" in the sense that neither user interference nor additional a priori information is applied to locally optimize the calculation of results. The results should therefore be an accurate reflection of the performance that should be achieved by a careful operator.
Comparison of DeconvolutionTechniques
285
METHODS
This section provides a discussion of the methods that were compared. As most of the methods are described elsewhere, we confine ourselves to describing pertinent features of the application of each technique to our particular problem, e.g., interpolation of data, the treatment of end effects, selection of fitting parameters, etc. In the results we present below, all userdefined parameters have been fixed so that there is no local optimization of the deconvolution which might be performed with a priori information (apart from the assumed unit impulse response) or other user intervention. The results therefore represent what may be achieved in a real situation by a careful operator applying these techniques.
Method 1 Fourier Transform
The Fourier transform of the convolution integral is :(C) =:(I):(U)
(2)
where ~- denotes the Fourier transform. After rearrangement, the deconvolution problem is simply
I= ~-1( ~(C)1
\:(U)/
(3)
We implemented a version of this analytic result using standard Fast Fourier Transform routines as supplied in MATLAB (17). Examples and discussion of convolution and deconvolution using Fast Fourier Transforms (FFT) can be found in Press et al. (15). To apply the MATLAB routines, a cubic spline interpolation was used to map the concentration data to a fine grid which is linearly spaced in time. The grid size affects the frequency resolution of the FFT. The size was chosen by starting with 101 points, and then approximately doubling to, successively, 201,401, and 801 points. Grid size 201 improved on 101, while 401 improved on 201, but 801 provided no noticeable improvement over 401, so that a grid size of 401 points was chosen. The data for the known unit impulse response were calculated on the same grid and the transformations and deconvolution were then calculated. The end effects were treated by extending the data with zeros (15), and the error measures were calculated after discarding the first data value, i.e., t = 0, which is arbitrarily zero, from the MATLAB deconvolution routine.
286
Madden, Godfrey, Chappell, Hovorka, and Bates
Method 2 System Identification This is the method of Vajda et al. (11), which uses Discrete Integral Least Squares (DILS) parameter estimation to model the convolution process with a linear differential equation. The method described by Vajda et al. (11) is limited to third-order differentials, but the original code only provided estimates of the input function up to second order. Since thirdorder models were calculated, the authors extended the original code to determine estimates of the input function for these third-order models. The original deconvolution code was not modified. The appropriate model order for a given data set can be determined by considering the residuals of the parameter estimation. The concentration data are not interpolated and special treatment of the data for end effects as used in Method 1, is not required.
Method 3 CODE This method, described in Hovorka et al. (14), produces an estimated input function which minimizes a given objective function. The objective function chosen is a combination of the Z 2 measure on the concentration data and a second measure which is defined on the grid of the estimated input function. The X2 measure is defined in the usual way Z z=
(4)
where C is the estimated concentration profile. The second measure provides a constraint whose effect is to reduce differences between values of the input function at adjacent grid points. For a more detailed description see ref. 14. The algorithm uses the predefined concentration data together with the known unit impulse response and produces a result based on a user-defined grid of time points. In the results provided here, the grid was defined as 81 points spaced linearly in time which produced convergence of the algorithm in all the present calculations.
Method 4 Constrained Deconvolution Using Spline Functions Verotta (12) suggested a method where the input function is parameterized as a polynomial spline. As a result, deconvolution becomes a process of minimization, where the sum of the squared differences between estimated and observed concentrations is minimized. The necessary computer code for performing this deconvolution, kindly supplied by D. Verotta, was for the SPLUS (18) analysis system. In our particular case, the essential user-determined parameter for the deconvolution process is the number of break points
Comparison of Deconvolution Techniques
287
for the basis splines. The number of break points is used to determine the parameterization of the input function. The approximate number of break points is dependent on the number of data points in the concentration profile and the noise level; where these are fixed, it is reasonable to set empirically a fixed number of break points which produces the best results (on average) in a small data trial. For a particular environment, this is equivalent to calibrating a measuring device. This procedure has been adopted here, and the number of break points was set equal to 7. Verotta (12) suggested some possibilities for a criterion to determine the number of break points on an individual basis. These were also explored and the results are presented in the Results section. Method 5 Maximum Entropy The maximum entropy technique (1,13,16) is an iterative maximization process. Here the estimated input function with maximal entropy is sought, such that the X 2 measure of the estimated and observed concentration profiles is equal to some value which is set by considering the noise statistics of the observed data. The entropy is taken to be S = - ~ ? ln(?/~ ?)
E?
(5)
where ] is the estimated input function. Equation (5) has a maximal value when each element of the estimated input is equal to the mean of the estimated input. This implies that solutions without peaks and troughs maximize the entropy, and so the method flattens the estimate unless there is strong observational evidence to the contrary. Our algorithm is written for use with MATLAB and it is based on the description given in Skilling and Bryan (16). The concentration data are not interpolated for this technique and the functional form of the unit impulse response is assumed known. This assumption of the functional form is not necessary but it does simplify the calculations. The input function is estimated at the sampling times of the concentration data, and it is assumed to be linear between sampling times in order to estimate the corresponding concentration profile via the convolution integral. There is a special treatment of end effects for calculations of some intermediate results, e.g., Vg 2, which is taken with respect to the values of the estimated input function. The treatment extends the data by appending a point at t > tf~al, whose value is equal to the last (actual) data point. The calculation of V2 2 for the data point at t = t~n~dcan then be verified using Eq. (8) from Skilling and Bryan (16).
288
Madden, Godfrey, Chappell, Hovorka, and Bates
In addition, the algorithm relies upon calculations involving the standard deviation of the concentration data. These values lie between 0 and 0.1 for the data we consider, and, in particular, values that are either 0 or very small are possible from our definition of the noisy simulated data. These very small values can adversely affect the calculations since the reciprocal of the squared value is used, and in an extreme case one or two data points may completely dominate the fitting process. Therefore, to remove this effect a minimum value of standard deviation was defined. If the defined minimum is chosen to be too large, the fit to the concentration profile is poor, and if the value is too small, the intermediate calculations can become dominated by a few data points and the algorithm will perform many iterations before convergence. As a compromise between speed of convergence and quality of fit to the concentration profile, the minimum standard deviation was fixed at 0.01 for all the processing, i.e., standard deviations smaller than 0.01 are taken to be 0.01. With this minimum permissible value, the standard deviations for the first and last data points of each concentration profile are limited, and an additional one or two points at the beginning or end of the concentration profile may be capped, depending on the input function.
Method 6 Genetic Algorithm This method was specially developed for this study so its description is more detailed than the previous discussions of techniques. Similar genetic and evolutionary techniques have been used in, for example, constrained optimization (19) and parameter estimation (20). Our method uses an algorithm to successively generate an estimate of the input function so that the corresponding concentration profile matches a given condition in this case, the difference between the estimated and observed concentration values at each point is less than 1.76 times the standard deviation for each point. This condition was chosen to ensure that the observed data are fitted with equal weight at each point. This condition is distinct from using Z 2 which is summed over all the data set and which allows small differences in one place to compensate for large differences in others. Since we used a normal distribution for simulating the measurement error the choice of 1.76 gives greater than 90% confidence that the observed value is a noisy measurement of the estimated value. The basic algorithm generates a solution in the following way. An initial guess is defined as a straight line between two data values, one at t = 0 and the other at t = t~nal. A number of mutations are generated from this initial guess by randomly perturbing the data values. Each mutation is then convolved with the unit impulse response to generate an estimated concentration
289
Comparison of Deconvolution Techniques
profile which in turn is compared with the observed data. The mutation with the smallest misfit is selected and then forms the initial guess for the next stage of mutations. The misfit measure for a particular estimate, C, and its corresponding concentration profile, C, is
M=T2-max(T2, maxt,((.C(ti)-C(ti))2)) \\
(6)
tr(ti)
where T= 1.76. The criterion for a fit is M = 0. The processes of mutation and selection proceed until successive iterations do not reduce the misfit measure. When this happens the number of lines defining the next guess is increased by splitting each line segment into two equal parts. The iterations are then continued. The whole process is repeated until either the number of iterations exceeds a user defined maximum, or the fit criterion is satisfied. At each stage, the input function is estimated as a sequence of connected lines. The mutation process randomly changes the data values at the end points of these lines. The values are changed by adding a Gaussian random variable of given variance, R. The size of this variance affects the speed of convergence of the algorithm. If R is very small then the mutations differ from each other only very slightly and hence the rate of convergence is small. Conversely, if R is large the initial rate of convergence is rapid, but near the point of convergence the variations are correspondingly large and the algorithm may never converge. An effective scheme was developed which adjusted the value of R by considering the variance of the misfit parameter for each set of mutations. We call the variance of the misfit parameter V M and consider its size compared with the current best value of misfit parameter. If V M is small we increase R, and if V M is large we decrease R. In our particular problem, this procedure has proved to be an effective strategy which greatly improves the speed of convergence of the algorithm. Finally, a number of additional iterations are executed after the algorithm has converged in order to produce an ensemble average as the final result. Small perturbations of this average solution have little effect on the misfit measure of Eq. (6).
DATA SIMULATION The results we present were calculated by applying each deconvolution method to 500 noisy data sets. These data sets were generated to simulate the sampling schedule and measurement noise of ordinary measured concentration profiles, i.e., unequal time spacing, constant coefficient of variation, limited number of data points. The unit impulse response was chosen to be the same as that used originally by Cutler (3), and then used in several
290
Madden, Godfrey, Chappell, Hovorka, and Bates
further deconvolution studies (4-7,11), namely, U(t) = e - t + e -5`
(7)
Noise-free concentration profiles were generated by convolving U(t) with each of five defined input functions. These noise-free data sets were calculated at 20 unequally spaced time points between 0 and 4, t = 0.0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 1.0, 1.2, 1.5, 1.8, 2.2, 2.7, 3.3, 4.0. This sampling schedule is an extension of that suggested by Cutler (3) and has also been used by previous authors (5,11,14). A further 100 noisy data sets were then calculated for each of the 5 noise-free profiles by adding a random variable to each datum. The random variable has normal distribution with mean 0 and standard deviation directly proportional to the datum's value. This constant coefficient of variation is used as a simulation of the noise that can be found in concentration measurements. Two proportionality constants were used, 0.01 chosen to produce near noise-free data and 0.15 which was chosen as an estimate of the noise levels which may be found in physical measurements. The value of 0.15 also strikes a balance between the maximum value of 0.10 used by Cutler (3) and the values of 0.20 used by Lacey et al. (21) and 0.25 used by Verotta (12). We present results for the value of 0.15 only, because all the algorithms produced similar and reliable results for the low noise data. It is only at the higher noise value that large discrepancies between the results are found. The five input functions that are used are defined in Table I and plotted in Fig. 1. The first two are those suggested in Cutler (3,4) and the remaining three were chosen to represent different types of input: extravascular administration of solid dosage, discontinuous oral absorption, and controlled release. We have also used circles in the graphs of Fig. 1 to indicate the sampling schedule of the data. It is the input functions represented by these points that we are trying to calculate. Some algorithms make explicit use of the standard deviation at each concentration data point. This deviation may be very small or 0, since we have defined a constant coefficient of variation. Therefore, a minimum value of standard deviation is defined where necessary in the appropriate algorithms. ERROR MEASURES
The primary error measure we use is the sum of squared differences between the defined and estimated input functions. Most of the deconvolution techniques provide a solution defined on some user specified grid of time points. This grid is not common to each technique, typically for reasons of computational expense and grid dependency of a particular algorithm.
Comparison of Deconvolution Techniques
291
Input Function 1
Input Function2 2
-
1.5k
~0
1
o,o3
2
~CCC
2 3 time Input Function 4
4
0
1
2 time
0.
"0
2 3 time Input Function5
'
1
Input Function3
1
0
0
time
o
0
4
0
0
4
' 0
3
4
~=='~
c
9
0
1
2
o
c 3
4
time
Fig. 1. Graphs of the five input functions with points of the sampling schedule indicated by circles.
Therefore, a discretization and interpolation scheme between these different grids may introduce artificial errors into the error measures themselves. However, each technique can provide estimates of the input function at the time points defined in the original concentration data, and so the most straightforward scheme is to use these points as the basis for the first error m e a s u r e , / q , given by ~'/1 =
II/- ~11=
(8)
This is calculated for each estimated input and then averaged for each type of input function to p r o d u c e / t l . Therefore each algorithm generates 500 values of/~1 and 5 instances of P l. We also use the error at each time point which is defined as /~2(ti) =I(ti)- I(ti),
i = 1, 2 . . . 20
(9)
292
Madden, Godfrey, Chappell, Hovorka, and Bates
Table I. Defined Input Functions Input function
Definition 1.2 exp(-2t)
t_>0
1.8 " ---(-t /- ~ (1 1.15 \ 1.15]
0_0
15texp(-9tLT)+t3exp(---~-~)
t>O
1 1- 1-
2
0_