MODEL PARAMETER ESTIMATION USING SIMULATED ANNEALING Olaf Hellwich Chair for Photogrammetry and Remote Sensing ¨ Munchen, Technische Universitat ¨ D-80290 Munich, Germany E-mail:
[email protected] URL: http://www.photo.verm.tu-muenchen.de Commission III, Working Group III/3
KEY WORDS: Model Parameter Estimation, Image Analysis, Markov Random Fields
ABSTRACT In image analysis and image processing most methods require the selection of model parameters such as thresholds. As this task can be rather complex, it is often conducted by a human operator. Here a generally applicable automatic estimation scheme for model parameters is proposed. The parameter space is explored to find a global optimum by simulated annealing controlled by a divergence measure resulting from a comparison of processing results and a training data set, i.e. “ground truth”. In comparison with an interactive selection of parameter values the automatic estimation has advantages when parameters are to be determined from many data sets, or in a parameter space which is not intuitively accessible. Furthermore, interdependencies between parameters can be investigated, e.g. to avoid overparameterization. The parameter estimation method can be interpreted as a means of learning in order to adapt the analysis method to different data sets. The estimation method is applied to two Markov random field models for line extraction. Using examples, its convergence properties are demonstrated.
1 INTRODUCTION Given is the following scenario in image analysis/processing: An image is to be processed with a particular method, for instance for line extraction. Certainly the processing results should be as good as possible. The result of the method depends on a model parameter vector , e.g. thresholds. Therefore, has to be set to optimum parameter values. Frequently, the optimum values are found by trial and error which may be acceptable as long as the dimension of is low and the influence of the parameters is intuitively estimable.
x
x
x
Yet, it is not difficult to imagine a case where an image analysis method depends on several model parameters, or where there is a large amount of data such that the selection of optimum parameter values might be rather a complex procedure. In those cases a more systematic approach than trial and error would be very helpful. Here a method using simulated annealing is proposed. It was inspired by methods for continuous minimization using simulated annealing (Vanderbilt and Louie, 1984, Lakshmanan and Derin, 1989, Press et al., 1992). It requires a training data set which should represent an optimum output the image analysis method is supposed to give, i.e. ground truth. At the beginning of the estimation the model parameters are set to initial values. These values should be reasonably correct, but the selection can be rather arbitrary and does not require special care. Then the method is applied to the given data. The output is compared with the training data. The comparison results in a divergence measure E . It should be low when output and training data agree and high when they do not. Such a divergence measure could for instance be the number of misclassified pixels, e.g. in This research was partially funded by Deutsches Zentrum f¨ur Luft- und Raumfahrt DLR e.V. under contract 50EE9423.
our example line pixels labeled as no-line pixels and noline pixels labeled as line pixels. Also evaluation measures like “correctness” or “quality” as defined e.g. in (Heipke et al., 1997) for road extraction can be adapted to this task. Then the model parameters are optimized by simulated annealing in an iterative procedure. The algorithm theoretically leads to the selection of those model parameter values for which the divergence measure is at the global minimum. In practice there is a trade-off with efficiency, and thus, this goal is only approximately achieved. In each iteration the model parameter vector is changed slightly. The method is applied to the data again, and its output is compared with the training data resulting in a new divergence measure. Then the Metropolis algorithm (Metropolis et al., 1953, Kirkpatrick et al., 1983) is employed: If the new divergence measure is lower than the preceeding one, the changed parameter vector is accepted. Otherwise, it is only accepted with a probability lower than .
1
According to simulated annealing, at the beginning the probability of accepting parameters which result in higher divergence measures is high such that local divergence minima which are reached during the estimation can be rejected. At the end of the procedure the probability to accept higher divergence values is low such that a global minimum is reached and not left any more.
2 SIMULATED ANNEALING The term “simulated annealing” stems from the analogy to annealing – the controlled heating and cooling of metals. The latter has the goal to transform a metal to a state where the atoms are arranged in the regular grid of a crystal equivalent to the state of minimum energy. First, the metal is heated, eventually reaching the liquid state where the atoms move freely. Then, it is cooled very slowly until
the state of minimum energy is reached. The metal then has certain mechanical properties, like an enhanced stiffness. Simulated annealing is a stochastic optimization method. Its purpose is the computation of the value x of a variable X where a function E X has a global minimum. Generally, X is a vector, i.e. E multidimensional. An initial value x0 of X is randomly changed over several, often many iterations. The change of the variable from xi?1 to xi in iteration i can result in an increase or decrease of the function value E X . It is essential for simulated annealing that E is transformed using a temperature parameter T such that the transformed function is flat at the beginning and steeply modulated at the end of the iterations. Such a transformation is frequently conducted according to the Boltzmann probability distribution (Press et al., 1992)
( )
proven, e.g. (Winkler, 1995, pp. 90 f.), that a cooling schedule exists with which the global minimum is reached. It requires a very slow decrease of T and a very large number of iterations which is why simulated annealing can take an agonizingly long time. Therefore, so-called fast cooling schedules are applied in practice which only approximate the global minimum. Such a cooling schedule is for instance the logarithmic cooling schedule
T
( )
?E (xi) pi (xi ) / exp kT
(1)
where E is called energy and its transform p probability. k is Boltzmann’s constant. The temperature parameter T is decreasing every n iterations. Using (1), the inverse proportionality has to be taken into account, transforming minima to maxima and increases to decreases. Changes of X are conducted according to the principle that they are the more probable the larger the increase of pi xi is in comparison to pi?1 xi?1 . The transformation causes largely arbitrary changes of X at the beginning of the iterations as the function values pi X have hardly any differences for any values xi . In the extreme case, with T , pi X is constant. At this point in time local maxima of pi , corresponding to local minima of E , can be easily left and minima can be overcome. At the end of the iterations changes of X are predominant where pi xi increases. Owing to the larger differences between probabilities changes of X are assessed more rigorously by strongly preferring decreasing function , pi X is an values E xi . In the extreme case, with T equal distribution on the global minima of E , i.e. positively constant on global minima of E and 0 otherwise.
(
( )
) ( )
!1 ( )
( )
( )
!0 ( )
Simulated annealing is conducted according to the following algorithm: 1. Selection of initial values x0 and T 2. Computation of p0 3. Iteration i (a) Proposal of a randomly changed value xi (b) Computation of pi
(xi )
(c) Acceptance or rejection of and pi?1 xi?1
(
)
xi
based on pi
(e) Every n iterations, decrease cooling schedule.
2.1 Continuous Minimization Simulated annealing is applied to various optimization tasks such as the travelling salesman problem (Press et al., 1992) or the maximum a posteriori (MAP) estimation in Markov random field models (Geman and Geman, 1984, Winkler, 1995, Hellwich, 1997). These tasks are combinatorial problems where functions of discrete variables are involved. The application of simulated annealing proposed here is belonging to the alternative class of minimization of functions of continuous variables (Press et al., 1992). Its main problem is the proposal of “meaningful” random changes of X between two iterations. Several approaches have been proposed, e.g. (Vanderbilt and Louie, 1984, Lakshmanan and Derin, 1989, Press et al., 1992). In this work two of them, (Press et al., 1992, Lakshmanan and Derin, 1989), are used. (Press et al., 1992) is based on a simplex method related to (Nelder and Mead, 1965). The simplex, i.e. a geometrical figure in an n-dimensional space consisting of n vertices and their connections and faces , is used to determine the change of X . The method gains its efficiency from changing all elements of a vectorial variable at the same time. This is different in the approach of (Lakshmanan and Derin, 1989) where only one element of is changed between two epochs. As the simpler, latter approach clarifies the principle sufficiently, it is described in greater detail.
+1
X X
According to (Lakshmanan and Derin, 1989), each iteration consists of n epochs treating the n elements of the model in random order. In an epoch a new parameter vector value xji is proposed for an element Xj of where i is the epoch index continuously increasing over the iterations and j is the element index. It is equally distributed in a small intervall around the previous value xji?1 . This means that xji is selected from the interval
X
~
T
h
(xi )
(d) If a given criterion is met, stop. according to a
(f) Continue with the next iteration. In the acceptance step for example the Metropolis algorithm (Metropolis et al., 1953) can be applied. The stopping criterion could be the computation of a certain number of iterations, or the lack of significant changes in pi during a certain number of iterations. Whether and after how many iterations the global minimum of E is found depends on the cooling schedule. It has been
(2)
where C is a cooling constant.
~
(x0 )
= ln(Ci)
X
xji?1 ? j ; xji?1 + j
2
2
i
where j is the size of the interval for element j . With the model parameter vector ~i changed in one element the method is executed on the data. Then, the divergence E ~i between its results and the training data is computed.
(x )
X=x
The Metropolis algorithm is used to decide about acceptance or rejection of the parameter change: A value is computed from
=
exp f?E (x~i ) =T g exp f?E (xi?1 ) =T g = exp f?E=T g
triangle in 2 dimensions, tetrahedron in 3 dimensions
(3)
where E is the difference of the divergence measures of the present and the previous epoch. T is computed according to the cooling schedule. If the divergence E ~i of the changed model parameter vector is smaller than the divergence E i?1 of the previous epoch, the parameter change is accepted. If the change resulted in an increase of the divergence, it is only accepted with a probability . The decision results in a model parameter vector of epoch
(x )
(x )
( x~i
i
1
if if < with probability otherwise
xi = x~i xi?1
1
:
1
(4)
The decision in case < is conducted with the help of a random number r equally distributed in the interval ; . If r , the changed model parameter vector is accepted, otherwise the previous one is used. In the following epoch the next element of the model parameter vector is treated.
[0 1]
2.2 Implementation For the implementation of the proposed model parameter estimation scheme the following algorithm was used: 1. Selection of initial model parameters
deviations. The data evaluation is a pixel-wise classification using the normal distributions as likelihood functions. The line extraction method has been applied to a simulated data set. Figure 1 a) shows the true object parameters: line pixels in black and no-line pixels in white. Figure 1 b) displays the simulated image data. The normal distributions and L for line pixels, used have parameters L and N for no-line pixels. Figure 1 c) and N contains the results of a maximum likelihood (ML) estimation based on the known normal distributions. In this case the prior knowledge in form of the MRF model has not been applied. The number of misclassified pixels is 45. Figure 1 d) contains the results of a maximum a posteriori (MAP) estimation using the MRF model for a priori probability density function. This reduces the number of misclassified pixels to 15. The MAP estimation is based on a Gibbs sampler in combination with simulated annealing (Geman and Geman, 1984, Koch and Schmidt, 1994, Winkler, 1995). The Gibbs sampling has been conducted over 10000 iterations. and have In the MRF model the parameters been used; simulated annealing has been conducted with a : of the logarithmic cooling schedcooling constant C ule according to (2). These three parameters have been estimated using the model parameter estimation scheme.
= 90 = 10
= 110
= 10
=1
=1
=07
2. Execution of the method on the data 3. Computation of divergence 4. Optimization of the model parameters by simulated annealing (a) Proposal of changed model parameters (b) Execution of the method on the data (c) Computation of divergence (d) Decision about acceptance or rejection of model parameter change
a) “True” object parameters
b) Observed histogram-equalized grey values
c) ML estimation
d) MAP estimation
(e) If stopping criterion is not fulfilled, go to 4a.
3
RESULTS
The model parameter estimation scheme has been applied to two Markov random field (MRF) models for line extraction (Hellwich, 1997). In both cases the MRF expresses prior knowledge about lines. The first model aims at the extraction of one pixel wide horizontal lines demonstrating the characteristics of MRF models. To optimize its parameters the simulated annealing method of (Press et al., 1992) has been used. For the second, more sophisticated model for the extraction of thin lines with arbitrary directions, the simulated annealing method of (Lakshmanan and Derin, 1989) has been applied. As object parameters, both methods are determining line and no-line labels for the image pixels. 3.1 MRF Model for Horizontal Line Extraction In the first MRF model, prior knowledge for more or less artificial cases is used. It states that there are primarily continuous, horizontal, one pixel wide lines. Consequently, it penalizes end points of horizontal lines and vertically neighbouring line pixels using energy parameters and , respectively. For the grey value data, a stationary normal distribution is assumed for both line pixels and no-line pixels. They are characterized by their mean values and standard
Figure 1: MRF model for horizontal line extraction. The results are generated without model parameter estimation.
For the implementation of the simulated annealing method (Press et al., 1992) the example given in (Vetterling et al., 1992) has been adapted. No algorithmic changes were reand the quired; the temperature parameter T0 was set to iteration parameter to . The number of misclassified pixels was used as divergence measure E . For the model parameter estimation the number of Gibbs sampling iterations was reduced to 100. Using the initial values 0 , 0 and C0 : the optimized values : : and C : were found. They give a result with misclassified pixels which means that they describe a global minimum. With the initial values 0 , 0 and C0
4
=07 = 0 74
20
=1 =1 = 2 54 = 1 00 0 = 10 = 10 = 10
= 12
= 25
another global minimum is found at , : and C : . Applying spontaneous small changes to the optimized parameters it was found that they are not particularly robust. This behaviour will be commented in Section 4.
=03
3.2 General MRF Model for Line Extraction To a line extraction method for Synthetic Aperture Radar (SAR) data (Hellwich, 1997, Hellwich et al., 1996), the simulated annealing method proposed by (Lakshmanan and Derin, 1989) was applied. Here, an MRF model is used to express continuity and narrowness of lines. The SAR intensity data is evaluated using rotating templates which consist of a line zone and two side zones. The SAR intensity ratio is used to determine the line strength for the center pixel of a template. Three model parameters have to be set by the user: a factor L enforcing line continuity, a factor N enforcing line narrowness, and a threshold tr balancing intensity ratios of line and no-line pixels. For a simulated data set, the three model parameters have been estimated. Figure 2 a) shows the simulated data, and Figure 2 b) the “true” line and no-line pixels on which the simulation is based. Two test runs have been conducted: the first one started with a model parameter vector which favours no-line pixels (Fig. 2 c)), whereas the second one’s initial parameters strongly support line pixels (Fig. 2 e)). In both cases the final results after simulated annealing are very similar (Figs. 2 d) and 2 f)). Table 1 shows the parameter values before and after simulated annealing. The differences between the final parameter values of both test runs are acceptable, as the percentages of misclassified pixels are approximately equal. This means that both results’ divergence values are equivalently close to the global minimum of the divergence function. A careful interactive selection of parameter values has shown that the values found by simulated annealing are good representations of the parameter vector at the global minimum. The converging of the parameter vectors during the first and the second test run is shown in Figure 3. test run 1 parameter
test run 2
initial values
final values
initial values
final values
0.00013
0.000205
0.0005
0.000276
0.015
0.00555
0.004
0.00390
0.2
0.299
0.5
0.288
iterations
0
250
0
250
misclassified
7.88
3.94
9.70
3.95
L N tr
pixels [%]
Table 1: Estimation of model parameters L , N , and tr for a simulated SAR scene by simulated annealing
The model parameter estimation scheme has also been applied to real SAR data. Figures 4 and 5 show the magnitude (square root of the intensity) and coherence, respectively, of the evaluated TOPSAR data set. For the image section displayed in Figure 6 a) a training data set was generated by interactive digitization (Figure 6 b)). The training data was used in two model parameter estimations by simulated annealing. In the first case, the model parameters L , N , and tr were estimated based on the intensity data. In the second case, intensity and coherence were combined using a Bayesian approach. In addition to the previous model parameters, two further model parameters were estimated for the evaluation of coherence: td balancing line versus
a) Simulated SAR data
b) “True” line and no-line pixels
c) Initial result of test run 1
d) Final result of test run 1
e) Initial result of test run 2
f) Final result of test run 2
Figure 2: Model parameter estimation by simulated annealing for an MRF model for line extraction from SAR data. Estimated parameters: L , N , tr no-line evidence from coherence, and d weighting coherence with respect to intensity. Table 2 gives the estimated parameter values. A comparison of the values reveals that the parameters L , N and tr remain unchanged after the introduction of coherence to the evaluation process, which eases the handling of the line extraction method. Further investigations showed that the model parameters are sufficiently robust, as the results don’t change when the model parameters are varied slightly. Figures 7 and 8 display the extracted lines for the complete data set using the estimated model parameters. parameter
L N tr td d
evaluation of
evaluation of
intensity
intensity and coherence
0.000547
0.000411
0.000145
0.000139
0.6071
0.6264 0.0153 0.0318
iterations
185
82
misclassified pixels [%]
15.63
15.13
Table 2: Estimation of model parameters L , N , tr , td , and d for a TOPSAR scene by simulated annealing
Start 2
L 0.0006 0.00055 0.0005 0.00045 0.0004 0.00035 0.0003 0.00025 0.0002 0.00015 0.0001 5e-05
End 1 End 2
Start 1 0.005 0.01 0.015
0.1
0.2 0.15
0.3 0.25
0.4 0.35
0.5 0.45
tr
N Figure 3: Model parameter estimation by simulated annealing: sequence of parameters L , N , and tr during the first (continuous line) and the second test run (dashed line)
Figure 4: TOPSAR scene: histogram-equalized magnitude Prior to the investigations described here, during the design phase of the MRF-based line extraction method, the model parameter estimation scheme was used to discover interdependencies between model parameters which led to a reduction of the number of necessary parameters thus avoiding overparameterization.
4 CONCLUSIONS It has been shown that model parameters can be successfully estimated using simulated annealing when a training
Figure 5: TOPSAR scene: histogram-equalized coherence
data set and a divergence measure are available. The use of the optimized parameters leads to improved results if (1) the training data is representative for the image data to be evaluated, (2) the divergence measure describes the problem properly and robustly, and (3) the results depend sufficiently robustly on the parameters. The third condition is only fulfilled when the global minima of the divergence function are not located in narrow valleys or troughs. Therefore, it is recommended that the user of the proposed model parameter estimation scheme is sufficiently aware of the properties of the divergence function. It may also be possible to
a) Subsection from Figure 4
b) Training data
Figure 6: TOPSAR scene: interactivly digitized training data
Figure 8: Line pixels extracted from TOPSAR magnitude and coherence Kirkpatrick, S., Gelatt, C. D. and Vecchi, M. P., 1983. Optimization by Simulated Annealing. Science 220, pp. 671– 680. Koch, K.-R. and Schmidt, M., 1994. Deterministische und stochastische Signale. Dummler, ¨ Bonn. Lakshmanan, S. and Derin, H., 1989. Simultaneous Parameter Estimation and Segmentation of Gibbs Random Fields Using Simulated Annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(8), pp. 799–813. Figure 7: Line pixels extracted from TOPSAR magnitude include curvature properties of an intermediate divergence function into a final divergence function such that global minima of the intermediate function in flat areas are preferred in comparison with global minima in narrow valleys.
REFERENCES
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N. and Teller, A. H., 1953. Equation of State Calculations by Fast Computing Machines. The Journal of Chemical Physics 21(6), pp. 1087–1092. Nelder, J. A. and Mead, R., 1965. A Simplex Method for Function Minimization. The Computer Journal 7, pp. 308– 313. Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P., 1992. Numerical Recipes in C: The Art of Scientific Computing. 2. edn, Cambridge University Press.
Geman, D. and Geman, S., 1984. Stochastic Relaxation, Gibbs Distribution, and the Bayesian Restoration of Images. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-6(6), pp. 721–741.
Vanderbilt, D. and Louie, S. G., 1984. A Monte Carlo Simulated Annealing Approach to Optimization over Continuous Variables. Journal of Computational Physics 56, pp. 259– 271.
Heipke, C., Mayer, H., Wiedemann, C. and Jamet, O., 1997. Evaluation of Automatic Road Extraction. In: International Archives of Photogrammetry and Remote Sensing, Vol. (32) 3-2W3, pp. 47–56.
Vetterling, W. T., Teukolsky, S. A., Press, W. H. and Flannery, B. P., 1992. Numerical Recipes Example Book (C). 2. edn, Cambridge University Press.
Hellwich, O., 1997. Linienextraktion aus SAR-Daten mit einem Markoff-Zufallsfeld-Modell. Reihe C, Vol. 487, ¨ Deutsche Geodatische Kommission, Munchen. ¨ Hellwich, O., Mayer, H. and Winkler, G., 1996. Detection of Lines in Synthetic Aperture Radar (SAR) Scenes. In: International Archives of Photogrammetry and Remote Sensing, Vol. (31) B3, pp. 312–320.
Winkler, G., 1995. Image Analysis, Random Fields and Dynamic Monte Carlo Methods. Applications of Mathematics, Vol. 27, Springer-Verlag, Berlin.