A neural network for the Steiner minimal tree problem - CiteSeerX

0 downloads 0 Views 1MB Size Report
the shortest tree connecting N given points called site points {Pl,. 9 -, PN}, where the tree may contain vertices other than the site points. These extra vertices areĀ ...
Biol. Cybern. 70, 485-494 (1994)

9 Springer-Verlag 1994

A neural network for the Steiner minimal tree problem Jayadeva, Basabi Bhaumik Department of Electrical Engineering, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110016, India Received: 23 April 1993/Accepted in revised form: 8 September 1993

Abstract. The problem of finding the shortest tree connecting a set of points is called the Steiner minimal tree problem and is nearly three centuries old. It has applications in transportation, computer networks, agriculture, telephony, building layout and very large scale integrated circuit (VLSI) design, among others, and is known to be NP-complete. We propose a neural network which selforganizes to find a minimal tree. Solutions found by the network compare favourably with the best known or optimal results on test problems from the literature. To the best of our knowledge, the proposed network is the first neural-based solution to the problem. We show that the neural network has a built-in mechanism to escape local minima.

1 Introduction Given a set of points in the plane, a minimal spanning tree (MST) is a tree of the shortest possible length with the given points as vertices, e.g. for the vertices of a scalene triangle, the minimal tree is composed of the two shorter sides. Such trees can be constructed in polynomial time (Kruskal 1956; Prim 1957). Figure la shows the MST for three points 'A', 'B' and 'C'. However, it is usually possible to construct trees with a shorter total length by incorporating additional points, for example, in triangle ABC, by connecting the vertices to an interior point 'D', as shown in Fig. lb. The euclidean Steiner minimal tree problem (ESMTP), as it is called, is to find the shortest tree connecting N given points called site points {Pl,. 9 -, PN}, where the tree may contain vertices other than the site points. These extra vertices are called Steiner points and are denoted by {sl . . . . . st}, where K ~< N - 2. The ESMTP has a long history of nearly three centuries, with the special case of N = 3 having caught the attention of many luminaries including Fermat, Cavalieri, Torricelli, and J. Steiner, after whom it is named. It has been shown (Garey et al. 1977) to be NP-complete, i.e. the computing effort required to solve

Correspondence to: B. Bhaumik

the problem optimally increases faster than any polynomial in N. The ESMTP has applications in transportation, computer networks, agriculture, telephony, building layout and the design of very large scale integrated circuits (VLSI) and printed circuit boards. Various extensions of the ESMTP, such as the weighted ESMTP, are also of considerable importance. Steiner minimal trees (SMTs) are hard to find because (a) we do not know the number of Steiner points a priori, and (b) for a given number of Steiner points, we have to determine the correct topology with respect to a general set of site points. Of course, one may try to solve the problem by enumerating minimal trees for each topology, starting with the number of Steiner points K = 1, and incrementing K by unity till K = (N - 2). Unfortunately, even for N = 6, the number of trees exceeds 5600 and increases extremely rapidly, being about 3 million for N = 8 (Hwang and Richards 1992)! A SMT with K = (N - 2) is called a full Steiner tree (FST). FSTs for small problems can be obtained relatively simply. The SMT of a set of points consists of the edge-disjoint union of FSTs on subsets of the point set. For exact methods, an efficient technique to find a pruned list of FSTs which includes (among others) FSTs comprising the SMT was developed by Winter (1985) and later improved by Cockayne and Hewgill (1986) using decomposition methods. Problems with N ~< 30 could be solved using this method, as long as the largest decomposed subset had 17 or fewer points. Recently, using techniques to prune the list further, Cockayne and Hewgill (1992) were able to solve about 80% of attempted 100-point problems in a reasonable time. Hwang et al. (1988) reported a new decomposition theorem which could enable larger problems to be solved. Most heuristic methods for the ESMTP are based on the spanning tree (e.g. Chang 1972; Beasley 1992) or Delaunay triangulation (Smith 1981). Heuristics based on simulated annealing (Lundy 1985) and genetic algorithms (Hesser et al. 1989) have also been proposed. Surveys of the problem can be found in Hwang and Richards (1992); Gilbert and Pollak (1968); Winter (1987) and indicate that exact methods (i.e. techniques to find the optimal SMT) and well-known heuristics are sequential.

486

AL.

C

B

B

C

(1,o)

(o,o)

(a)

(b)

Fig. 1. a Minimal spanning tree (MST) for three points 'A', 'B' and 'C'. b Using an extra vertex 'D', one can obtain a tree of shorter length in this case In this paper, we present a neural-based solution to the SMT problem. Neural network approaches are attractive, because for m a n y difficult problems, they m a y enable good solutions to be found quickly (Hopfield and Tank 1985, 1986; Durbin and Willshaw 1987) and can be implemented in parallel hardware or simulated on parallel computers. An attractive feature of our formulation is that no assumptions a b o u t the number of Steiner points or the topology of the tree need to be made a priori. The neural network self-organizes to find a minimal tree. The number of Steiner points and the topology are determined by the self-organization process. The network requires O ( N ) neurons with a fixed number of local interconnections per neuron. The m e m o r y requirements for simulating our method are very modest since only one tree configuration needs to be stored. To the best of our knowledge, this paper presents the first neural-based solution to the E S M T P . The paper is organized as follows: Sect. 2 discusses the neural network for solving the ESMTP, Sect. 3 is devoted to a discussion of the experimental results, Sect. 4 discusses how the network is able to escape local minima, and Sect. 5 gives the conclusions.

Pi be denoted by ( P i , 1 , P i , 2 , . 9 9 , Pi, o), i.e. each site point is specified by D co-ordinates. In this paper, we discuss the case (unless otherwise specified) in which the site points lie in a plane, i.e. D = 2. Figure 2a-c show Steiner trees for three examples. Figure 2 d - f suggests that in each case, one can consider the Steiner tree as consisting of a piecewise linear, open curve C, with some vertices of C at the locations of the site points, and other site points connected by a straight line to the closest vertex of C. The difficulty is that we cannot determine a priori the number of Steiner points and the locations of the vertices of C (curve points) in the optimal solution. We use the E S M T P network to find a Steiner tree by essentially solving an optimization problem to determine the locations of the curve points. These correspond to the activities of the neurons in the network. Consider such a curve with M vertices (x~,t, x~,2) (i = 1, 2 , . . . , M) and let each site point be connected to the nearest vertex of C (curve point). The network for solving the E S M T P thus consists of M neurons. The state of each neuron is an activity vector xi =(x~,t,x~,2) r, (i = 1, 2 , . . . , M). The neural network for the E S M T P is associated with the following energy function: M-1

E = ~

N

dist(xi+l,x,)+ ~

M

~ dijh(dij)

(1)

i=lj=l

i=1

where dist(zl, zj) = [(zi,1 - zj,1) 2 + (zl,2 - zj,2)2] 1/2

(2)

dij = dist(pl, X j)

(3)

and h(dlj)=

dij = Mink otherwise

1, 0,

dig

(4)

The dynamics of the E S M T P network corresponds to the steepest descent of the energy function given by (1). The equations of motion of the neurons are given by: -~i,j = - - ~ g / 6 3 x i , j

--~ ( x i + 1 , j - - x i , j ) / d i s t ( x i +

1, x i )

+ (xi- 1,j - xi, j)/dist(xi, x i - 1)

2 A neural network for the Steiner tree problem

N

+ ~

The E S M T P is specified by a set of site points Pi (i = 1, 2 . . . . . N). Let the co-ordinates of site point

h(dki)(Pk,j--Xi,j)/dki,

k=l

i = 1,2 . . . . .

(a)

\

/

/

\ (~

/

/

P

\

/ 9

I

Ce)

iI

9

I e~

I eJ

/

(f)

~

/

/

/

\

/ /.

/

M;

j=

1,2

(5)

Fig. 2a--c. Steiner trees for some point sets. Each of these trees can be thought of (d-f) as being composed of a piecewise linear curve (solid) with links from site points (dashed lines). Some vertices of the curve lie at the locations of the site points. All other site points are connected by straight lines (dashed) to the closest vertex of the curve. The energy function (1) corresponds to the sum of the length of the curve and the lengths of the links from the site points to it

487

Fig. 3. A plot of - E for an example with three site points located at (0, 0), (1, 0) and (0, 1). The point at which the maximum is attained (marked as M1) corresponds to the location of the Steiner point. In this case, the Steiner point has the co-ordinates ~ (0.2113, 0.2113)

(o,1)

(2,1)

When the number of site points exceeds three, there may be more than one Steiner point, and it is difficult to visualize the energy and tree-length landscapes. We now consider an example with N = 4, where the site points are located at (0, 0), (2, 0), (2, 1) and (0, 1), i.e. they are at the corners of a rectangle with sides of length 1 and 2. The number of Steiner points can be at most (4 - 2) = 2. Assume that there are two curve points 'E' and 'F', i.e. N = 4 and M = 2. Figure 4 shows the curve and the site points. In reality, both curve points are free to move. However, we fix 'E' at the location of the Steiner point in the optimal solution, i.e. at ( ~ 0.2887, 0.5). The energy is therefore a function only of curve point 'F'. Figure 5a shows a plot of the negative of the energy (i.e. - E ) as a function of tile co-ordinates of the curve point 'F'. Figure 5b, c shows the curve points and the links from the site points to them for different locations of'F'. When 'F' is closer to 'A' and 'D' than is 'E', the links are as shown in Fig. 5b. The surface of E has two local minima in this region, which arise when 'F' coincides with 'A' and with 'D', respectively. The corresponding maxima in the surface of ( - E) have been marked in Fig. 5a. The global maximum of ( - E) occurs at the location of the second Steiner point. This case corresponds to Fig. 5c. In practice, we approximate h(di~) by a sequence of functions h(dq) given by

h(d,j) = dp(dij, fit) "k

~ q~(dik,fit)

(6)

k=l

where fit decreases to zero as t ~ ~ . ~b should be a decreasing function of dig. One possible choice is

(o,o)

(2,0)

Fig. 4. An example with four site points 'A', 'B', 'C' and 'D' located at (0, 0), (2, 0), (2, 1) and (0, 1). The curve consists of 2 points, 'E' and 'F'. The former has been kept fixed at the location (x/~/6, 0.5), while point

"F' is free to move

where the derivative of h(d~j) has been taken in the sense of generalized gradient. Equation (1) represents the sum of the length of C and the lengths of the links from the site points to the curve points nearest to them, as illustrated in Fig. 2d-f. Since the network finds a minimum of the energy function (1), we obtain a corresponding Steiner tree from the activities xi of the neurons. Take an example with three site points, 'A', 'B', and 'C', located at (0, 0), (1, 0), and (0, 1), respectively, i.e. N = 3. The number of Steiner points can be at most (N -- 2) i.e. one. We assume that the curve C in this example has only one vertex named 'D', i.e. M = 1. The energy function in this case is therefore a function only of the co-ordinates (XD.1, XD.2) of 'D' and can therefore be easily visualized. Note that the first term of E in (1) is zero because the curve has zero length, and the tree length is solely the sum of the lengths of the links from the site points to 'D'. Further, h(dij)= 1 (i = ! , 2, 3;j = 1). Figure 3 shows a plot of ( - E). The point at which E is minimum (and, correspondingly, - E is maximum) is the location of the Steiner point. The maximum of - E is shown marked in the figure as M1.

O(dij, fit) = exp(-- d ij/k't 2 /R2~!

(7)

Note that f*(dij) approaches the discontinuous "Min" function in the limit (t ~ 0o). This behaviour is made clearer in Fig. 6, which shows a plot of the function a(i)

~b(x[i], fl)

j

against the index i for a set of 50 randomly generated values x [i] (i = 1, 2 . . . . ,50) and three different choices of ft. When fl is large, the values of g(i) show little variation. However, as fl decreases, g(i) peaks at the smaller values of x[i], and the heights of these peaks increase with decreasing ft. In the limit, as fl ~ 0, g(i) tends to a delta function at the minimum of x[i] (i = 1, 2 . . . . . 50). The function/~(dtj) essentially approximates the function h(dij) in a smooth manner. In effect, this means that those curve points which are closer to the site points than others move faster towards them [refer (5)]. Other choices of ~b are possible, and in simulations, we have found the following choice to be acceptable: ~b(dij, fit) ~ (1/d 2)

(8)

Connecting a site point to the nearest curve point may not always yield a shorter tree. In practice, some site points may be nearer to other site points rather than to curve points. In such situations, these site points are

488 x[i] 1 0.8

1

0.6

(a)

0.4 0.2 0 0

lO

20

i

30

50

40

g(x[i]) 0.0275

1

0.025 0.0225

(b)

1

0.02 0.0175 0.015 0.0125 0

i0

20

30

40

"i 50

40

50

40

50

g(x[i])

0.2

(c)

0.15 0.I 0.05 0 i0

20

I0

20

30

g(x[i])

Fig. 5. a A plot of - E for the four-point example shown in Fig. 4. The axes show the co-ordinates of the curve point 'F'. The function has three local maxima (marked as M1, M2 and M3). The Steiner point corresponds to the global maximum (M3) and lies at ~ (1.7113, 0.5). b, e The curve ' E ' - ' F ' and the links from the site points to the nearest curve point for two different locations of the point 'F'. In b, ' F ' is closer to 'A' and 'D' than is 'E', and therefore site points 'A' and 'D' are connected to it. The surface - E has two local maxima, M1 and M2, corresponding to the cases when ' F ' coincides with 'A', and with 'D', respectively. In e, 'F' is closer to 'C' and 'B' than is 'E', and therefore ' F ' is connected to 'C' and 'B'. The global maximum of - E is marked in a as M3 and corresponds to the location o f ' F ' = .~ (1.7113, 0.5). Refer to the text for details

connected to other site points in a nearest-connectedneighbour fashion. The length of the tree thus obtained is referred to as the tree length. Figure 7a shows the negative of the tree length, ( - T) determined in this fashion, for the four-point exainple of Fig. 4. As before, the curve (shown in bold) consists of two points 'E' and 'F'. Point 'E' has been kept fixed. The links from the site points, to

0.5 0.4

(d)

0.3 0.2 0.i 0

A 30

Fig. 6. a The distribution of an array of 50 randomly generated numbers x[i] (i= 1, 2. . . . , 50). b - d Plots of the function g(i) ~ q~(x[i], f l ) / ~ ~ 1 ~b(xl-j], fl) for fl = 1, 0.5 and 0.1, respectively, As 8 decreases, the function g(i) peaks at smaller values of x[i]. The peaks o f g [ i ] become sharper and higher as 8 d~reases. In other words, g (i) selects the minimum of the elements x [i ]. h(d~j,8,) also behaves in a similar manner. Its value is larger for that curve point (j) which is closest to a given site point (i)

489

obtained by connecting 'C' to 'B' and 'B' to 'F'. The tree has the same topology as long as 'F' is at a distance of more than 1.0 from site point 'C'. However, if 'F' is closer, then 'C' will be connected to 'F' instead of 'B'. This gives rise to the "ridge" in the landscape of - T , shown in Fig. 7a. The global minimum of T (the global maximum of - T) arises when the topology in Fig. 7f is obtained and is shown marked in Fig. 7a. Another region of interest in Fig. 7a arises when 'F' is closer to 'A' and 'D' than is 'E'. The tree shown in Fig. 7c is then obtained. In this region, two local maxima of - T, M1 and M2, arise and occur when the location of 'F' coincides, respectively, with that of 'A' (Fig. 7d) and that of 'D' (Fig. 7e). The corresponding maxima of ( - T) have been marked in Fig. 7a. 3 Experimental results

e0

(b)

(~)

(~)

(e)

eD \~ /" / e

F/ "N

A

Fig. 7. a A plot of the negative of the tree length, i.e. - T for the four-point example considered in Fig. 4. T is a function of the coordinates of the curve points 'E' and 'F'. 'E' has been fixed at the location ~ (0.2887, 0.5), a n d the axes represent the co-ordinates of the curve point 'F'. In contrast to the energy function, the landscape of - T can be very non-smooth. The locations of three maxima of - T, M1, M 2 a n d M3 are shown. These correspond to the m a x i m a of - - E (Fig. 5a). Ir--f The tree determined for different locations of the curve point 'F'. The curve is shown with solid lines, and links from the site points are shown with dashed lines. The example in the figure is the one considered in Fig. 4. In b, 'E' is closer to 'A' and 'D' than is 'F'. ' F ' is further from ' C ' than 'B', i.e. IFCI > ICBI. 'B' is connected to 'F', and 'C' is connected to 'B'. In e, the curve point closest to 'A' and 'D' is 'F', rather than 'E'. Therefore, 'A' and 'D' are connected to 'F', 'B' is connected to 'E', and ' C ' is connected to 'B'. Two local m a x i m a of ( - T) (M1 and M2) arise when ' F ' lies at the locations of 'A', and 'D', respectively (d and e). The global maxima of ( - T) correspond to the tree shown in f

the curve are shown by dotted lines. The axes show the co-ordinates of curve point 'F'. Figure 7 b - f shows the tree as determined for different locations of point 'F'. In the case of Fig. 7b, the site point 'B' is closer to site point 'C' than is any curve point ('E' or 'F'). Thus, the tree is

The network was simulated using a finite-difference approximation of (5). The code was written in C language and run on an IBM PC-AT compatible and a SUN workstation. The stepsize was typically chosen to be 0.001 and was reduced by a factor of 0.999 whenever the tree length increased from one iteration to the next. fit was typically chosen to have an initial value of 0.2 and was decreased by a factor of 0.999 after every 20 iterations. Figure 8 shows the solutions obtained on a set of test examples from the literature (Soukup and Chow 1978). To the best of our knowledge, this is the only test set available in the literature for which the co-ordinates of the site points as well as the length of the minimal tree are available. Each example shows the starting configuration of the curve C, as well as the shortest tree found. The initial state has been shown to indicate the robustness of the network. Problem 18 shows only the final tree obtained. In this case, the initial configuration of the curve was chosen differently and is discussed in the sequel. Figure 9 shows snapshots of the network in the process of self-organization on an example (not from the test set). Note that the number of Steiner points and the topology change with time. Table 1 compares the performance of the network with the optimal solutions (where known) and those obtained by a well-known heuristic for the SMT (Beasley 1992). The table lists the tree lengths before and after a processing step. This step is illustrated in Fig. 10. The tree in Fig. 10a contains many curve points, and though they have self-organized to lie on straight lines joining two Steiner points, or a Steiner point and a site point, a minor error does exist in the alignment of the points. Eliminating these extra curve points gives us a small reduction in the tree length. This involves removing all curve points which have a degree of two or less and can be done in O(M) time sequentially. Figure 10b shows the tree of Fig. 10a after processing. Figure 11 shows a plot of the number of iterations required to find the best solution against the problem size (N). The plot indicates that the number of iterations depends more on the topology of the points than on their

490 Table 1. Steiner minimal tree examples

Example

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

N

5 6 7 8 6 12 12 12 7 6 6 9 9 12 14 3 10 62 14 3 5 4 4 4 3 3 4 4 3 12 14 19 18 19 18 4 8 14 14 10 20 15 16 17 19 16

Tree lengths

Iterations

ESMTP network

Post-processed

Heuristic a

Optimal

1.667456 1.500514 2.077812 2.140298 2.046971 2.297982 2.232703 2.180697 1.708875 1.655548 1.356931 1.661394 1.305765 2.368226 1.238845 1.172698 1.654538 3.970786 1.801841 1.039616 1.830053 0.503990 0.523771 0.254775 0.198972 0.124392 1.180761 0.206070 1.465982 1.079146 2.372054 2.976536 2.330034 2.186444 1.384568 0.878913 0.768445 1.438812 1.433279 1.487860 1.980777 1.319809 2.415638 2.212799 1.988554 1.434317

1.66505 1.50051 2.07781 2.13938 2.04601 2.29797 2.23270 2.17805 1.70195 1.65536 1.35338 1.66135 1.30243 2.36588 1.23672 1.17262 1.65242 3.970786 b 1.79898 1.03962 1.83005 0.50394 0.52365 0.25467 0.19897 0.12439 1.18023 0.20599 1.46598 1.07825 2.36563 2.96856 2.32489 2.18423 1.38262 0.87891 0.76828 1.43797 1.43288 1.48773 1.97855 1.31940 2.40481 2.20763 1.98520 1.43432

1.66440 1.50050 2.07767 2.13879 2.04405 2.22239 2.20529 2.17779 1.58783 1.64728 1.27411 1.64853 1.27338 2.20492 1.23041 1.16678 1.64279 3.85130 1.72225 1.03962 1.81818 0.50329 0.51303 0.25282 0.19897 0.12435 1.17817 0.20442 1.46598 1.03323 2.34009 2.85677 2.22953 2.13813 1.35545 0.87891 0.76603 1.43501 1.43125 1.41803 1.97672 1.31535 2.36719 2.19744 1.93584 1.42209

1.66440 1.50050 2.07767 2.13879 2.04405 2.1842 not known 2.17779 not known 1.5988 1.27411 not known not known 2.20492 1.23041 1.16678 not known not known not known 1.03962 not known 0.50329 0.51303 0.25282 0.19897 0.12435 1.17817 0.20442 1.46598 not known not known not known not known not known not known 0.87891 not known 1.4248 1.43125 not known not known not known not known not known not known not known

513 19 18 679 431 205 936 1753 967 1061 1090 577 543 3378 643 534 1068 393 2400 1500 646 469 246 177 500 500 2320 420 1200 330 280 161 1120 1152 499 455 549 922 1038 945 221 1314 1282 1941 1088 172

a The heuristic is from Beasley (1992) b A processing step was not carried out on this problem The solutions found by the network are within about 1.56% of the heuristic solution, and for the 24 problems for which the optimal solution is known, they are within about 1.18% of the optimal solutions, on an average

number, N. The number of iterations also depends on the s t e p - s i z e u s e d t o s i m u l a t e t h e d i f f e r e n t i a l e q u a t i o n s (5) and the initial state of the network. On an average, about 840 i t e r a t i o n s w e r e r e q u i r e d . I n m a n y n e u r a l n e t w o r k s , t h e f i n a l s o l u t i o n is c r i t i c a l ly d e p e n d e n t o n t h e i n i t i a l s t a t e o f t h e n e t w o r k . T h i s is because most such networks perform the steepest descent of their energy landscape and converge to the "closest" b a s i n o f a t t r a c t i o n f r o m a n i n i t i a l state. T e c h n i q u e s like simulated annealing enable the network to escape from l o c a l m i n i m a . I n s e c t i o n 4, w e s h o w t h a t t h e E S M T P

n e t w o r k h a s a b u i l t - i n m e c h a n i s m t h a t e n a b l e s it t o a v o i d being trapped in a local minimum. This means that the n e t w o r k is less s e n s i t i v e t o t h e i n i t i a l s t a t e c h o s e n . I n t h e e x a m p l e s s h o w n i n Fig. 8, t h e i n i t i a l c o n f i g u r a t i o n o f t h e c u r v e C w a s c h o s e n t o b e a l a r g e o p e n ellipse s u r r o u n d i n g t h e site p o i n t s . T h r o u g h o u t o u r f o r m u l a tion, we have nowhere enforced any of the properties of t h e S M T ( G i l b e r t a n d P o l l a k 1968), e.g. t h a t t h e d e g r e e o f a n y n o d e c a n b e a t m o s t 3, o r t h a t t h e a n g l e s b e t w e e n s e g m e n t s m e e t i n g a t a S t e i n e r p o i n t a r e 120 ~ I n g e n e r a l , t h e n e t w o r k s e l f - o r g a n i z e s t o satisfy t h e s e p r o p e r t i e s , as

491 0

I

o

Oo

oO

o 6

o 7

o

8

o

2O

Oo ~

22

o

o

23

Jo

o o

25

26

" ~

27

oo

o 28

ooo 29

'

Io

~

0

~

o

v

42

o

0

36

Fig. 8. SMTs determined by the network for a set of test examples from the literature. Each pair of figures shows the initial state of the network and the best tree found. The larger circles denote site points, while the smaller ones represent curve points. In the case of problem 18, only the final tree obtained has been shown. In each case, the figures have been scaled to reveal maximum detail. The initial distribution of curve points is an ellipse surrounding the site points (except for problem 18), while the final tree will always lie within the convex hull of the point set. Therefore, the scales are different for each figure in a pair

/ Fig. 9a---d. Snapshots of the network in the process of self-organizationon an example. Observe that the topology of the network changes with time

can be seen in the e x a m p l e s of Fig. 8. H o w e v e r , as a p p a r ent from p r o b l e m 46, it m a y find a s o l u t i o n which vio-

lates these c o n s t r a i n t s (however, the s o l u t i o n f o u n d is only 0.837% off the best k n o w n one). This is because a simple, piecewise-linear curve c a n n o t self-organize to c o m p l e x t o p o l o g i e s with m u l t i p l e branches. In such a situation, one can isolate s m a l l e r subsets o f site points, solve the S M T for t h e m a n d c o n c a t e n a t e the trees o b t a i n e d . O n e w a y w o u l d be to use d e c o m p o s i t i o n m e t h o d s to split the original p o i n t set i n t o subsets whose S M T s can be i n d e p e n d e n t l y d e t e r m i n e d b y o u r n e t w o r k a n d c o n c a t e n a t e d . Alternatively, we have i s o l a t e d smaller subsets o f site p o i n t s in which m u l t i p l e b r a n c h i n g o f the curve is necessary, solved for t h e m s e p a r a t e l y a n d c o n c a t e n a t e d the trees o b t a i n e d . F i g u r e 12a shows the S M T o b t a i n e d for p r o b l e m 46 in this m a n n e r . W e have also tried an initial c o n f i g u r a t i o n for the curve C, where the curve p o i n t s lie on segments of the M S T j o i n i n g vertices whose degrees are /> 2. Such a configuration is m o r e suitable for large N. T h e S M T for p r o b l e m 18 of the test set (shown in Fig. 8) was o b t a i n e d in this m a n n e r . T h e S M T for a 100-point e x a m p l e (shown in Fig. 12b) was similarly o b t a i n e d in a b o u t 2800

492

(a}

t

p

I

I

I

I

I

I

I

I

71

Fig. 12. a The SMT determined for problem 46 of Fig. 8 by solving for smaller subsets. The subsets are determined by looking at curve points with a degree greater than 3, or for tree edges which make angles substantially different from 120 ~ b SMT for a 100-point example, c An example with the site points located in three dimensions solved by a modified ESMTP network

(b) Fig. 10a, b. A processing is carried out on the trees. This involves removing all curve points which have a degree of 2 or less. The figure shows a tree (a) before and (h) after the processing step. The cumulative error due to misalignment of several cuf>>rve points is not always negligible, as can be seen from Table 1

of a tree vertex exceeded 3, and solved for them separately. This procedure can be automated. The network can be easily extended to solve the D-dimensional ESMTP, where the site points are located in a space of dimension D. Let the co-ordinates of site point Pl (i = 1, 2 . . . . . M ) be denoted by (Pi, 1, Pi, 2 ..... P~.D). The only difference in the ESMTP network is that the activity vector describing the state of each neuron is now a vector with D elements. The energy function is now given by E =

E

(Xi+I,j

i=1

ltera~o~J~

+ 200

x

i,j)

j=l

"4- E

E dijh(dij) (9)

i=lj=l

with dij being given by

17 16 15 14 t3 t2 11 tO 9 8 7 6 5 4 3

d~j =

I

YDl(pi, k - xj, k) 2 11/2

(10)

k=

t

9 t

*

$

2,

I

I

,.

I

:

o

5

-

r

:

.

.

:

tO

.

. . . . . . . .

t5

20

N

Fig. 11. A plot of the number of iterations required versus the problem size N indicates that the number of iterations depends more on the topology of the point set than on their number. On an average, about 840 iterations are required

iterations. It is about 2.2% shorter than the MST for this example. This improvement is of the order expected for large, randomly generated point sets (Chang 1972; Beasley 1992). In this example, we determined two subsets of size 4 and 5 points, respectively, where the degree

As an illustration, Figure 12c shows the ESMT for an example in three dimensions found by our network. On similar lines, we have obtained neural networks for solving the rectilinear Steiner tree problem (Hwang 1978; Beasley 1992; Hwang and Richards 1992), the Steiner circuit problem (Smith and Gross 1979) and the travelling salesman problem (TSP; Hopfield and Tank 1985; Durbin and Willshaw 1987). Details of this work will be reported in the near future. The network for the TSP was obtained by using sequential unconstrained minimization techniques (SUMTs). SUMTs have been used earlier (Jayadeva and Bhaumik 1992) to solve the TSP on Hopfield nets, where improved convergence could be demonstrated.

4 Escaping local minima Figure 13a, b shows the time variation of energy and tree length, respectively, for one of the examples (problem 14)

493 N

Energy

+ ~'. h(dki)(Pk,j-- Xid)/dki,

2.7

k=l 2.68

i = 1,2 . . . . .

2.66

2.64

(a)

2.62

2.58

Iteration 500

1000

1500

2000

2500

3000

T r e e Length

2.56

2.54

(b)

2.52

1000

1500

2000

2500

from the benchmark set.An interesting fact that comes to light is that the network energy is not a monotone decreasing function of time. We now show that the E S M T P network has a built-in mechanism that enables it to escape from local minima. It is thus able to change both the topology and the number of Steiner points as it self-organizes. It should be pointed out that the function E in (1) has not been called a Lyapunov function for the network. The graph in Fig. 13a deafly shows that the energy does not decrease monotonically with time (iteration number). Most neural networks proposed for neural optimization (e.g. Hopfield and Tank 1986; Brandt et al. 1988; Szu 1988) do not use a time-varying energy function. That is, though the energy varies with time due to the changing activities of the neurons, the relation between the activities and the energy is a pre-specified function. In contrast, because fit is a time-varying parameter, the energy of the SMT network varies not only with the activities of the neurons, but also with time. Consider, for the moment, that fit is constant over some time interval. From (5), we have -- OE/Oxi, j = xi,j = (xl + 1,j -- xi, j)/dist(xi

+ 1, x i )

+ (xi- 1,j - xi,j)/dist(xi, x i - 1)

M;

j = 1, 2

(12)

dist(xi, x i + l ) - ( ( x i , j - X i + l , j ) / d i s t ( x i ) ) dist2(xi, xi+l) N

+ y, ff~(dki)dk,- ((xi,j -- Pk, j)/dist(xi, Pk)) k= 1 dist2 (xi, Pk)

3000

Fig. 13. The variation of(a) energy and (b) tree length for problem 14 of the test set (Fig. 8). Note that the energy does not monotonically decrease with time. This is because the network can escape local minima, enabling it to change the topology and the number of Steiner points as it self-organizes

i = 1, 2 . . . . .

Ox~j

Iteration 500

(11)

2 E = dist (xl, x l - 1 ) - ((xi.j - x i - t, j)/dist(xi, x i - 1 )) dist2(xi, x i - x ) +

0

1,2

at some time t~. The equilibrium point can be a stable equilibrium point (a local minimum), an unstable equilibrium point (a local maximum) or an inflection point of E; its nature can be determined from the local curvature of E, i.e. from 02E/Ox~,j, which is positive at a local minimum and negative at a point of local maximum. We now study the stability of an equilibrium point. We assume that the network is updated asynchronously, i.e. we can always find a time instant at which the activity of only one neuron is being updated. Consider, for instance, neuron i.

2.5

2.48

j=

i.e. the network performs the steepest descent of the energy landscape (E) and converges to an equilibrium point, such that

Yq.j(tl) = 0,

2.6

M;

+

, c? [ e x p ( - - d 2 / f l 2) ~ (x~,j - p k , j j ~ x , , ~-------;5-2, z k= l i,j [ L l = a exp(-- dkdfl, ) (13)

But, Ox~,j L2tM__xe x p ( -

dk2/fl?)]

(-- 1/f12) 9e x p ( - d2/f12) 9(xi,j -- Pk,j) [Y',~, exp(-- d2/f12)]

-exp(-

d~/fl~)

/ t ~ 2 ~ , ' , ( x i,j - - Pkd) (-- 1/fl2)'exp( - d 2ki/Pt [Xtu=, exp(-- d2 /fl?) ] 2

= (xi,j - Pk.j)'-~t2 " h(dki)(h(dki) -- 1)

(14)

Therefore, we have

~2E -

(xi, f(j) - x i - l , f t j ) ) (xi, f(j) -- xi+l,f(j)) dist(xl, xi_l)3 + dist(xl, xi+l)3 N

+ E ~(ak,)(x,,m~ k=l

k =x

d~

-

-

Pk, f