University of Houston-Downtown. Houston ... interval software packages are reviewed in this paper as ...... computation, software development, and computa-.
On Interval Weighted Three-layer Neural Networks C. Hu, M. Beheshti, A. Berrached, A. de Korvin, and O. Sirisaengtaksin Department of Computer and Mathematical Sciences University of Houston-Downtown Houston, TX 77002 Abstract| In solving application problems, the data sets used to train a neural network may not be hundred percent precise but within certain ranges. Representing data sets with intervals, we have interval neural networks. By analyzing the mathematical model, we categorize general three-layer neural network training problems into two types. One of them can be solved by ¯nding numerical solutions of nonlinear systems of equations. The other can be transformed into nonlinear optimization problems. Reliable interval algorithms such as interval Newton/generalized bisection method and interval branch-and-bound algorithm are applied to obtain optimal weights for interval neural networks. The applicable state-of-art interval software packages are reviewed in this paper as well. I. Introduction A. Three-layer Neural Network A three-layer neural network includes I (input), H (hidden), and O (output) layers. Each of these layers contains some nodes, called neurons. There are no direct links for any two nodes within the same layer. However, for each node in layer H, there is a direct connection to every node of the I and O layers. There are no direct links between I and O layers. Each such direct connection is associated with a weight. If we assume that the I layer has i nodes, the H layer has h nodes, and the O layer has o nodes. Then, there are i £ h weighted connections between the I and H layers, and h £ o weighted connections between the H and O layers. It has been proved that any neural net problem can be mapped and solved by a three-layer neural net [6, 23]. Therefore, it has general importance to study 3-layer neural networks. Let X 1 ; X 2 ; ¢ ¢ ¢ ; X l in z = > ¡w0;0 x20 ¡w1;0 x21 > 1 + e > > > 1 > < t10 = 1 1 + e¡v0;0 z 1 > > t11 = > 1 > > 1 + e¡v0;1 z > > 1 > 2 > t0 = > 2 > > 1 + e¡v0;0 z > > 1 > : t21 = 2 : 1 + e¡v0;1 z
(10)
(k+1) There are 6 unknowns z1 ; z2 ; and the weights w0;0 , (b) If for any i 2 f0; 1; :::; N ¡ 1g; xi is w 1;0 , v0;0 , and v0;1 in the above 6 equations. Every empty, then there is no root in current step of interval Newton/generalized bisection algobox; goto step 6. (k+1) rithm can be performed to ¯nd the weights. Otherwise, compute the diameter of X PN ¡1 1 IV. Finding Weights for Type 2 Interval (c) Compute: diam(X(k+1) ) = ( j=0 !j ) 2 ; Neural Networks 5. Check convergence: (a) If diam(X(k+1) ) · ², where ² is a given A. The problem to be solved
tolerance, and the residual of the nonlinear system is also less than a given tolerance; then a root is found. Output the root, goto step 6;
(b) If diam(X(k) )¡diam(X(k+1) ) less than a given tolerance, bisect current box. Put half of it in the box stack and keep another half as the new box for the next loop of iteration. Goto step 1;
In section II, we showed that a type 2 interval neural network problem may be modeled as two optimization problems. One of them, l X min (F(Xj ; W; V) ¡ Tj )2 W;V j=1
is unconstrained. The other one is the constrained optimization problem m X (c) If diam(X(k) )¡diam(X(k+1) ) > ±, goto step min ® (F(X; W; V)i ¡ F(X; W; V)i )+ W;V i=1 1; m X 6. If the box stack is not empty, then pop a box (1 ¡ ®) ( mid F(X; W; V)i ¡ mid Ti )2 from the stack, goto step 1. Otherwise, end the i=1 algorithm. 2 subject to F(Xj ; W; V) ¶ Tj for j = 1; 2; ¢ ¢ ¢ ; l.
C. An Application
For simplicity, we only study the unconstrained model in this paper.
We use an example below to explain how to apply interval Newton/generalized bisection method to ¯nd B. Interval branch-and-bound algorithms weights for a given three-layer neural network. An unconstrained general global optimization probExample: Let N N be a three-layer interval neural lem over a given domain X can be described as: network. There are two nodes in the I-layer, one node in the H-layer, and two nodes in the O-layer. minimize Á(X) for X 2 X Here is the training data set. It is very di±cult to solve a general nonlinear opInput Output timization problem. In fact, it is NP-hard. With 1 1 1 1 1 1 X = (x0 ; x1 ) T = (t0 ; t1 ) traditional methods, it is also hard to verify whether X2 = (x20 ; x21 ) T2 = (t20 ; t21 ) an approximate numerical solution is the global optimum. We need to ¯nd the weights. Automatically veri¯ed global optimization (with Analysis: In this example, i = 2; h = 1; o = 2; and l = 2. Since l £ o = 4 = h(i + o), it is a type 1 or without constraints) techniques have been deinterval neural network problem. veloped in recent years with interval computation
[5, 27]. They can be generally called branch-andbound algorithms. In [15], Kearfott reviewed these algorithms. With minimal notation and without details of interval arithmetic, we introduce the basic ideas of these algorithms. Let a interval box X = (x0 ; x1 ; :::; xN ¡1 ) represent a search region. Then, the principle of branchand-bound algorithms is to maintain one or more of these lists: L, a list of boxes X to be processed; a list U of boxes which have been reduced to small diameter by the algorithm; and a list C of boxes that have been veri¯ed to contain critical points by the algorithm. Boxes in L can be considered one-by-one for processing. When L is exhausted, we evaluate Á on every box of list U and on the critical points in list C, and then ¯nd the possible global minimizer. Algorithm 4.1: (Branch-and-bound) 1. Initialize L by placing the searching domain X0 in it 2. Initialize the current optimum Á by picking a X 2 X0 , then evaluate Á(X) with interval arithmetic 3. While L 6= ; do
(a) Remove a box from L for processing (b) Compute a lower bound on the range of Á over X. If Á(X) > Á then discard X and back to the step 3 (c) Use interval arithmetic to compute an upper bound on the range of Á over X, then update Á by Á Ã fÁ; Á(X)g (d) Use interval arithmetic to compute 5Á(X). If 0 62 5Á(X) then discard X and back to the step 3 (e) For any X 2 X, if 52 Á(X) cannot be positive de¯nite, then discard X and exit (f) Apply interval Newton method to possibly do one or more of the following ² rejected X ² reducing the size of X ² verify existence of a unique critical point in X (g) Subdivide X if step (f) the above did not result in a su±cient change in X, then return all resulting boxes to L for subsequent processing.
C. An Application
Here is the training data set. Input X1 = (x10 ; x11 ) X2 = (x20 ; x21 ) X3 = (x30 ; x31 ) X4 = (x40 ; x41 )
Output T1 = (t10 ; t11 ) T2 = (t20 ; t21 ) T3 = (t30 ; t31 ) T4 = (t40 ; t41 )
We need to ¯nd the optimal weights. Analysis: In this example, i = 2; h = 1; o = 2 and l = 4. Since lo > h(i + o), it is a type 2 interval neural network problem. Assume the bias vectors are zero. Then, the weights W = fw0;0 ; w1;0 g and V = fv0;0 ; v0;1 g should minimize: 3 X 1 X
j=1 k=0
Ã
1 1 + e¡v0;k zk ¡ tjk
Replacing zjk with
j
!2
(11)
1
, we may 1+ apply the branch-and-bound algorithms to ¯nd the optimizers W and V on a given domain in