tations because i) a limited range for the weights can be translated into reduced storage ... Our results simply guarantee that for a given set of classification prob- lems there ... If the weights are real numbers, the hyperplanes implemented ..... side l with a vortex in the origin and situated in the positive halfspace of each axis.
On the C o m p u t a t i o n a l P o w e r of L i m i t e d P r e c i s i o n W e i g h t s N e u r a l N e t w o r k s in Classification Problems: H o w to C a l c u l a t e the W e i g h t R a n g e so t h a t a S o l u t i o n Will Exist Sorin D r a g h i c i Wayne State University, Detroit, MI 48202, USA
A b s t r a c t . This paper analyzes some aspects of the computational power of neural networks using integer weights in a very restricted range. Using limited range integer values opens the road for efficient VLSI implementations because i) a limited range for the weights can be translated into reduced storage requirements and ii) integer computation can be implemented in a more efficient way than the floating point one. The paper concentrates on classification problems and shows that, if the weights are restricted in a drastic way (both range and precision), the existence of a solution is not to be taken for granted anymore. We show that, if the weight range is not chosen carefully, the network will not be able to implement a solution independently on the number of units available on the first hidden layer. The paper presents aa existence result which relates the difficulty of the problem as characterized by the minimum distance between patterns of different classes to the weight range necessaxy to ensure t h a t a solution exists. This result allows us to calculate a weight range for a given category of problems and be confident t h a t the network has the capability to solve the given problems with integer weights in t h a t range.
1
Introduction
T h e field of n e u r a l n e t w o r k s h a s enjoyed m o r e t h a n a d e c a d e of full swing develo p m e n t after its r e b i r t h in t h e l a t e '80's. T h i s d e c a d e h a s b e e n fruitfully s p e n t i n v e s t i g a t i n g t h e p o s s i b i l i t i e s of v a r i o u s t y p e s of n e u r a l n e t w o r k s a n d discovering l i t e r a l l y h u n d r e d s of t r a i n i n g a l g o r i t h m s a n d a r c h i t e c t u r e s . A t t h e s a m e t i m e , v a r i o u s i d e a s h a v e b e e n t e s t e d in r e a l - w o r l d a p p l i c a t i o n s a n d have b e e n shown t o b e e x t r e m e l y successful. A f t e r this p e r i o d of g r o w t h , t h e field h a s r e a c h e d a c e r t a i n m a t u r i t y a n d now t h e r e exists a r a t h e r large b o d y o f t e s t e d techniques which a r e p o t e n t i a l l y r e a d y for w i d e s p r e a d use. Such a w i d e s p r e a d use will b e a c h i e v e d o n l y if n e u r a l n e t w o r k techniques will be a v a i l a b l e as i n t e g r a t e d h a r d w a r e devices. T h e r e a r e several a r g u m e n t s in favor of t h i s idea. F i r s t l y , t h e m a r k e t for c o n s u m e r e l e c t r o n i c s is several o r d e r s of m a g n i t u d e l a r g e r t h a n t h e one for d e d i c a t e d s y s t e m s i m p l e m e n t e d as software s i m u l a t i o n s r u n n i n g on full fledged c o m p u t e r s . T h e p o t e n t i a l of a n e u r a l n e t w o r k b a s e d a d a p t i v e device is
402
huge. Secondly, the price of a dedicated chip is much lower than that of a full system (even a low-cost system) running a sofware simulation. Finally, there are some applications (e.g. aerospace) where weight and dimension restrictions are very stringent and again, a dedicated chip or board is much preferred to a full system. All these arguments have been long known and they have stimulated research in the direction of hardware implementations. There are indeed several such implementations [20, 11, 28, 27, 19, 22, 10, 2, 24, 9, 25, 8]. A good review of various existing hardware implementation can be found in [13]. A particularly interesting category of theoretical algorithms consists of the algorithms using limited precision integer weights (LPIW). This type of algorithms is particularly interesting because it is more efficient to implement in hardware integer weights and integer computation circuitry both in terms of chip space (VLSI area) and cost. If the integer numbers are powers of two, multiplications and divisions reduce to shifts, thus becoming even more attractive for hardware implementation. The idea of using integer weights has encountered a strong initial resistance. Early experiments have shown that the convergence process is not a simple matter if the weights are truncated. However, more recent techniques have shown that it is possible to train such neural networks using integer [26, 3, 21, 12, 23] or even powers of two weights [7, 14-18, 21, 6, 5, 1]. More recently, a number of papers have started to address the theoretical study of the capabilities of neural networks using only integer weights. While the existing algorithms offer a way of training these weights in order to achieve a desired goal weight state, there are relatively few results that address the problem of whether a goal state exist for a given problem and a given architecture in the context of limited range integer weights. The scope of this paper is the class of VLSI-friendly neural networks (i.e. networks using L P I W ) and their capabilities in solving classification problems. This paper will show that: i) if the range of the weights is not chosen appropriately, a network using hyperplanes and limited precision integer weights will not be able to find a solution independently of the number of hyperplanes (units) used and ii) one can calculate a weight range able to guarantee the existence of a solution as a function of the minimum distance between patterns of opposite classes. These results are important because they allow the user of a L P I W neural network (e.g. the designer of a neural network VLSI chip) to choose the range of the weights in such a way that the resulting chip is guaranteed to be able to solve the problems envisaged. One important problem that this paper does not address is how the solution weight state is found i.e. how the neural network is trained. Our results simply guarantee that for a given set of classification problems there exists a solution with weights within a certain range which can be calculated from the problem itself. This paper updates and details some of the results that were published originally in [4]. T h e results regard the capabilities of L P I W neural networks in classification problem and relate the weight range used by the network to the
403
difficulty of the classification problem as characterized by the smallest distance between two patterns of different classes. For low values of n, we present an alternate and more rigurous proof for essentially the same result. However, for the high dimensional case, this paper will describe a completely different approach which leads to a different formula for calculating the minimal weight range.
1.1
Definitions and general considerations
A network is an acyclic graph which has several inputs nodes (also called inputs) and some (at least one) output nodes (also called outputs). A neural n e t w o r k is a network in which each connection is associated with a weight (also called synaptic weight) and in which, each node (also called neuron or unit) calculates
a function a of the weighted sum of its m inputs as follows:
The function a (usually non-linear) is called the activation ]:unction of the neuron and the value 0 is the threshold. A neuron which uses a step activation function can be said to implement the hyperplane Y'~---0wi'xi+O = 0 in the m-dimensional space of its inputs because its output will be 1 for ~ i =m 0 wi 9 xi + 0 > 0 and 0 for Y-~i=o m wi 9 xi + 0