IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 4, JULY 2000
859
A Feedforward Bidirectional Associative Memory Yingquan Wu and Dimitris A. Pados, Member, IEEE Abstract—In contrast to conventional feedback bidirectional associative Memory (BAM) network models, a feedforward
#assumption and leads to what is known as the asymmetrical BAM (ABAM). The general bidirectional associative memory (GBAM) in [16] is another example of an asymmetrical BAM. In the present work we follow a different approach #and we develop a feedforward bidirectional associative .memory network. To the best of the authors’ knowl/edge, this is the first nonfeedback BAM design attempt $ that appears in the pertinent literature. The procedure is 0conceptually simple and implementation-wise straightfor1ward. Given a set of %bidirectional bipolar prototype pairs
BAM network is developed based on a one-shot design algorithm
computational complexity, where is the
number of prototype pairs and are the dimensions of the input/output bipolar vectors. The feedforward BAM is an - three-layer network of McCulloch–Pitts neurons with storage capacity and guaranteed perfect bidirectional recall. The overall network design procedure is fully scalable in the sense of bidirectional associations that any number can be implemented. The prototype patterns may be arbitrarily correlated. With respect to inference performance, it is shown that the Hamming attractive radius of each prototype reaches the maximum possible value. Simulation studies and comparisons illustrate and support these theoretical developments.
of
#and
', for 1we design two distinct two-layer feedforward BAM’s with hard-limiter McCulloch–Pitts neurons, that implement the bidi3rectional associations #and ', 1where is the standard unit vector basis of . Then, we simply merge the two previous BAM designs to #association in the form of implement the desired #a single three-layer feedforward network. The (or 4 ) design procedure relies on the one-shot algorithm (of low computational complexity of order in #and #are the dimensions of #and ', [17], where respectively. Our analysis shows that the proposed feedforward BAM guarantees perfect bidirectional recall. The BAM is fully scalable and the maximum number of prototype pairs $ . The that can be memorized (storage capacity) is 5prototype patterns may be arbitrarily correlated as opposed to $ the linear independence required for the structure in [15]. For /each bidirectionally stored prototype pair, the network requires (one hard-limiter (McCulloch–Pitts) hidden neuron, real 1weight parameters and one threshold parameter in the (open interval. Finally, in terms of inference capability we show $ that the attractive radius of each prototype pair achieves its maximum possible value with respect to the Hamming distance .metric. 6 The rest of this paper is organized as follows. In Section II we 5present the network model. Design preliminaries in the form of 7 definitions, notation, and theoretical foundation are offered in 8 Section III. Section IV presents the main body of our design and #analysis results. Section V is devoted to simulation studies and 0comparisons with prevailing feedback BAM schemes. Finally, #a few conclusions are drawn in Section VI.
Index Terms—Associative memories, bidirectional associative memories, feedforward neural networks, Hopfield networks,
neural networks.
O
"
! I. INTRODUCTION
VER the past several years the term associative memory has come to refer collectively to a large class of nonlinear artificial neural networks with feedback, with primary #applications in the fields of pattern recognition and content #addressable memory devices. Arguably, associative memories $ %trace their origin to the correlation matrix memory studies by Steinbuch [1] and Kohonen [2], [3]. A few years later, Hopfield’s model [4], [5] propelled associative memories to the &state of a main-stream neural-network research area. The bidirectional associative memory (BAM) is essentially #a generalization of the Hopfield network model introduced % by Kosko [6], [7]. As opposed to autoassociative or unidirectional memories, BAM’s are heteroassociative devices $ that are supposed to be able to store bipolar library pairs
', with two-way retrieval capabilities: ', where is the number of proto$ type pairs. Improvements on the original BAM network were &sought in [8] and [9] in the form of extra dummy neurons (or multiple training procedures. The possibility of multilayer ) BAM’s was considered in [10]. Other algorithmic improvements were pursued in the context of Ho-Kashyap learning [11], Householder encoding [12], iterative relaxation learning [13], and higher-order interconnected BAM’s [14]. * While all the above BAM network models assume symmetric interconnections for the two directions, in search of superior &storage and inference capability the work in [15] relaxes this Manuscript received November 5, 1998; revised January 11, 2000. + The authors are with the Department of Electrical Engineering, State University of New York at Buffalo, Buffalo, NY 14260-2050 USA (e-mail: ,
[email protected]). Publisher Item Identifier S 1045-9227(00)04293-4.
2
II. NETWORK MODEL In this paper we implement the BAM by a three-layer feedforward network. We assume that % 1we are given a set of bidirectional prototype pairs ', where #are -dimensional bipolar vectors, ', are the corresponding
$
1045–9227/00$10.00 © 2000 IEEE
860
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 4, JULY 2000
Y
to be determined, and is a constant threshold vector in . To avoid any ambiguity in our presentation we fix . b Upon merging of the BAM (Fig. 1) and the anal[ \ogous BAM in the form of a three-layer feedforward ^will associate to ^while ^will assonetwork, the input cciate to `and vice versa, as seen in Fig. 2.
d
f g e III. DESIGN PRELIMINARIES
As a preliminary step toward the overall BAM design, in this asection we identify a ^weight matrix `and a -dimensional Xvector Ythat satisfy
]
(5)
for a given set of prototype vectors . h We begin with a few important definitions and pertinent notaY tion. The minimum Hamming distance between prototype `and any other prototype U, is denoted by
] 9
: Fig. 1. Two-layer feedforward network for the implementation of the bidirectional associations ;=< >@? A@BDCFEHGJI@KMLONPNQNPRTS .
if \otherwise
_For the output layer functions of the form
denotes the Hamming distance beY `and tween . In the sequel we use to denote a ineighborhood of radius \of the prototype U, that is
Y
h
]
] `are the -th coordinates of `and tively, and is the index space . ` and To find asatisfy (5) for the given prototype set ^we follow the method in [17] and we set ^where Y
^we use unit-step nonlinear activation ]
if \otherwise
that
(9)
(10)
^where if \otherwise
(3)
] (4) `are vector generalizations of the scalar asignum and step function in (1) and (2), respectively, Z denotes the transpose of U, is a real weight matrix
Y
]
j `and
respec-
`and
_Then, the overall bidirectional mapping that we propose to enforce is as follows:
^where
`and
(8)
]
(2)
]
(7)
Y We use to denote the set of bit positions (indices) where Z U, i.e., differs from
(1)
W
Z
^where
-dimensional bipolar vectors, and for . First, we wish to design two distinct two-layer BAM’s that implement the bidirectional associations U, and U, W Vrespectively, where is the standard -dimensional unit Xvector. Then, the objective is to merge these two BAM’s in Y Zthe form of a single three-layer BAM that implements the U, association. The design of desired Y the BAM will be analogous to that of the BAM, so in the sequel we focus only on the design of the [ BAM. U, BAM For the input layer \of the ] (Fig. 1), we employ signum nonlinear activation functions
]
(6)
asuch that
] (11) `are to be chosen in a way that
The coefficients asatisfies the following criterion. k asuch that either Criterion 1: For every \ is not the nearest prototype to or the superscript is not the largest among all nearest prototypes to U, the set of coefficients ashould satisfy .
l WU AND PADOS: A FEEDFORWARD BIDIRECTIONAL ASSOCIATIVE MEMORY
Fig. 2.
The three-layer feedforward BAM
The radius weighted average of
where
m=n
oqp
r@sutuvxwJy@zM{O|P|Q|P}T~
.
The nonidentical index set as well as the local identical index sets are defined for each individual prototype pattern . More specifically, let be a re-arrangement of such that
, in Criterion 1 is selected as a as in [17],
if otherwise Therefore,
861
(12)
is of the general form
(13)
are weighting coefficients satisfying: 1) ; 2) if ; and 3) if . The following proposition facilitates the algorithmic enforcement of Criterion 1.
Let or if otherwise. (14) such that 1) Proposition 1: Let be a vector in ; 2) ; and 3) and . If then such that when for any , or when . We are now ready to present an algorithmic procedure for the that satisfy Critedesign of a set of coefficients rion 1. Multiple copies of the algorithm can be run in parallel to obtain simultaneously. The for all algorithm makes use of the following basic definitions. For the given set of prototype patterns , the global identical index set is defined as the set of indexes for which the corresponding coordinate value is identical across all proto types where
(15)
and be such that . Thus, are the prototype patterns nearest to with respect to the modified Hamming distance measure in and the local identical (14). Then, the nonidentical index set , all pertinent to prototype , are index sets defined as follows: Let also
(16)
(17)
and
(18)
The following algorithm generates the required coefficients in (9) for some . Step 1) Set Step 2) Set . Set . . Step 3) Determine , then set . If be the smallest value of the coefficients Let . such that (a ) , (b) Select and , and (c) . then go to Step 5; else If continue with Step 4.
862
Fig. 3.
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 4, JULY 2000
A prototype set with the 26 letter pairs (IBM PC CGA character set).
and
Step 4) Assign
TABLE I M INIMUM HAMMING DISTANCES FOR § ¨ ª « ¨ -LAYER AND © -LAYER PROTOTYPES
.
with respect Solve to . , then set If and continue with Step 5; else and repeat Step 4. . Step 5) then stop; else go to Step 3. If The above algorithm leads to coefficient designs that exhibit . Namely, two important properties for every and if . matrix and the In the next section we modify the -dimensional threshold vector to meet the special needs of a two-layer BAM that implements the bidirectional association . As explained in Section II, the merging of the BAM and the BAM will BAM. lead to the desired
k e IV. THE FEEDFORWARD BAM
¡
We begin this section with an observation presented in the form of Theorem 1. The proof is straightforward and thus ¢omitted. ¥ ¤ £ and be such that Theorem 1: Let ¦ (19) ¦ (20) Then, (4) is satisfied, i.e.,
¦
(21)
¬
Next, we modify the weight matrix and the threshold vector ¢obtained in the previous section to meet the conditions of the
above theorem. We see that (9) implies
¦
(22)
l WU AND PADOS: A FEEDFORWARD BIDIRECTIONAL ASSOCIATIVE MEMORY
Fig. 4.
863
Recognition accuracy rate versus Hamming distance for Experiment 1.
Since in general, we can assume that . We can modify the weight matrix and the threshold vector to satisfy (20) as follows:
if ¢otherwise
¦ ¦
(23)
(24)
® , therefore We can show that for . To ¯satisfy the “less than one” threshold condition in (19), we normalize and ¥by and we obtain the final BAM design that ¥ implements the bidirectional association described by ¦ (3) and (4)
the exact same threshold vector . If we merge the two two-layer BAM’s into one three-layer feedforward network we obtain the BAM of Fig. 2. This synthesized network exhibits the robust tolerance characteristics described in the following theorem. The proof follows directly from Theorem 1 and 2. for the layer Theorem 3: Let the weight matrix and the weight matrix for the layer be designed in the ¯same manner according to (25). Then, we have the following ° association and vice versa. desirable properties for the
¡
¦ ¦
(25) (26)
The following theorem demonstrates the high built-in tolerance ° ¢of our BAM design in the direction. The ±proof is included in the Appendix. and a threshold vector ° Theorem 2: For a weight matrix designed by (25) and (26), respectively, then . 1) if ² ¯ such that 2) .
³ We note that the threshold in the layer is a constant vector . Thus, we may design the independent of BAM in the same way as the BAM and with ¡
then ¯such that
1) If 2)
. .
In comparison with other conventional BAM’s, this newly ±proposed BAM system exhibits a series of desirable properties.
The attractive radius of each prototype pattern attains the maximum possible value as shown in Part 1 of Theorem 3. More¢over, there is no system induced cap on the number of pro totype pairs that the network can memorize: the storage ca±pacity is . It is also important to note that no data ´constraints are imposed on the prototype pairs and the prototype ±patterns may be arbitrarily correlated. Last but not least, the de¯sign process is a noniterative procedure of low computational ´complexity . For each bidirectionally stored pair, we need one hidden layer ( -layer) neuron, weights, and ¢one threshold parameter. At the operational stage the proposed BAM acts as a rapid one-shot parallel feedforward system. µ
· ¶ V. SIMULATIONS AND COMPARISONS
The performance of an associative memory is usually mea¯sured in terms of storage capacity and noise tolerance. Since
864
9
Fig. 5.
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 4, JULY 2000
Recognition accuracy for noisy versions of prototype ¸
¹»º½¼F¾
with maximum attractive radius 8.
¢our proposed network design can accommodate/memorize any
¿given number of pattern associations, our performance evalua-
tion studies focus on noise tolerance. We use as a case-study the familiar set of 26 IBM PC CGA character pairs of pixel size 7 À 7 as in [13] and [16] (Fig. 3). For network processing, each one ¢of the 52 prototypes is vectorized row by row and black pixels are coded by , while white pixels are coded by . Thus, Áeach pattern is represented by a 49-dimensional bipolar vector ¦ in Fig. 2). ( ¡ Â Experiment 1: We use the 26 upper-to-lower-case letter asso´ciations to compare the noise tolerance of the proposed BAM model with that of the general BAM model (GBAM) in [16] and the ABAM model in [15]. A maximum allowable number ¢of 1000 epochs is used for both the GBAM and the ABAM network. No comparison with the BAM in [6] is made since the ¯storage capacity of the latter is too low to memorize all 26 pairs ¢of associations. In this experiment the required storage ratio is . The minimum Hamming dis° , among all -layer defined by (6), tances ±prototypes , and the minimum Hamming ° , among all -layer prototypes distances are listed in Table I. To study the effects of bit-inverting disturbances, for Áeach prototype pattern in either direction and for each , we generate exhaustively Ãall ±patterns in that are at a minimum Hamming distance of with respect to the prototype pattern of interest. Then, these ±patterns are fed to each BAM and the successful association Äratio is calculated. The results are shown in Fig. 4. Here we ¯select in (13) for the proposed BAM. We see that all three algorithms memorize all prototype associations
TABLE II MINIMUM HAMMING DISTANCES FOR THE 52 LETTER PATTERNS IN FIG. 3
¯successfully. However, only one prototype pair is an asymptotic equilibrium for the ABAM network and this explains its failure for . We also note and that the Hamming distance between is two. Thus, the GBAM network is unable to converge (2000 training epochs were used here) since it requires a minimum of ¢one bit tolerance among prototype patterns. Â Experiment 2: To illustrate the theoretical claim that in the proposed algorithm the attractive radius of each pro totype reaches the maximum possible value, we examine
l WU AND PADOS: A FEEDFORWARD BIDIRECTIONAL ASSOCIATIVE MEMORY
9
Fig. 6.
Recognition accuracy rate versus Hamming distance for Experiment 3.
¦ . We observe that (Table I), therefore is eight. We reverse the maximum attractive radius of ±pixels of and we test Ãall , possible ±patterns at each bit-inverting noise level . Å As previously claimed, the proposed network is shown to Áexhibit 100% noise recovery rate (Fig. 5). The ABAM network ´cannot associate correctly any noisy pattern while the GBAM recognition accuracy is less than 20% at the maximum radius . ¬ Æ Experiment 3: Next, we expand the prototype set in Experiment 1 to create a set of 52 prototype pairs of the . form As we see, in this case the required storage ratio is and the number of prototypes is larger than the pattern dimension. The minimum Hamming distances , among all prototypes in the -layer are of course the same as for the -layer prototypes and are listed in Table II. Å As in Experiment 1, for each prototype pattern in either diÄrection and for each we generate exhaus ± Ç that are at a minimum Hamtively all patterns in ming distance with respect to the prototype of interest. Then, these patterns are fed to each BAM and the successful asso´ciation ratio is calculated for each ¦(Fig. 6). Here we select in (13) for the proposed BAM. The ABAM Ènetwork can memorize successfully ¢only two out of the É . In fact, only 52 prototype pairs, namely is an asymptotic equilibrium . The GBAM Ê Ènetwork memorizes successfully 8 out of the 52 proto type pairs. But the GBAM accuracy ratio drops well below 1% . for
865
µ
VI. CONCLUSIONS
In this work we developed a feedforward BAM network for recording ¥bidirectional associations of the form and are , where arbitrary bipolar vectors in and , Ärespectively. The overall design procedure is based on a ¢one-shot algorithm for the implementation of the bidirectional ¦ Ë associations , (and ), where is the standard unit vector basis of . Direct merging of the two BAM’s creates the desired ¥ bidirectional associations in the form of an - - feedforward network. The overall design computational ´complexity is and the design procedure is fully ¯scalable to accommodate any given number ¢of ¥ bidirectional associations. Ì Our theoretical analysis showed that the newly proposed feedforward BAM guarantees perfect bidirectional recall for arbitrarily correlated patterns. Every prototype pair attains the maximum possible attractive radius in the Hamming distance ¯sense. The latter statement quantifies the inference potential ¢of the network for operation on noisy data. The simulation ¯studies and comparisons of Section V on the familir 7 Í7 pixel ´character set serve as a simple illustration.
Proof
of
APPENDIX ¡ Theorem 2: We
note
that
for
every
¦
(27)
866
If
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 4, JULY 2000
, then
¦
(28)
then is the nearest prototype In addition, if to with the largest superscript (i) among all prototypes nearest to . We conclude from Criterion 1 that
¦
(29)
and applying (23) and (24) we see that
¦
(30)
and from (28). Therefore, ¡ and this completes the proof We conclude that ¢of Part 1). and we obtain For Part 2, we apply (30) for ¦ Hence, due to Criterion 1, with the largest superscript Therefore, ¦ (31). We conclude that ±proof of Part 2.
(31)
is the nearest prototype to among all nearest prototypes. and from and this completes the
REFERENCES [1] K. Steinbuch, “Die lernmatrix,” Kybernetik, vol. 1, pp. 36–45, 1961. [2] T. Kohonen, “Correlation matrix memories,” IEEE Trans. Comput., vol. 21, pp.Î 353–359, 1972. [3] , Associative Memory: A System-Theoretical Approach. New York: Springer-Verlag, 1977. [4] J. J. Hopfield, “Neural networks and Ï physical systems with emergent collective computational abilities,” Proc. Nat. Acad. Sci., vol. 79, no. 5, Ð pp. 2554–2558, 1982. [5] , “Neurons with graded response have collective computational Ð properties like those of two-state neurons,” Ï Proc. Nat. Acad. Sci., vol. 81, pp. 3088–3092, 1984. [6] Ñ B. Kosko, “Bidirectional associative memories,” IEEE Trans. Syst., Man, Cybern., vol. 18, pp. 49–60, Jan. 1988. [7] , “Adaptive bidirectional associative memories,” Appl. Opt., vol. 26, no. 23, pp. 4947–4960, 1987.
[8] Y. F. Wang, J. B. Cruz, Jr., and J. H. Mulligan Jr., “Two coding strategies Ò for bidirectional associative memory,” IEEE Trans. Neural Networks, vol. 1, pp. 81–92, Mar. 1990. , “On multiple training for bidirectional associative memory,” IEEE [9] Trans. Neural Networks, vol. 1, pp. 275–276, Sept. 1990. [10] H. Kang, “Multilayer associative neural network (MANN): Storage caÐ pacity versus perfect recall,” IEEE Trans. Neural Networks, vol. 5, pp. 812–822, Sept. 1994. Ó [11] Ó M. H. Hassoun, “Dynamic hetero-associative neural memories,” Neural Networks, vol. 2, pp. 275–287, 1989. [12] X. Zhang, Y. Huang, and Ó S.-S. Chen, “Better learning for bidirectional associative memory,” Neural Networks, vol. 6, no. 8, pp. 1131–1146, 1993. [13] H. Oh and S. C. Kothari, “Adaptation of the relaxation method for Ò in bidirectional associative memory,” IEEE Trans. Neural Ó learning Networks, vol. 5, pp. 576–583, July 1994. [14] P. K. Simpson, “Higher-ordered and interconnected bidirectional associative memories,” IEEE Trans. Syst., Man, Cybern., vol. 20, pp. 637–653, May/June 1990. [15] Z.-B. Xu, Y. Leung, Ò and X.-W. He, “Asymmetric bidirectional associative memories,” IEEE Trans. Syst., Man, Cybern., vol. 24, pp. 1558–1564, Oct. 1994. [16] H. Shi, Y. Zhao, and X. Ò Zhuang, “A general model for bidirectional associative memories,” IEEE Trans. Syst., Man, Cybern. B, vol. 28, pp. 511–519, Aug. 1998. [17] Y. Wu and S. N. Batalama, “Improved one-shot learning for feedforward associative memories with application to composite pattern association,” IEEE Trans. Syst., Man, Cybern. B, to be published.
Ô
Õ Yingquan
Wu was born in Jiangshang, China, on August 13, 1974. He received the B.S. and M.S. degrees in mathematics from the Harbin Institute of Technology, Harbin, China, in 1995 and 1997, respectively. From 1997 to 1998 he was a Ph.D. Ö student in the Department of Mathematics, State University of New York at Buffalo. Since 1998, he has been pursuing graduate studies in electrical × engineering at the State University of New York, Buffalo. His research interests are in the areas of informaØ tion theory, pattern recognition, and neural networks.
Dimitris A. Pados (M’95) was born in Athens, Greece, on October 22, 1966. He received the diploma degree in computer engineering and science from the University of Patras, Patras, Greece in 1989 the Ph.D. degree in electrical engineering from Ø and the University of Virginia, Charlottesville, in 1994. He was an Applications Manager at the Digital Systems and Telecommunications Laboratory, Computer Technology Institute, Patras, Greece Ù from 1989 to 1990. From 1990 to 1994 he was a Research Assistant in the Communications Systems Ú Laboratory, Department of Electrical Engineering, University of Virginia, Û Charlottesville. From 1994 to 1997 he was an Assistant Professor in the Department of Electrical and Computer Engineering and the Center for Telecommunications Studies, University of Louisiana, Lafayette. Since August e 1997, he has been an Assistant Professor with the Department of Electrical Engineering, State University of New York, Buffalo. His research interests are in the areas of communications theory, adaptive signal processing, and neural networks.