Distributed fuzzy learning using the MULTISOFT ... - Semantic Scholar

3 downloads 0 Views 222KB Size Report
best individuals found from the slaves and after kills them. This simulation ..... [5] S. H. Park, Y. H. Kim, Y. K. Choi, H. C. Cho, and H. T. Jeon, “Self-. Organization ...
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 3, MAY 2001

475

Distributed Fuzzy Learning Using the MULTISOFT Machine Marco Russo, Member, IEEE

Abstract—This paper describes PARGEFREX, a distributed approach to genetic-neuro-fuzzy learning which has been implemented using the MULTISOFT machine, a low-cost form of personal computers built at the University of Messina. The performance of the serial version is hugely enhanced with the simple parallelization scheme described in the paper. Once a learning dataset is fixed, there is a very high super linear speedup in the average time needed to reach a prefixed learning error, i.e., times, the if the number of personal computers increases by mean learning time becomes less than 1 times. Index Terms—Distributed computing, evolutionary computing, fuzzy logic, neural networks (NNs), supervised learning.

I. INTRODUCTION

T

RADITIONAL techniques of system modeling could have some difficulties in dealing with multiinput–multioutput (MIMO) systems using mathematical models above all when the structure of the system is unknown. Some times the structure is known, but the available data is imprecise or uncertain. This often implies irrelevant or computationally inefficient numerical representations. In these cases soft computing (SC) approaches are generally more suitable. Neural networks (NNs), evolutionary computing (EC) and fuzzy logic (FL) are consolidated techniques of MIMO modeling starting from significant data sets. Among them FL model representation has two properties that are often very interesting from the system modeler’s point of view: a human-like model representation and a very low computational requirement. Hybrid approaches, i.e., the aggregation of two or more of the above mentioned techniques have demonstrated a greater learning capability [1]–[9]. System modeling in SC requires a learning phase during which the input data set is used to extract parameters of the model. This phase is typically very intensive. To reduce the learning time there are two different approaches: digital or analog hardware dedicated to learning [10], [11] or distributed learning [12]–[14]. Both supervised and unsupervised distributed learning is widely studied in literature. Mainly there are two categories of distributed learning: customized parallel hardware solutions [15] and approaches based on general processors interconnected together differently. In this latter category, we find Manuscript received February 29, 2000; revised September 20, 2000. This work was supported by INFN of Catania, INFM of Messina, and Computer Center of the University of Messina. The author is with the Department of Physics, University of Messina, Catania, Italy, and is also with INFN of Catania, Italy (e-mail: [email protected], [email protected]). Publisher Item Identifier S 1045-9227(01)04350-8.

multiprocessor solutions mainly based on shared memory data exchange and multicomputer solutions generally based on message passing communication. Multiprocessors have more bandwidth in the interconnection network than multicomputer communication via message-passing. However, the second case is generally cheaper. As regards distributed learning, it is typical to have a situation where speedup saturates when only few processing elements (PEs) are used, even if a high-bandwidth interconnection network is available [16]–[19]. This paper deals with PARGEFREX the distributed version of GEFREX [7] that is a hybrid genetic-neuro tool for MIMO supervised learning that extracts fuzzy models. The main characteristic of GEFREX is that it is able to find very compact fuzzy sets in comparison with other techniques. It can be used for approximation, classification, and time prediction tasks. The learning procedure is mainly based on a genetic algorithm (GA). A neuro-algorithm is used to accelerate the overall method. In PARGEFREX a coarse-grain parallelism was implemented: the whole population of solutions is divided into autonomous subpopulations. In genetic programming (GP) and in general in EC this approach implies a decrease in processing amount [20]: a requires more iterations to reach single population of size inthe same performance as subpopulations each with dividuals if subpopulations sometime exchange solutions with others. This behavior is consistent with the analysis of Wright [21] regarding demes of biological populations. This effect is . amplified if each subpopulation has more individuals than It is shown in [6] that larger populations obtain better results than smaller ones. In [22] some very preliminary results regarding the parallelization of GEFREX were presented. In that period seven computers with one Pentium II processor with clock speed of 233 MHz and one server [a computer with two Pentium II processors (300 MHz)] were available. That work wanted only to understand if GEFREX could be parallelized with significant results without spending too much time in the implementation. So, the first parallel version was very simple: through shared-files which were saved on hard disk of the server computer. The operating system (OS) was Windows NT. Although the implementation was very simple and the simulations were not very accurate the improvement obtainable with the parallelization of GEFREX was evident. The result presented here refers to the final implementation of PARGEFREX. A message-passing paradigm has been used for data exchange among computers. The hardware platform is the MULTISOFT machine and the OS is LINUX. The paper is organized as follows: • Section II briefly describes the MULTISOFT machine;

1045–9227/01$10.00 ©2001 IEEE

476

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 3, MAY 2001

• Section III describes PARGEFREX; • Section IV deals with the performance evaluation of PARGEFREX; • In Section V there is a comparative discussion regarding related works; • Section VI gives the author’s conclusions and future work. II. THE COMMODITY SUPERCOMPUTER Fig. 1 shows the commodity supercomputer implemented at the University of Messina. We named it MULTISOFT machine. At the moment there are 32 PEs. There are both PEs based on one Celeron (366 MHz), in the following indicated with c366, and PEs with a mono Pentium II (233 MHz) indicated with p233. In particular, there are 25 c366s and 7 p233s. Each PE has one local hard disk, a fast-Ethernet card, and 32 MByte of RAM. For various reasons one p233 and three c366s have not been available in this experiment. There are also two servers: one with two Pentium II processors (300 MHz) and one with only one (350 MHz). The servers have two Ethernet cards: one to connect to the university LAN and another to the local fast-Ethernet LAN. There is one 12-ports switch and four 12-ports hubs. The OS is LINUX. The message passing communication paradigm has been used. The parallel virtual machine (PVM)1 served to this purpose. III. PARGEFREX DESCRIPTION This section is organized into three different subsections. First, the representation of the fuzzy systems trained by PARGEFREX is described. Then, there is a short description of GEFREX the serial version of PARGEFREX. Last, there is a detailed explanation of PARGEFREX implementation. A. Fuzzy Representation GEFREX deals with MIMO fuzzy systems with rules, inputs, and outputs. The generic th rule has the following form:

is

IF

AND

AND

Fig. 1.

Schematic of the MULTISOFT machine.

All the connectors in the premise are ANDs. The algebraic minimum is used for this operator. Therefore, given the inputs, of the th rule is the the expression of the degree of truth : minimum of the activation degrees of the antecedents (2) , In the conclusion there are no FSs but only singletons i.e., they represent FSs which always have a zero degree of where the degree of membership is membership except in one. Two different versions of defuzzification method have been implemented. The weighted mean (WM) [6]:

is

(3) and the weighted sum (WS) [24]

THEN

is

AND

AND

is (4)

Each premise contains a maximum of antecedents and each conclusion at most consequents. is the th input and is the th output. there is the crisp input In the generic antecedent and the Fuzzy Set (FS) that is generally different from all the others. The membership function (MF) of has a Gaussian shape, so (1) 1Available

at http://www.epm.ornl.gov/pvm/pvm_home.html

B. A Brief Review Regarding GEFREX tool for MIMO supervised GEFREX is a hybrid learning. It can be used for approximation, classification, and time prediction tasks. Moreover, it is able to identify significant features. The learning procedure is mainly based on a GA. A neuro-algorithm is used to accelerate the overall method. The genetic coding involves only the premises of the fuzzy rules. No data has to be coded for the conclusions of the fuzzy rules. The learning phase is divided in two parts: initialization and iteration. In the initialization phase the genetic population is

RUSSO: DISTRIBUTED FUZZY LEARNING USING THE MULTISOFT MACHINE

randomly generated and evaluated. The learning dataset is examined during this phase. For each input the range of possible values is determined. The MFs are randomly arranged within these ranges. The iteration part consists of a number of iterations where each time a new fuzzy system is generated using customized genetic operators, evaluated and placed in the place of the worst individual. These iterations end when the termination criterion is met. Both in the initialization and iteration phases, when there is the improvement of the best individual found so far, a specialized neuro-operator is called. This operator is very time-consuming and consists in the transformation of the fuzzy individual into a neuro-fuzzy system. After, this system is trained. Finally, it is retransformed into a genetic individual and replaces the original fuzzy individual. As at the beginning there is the random generation of the population the neuro-operator is often invoked. After, it is used much less. In the following we will distinguish between the two parts described above calling them, respectively, neuro phase and genetic phase. In the following some details regarding the genetic and neural components of GEFREX are given. For more details on GEFREX and its ability to learn refer to [7]. 1) The Genetic Part of Gefrex: Each genetic iteration consists in the choice of a subpopulation. From the individuals belonging to the subpopulation two parents are selected. After, only one offspring is generated. The son replaces the individual in the subpopulation with the lowest fitness. Whenever a new individual is generated with a fitness2 value higher then the best one obtained so far, a hill-descending operator is used, too. It is the neural part of GEFREX. The genetic coding of GEFREX has a mixed nature. Each individual contains both real genes and binary ones. The real genes regard the Gaussians of the antecedents of fuzzy rules. and are needed. For each Gaussian the two parameters So, if the fuzzy system to be trained has rules and inputs, real numbers. The binary its real coding requires coding requires only bits. These bits are used when the feature selection option is enabled during the learning phase. No coding is needed for the output singletons. In fact, given the antecedents (obtained from an offspring) we can calculate the degree of truths of the rules. These values, the desired output ones and the output singletons are sufficient to build an over-determined linear system (see Section III.G in [7] for details). Regarding the binary coding, standard, multicut crossover, and uniform mutation are used. The real coding is dealt with in a different way. The crossover consists of a random, weighted mean of the two parents, i.e., all the values of the real genes of the offspring are between the values of the relative genes of the two parents in a random way. The mutation is of a multiplicative nature, each real genes is multiplied by a value around 1.0. 2) The Neural Part of GEFREX: To improve the performance of the genetic algorithm a hill-climbing operator was introduced (in [6] the improvement reachable with a similar approach is shown). Initially, the individual selected is transformed into a neurofuzzy system. That is, the equivalent neural network is built and all the trainable weights are initiated. Then, the system is

477

trained. In this phase it is possible that some neurons can be deleted, too. Finally, the trained system is retransformed into a genetic individual and introduced into the genetic grid at the place of the starting individual selected. The neural network is trained with the gradient descending method. GEFREX automatically adjusts the learning rate and proceeds until the error reaches a local minimum. Also in the neuro-fuzzy system the best output singletons are found solving an over-determined linear system. Fig. 2 illustrates the neural architecture. For simplicity, there is a neuro-fuzzy system with only two inputs, one output and two rules. When the WS is assumed (as in Fig. 2) the neurofuzzy system has four layers: • The first is the input layer and it has a maximum of neurons. • The second layer is the fuzzification one. Here, the activaof the antecedents are calculated. It is postion degrees sible to distinguish the activation function where . Of course, . and . The total The weights that can be learned are . number of neurons is at maximum • After, there is the inference layer. It needs to calculate the of the rules. It has neurons and degree of truth weights fixed to one. • Lastly, there is the linear output layer in which the WS defuzzification is calculated. The weights that can be learned values. The output neurons are . are the ina) Gradient Descending Formulas: For simplicity dicates one of the two coefficients If then . Otherwise we obtain

(5) if WS

where if if

fitness must be maximized.

(7)

3) A Training Example: Fig. 3 shows 40 learning patterns of a single input–single output (SISO) system (they are represented in the figure) that were already used in [22]. It conwith a sists of equally spaced samples of two triangles one next to the other. As Gaussian MFs were used, the function to be learned is complex enough if low learning errors must be reached. For example, roughly 10 s are required on a computer based on a Pentium II with a clock speed of 350 MHz to reach a root mean squared normalized error (RMNSE) [6], [7] equal to 3.4% with five rules. The RMNSE is defined as follows:

RMNSE 2The

(6)

if WM

(8)

478

Fig. 2.

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 3, MAY 2001

The neuro-fuzzy system. This system consists of two inputs, one output. WS defuzzification is used.

where number of learning patterns; calculated and desired th outputs of the th learning pattern, respectively; indicates the difference between the maximum and minimum values that the th output can assume. It indicates how much each output differs on average from the desired value with regard to its range. In the following it will be simply called error and indicated with . The same figure shows the learned curve. The corresponding RMNSE is 3.22%. GEFREX was used to learn fixing a population of 400 individuals, Gaussian fuzzy sets, five rules and the WM as defuzzification method. This example fits well in a PC with 32 MByte of RAM (like those belonging to the MULTISOFT machine), i.e., less than this amount of memory is needed to contain the whole population. Fig. 4 shows the fuzzy system learned that was used to plot the curve in Fig. 3. There are five rules one for each row. In the first column there is the input MF related to each rule and in the second column there is the output singleton. A linguistic representation of the fuzzy rules could be those shown in Table I. In the following, in all simulations the same type of MFs, number of individuals, defuzzification method and number of rules were fixed. and

C. PARGEFREX Starting from encouraging and preliminary results obtained in the previous work [22] we realized a parallel version of GEFREX very suitable for the MULTISOFT machine. The policy adopted in PARGEFREX realization is simple. There is a master process that spawns a number of slaves. Then, every slave starts to work and exchanges data with the others slaves. At the end the master collects results from the slaves and kills them.

Fig. 3.

The 40 learning patterns and the learned function with a RMNSE

3 22%. The fuzzy system obtained is shown in Fig. 4. :

=

Effectively, there are not two different softwares one for the slaves and another for the master. There is only one software that detects if it has been spawned from another copy of itself or not. If not, it acts as master, otherwise it is a slave. When the master starts it reads a configuration file which indicates how many PEs must used. Moreover, for each PE this file specifies how many slaves must be run. After the reading of the configuration file the master spans the all slaves, sends them initialization information and waits for results from the slaves. After the slaves run and their initialization phase ends, they start to exchange information among themselves excluding the master process. The data exchanged is the best individual found so far from the slave which is send to another slave. They send their best individual at random times, but the average sending time is user definable. The receiving slave is randomly chosen, too. The individual received replaces the worst one in the local population.

RUSSO: DISTRIBUTED FUZZY LEARNING USING THE MULTISOFT MACHINE

Fig. 4.

The five fuzzy rules obtained to draw the curve in Fig. 3.

TABLE I LINGUISTIC REPRESENTATION OF RULES SHOWN IN FIG. 4

THE

479

Fig. 5 shows the real behavior of PARGEFREX. The graph was obtained with the X windows based, graphical console and monitor for PVM (XPVM).3 There is the master that is launched on the 7th p233 (p233h07 in the figure), and three slaves, respectively, launched on the first, second, and third p233 of the MULTISOFT machine (p233h01, p233h02, and p233h03 in the figure). The run lasts for about 4 s. The stopping criteria was CT when the simulation time was fixed to 3 s. First, there is the initialization phase where the slaves are spawned and initial information is sent to them. After there is a synchronization phase. This phase is optional and has been implemented only for speedup evaluation. In this phase the slaves and master synchronize their clocks, so they can start simultaneously. The beginning time is sent from the master to the slaves in the previous phase. This solution has been adopted, because when many PEs are involved in the simulation it is practically impossible to use the network to send a message that indicates to start to all PES contemporarily. In fact, in the PVM there is not the real broadcast function, but only a simulated one. When a broadcast message is sent there are as many equal messages sent as the number of receivers. So, when there are several PEs the start message reaches the first slave much sooner than it reaches the last one. Obviously the implemented synchronizing method does not guarantee that all slaves start exactly in the same moment, but the maximum difference in starting times is much less in respect of using the PVM broadcast. Then the slaves start to work. At each iteration they see if a message has been delivered from another slave. Furthermore, at random times they send their best individual to other random slaves. In this simulation the average sending time was fixed to 1 s. In the graph two particular phases, the neuro and genetic ones, are characterized. The slave starts to work in the neuro phase. In this phase it generates its population of 400 individuals. This phase lasts about 0.4 s and it is forbidden to send anything to other slaves. After, the genetic phase starts. It lasts 2.6 s. Finally, there is the collecting phase where the master receives all the best individuals found from the slaves and after kills them. This simulation clarifies which is the simulation time specified in the termination condition: the neuro phase time plus the genetic one. IV. PERFORMANCE EVALUATION

Several stopping criteria have been implemented as follows. CI: The number of iterations. When all the slaves reach a prefixed number of iterations they send their best individual to the master. The master collects these individuals, retains the best one, kills all the slaves and ends. CT: Simulation time. When all the slaves run for a prefixed amount of time they send their best result to the master. Then the master behaves as described above. CE: Final error. The first slave that reaches a prefixed error sends its best individual to the master. The master kills the other slaves and ends.

This section is devoted to the evaluation of the PARGEFREX performance using the MULTISOFT machine. In every simulation the 32 MByte of RAM were necessary to avoid hard disk usage during the simulation itself. First of all the relative computational power among PEs was evaluated. With this aim 100 learning phases were executed using one p233 and successively one c366. The stopping criterion was CI, where the number of iterations was fixed at 100 000. On average the final errors were almost the same. Furthermore, still on average, the learning time using one p233 times the time needed for a c366. was To evaluate the performance obtainable with GEFREX we need to introduce two new quantities. Let us suppose we are

3Available

at http://www.epm.ornl.gov/pvm/pvm_home.html

480

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 3, MAY 2001

Fig. 5. PARGEFREX simulation when three PEs are used.

Fig. 6. RMNSE error distribution when one p233 is used and the run lasts for 75 s.

using p233s and c366s. Some useful information is the of these PEs, that is the equivalent computational power sum of relative performances. So, we have (9) where is the relative speed between one p233 and one c366 in this case). ( One possible way of evaluating the speedup would be to fix a learning error and to see how much time is required to reach varies. Unfortunately, this is impossible. this error when In fact, PARGEFREX is mainly a GA. Fig. 6 shows the distribution of the final error when only one p233 is used and the termination criterion is 75 s of learning time. The figure shows the great spread in final error obtainable. If we use CE as termination criterion, we will have many problems, above all when we look for fuzzy systems with low learning error. In fact, the lower the error, the greater the number of simulations that do not converge in a reasonable time. In the case of Fig. 6 if the

termination condition CE were to reach an error of 3.0% only two runs among 50 would reach the error within 75 s. If simulations lasted longer it would be possible that other runs would reach the prefixed error. The author experienced that in these cases only very few simulations reach the error even if very long simulation times are allowed. This phenomenon in conjunction with the great spread, would require an enormous number of simulations and too much time to obtain average values with an acceptable level of confidence. To obtain significant results in a reasonable time the speedup was evaluated in an indirect way. However, one month of intensive simulations was necessary to collect all the results reported in this paper. Rather than using CE as termination criterion and consequently do not having control regarding the simulation time we chose to adopt CT. In this manner we can fix the learning time. The main idea for evaluating the speedup is to compare the simulation time when a fixed number of PEs are used (and an average error is obtained) in respect of the time that would be required to reach when a reference number of PEs are used. To do this, as regards the reference configuration, that ties the learning it is necessary to find the function time and the average error. This function can be obtained by a regression technique starting from a sufficient number of significant points. For this reason as reference configuration all available PEs have been used. As will be shown after, for each point lots of runs are required. So, to minimize the time necessary to obtain the curve, it is preferable to fix all PEs as reference configuration. Any other reference configuration would require much longer to evaluate the speedup. . The average migration time Fig. 7 shows the function of individuals among slaves was fixed to 0.1 s. When not otherwise specified this time has been used in the other simulations

RUSSO: DISTRIBUTED FUZZY LEARNING USING THE MULTISOFT MACHINE

481

TABLE III AVERAGE ERROR AND NUMBER OF RUNS REQUIRED VERSUS SIMULATION TIME WHEN THE NUMBER OF PES VARIES

Fig. 7. Time (in seconds) versus per-cent RMNSE when 6 p233s and 22 c366s are used. TABLE II AVERAGE ERROR AND NUMBER OF RUNS REQUIRED VERSUS SIMULATION TIME WHEN ALL PES ARE USED

reported in this paper. was obtained as a cubic interpolation of eight significant points. These points were obtained ranging s. the simulation time in Then for each of these times a set of simulations were executed. This set of simulations was ordered according to errors and the best 20% and the worst 20% were discarded. We must remember we are dealing with a GA. In this way we eliminate lucky simulations and simulations where premature convergence was occurred. Moreover, we reduced the spread in the learning errors. With the remaining simulation the average error was calculated. As many simulations were executed as were needed to reach a confidence level of the error under 0.01% were executed. However, at least 50 runs were executed. Table III reports the simulation times, the average error, and the number of runs. This table shows the great number of runs needed to reach the prefixed level of confidence. Once the curve in Fig. 7 has been obtained, it is possible to evaluate the speedup when the number of PEs range from one ranges from 1 to 40.54. to all, i.e., when Let us run PARGEFREX using p233s and c366s. Let us suppose the use of the CT termination error with the simulation

time fixed to . Let us indicate with the average per-cent error (RMNSE) obtained in the same way as above: at least 50 runs and as many runs as are needed to reach an average error with a confidence level of 0.01 once the best 20% and the worst as 20% of runs are discarded. Let us introduce the quantity follows: (10) indicates indirectly the speedup. It represents the ratio bewhen tween the time that would be needed to reach the error when only all PEs would are used and the learning time p233s and c366s are used. This ratio is normalized in respect of the case when only one p233 is used. many runs were executed. For these To evaluate runs was decided to use in order: 1 p233, 6 p233s, 6 p233s, and 4 c366s, 6 p233s and 8 p233s, 6 p233s and 12 in p233s, 6 p233s, and 17 c366s. These choices imply . In all simulations the CT finishing criterion was used. In particular for one p233 three sets of runs were executed. In the first set all simulations lasted 300 s, in the second 150 s and in the last 75 s. For the other choice of PEs three sets of simulation were executed, too.

482

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 3, MAY 2001

Simulation times were fixed inversely proportional to . The s , duration times were fixed, respectively, to about s , s . So three different curves were obtained. They are shown in Fig. 8. The fourth curve represents the linear speedup. The three sets of simulations were chosen after lots of runs to understand two distinct phenomena. • The first one is related to the total power of our machine values obtained. The shorter simulation and the high using only one p233 must correspond to a significant point in the curve shown in Fig. 7. Table III shows that with one p233 and 75 s of learning time an error of about 3.41% is reached on average. This value corresponds (see Fig. 7) to about 0.2–0.4 s using all PEs. This value is at the limit. In fact we remember that as explained in describing Fig. 5, when learning time is less than about 0.3–0.4 s the learning process is only in the neuro phase where the initial population is generated and no communication among PEs is allowed. So, to evaluate well the speed up achievable with the proposed implementation of PARGEFREX simulations on a certain number of PEs must correspond to runs where the time required from all PEs to reach the same error is greater than 0.3–0.4 s. • The second consideration is related to the genetic convergence. Simulations cannot last too long because after a certain time the genetic-phase converges and no significant improvements are reachable in a reasonable time. To long simulations with a number of PEs with a low would imply artificially, very high speedups. For these two reasons the simulations using one p233 were chosen to last 300 s, 150, and 75 s. Fig. 8 clearly states that the implementation of PARGEFREX gains significantly in performance when many PEs are used. Although only 10% of the networks maximum bandwidth is used on average the results are surprising: PARGEFREX is particularly more suited for commodity supercomputers rather than very expensive servers with several processors. Fig. 8 confirms our discussion regarding too long simulations. If one p233 is used and the simulations last for 300 s the speedup is particularly high: almost 450! This phenomenon is due to the fact that after 100–200 s all simulations converge and do not succeed in improving significantly. If we evaluated the average error when the simulation time was greater we would have a huge speedup, because the error would be almost identical to that found in 300 s. Simulations using one p233 that last for 75 s and 150 s are more suitable for evaluating the speedup. The reader can see that the speedup is always greater than one. This fact implies a clear concept: the use of several PEs rather than only one permits very low learning errors to be found on average. It is very improbable to find the same error with only one PE even if many very long simulations are launched. In this sense we can say that PARGEFREX is more effective than GEFREX. To understand how much inter-slave communication is important PARGEFREX was used avoiding any exchange of individuals. Table IV shows the results obtained when all PEs are used and the CT criterion was chosen with the simulation time

Fig. 8. 75 s=P



(+) – Simulation time 300 s=P and ( ) linear speedup.

5

;

2  150 s

( )

=P

;

8 

( )

TABLE IV RESULTS OBTAINED WHEN ALL PES ARE USED AND NO EXCHANGE OF DATA AMONG SLAVES IS PERMITTED

IMPROVEMENT

AS A

TABLE V FUNCTION OF THE NUMBER NODE AND NUMBER OF PES

OF

PROCESSES PER

fixed, respectively, at 0.5, 1, 2, and 4 s. The last column reports the time would be needed in the case of a migration time of 0.1 s. There is a clear advantage in permitting migration. With the fastest simulation the advantage is less evident in comparison to other cases, but it needs to be underlined that in this case the simulation time is almost all constituted by the neuro-phase and the genetic phase, where migration is allowed, is very short. Further studies will investigate the optimal migration rate. The last Table V shows the behavior of PARGEFREX when more slaves are launched on the same PE. In the first two rows the results are reported when PARGEFREX is launched using only one p233, but, respectively, 5 and 10 slaves are spawned. The last two refer to all PEs. The CE termination criterion was used. The simulation time is reported in column 2. The last column reports the percentual improvement obtained in respect

RUSSO: DISTRIBUTED FUZZY LEARNING USING THE MULTISOFT MACHINE

483

TABLE VI SPEEDUP OF DIFFERENT PARALLELIZED SC ALGORITHMS

of the same simulation when only one slave is spawned per PE (see Table III for mono-slave results). The results obtained show that more processes per PE implies greater performance. Moreover, this increase is more evident when the number of PEs used increases, too. Further studies will deal with this phenomenon. V. DISCUSSION REGARDING PREVIOUS WORKS To the best of the author’s knowledge no other work exists in literature dealing with distributed fuzzy learning. The nearest approaches regard the parallelization of SC learning algorithms in the NN and EC fields. In the following three different parallelization schemes are reported and discussed as possible samples in three different SC areas. • Danese et al. [18] tested the performance of the Multimedia Video Processor TMS320C80 by implementing on it two different NN applications. The multilayer perceptron was considered. The hardware consists of five powerful processors, a master processor, and four parallel processors. Communications among processors are feasible through a high speed, crossbar switch, allowing multi-access to the small, local memory. Through the transfer controller the chip is connected to the external memory. The particular NN parallelization scheme required a big communication overhead. Using all four parallel processors, the speedup reached for the first application (a simple classification problem) was 1.9, whereas in the second case (hand-written characters recognition problem) it was 3.4. In contrast, the approach presented in this paper reaches very significant speedups with a negligible network bandwidth available (in comparison to the 2.4 GB/s of the TMS320C80). • In [19] Lam used the Cray T3D MPP system in Edinburgh. It consists of 256 nodes each with two processing elements (150 MHz DEC Alpha 21 604 processor with 64 MByte of RAM), arranged in a torus-3D network with a communication bandwidth of 300 MByte/s per link. He implemented the unified hierarchical classifiers (UHC) on this distributed platform. UHC modified the general regression neural networks. A study regarding the effectiveness of the UHC architecture implementation was performed varying the number of training patterns and the number of processors ranging from 4 to 32 PEs. The results reported show a large range in performances. The speedup reached varied from about two to about six when the number of PEs is in-

creased eight times. Also in this case our speedup is decidedly more consistent. Furthermore, the hardware we used is much cheaper. • In [25] Koza and Andre described a genetic programming parallel implementation using a PC 486 acting as a host computer of a network of 64 Transputers acting as PEs with 4 MByte of RAM and a clock of 30 MHz. The Transputers were the T-805 with a bidirectional communication capacity of 20 Mbit/s simultaneously in all four available directions. The problem of symbolic regression of the Boolean even-5-parity function was used to illustrate the parallization strategy. The results reported in the paper are summarized with the quantity named computational effort (CE). It is a measure of the computational effort necessary to yield a solution to the problem with a probability of 99%. Several attempts were performed using either only one processor or using all 64 PEs. Furthermore, the migration rate ranged from 0% to 12%. The best result regards M whereas the migration rate of 8%. In this case in the single processor case it was about 6.9 M. This implies a super-linear speedup of approximately 116. This example confirms the great benefit obtained by EC techniques when they are implemented on distributed hardware. The total network bandwidth available was greater than our approach and the speedup was inferior, although it was strongly super-linear. In conclusion, observing the results summarized in Table VI we can affirm that it is clear that parallelization shows its best benefits in the field of EC without being significantly affected by the allowable network capacity. With EC it is often possible to obtain very high super-linear speedups, whereas in the supervised NN learning and in the classification domains the speedup at the maximum is linear, but very often saturates quickly when the number of PEs increases [16], [17]. VI. CONCLUSION This paper has illustrated a parallel, supervised, fuzzy learner called PARGEFREX that has been implemented using the MULTISOFT machine, a farm of 34 inexpensive, personal computers. The performance obtained demonstrates that on the contrary of other neuro or fuzzy distributed learning methods, the speedup achievable does not saturate. Although PARGEFREX is not a pure GA, the speedup is super linear. This means that thanks to cooperation PARGEFREX is more effective than

484

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 3, MAY 2001

GEFREX. The study reported in the paper regards, above all, a realization in which one process per PE is executed. But some simulations demonstrated interesting developments when more processes run on the same PE. This influence on the performance of GEFREX, the size of the population and the migration rate will be the starting point of following studies. ACKNOWLEDGMENT The author wishes to thank the anonymous referees, Prof. P.Ballone, and G.Patanè for their helpful suggestions. REFERENCES [1] R. Katayama, K. Kuwata, and L. C. Jain, Fusion Technology of Neuro, Fuzzy, Genetic and Chaos Theory and its Applications. Singapore: World Scientific, 1996, Hybrid Intelligent Engineering Systems, pp. 167–186. [2] M. Funabashi, A. Maeda, Y. Morooka, and K. Mori, “Fuzzy and Neural Hybrid Expert Systems: Synergic AI,” IEEE Expert, vol. 10, pp. 32–40, Aug. 1995. [3] T. Hashiyama, T. Furuhashi, and Y. Uchikawa, “A Creative Design of Fuzzy Logic Controller Using a Genetic Algorithm,” Advances Fuzzy Syst., vol. 7, pp. 37–48, 1997. [4] K. C. C. Chan, V. Lee, and H. Leung, “Generating Fuzzy Rules for Target Tracking Using a Steady-State Genetic Algorithm,” IEEE Trans. Evolutionary Comput., vol. 1, pp. 189–200, Sept. 1997. [5] S. H. Park, Y. H. Kim, Y. K. Choi, H. C. Cho, and H. T. Jeon, “SelfOrganization of Fuzzy Rule Base Using Genetic Algorithms,” in Proc. 5th IFSA Congr., Seoul, Korea, July 1993, pp. 881–886. [6] M. Russo, “FuGeNeSys: A Fuzzy Genetic Neural System for Fuzzy Modeling,” IEEE Trans. Fuzzy Syst., vol. 6, pp. 373–388, Aug. 1998. , “Genetic Fuzzy Learning,” IEEE Trans. Evolutionary Comput., [7] vol. 4, pp. 259–273, Sept. 2000. [8] A. Homaifar and E. McCormick, “Simultaneous Design of Membership Functions and Rule Sets for Fuzzy Controllers Using Genetic Algorithms,” IEEE Trans. Fuzzy Syst., vol. 3, pp. 129–139, May 1995. [9] D. Kim and C. Kim, “Forecasting Time Series with Genetic Fuzzy Predictor Ensemble,” IEEE Trans. Fuzzy Syst., vol. 5, pp. 523–535, Nov. 1997. [10] G. Ascia, V. Catania, and M. Russo, “A VLSI Hardware Architecture for Complex Fuzzy System,” IEEE Trans. Fuzzy Syst., vol. 7, pp. 553–570, Oct. 1999. [11] D. L. Hung, “Dedicated Digital Fuzzy Hardware,” IEEE Micro, vol. 15, pp. 31–39, Aug. 1995. [12] H. Chen, N. S. Flann, and D. W. Watson, “Parallel genetic simulated annealing: A massively parallel simd algorithm,” IEEE Trans. Parallel Distributed Syst., vol. 9, pp. 126–136, Feb. 1998. [13] E. Cantú-Paz, “A Survey on Parallel Genetic Algorithms Tech. Rep.,” Illinois Genetics Algorithms Laboratory, Urbana, IL, 97 003, 1997. [14] M. Misra, “Parallel Environments for Implementing Neural Networks,” Neural Comput. Surveys, vol. 1, pp. 48–60, 1997.

[15] M. Russo, “FUZZY HARDWARE: Architectures and Applications,” in Designing a Simple System to Greatly Accelerate the Learning Speed of a Large Class of Fuzzy Learning Methods, A. Kandel and G. Langholz, Eds. Boston, MA: Kluwer, 1998, vol. 10, pp. 207–229. [16] N. K. Ratha, A. K. Jain, and M. J. Chung, “Clustering using a coarsegrained parallel Genetic Algorithm: A Preliminary Study,” in IEEE Proc. Comput. Architectures Machine Perception, 1995. [17] F. Ancona, S. Rovetta, and R. Zunino, “Parallel Architectures for Vector Quantization,” in Proc. IEEE Int. Conf. Neural Networks’97, vol. 2, Texas, Houston, June 1997, pp. 899–903. [18] G. Danese, I. De Lotto, and F. Leporati, “A Parallel Processor for Neural Networks,” in Proc. 7th Euromicro Workshop Parallel and Distributed Processing, 1999. [19] K. P. Lam, “UHC—A Massively Parallel and Distributed Realization of Hierarchical Classifier Networks,” in Proc. 1997 Int. Symp. Parallel Architectures, Algorithms, Networks, 1997, pp. 186–189. [20] W. F. Punch, “How Effective are Multiple Populations in Genetic Programming,” in Proc. 3rd Int. Conf. Genetic Programming, July 1998, pp. 308–313. [21] S. Wright, Genetics, vol. 28, p. 114, 1943. [22] M. Russo, “Parallel Fuzzy Learning,” in Proc. IWANN’99, vol. 1606, J. Mira and J. V. Sánchez-Andrés, Eds., Barcelona, Spain, June 1999, pp. 641–650. [23] G. H. Golub and C. F. Van Loan, Matrix Computations. Baltimore: Johns Hopkins Univ. Press, 1996. [24] Y. Lin and G. A. Cunningham III, “A New Approach to Fuzzy-Neural System Modeling,” IEEE Trans. Fuzzy Syst., vol. 3, pp. 190–198, May 1995. [25] J. R. Koza and D. Andre, “Parallel genetic programming on a network of transputers, Tech. Rep.,” Stanford Univ., Dept. Comput. Sci., CS-TR-95-1542, 1995.

Marco Russo (M’98) was born in Brindisi, Italy, in 1967. He received the Master’s degree in 1992 and the Doctor of Philosophy degree in electronical engineering in 1996, both from the University of Catania, Italy. In 1996, he joined the Institute of Computer Science and Telecommunications as an Assistant Professor of Computer Science. Since 1998, he has been an Associate Professor of Computer Science at the Department of Physics at the University of Messina. He is a person in charge of research at the National Institute for Nuclear Physics (INFN). He has more than 90 technical publications appearing in international journals, books, and conferences. He is a coeditor of the book Fuzzy Learning and Applications (Boca Raton, FL: CRC). His primary interests include soft computing, VLSI design, optimization techniques, and distributed computing. Dr. Russo is a member of the IEEE Computer Society, IEEE Circuits and Systems Society, and IEEE Systems, Man, and Cybernetics Society.

Suggest Documents