Large machine-part family formation utilizing a parallel ... - Springer Link

0 downloads 0 Views 207KB Size Report
Large machine-part family formation utilizing a parallel ART1 neural network. DAVID ENKE, KANCHITPOL RATANAPAN and CIHAN DAGLI. Smart Engineering ...
Journal of Intelligent Manufacturing (2000) 11, 591±604

Large machine-part family formation utilizing a parallel ART1 neural network D AV I D E N K E , K A N C H I T P O L R ATA N A P A N and C I H A N D A G L I Smart Engineering Systems Lab, Department of Engineering Management, University of Missouri, Rolla, Rolla, MO 65409-0370, USA E-mail: dagli,[email protected]

The binary adaptive resonance (ART1) neural network algorithm has been successfully implemented in the past for the classifying and grouping of similar vectors from a machine-part matrix. A modi®ed ART1 paradigm which reorders the input vectors, along with a modi®ed procedure for storing a group's representation vectors, has proven successful in both speed and functionality in comparison to former techniques. This paradigm has been adapted and implemented on a neuro-computer utilizing 256 processors which allows the computer to take advantage of the inherent parallelism of the ART1 algorithm. The parallel implementation results in tremendous improvements in the speed of the machine-part matrix optimization. The machine-part matrix was initially limited to 65,536 elements (256 6 256) which is a consequence of the maximum number of processors within the parallel computer. The restructuring and modi®cation of the parallel implementation has allowed the number of matrix elements to increase well beyond their previous limits. Comparisons of the modi®ed structure with both the serial algorithm and the initial parallel implementation are made. The advantages of using a neural network approach in this case are discussed. Keywords: Group technology, manufacturing, neural networks, ART1, machine-part matrix, parallel computer

1. Introduction Group Technology (GT) involves a number of methods that seek to identify similarities between the design of a product and the manufacturing processes that are involved in its production (Hyer, 1984; Hyer and Wemmerlov, 1989; Groover, 1987). The advantage of employing GT is that parts which undergo similar manufacturing operations can be grouped together, thereby reducing machine setup and down time. If one is to implement GT, it becomes necessary to create a method for the recognition of part attributes. This would allow for the correct classi®cation of parts which require identical operations or that are processed in a similar sequence. If visual inspection is utilized, the manufacturing attributes of the parts must either be physically 0956-5515 # 2000

Kluwer Academic Publishers

inspected for similarities by an individual (Banerjee and Redford, 1982), or inspected through the use of an automated technique such as machine vision. Inaccuracies become more frequent as the complexity of the parts increases or the required number of machines and parts becomes excessive. Classi®cation and coding techniques seek to eliminate the human error and judgment involved in manual inspection by ®rst coding the individual attributes of each of the parts and then employing a computational method for the part classi®cation and grouping. The dif®culty with this technique is ®nding an optimal computational paradigm or mathematical model that operates in a reasonable amount of time for larger machinepart matrices. Often a tradeoff must be made between the computation time and the need or ability to ®nd an optimal solution. Production ¯ow analysis methods

592 have attempted to decrease the size of the machinepart matrix by only considering manufacturing attributes taken from route sheets, but the increased matrix size will once again result in tradeoffs. The discussion that follows will begin with a brief description of the computational problem and the tradeoffs that must often be made. A more detailed analysis can be found elsewhere (Enke et al., 1998; Dagli and Huggahalli, 1995a,b). The use and advantages of the ART1 neural network for optimizing the machine-part matrix is described, along with the modi®ed ART1 paradigm with improved performance for generating a solution to the machinepart matrix. The performance of the network is brie¯y demonstrated here and compared against its implementation on a parallel neuro-computer utilizing 256 processors. The serial model is explained and evaluated in more detail in Dagli and Huggahalli, (1995a,b). A modi®cation to the parallel implementation, that allows for the generation of machine-part matrix solutions for matrices that contain up to two million elements, is also presented. 1.1. Machine-part matrix optimization During optimization of a machine-part matrix, the matrix is usually represented as a n 6 m binary matrix, with n representing the number of machines and m the number of parts. The elements of the matrix are given by a term such as aij , where i is the machine number and j is the part number. A value of aij ˆ 1 would indicate that part j is processed at machine i, or conversely that machine i processes part j (Dagli and Huggahalli, 1995a). A value of aij ˆ 0 indicates no interaction between the machine and the part. The objective of the optimization is to rearrange the rows and columns such that similar groups of parts are assigned to cells of machines which have common processing characteristics. This will result in all of the ``1'' elements of the matrix being clustered into groups which have been arranged into a block diagonal sequence. Then, each cluster in the new matrix will indicate a unique machine-part group. During the clustering process, there may be machine-part associations that do not appear to fall into any particular group, or that fall into more than one group. The parts within these associations are called exceptional parts, because of the inability to place them within a single, or unique group. Adding extra machines to a cell to accommodate exceptional

Enke, Ratanapan and Dagli

parts is expensive, so tradeoffs between supplying extra machines and incurring additional routing costs must be evaluated (Huggahalli, 1991; Dagli and Huggahalli, 1991). Since determining the best tradeoff can be a costly, time consuming, and risky endeavor, there is obviously a need for faster optimization routines that reduce the decisions placed upon manufacturing personnel. Neural networks and parallel computing offer possible solutions. 1.2. The neural network approach Many techniques have been proposed to solve the problem of optimizing the machine-part matrix. A review of these approaches can be found in the following sources (Dagli and Huggahalli, 1991, 1995b; King and Nakornchai, 1982; Huggahalli, 1991). The use of neural networks, and competitive neural networks in particular, has garnered interest in the GT community for classifying and optimizing the machine-part matrix (Moon, 1990). For example, neural network approaches have been implemented for the design and formation of cellular manufacturing systems (Kaparthi and Suresh, 1992; Malave and Ramachandran, 1991). The main advantage of using neural networks for GT problems is that neural networks allow for faster levels of optimization while retaining the ¯exibility for adding additional machine-part relationships after initial model building and training. Their parallel nature also allows for ef®cient coding onto a parallel machine which increases the processing speed. 2. Neural networks Neural networks are massively parallel computer algorithms (Simpson, 1990; Wasserman, 1989) that have the ability to learn from experience. They have the capability to generalize, adapt, approximate given new information, and provide reliable classi®cations of data. These algorithms involve numerous computational nodes that have a high connectivity. Each of the nodes operates in a similar manner which makes them ideal for a parallel implementation. During execution of the algorithm, each node receives an input, processes this information, and produces an output which is provided as an input to other nodes in the network. The connections between the nodes, and in particular the learning rules that modify the strength

593

Large machine-part family formation utilizing a parallel ART1 neural network

between the connections, are what give neural networks their power and ¯exibility. Numerous applications involving neural networks have been introduced, many of which have been applied to manufacturing problems (Dagli, 1993). During the optimization of the machine-part matrix the only inputs provided to the network are the vectors representing the relationship between the machines and parts. Ideal or expected input is not available. As a result, the neural network must be self-organizing and perform in an unsupervised manner, in other words, without the aid of input/output training pairs. The adaptive resonance theory (ART) paradigms (Carpenter and Grossberg, 1987) can be applied directly to the problem. In the past, the ART1 (binary ART) paradigm has been successfully applied to the machine-part matrix optimization and has generated interest as a result of its resemblance to the similarity coef®cients methods (King and Nakornchai, 1982; Sahay and Seifoddini, 1987; Seifoddini, 1989a,b; Seifoddini and Wolfe, 1986). 2.1. The ART1 paradigm The information that is sent to a neural network is often represented as a pattern. Every node in the network contains a representation of previously stored patterns that ®t the category associated with that node. When a new pattern is presented to the ART1 network, each node competes to make a match with the new pattern. The node with the strongest match wins the competition. If the match is strong enough, the input pattern is placed into that node's grouping, whereas if the match is not very strong, then the pattern is considered unique and a new node (or category) is created for it. Different thresholds can be used to specify the classi®cation between groupings. Since the threshold determines whether a new category is created, a different degree of clustering is obtained for each threshold. If the similarity exceeds the de®ned threshold, a heuristic is used to change the existing representative pattern that is used to de®ne the category or classi®cation of the node. The performance of the ART1 is very sensitive to the values given to the threshold and heuristic. During the optimization of the machine-part matrix, the column vectors representing the part patterns are ®rst classi®ed by the ART1 to obtain a series of part groups. Similar columns are grouped into adjacent areas within an intermediate matrix.

This begins the clustering of the ``1'' elements of the matrix next to each other. The machine row vectors are then classi®ed and clustered in a similar manner to obtain the machine groups. Figure 1 illustrates the sequence of events. The grouping of the rows and columns can occur simultaneously. Once the grouping is completed the resulting matrix can be inspected for bottleneck machines and exceptional machine-part cells. An additional advantage of the ART1 paradigm is that it supports on-line learning which allows new parts and machines to be immediately classi®ed and scheduled on the shop ¯oor and results in an intelligent manufacturing system.

2.2. The ART1 network architecture The architecture of the ART1 neural network consist of two layers of neurons, labeled as the comparison and recognition layers (Wasserman, 1989). Each layer and their interactions are illustrated in Fig. 2. In the comparison layer, each neuron has three inputs: (1) a feedback signal from the recognition layer, identi®ed as an element p of the vector P; (2) the gain signal G; and (3) an element x of the input vector X. The gain G is zero if any element of the vector P is 1. The output of the neurons in the comparison layer is 1 if any two of its three inputs is 1. This creates a ``two-thirds rule'' and is what aids the ART1 in its ability to make decisions. By implementing the two-thirds rule, if G is 0 the output of the comparison layer is simply the logical AND of each component of x and p, resulting in the vector C. Initially, the binary vector X is applied at the comparison layer and passes through unchanged, simply becoming the binary vector C. No processing occurs because G is set to 1 and all elements of the vector P are zero. The vector C then becomes the input to the recognition layer. The weights corresponding to these inputs form the vector Bj at the jth neuron in the recognition layer. The net value of each neuron in the recognition layer is calculated by the dot product, given as NETj ˆ Bj ? C

…1†

where the neuron with the highest net value becomes the winning neuron of the competition. All other neurons in the recognition layer are set to zero. Each neuron in the recognition layer is associated with a representation of stored patterns, with the

594

Enke, Ratanapan and Dagli

Fig. 1. Machine-part matrix formation with the ART1 Paradigm.

weight vectors Bj storing the analog values that determine the strength of the associations. The binary vectors Tj give the resulting output. The elements of Bj are usually initialized to a small value, while the Tj values are initially set to 1. When a neuron wins a similarity check, and if its values are large enough, the weights Bj are adjusted to the normalized values of the elements of the vector C using the following:  X  Lÿ1‡ c …2† bij ˆ …Lcj † k k Tj ˆ 1 if bij > 0 Tj ˆ 0 otherwise

…3†

where ci is the ith component of the comparison layer

Fig. 2. The ART1 neural network model.

output vector C, j is the number of the winning recognition layer neuron, bij is the weight corresponding to the ith component of vector C, and L is a constant that is usually set to 3. This results in the increase of the weights corresponding to the 1's in C, while the other weights are forced to zero. This ultimately increases the chances of similar vectors being detected at a particular neuron. With each neuron in the recognition layer there also exists a vector Tj , the elements of which are one for every non-zero value of the elements in the analog vector Bj . If node j is the winner, then the vector P that fed back to the comparison layer will equal Tj . The value of G is forced to 0 if any element of P is 1, therefore the application of the two-thirds rule gives

Large machine-part family formation utilizing a parallel ART1 neural network

the logical AND of the elements of the vectors P and X, and results in a new value for the vector C. The number of ones in the resultant vector …N† divided by the number of ones in the input vector X…D† gives the similarity S between the input vector and the vector to be stored at neuron j, which is contained in the forms of both Bj and Tj . The value of S must exceed a predetermined threshold value, called the vigilance parameter, for the input vector to be classi®ed into a speci®c category. If the condition is satis®ed, then Equations (2) and (3) are implemented, otherwise the second layer neuron with the next highest net value is tested in the same manner. When an input vector is stored in conjunction with a recognition layer neuron for the ®rst time, an exemplar will be created. This implies that a new category has been formed. Later, when similar vectors are applied they are recognized by comparison with this and other exemplars. If the similarity value signi®es that a suf®cient match is made, the two vectors are combined using the logical AND operation and the resulting vector X0 (new C) is stored in the form of the vectors Bj and its binary version Tj . The speci®c details of the ART1 processing can be found elsewhere (Carpenter and Grossberg, 1987).

2.3. Drawbacks and improvements of the standard ART1 neural network The ART1 paradigm in its basic form has many drawbacks that keep it from being an effective technique for optimizing the machine-part matrix (Dagli and Huggahalli, 1995a). As more input vectors are applied to the network, the stored patterns grow sparser. This effect can be minimized by changing the vigilance parameter of the network, but optimization still becomes dif®cult as the number of input patterns increases. The classi®cation process is also dependent on the order in which the input vectors are applied. Obviously, if the representation grows sparse as the number of input vectors increases, vectors with large amounts of 1's will not be classi®ed into existing groups and will create a new category. New categories may be created as the number of inputs increases. As with many neural networks it is also often dif®cult to choose the proper parameters. Likewise, with the ART1 network, determining the proper vigilance value can be problematic. For the machine-part matrix

595

problem too high a vigilance value will result in groups that are more similar, at the expense of creating too many groups. Too low a vigilance value will result in everything being placed into just a few groups, essentially performing no true classi®cation. Finally, it should be mentioned that bottleneck machines and exceptional elements are not addressed by the network. Tradeoffs between machine and material handling cost, along with restrictions on cell size and scheduling constraints, are also not addressed here. Each of these areas has been discussed in detail elsewhere (Flynn, 1987; Gupta and Tomkins, 1982; Kumar and Vannelli, 1986; Sahay and Seifoddini, 1987; Wei and Gaither, 1990). In an attempt to improve performance, Dagli and Huggahalli (1995a,b) proposed two changes to the standard ART1 algorithm. The ®rst change addressed the problem of the representation vectors becoming too sparse. Instead of storing the result of the logical AND operation between the vector components of X and P, the vector having the higher number of 1's between X0 and Tj was stored. A heuristic was also included so that if both X0 and Tj had the same number of vectors, then either of the vectors could be stored based on a prede®ned convention. This reduces the chance of passing up a good group because the representation vector was too sparse and uncharacteristic. The second change involved pre-processing of the machine-part matrix to help reduce the dependence of the ART1 as to the sequence of the input vectors. For the network to function properly with the ®rst proposed change, the vectors need to be presented to the network such that the smaller vectors, de®ned in terms of the number of 1's contained within the vector, are presented last. This allows group representations to be de®ned early with respect to the total number of 1's they contain and gives sparse vectors the opportunity to be classi®ed correctly. These changes result in substantial improvements in the optimization of the machine-part matrix with regard to reducing the number of machine-part groups formed and increase the speed of ®nding acceptable solutions. 3. Parallel computers and neural networks Although the speed of conventional computers continues to rise, hardware limitations eventually force users to seek alternative approaches for

596 improving system performance. Parallel computing offers one such alternative. The most apparent difference between serial and parallel computers is the number of processors involved. Initially, a parallel computer will operate in a similar manner to its serial counterpart except that now multiple processors can be programmed to perform identical or unique commands at the same time. This allows each step of the algorithm to be allocated to an individual processor, if the number of processors is larger than the number of steps. For instance, if 25 processors become available to perform 100 steps within the algorithm, then under the proper circumstances it may be possible to reduce the total processing time to just 4% of its serial counterpart. This type of drastic reduction is not always possible if one step in the algorithm relies on previous steps for input. However, it does become ef®cient when a particular step is executed multiple times, as is the case in neural networks when individual nodes calculate their net activation levels. Neural networks are ideally suited for parallel machines that utilize Single Instruction-Multiple Data (SIMD) operations because of their inherent parallelism and redundancy. This type of machine will execute a single set of instructions at each of the computer's processors. For neural network processing, each node on the input layer of the network will initially be performing the same operation. Therefore, a single instruction can be given to each of the processors so that activations at each of the network's input nodes can be calculated at the same time. The size of the input node is limited only by the number of processors available. Since this characteristic of neural networks can be exploited, many researchers have implemented neural network algorithms onto a number of parallel machines. Networks such as backpropagation, the Hop®eld model, and simulated annealing have been candidates for such an implementation (Jeong and Kim, 1990; Kerr and Barlett, 1992; McCartor, 1991; Papadourakis et al., 1989). The implementation of the ART1 neural network onto a CNAPS neuro-computer for the recognition of handwritten digits is of particular interest (Ratanapan and Dagli, 1995). 3.1. Architecture of the CNAPS system The CNAPS system is a parallel computer that utilizes SIMD operations. In its basic form the system has 256

Enke, Ratanapan and Dagli

sixteen-bit processors arranged as a linear array of homogeneous processors. All processors operate simultaneously by using a broadcast interconnection. Each processor receives the same instruction from a control unit, or sequencer, and executes on different portions of a data set stored within its memory. The CNAPS system communicates through a host machine, usually a UNIX station. The system will receive commands directly from the host computer, store them into a memory ®le, executes the commands, and send back any results to the host. A drawback of this procedure is that during operations most of the computational time results from ®le transfer between the host and the CNAPS computer. Because the system uses the SIMD structure, and is con®gured for neural network applications, it is sometimes referred to as a neuro-computer. The language that the neuro-computer uses is called CNAPS-C, a modi®ed version of ANSI C. A different form of the C language is necessary because programs are executed utilizing only 8- and 16-bit binary and integer data ®les. Floating-point data manipulation is not implemented. Fixed point arithmetic is used because it provides features of data movement that match the characteristics of the processors better that traditional ¯oating point arithmetic, ultimately allowing the hardware to operate much faster (McCartor, 1991). Many data conversion programs are also available to aid the neuro-computer.

3.2. The parallel ART1 architecture A brief outline of the details of the parallel processing of the ART1 architecture is discussed to highlight the advantages and disadvantages of implementing a parallel algorithm. The 10 processing steps are as follows: Step 1: Load the input vector X. Step 2: Calculate the activations of the nodes in the comparison layer. Step 3: Calculate the output C of the comparison layer. Step 4: Propagate C to the recognition layer. Step 5: Find the winning node P. Step 6: Propagate P to the comparison layer. Step 7: Calculate the new activations of the comparison layer nodes.

Large machine-part family formation utilizing a parallel ART1 neural network

Step 8: Calculate the new outputs of the comparison layer nodes. Step 9: Match the input and new comparison layer output. ÐIf there is no match, reset all the recognition layer nodes, prevent the winning node from competing again, loop back to step 2. ÐIf there is a match, go to step 10. Step 10: Update the weights. Each step in the sequence is performed after its predecessor. It is interesting to note how steps 2, 3, 7, and 8 take full advantage of the parallel processing, while the other steps are affected less by the use of multiple processors. Nonetheless, the advantages of parallel implementation can be tremendous, as presented in Section 4. Ratanapan and Dagli (1995) discuss the details and limitations of the parallel processing of the ART1 architecture in more depth. 3.3. Modi®ed ART1 parallel implementation Numerous changes are necessary to incorporate the modi®ed serial algorithm onto the parallel CNAPS neuro-computer. The algorithm itself was left unmodi®ed, but it was necessary to change certain data and variable allocations within the program. As mentioned in Section 3.1, the CNAPS system has its own form of C programming language which does not provide ¯oating-point manipulation. Anytime a ¯oating point calculation was made, the program code had to be changed to allow for the proper representation of that particular information. Since a large amount of information within the ART1 network involves binary representations, the ¯oating point calculations were performed using a separately designed subroutine. The modi®ed algorithm proves very effective in reducing the time necessary to execute the machinepart matrix optimization, while performing equal to or better than its serial counterpart. These results are illustrated in Section 4. Nonetheless, there was a dif®culty encountered with the original modi®ed algorithm with regards to the size of the input matrix that could be processed. The CNAPS computer that was utilized contained only 256 processors. As a result, the largest individual machine or part matrix size dimension optimized at that time could be no greater than 256 elements. Once the size of the

597

machine-part matrix extends beyond 65,536 elements (256 6 256), it becomes necessary to either add more processors to the computer, or modify the processing and data structures to allow for more ef®cient processing. Since the ®rst option involves an added expense that multiplies as one increases the size of the matrix, the second alternative was pursued. Two main data structures for the inputs and the outputs were initially embedded into each node of the network. Under the original data structure, when there is more than enough processors, the limitation of the parallel implementation is equal to the number of inputs plus the number of outputs, which comes out to 4082 byte (2041 nodes). This number is 14 bytes less than maximum and results from internal memory allocations. In an attempt to redesign this structure based on using 256 processors, a data wrap-around technique was employed. The input data and certain programming variables both participated in the data wrap-around. This allowed the method for storing data in each node to be redesigned, which provided more free memory for each individual node. As a result, the data limitation of each node was increased to 4 k bytes, which permitted the maximum number of inputs to increase to a number slightly over 3500. Once the allowable number of inputs increased beyond 256, it was also necessary to modify the precision of the calculations. The CNAPS provides 16-bit data manipulation, giving the widest range of precision to be 1=…216 †. This precision is simply not good enough to handle weights of a system that has more than 256 inputs. Therefore, an implementation of 32 bits for addition was utilized, which involves 24 bits after the decimal point. This allows the precision to decrease to 1=…224 †, for up to a total of 4000 inputs. Further modi®cations were also made to the program to take advantage of the characteristics of the ART1 processing. For instance, a simple division calculation is made during updating and initialization. It was possible to eliminate a major calculation within the program by creating a simple fraction lookup table. It was also noticed that since all row vectors of the ART1 weight matrix contain all zeros or one particular value, it was possible to reduce the size of the matrix in half by storing only the location of that particular value with a single ``1'' in the row, and remembering the individual value associated with the ``1's'' in that row. Likewise the input and output data structure matrices within the ART1 paradigm are identical, which allows one to be discarded. These two

598 factors together result in a decrease in memory to only 1/4 its previous allocation. All of these changes allow for a more ef®cient use of memory, an increased number of matrix elements, and the elimination of unnecessary or redundant calculations. 4. Results and comparison between the serial and parallel implementations In an attempt to compare the performance of the modi®ed ART1 architecture in both a serial and parallel environment, a number of machine-part matrices were randomly generated. Figure 3 illustrates the initial and ®nal 100 6 100 machine-part matrix (100 parts and 100 machines) processed using both the serial and parallel computer. For each of the ®gures the white area represents a ``1'' from the machine-part matrix, while the black area indicates an empty space, or non-correlation between a machine and a part. The ®rst observations that become apparent from the illustrations is that the ®nal matrices are identical for both the serial and parallel implementations, indicating that each of the implementations of the modi®ed ART1 algorithm appears to be performing the same operations. Bottleneck machines and exceptional parts were not included in these simulations so that a clearer demonstration of the machinepart matrix optimization could be observed. Tables 1 and 2 give grouping characteristics of the serial and parallel implementations, respectively, for the machine-part matrix optimization. Matrices of size 50 6 50, 100 6 100, and 256 6 256 are included in the table. The consistency of the results from the two implementations is once again apparent from the tables since the number of part groups and machine groups generated is the same. At this stage of the analysis all things appear to be equal with regard to the serial and parallel implementations. As mentioned earlier, the real advantage of utilizing the parallel neuro-computer involves the decrease in processing time, yet this decrease is not directly apparent from the total computer processing times given in Tables 1 and 2. The percent decrease in processing time between the serial and parallel implementations is also listed at the end of Table 2. For the machine-part matrix size of 50 6 50 the processing time almost doubles for the parallel implementation. Likewise, the processing time of

Enke, Ratanapan and Dagli

the parallel implementation for the 100 6 100 matrix is almost half as slow as its serial counterpart. It is not until the matrix size reaches 256 6 256 that the bene®ts of the parallel implementation appear to begin to pay off, decreasing total processing time by 23.1%. Although the processing times eventually get better for the parallel implementation as the matrix size increases, these numbers really do not illustrate the advantage of a parallel implementation of a neural network algorithm. As alluded to earlier, much of the processing time in the CNAPS neuro-computer involves ®le transfers, 32-bit operations, and data wrap around procedures. In fact, the movement of data between the UNIX host and the neuro-computer uses the majority of the total processing time. Table 3 breaks down the execution times for the ART1 network alone for both the serial and parallel implementations. This execution time includes only that time in which the characteristics of the neural network are being processed. All times involving ®le transfer or other extraneous operations have been eliminated. Table 3 illustrates that although the ART1 processing times are nearly identical for the small machine-part matrix size of 50 6 50, the decrease in ART1 execution time reaches 50.6% for the 100 6 100 matrix and approaches 78.2% for the largest matrix size of 256 6 256. Clearly the ®le transfer between the host and the neuro-computer slows down the processing. It is also worth mentioning that when the optimization was performed utilizing 16-bit operations, with noise added in the form of bottleneck machines and exceptional parts, the decrease in total processing times ranged from 41.1% to 69.6% for the three matrix sizes, with the decrease in ART1 execution time approaching 95.1% for the larger 256 6 256 matrix size (Enke et al., 1998). 4.1. Optimization of machine-part matrices beyond 256 3 256 elements It was previously illustrated how the parallel algorithm gave identical optimization results as the serial algorithm with regard to the ®nal machine-part matrix, while increasing the speed of processing as the machine-part matrix reached 256 6 256 elements. This section will provide the results produced by incorporating the modi®ed data structures and processing described in Section 3.3 that allowed for

599

Large machine-part family formation utilizing a parallel ART1 neural network

Fig. 3. 100 6 100 Machine-part matrix (serial and parallel implementation).

Table 1. Processing characteristics for the serial implementation Machine-part matrix size

Number of part groups

Number of machine groups

Total processing time

50 6 50 100 6 100 256 6 256

4 5 7

4 5 7

6.73 15.51 75.08

600

Enke, Ratanapan and Dagli

Table 2. Processing characteristics for the parallel implementation Machine-part matrix size

Number of part groups

Number of machine groups

Total processing time

Decrease in processing time (%)

50 6 50 100 6 100 256 6 256

4 5 7

4 5 7

13.26 21.89 57.77

ÿ 97.0 ÿ 41.1 23.1

Table 3. ART1 execution times for the serial and parallel implementations Machine-part matrix size

Serial ART1 execution time

Parallel ART1 execution time

Difference in execution time

Decrease in execution time (%)

50 6 50 100 6 100 256 6 256

2.31 9.38 59.45

2.30 4.63 12.94

0.01 4.75 46.51

0.4 50.6 78.2

matrix sizes beyond a single dimension of 256 elements. Figures 4±7 give the 500 6 500, 1000 6 1000, 2000 6 1000, and 3500 6 500 machine-part matrix optimizations, respectively. Each of these ®gures illustrates the initial random matrix along with the ®nal optimized matrix. Like the serial and parallel comparisons, bottleneck machines and exceptional parts were not included in the simulations. From these ®gures it is once again apparent that the modi®ed ART1 algorithm is able to optimize the

Fig. 4. 500 6 500 Machine-part matrix ( parallel implementation).

machine-part matrix into clear and well de®ned machine-part groups. It is also worth mentioning that the columns and rows of the ®gures are proportional to their matrix dimension, but when compared together the individual matrices in each of the illustrations are not to scale. Table 4 gives the number of part and machine groups for each of the matrix sizes tested. The total processing times, along with the ART1 executions times, are also listed. From the table it becomes apparent that the majority of the total processing time involves the

Large machine-part family formation utilizing a parallel ART1 neural network

Fig. 5. 1000 6 1000 Machine-part matrix ( parallel implementation).

Fig. 6. 2000 6 1000 Machine-part matrix ( parallel implementation).

601

602

Enke, Ratanapan and Dagli

Fig. 7. 3500 6 500 Machine-part matrix ( parallel implementation).

®le transfer process. In addition, the percentage of time for the ART1 algorithm execution is relatively consistent for each of the matrix sizes. This differs from the decrease in execution times in Table 3. This inconsistency from Table 3 results from the fact that beyond the matrix size of 256 6 256 the parallel implementation must begin to process blocks of information serially. In other words, the information within each block will be executed in parallel, but the individual blocks must be processed sequentially. Therefore, the advantage of parallelism for each block will be similar to the processing times of the matrix size of 256 6 256, with the processing time of the larger matrix sizes increasing in a linear manner. Nonetheless, optimization of large matrix sizes can be executed in a relatively short amount of processing time compared to previous serial methods.

5. Conclusions The advantages of using an ART1 neural network for machine-part optimization are tremendous. First, the ART1 algorithm supports on-line learning, allowing new parts and machines to be immediately classi®ed and scheduled on the shop ¯oor. The model does not have to be modi®ed or ``retrained'' to develop a new level of optimization. New machine-part relationships can be instantly classi®ed. In addition, during initial training of the network only the vectors representing the relationships between the machines and parts are provided to the network. Ideal or expected input is not necessary because the network self-organizes the data in an unsupervised manner. This frees the modeler from having to provide constraints, ulti-

Table 4. Total processing and ART1 execution times for the parallel implementation Machine-part matrix size

Number of part groups

Number of machine groups

Total processing time

ART1 execution time

500 6 500 1000 6 1000 2000 6 1000 3500 6 500

7 8 8 8

7 8 8 8

127.10 226.44 400.99 627.53

29.33 61.04 108.53 141.18

Large machine-part family formation utilizing a parallel ART1 neural network

mately allowing for a more ef®cient machine-part grouping since it is only the data that drives the process. The architecture of the ART1 neural network is also structured such that numerous highly connected nodes each operate in a similar manner, making these nodes ideal for a parallel implementation. As illustrated, incorporation of the ART1 neural network into a parallel environment can be done with relative ease, due to the inherent parallel nature of the algorithm. In addition to the vast reduction in processing time, slight modi®cations in the way that data is structured and processed has also made it possible to increase the size of the machine-part matrices beyond the limits of the parallel machine without necessarily purchasing extra processors. Although this approach will force the algorithm to process blocks of data sequentially, the parallel processing of the information within the blocks can result in tremendous gains as the machinepart matrix size increases. Although the advantages of utilizing a parallel algorithm to optimize the machine-part matrix are encouraging, there are limitations. First, and most obvious, is the extra time involved in performing the ®le transfers, 32-bit operations, and data wrap around procedures. Although ®le transfers consume most of the processing time, the current con®guration of the neuro-computer makes it dif®cult to incorporate any changes. A possible solution would involve utilizing a system that eliminates the host machine and makes all of the ®le transfers internal. PC based neurocomputers currently exist that allow for the direct implementation of parallel algorithms onto the manufacturing ¯oor. Not only will these systems make it possible to optimize matrices with thousands of machines and parts in a fraction of the currently required time, but they will also go a long way towards eliminating some of the current processing time problems encountered with the transferring of data ®les.

References Banerjee, K. G. and Redford, A. H. (1982) Visual inspection of components for mechanized assembly. International Journal of Production Research, 20, 545±553. Carpenter, G. A. and Grossberg, S. (1987) A massively parallel architecture for a self-organizing neural pattern

603

recognition machine. Computer Vision, Graphics, and Image Processing, 37, 54±115. Dagli, C. and Huggahalli, R. (1991) Neural network approach to group technology. Knowledge Based Systems and Neural NetworksÐTechniques and Applications, Sharda, R., Cheung, J. Y. and Cochran, N. J. (Eds.), Elsevier, pp. 213±228. Dagli, C. (Ed.) (1993) Arti®cial Neural Networks for Intelligent Manufacturing, Chapman and Hall. Dagli, C. and Huggahalli, R. (1995a) Machine-part family formation with the adaptive resonance theory paradigm. International Journal of Production Research, 33(4), 893±913. Dagli, C. and Huggahalli, R. (1995b) A neural network approach to group technology. Neural Networks in Design and Manufacturing, Wang, J. and Takefugi, Y. (Eds.), World Scienti®c, pp. 1±55. Enke, D., Ratanapan, K. and Dagli, C. (1998) Machine-part family formation utilizing an ART1 neural network implemented on a parallel neuro-computer. International Journal of Computers and Industrial Engineering, 34(1), 189±205. Flynn, B. B. (1987) The effects of setup time on output capacity in cellular manufacturing. International Journal of Production Research, 25(12), 1761±1772. Groover, M. P. (1987) Automation, Production Systems and Computer Integrated Manufacturing, 2nd Edn., Prentice Hall. Gupta, R. and Tomkins, J. A. (1982) An examination of the dynamic behavior of part families in group technology. International Journal of Production Research, 20(1), 73±86. Huggahalli, R. (1991) A neural network approach to group technology, MS Thesis, University of Missouri-Rolla Library. Hyer, N. L. (1984) Group Technology at Work, Society of Manufacturing Engineers. Hyer, N. L. and Wemmerlov, U. (1989) Group technology in the US manufacturing industry: a survey of current practices. International Journal of Production Research, 27(8), 1287±1304. Jeong, C. S. and Kim, M. H. (1990) Fast parallel simulated annealing for traveling salesman problem. Proceedings of the International Joint Conference on Neural Networks, 3, 947±953. Kaparthi, S. and Suresh, N. C. (1992) Machine-component cell formation in group technology: a neural network approach. International Journal of Production Research, 30(6), 1353±1368. Kerr, J. P. and Barlett, E. B. (1992) SPECT reconstruction using backpropagation neural network implemented on a massively parallel SIMD computer. Proceedings of the Fifth Annual IEEE Symposium on Computer-Based Medical System, pp. 616±623.

604 King, J. R. and Nakornchai, V. (1982) Machine-component group formation in group technology: review and extension. International Journal of Production Research, 20(2), 117±133. Kumar, K. R. and Vannelli, A. (1986) A method of ®nding minimal bottleneck cells for grouping part-machine families. International Journal of Production Research, 24(2), 387±400. Malave, C. O. and Ramachandran, S. (1991) Neural network-based design of cellular manufacturing systems. Journal of Intelligent Manufacturing, 2(5), 305± 314. McCartor, H. (1991) Back propagation implementation on the adaptive solutions CNAPS neurocomputer chip. Advances in Neural Information Processing System 3, Lippman, R. P. (Ed.), 1028±1031. Moon, Y. (1990) Interactive activation and competition model for machine-part family formation. Proceedings of the International Joint Conference on Neural Networks, Washington D.C., 2, II±667±670. Papadourakis, G. M., Heileman, G. L. and Georgiopoulos, M. (1989) A parallel implementation of the hop®eld network on GAPP processors. Proceedings of the International Joint Conference on Neural Networks, 2, 582. Ratanapan K. and Dagli, C. H. (1995) Implementation of

Enke, Ratanapan and Dagli ART1 Architecture on a CNAPS neuro-computer. Proceedings of SPIE, Applications and Science of Arti®cial Neural Networks, 1, 104±110. Sahay, A. K. and Seifoddini, H. (1987) An algorithm for forming machine±component cell in group technology. Proceedings of the 9th International Conference on Production Research, 1, 1314±1321. Seifoddini, H. and Wolfe, P. M. (1986) Application of the similarity coef®cient method in group technology. IIE Transactions, pp. 271±277. Seifoddini, H. (1989a) A note on the similarity coef®cient method and the problem of improper machine assignment in group technology applications. International Journal of Production Research, 27(2), 1161±1165. Seifoddini, H. (1989b) Duplication process in machine cells formation in group technology. IIE Transactions, pp. 382±388. Simpson, P. K. (1990) Arti®cial Neural Systems ± Foundations, Paradigms, Applications and Implementations, Pergamon Press, 1st Edn. Wasserman, P. D. (1989) Neural ComputingÐTheory and Practice, Van Nostrand Reinhold. Wei, J. C. and Gaither, N. (1990) An optimal model for cell formation decisions. Decision Sciences, 21(2), 243± 257.