Department of Manufacturing Engineering,. â. Department of ... design retrieval, constraint base engineering draw- ing, spatial .... pense is downloaded to the Seiko RT3000 robot ma- nipulator along with ...... Burke and S. Rangwala, âTool condition monitor- ing in metal ... N. Rail and R. Pitchumani, âRapid cure simulation.
Journal of Electronics Manufacturing, Vol. 10, No. 1 (2000) 19–48 c World Scientific Publishing Company
DESIGN ISSUES ASSOCIATED WITH NEURAL NETWORK SYSTEMS APPLIED WITHIN THE ELECTRONICS MANUFACTURING DOMAIN A. A. WEST, C. J. HINDE,∗ C. H. MESSOM,∗ R. HARRISON and D. J. WILLIAMS Department of Manufacturing Engineering, ∗ Department of Computer Science, Loughborough University, Loughborough, Leicestershire, LE11 3TU, UK Received 1 September 2000 Accepted 2 October 2000 Neural networks have been applied within manufacturing domains, in particular electronics industries, to address the inherent complexity, the large number of interacting process features and the lack of robust analytical models of real industrial processes. The ability of neural systems to provide nonlinear mappings between process features and desired outputs has been the major driving force behind implementations. One of the major issues limiting the widespread industrial uptake of neural systems is the lack of detailed understanding of their design, implementation and operation. In many cases, network topologies and training parameters are systematically varied until satisfactory convergence is achieved. There is little discussion of the rationale behind the adopted training methods. A review of research into the functions that can be readily represented by neural networks are presented in this paper. The application focus is the control and monitoring of a discrete manufacturing process that is part of the manufacturing cycle of mixed technology surface mount printed circuit boards. Detailed knowledge of the process operation and functionality that can be represented by simple network topologies have been combined to develop a structured, partially interconnected neural network that provides optimised convergence performance. A comparison of the designed solution with standard approaches to neural network implementation is given. It has been demonstrated that if there is sufficient confidence in the operation of the process, input feature interaction within the network can be constrained to produce a robust control and monitoring system. Keywords: Surface mount printed circuit boards, neural networks, control and monitoring.
1. Introduction
of parameters crucial to the quality of the product and enable optimisation of the process in the presence of manufacturing faults.3,4
The reduction in circuit size enabled by the adoption of surface mount devices has been one of the major influences on the electronics industry over the past decade.1 In addition production volumes have dramatically increased and competition within the global markets ensures that reliable and fault free production processes are required.2 Considering the complexity of the processes involved and the increased amount of data that is captured, techniques are required to determine useful information from the recorded data. Neural networks have been proposed as a possible technology to support an identification
1.1.
Neural network applications within the manufacturing domain
There have been numerous applications of neural networks within general manufacturing and electronics manufacturing domains in particular. Example target applications are referred to and links to the literature given in Fig. 1. All aspects of the system lifecycle (requirements analysis, specifications, design, implementation, test and maintenance) have
19
20 A. A. West et al.
Fig. 1. Applications of neural networks in general manufacturing domains and electronics manufacture in particular.
been addressed. In each manufacturing domain neural networks provide support for the reduction of complexity by generating a mapping from measured inputs to desired outputs. A brief summary of typical applications within the different manufacturing areas is given below: • Group Technology: Replacing traditional visual inspection and subjective human judgement to determine part classification, common parts-common processing, machine-part-cell formation.5–8 • Engineering Design: Supporting both the design of products based on design rules (e.g. association for design retrieval, constraint base engineering drawing, spatial reasoning in computer aided design, mappings from form to function)9,10 and configuring functionally complex systems from standard systems or components.11 • Monitoring and Diagnosis: Support for pattern recognition within signal collection,12 extraction of characteristics in signal analysis 13 and to enable efficient fault identification.14,15 • Process Control: Support for processes that require human intelligence within the control loop and not adequately supported by mathe-
•
•
•
•
matical models16,17 (e.g. blanking,18 gas tungsten arc welding19 adaptive control for continuous processes),20 association of the mappings undertaken by skilled operator to mimic human control.21 Process Modelling: Incorporating new knowledge into existing models by allowing self improvement and adaption, representing models of complex processes, capturing operator knowledge and generating a single coherent model from partial knowledge.22–25 Quality Assurance: Enhancing operator performance for “reactive” classification analysis for scrap and rework, sampling plans, lot acceptance determination, inspection, product finishes26,27 and “proactive” analysis for risk analysis, full factorial experimental design and cause and effect mappings.28 Scheduling: Providing mappings to support complex resource allocation under sequencing constraints, mapping resources and tasks to machines within job shop scheduling, flexible manufacturing system scheduling.29–32 Process Planning: Support for the knowledge associated with mappings from product and
Design Issues Associated with Neural Network Systems 21
machining features to machine sequences and operations, representation of the mapping between product design specifications to process specifications.33–35 In addition to the above general application areas there have been a number of applications within the electronics manufacturing area. The inherent complexities of the processes and the requirement to identify and control parameters crucial to the quality of the products have been the major driving forces behind the applications. Typical applications involve: • Wafer Fabrication: Prediction of wafer yield,36 determination of optimal batch size37 and proactive (rather than reactive) prediction of performance under a variety of operating conditions,38 • Chemical Vapour Deposition: Fine pitch stencil printing applications,39 metal organic chemical vapour deposition process mapping,40 • Plasma Processing: Used to determine inputoutput models of analytically intractable processes,41,42 equivalent performance to operator based control43 and recommendations consistent with engineering understanding of process adjustment,44 • Thin Film Deposition: Determination of deposition rate from electron diffraction pattern classification,45 • Reactive Ion Etching: Mapping between D.C. bias currents to etch rates for complex process,46 • Component Placement Errors: Corrective maintenance and dynamic diagnostics based upon mappings of process parameters and frequency of nozzle and feeder errors,47 • Multichip Module Manufacturing: Mapping between input process characteristics (e.g. spin coat, soft bake, expose, cure, plasma descum) and output quality in via formation to improve process yield,48 • Spot Welding: Mapping from physical features of nugget, electrical parameters during the welding process to the quality of weld,49 and • X-ray Image Inspection: Image processing and classification for semiconductor inspection process.50 Major advantages have been claimed as a result of the application of neural networks within the above manufacturing areas. These have been summarised in Fig. 2. The pattern matching capability of the networks are complementary to traditional artificial
intelligence techniques (e.g. rule based systems)51 and the claimed learning and adaption capabilities are desirable within the highly complex application domains. In addition the modular, interconnected structure of the networks can be utilised to provide solutions comprising efficient distributed computational elements that operate in parallel for increased speed. The capability of associating patterns from incomplete information, the inherent robustness and the recognition and classification capabilities are reported as the cornerstone of many of the successful applications. In spite of the above advantages there are nevertheless are a number of issues that must be addressed before neural networks can be applied with confidence. For example: • There are no clear guidelines as to how an optimal training set can be determined. • It is not easy to interpret the representation of the knowledge within the network.68 • There is no clear understanding of the complex nonlinear mappings that are represented within the trained networks. • It is difficult to explain why a decision was undertaken. • More efficient learning algorithms are needed to speed up training and guarantee convergence. • There no clear guidelines on the type of network that should be chosen to solve particular problems. • The associated learning parameters for network training are generally chosen by trial and error. • There are no recommendations as to the selection and parameterisation of appropriate nonlinearities to be included within the network (Sec. 3). • There are no clear guidelines as to the required topology of the network for a particular application. In the application of neural networks to the adhesive dispensing process reported in this paper, an attempt has been made to design the structure the network and optimise convergence performance based on an understanding of both the process requirements (see Sec. 2) and the basic operation of the network components (Sec. 3). The operation of the adhesive dispensing testbed is detailed in Sec. 2 to enable both the material and actuator/sensor control and monitoring requirements to be appreciated. The fundamental behaviour of an interconnected network of simple neurons is detailed in Sec. 3 to enable the structuring method to be understood. Depending on
22 A. A. West et al.
Fig. 2 Summary of the advantages and issues associated with neural network applications.
(i) the “type” of control and monitoring and diagnosis required (i.e. thresholding, banding to a tolerance around a set-point) and (ii) the amount of interaction between the expected between process variables it is shown that the topology of the network and values of the interconnection strengths can be designed to achieve optimal convergence (Secs. 3.3, 3.4 and 3.6). The behaviour of a minimally interconnected network has been compared with more traditional fully interconnected and a partially interconnected network (i.e. a network where expected interactions are interconnected whereas independent variables segregated). The network experiments reported in Sec. 4 were undertaken using both simulated and “real” data in an attempt to determine the effect of training set selection on network convergence. The simulated data contained 50 examples of “ideal” process operation and process fault characteristics. On the other hand the real process data was a summary of operation and fault signatures determined from a large number of experimental runs. The results of applying the network to the control
of the adhesive dispensing testbed are discussed in Sec. 4.3. Conclusions and lessons learned are discussed in Sec. 5. 2. The Adhesive Dispensing Testbed for the Manufacture of Mixed Technology Surface Mount Boards The manufacture of mixed technology boards is an important component process within surface mount printed circuit board (PCB) production. In mixed technology PCB’s, surface mount devices are present together with “through the hole” leaded devices. The surface mount devices are glued to the board with small quantities (0.0002–0.005 cm3 ) of adhesive prior to the insertion of the leaded devices and subsequent wave soldering of the complete assembly. There are several techniques available for adhesive dispensing e.g. pin transfer, screen printing, positive displacement and pressurised syringe dispense.52 Each technique has its own advantages and drawbacks but by far the most versatile in terms of the
Design Issues Associated with Neural Network Systems 23
variety of adhesive blob sizes that can be dispensed is pressurised syringe application. Although simple in concept, the process is complicated by the operational behaviour of the process actuators, sensors and material (e.g. Loctite Chipbonder 360) which has been shown to exhibit highly thixotropic (i.e. a viscosity dependent on shear history and temperature) behaviour.53 In current manufacturing systems, the quality of adhesive dispense is only checked at the beginning and end of each board application. Between these fiducial points, the process is essentially operated in “open loop” and it is up to the operator to ensure that faults are detected prior to post processing of the PCB. As illustrated below, monitoring of “open loop” system operation has indicated that with all “measurable” parameters constant, variations in the amount of adhesive dispensed of the order of +/ − 35% can be obtained3 which is typical of what is achieved in industrial environments. 2.1.
The dispensing system testbed at Loughborough University
An instrumented adhesive dispensing testbed has been developed at Loughborough University to enable the operation of the process to be studied in detail. A photograph of the dispensing head components is given in Fig. 3(a) and represented schematically in Fig. 3(b). Adhesive is dispensed by applying a pulse of air to a pre-packed syringe, the amount of adhesive dispensed being related to the area under the pressure-time pulse curve (Fig. 6(a)). The pressure pulse is controlled by a Watson and Smith Type 101X voltage controlled regulator and a Martonair 12V, 10W solenoid valve which determine the pulse height and width respectively. The primary control parameter is the height of the pressure pulse since altering the pulse width affects the cycle time of the process and hence the manufacturing throughput. A Sensym LX06030G pressure transducer provides information on the variation of pressure within the syringe and the ambient temperature is monitored using a RS590 semiconductor temperature sensor. Vision feedback on the plan area of each dispensed blob is obtained using a Hitachi (1024 × 1024) CCD camera mounted on the dispensing head. The plan area was adopted as an appropriate measure of the dispensed volume after extensive attempts to determine the volume using other techniques (e.g. structured lighting). The plan area provides the primary feedback variable for the neural network controller
discussed in Sec. 4.3. The complete system is illustrated in Figs. 4(a) and (b). Figure 4(a) is a photograph of the system components that are detailed in the schematic of Fig. 4(b). There are four major components: • a Sun 3/160 (later upgraded to a Sparc10) computer that provided a platform for the execution of the high level vision analysis, cell controller and system database (e.g. for dispense parameters and analysis results storage) software, • an Imaging Technology ITI 151 Pipelined Image Processor that was used for low level processing (e.g. image acquisition, thresholding, storage)54 of dispensed adhesive “blob” images, • a Seiko RT3000 Robot Manipulator and associated Z80 Darl processor that provided a transport mechanism for the dispensing head and included cell controller interface software and robot subroutines and • an “in-house built” target controller comprising two 680×0 VME based processors, associated random access and shared memory, IEEE and RS232 hardware interfaces, AD/DA interface hardware and included the low level pulse analysis (see below), sensor interface and cell controller interface software. Two VT220 visual display units (not shown in the photograph of Fig. 4(a)) provided interfaces to the target hardware.
2.2.
Control of pressurised syringe adhesive dispensing
Three major components of adhesive dispensing control system are required to (i) maintain set points, (ii) control sequence of operations and (iii) monitor and diagnose system faults.55 “Bang-bang” control has been adopted within the neural network controller since this paradigm is (i) simple to apply and (ii) has been shown to provide control to within +/ − 5% under rule based control.51 Under this paradigm the pulse height of the pressure pulse within the syringe is changed by a fixed amount whenever the dispensed blob falls outside pre-determined (i.e. +/ − 5% of target blob size) tolerance levels. The basic sequential operation of the software control loop is illustrated in table 1 and does not involve any complicated sequential or combinational logic. The x-y co-ordinate of the current dispense is downloaded to the Seiko RT3000 robot manipulator along with parameters such as robot
24 A. A. West et al.
Fig. 3. (a) Photograph of the adhesive dispensing head. (b) Schematic representation of the adhesive dispensing head.
Design Issues Associated with Neural Network Systems 25
(a) Dispensing Head Head Dispensing
Target Controller Controller Target
Image Processor Processor Image
Sun 3/160 3/160 (Sparc (Sparc 10) 10) Sun
Robot Controller Controller Robot
Fig. 4. (a) Photograph of the major hardware components of the adhesive dispensing system. (b) Schematic representation of the major hardware and associated software functionality of the adhesive dispensing system.
26 A. A. West et al.
Table 1. The glue dispensing software cycle.
translation speed and delays. The actuator voltages are set on the target controller DAC outputs and the current blob is dispensed. Following retrieval of the 680 × 0 pressure pulse analysis the vision analysis is started. Upon retrieving the vision analysis results, fault detection and control actions using the neural network controller are determined, the parameters for the next dispense are assigned and the results stored. The monitoring and diagnostics capabilities of the adhesive dispensing system is based upon measurements (and associated parameters derived from the measurements) of (i) a “plan” image of the dispensed blob area, (ii) the pressure variation within the syringe of adhesive and (iii) the ambient temperature. Figure 5 illustrates the image processing parameters adopted. Basic image processing transformations (e.g. “snap”, “histogram” and “threshold”) are used to transform the 1024 × 1024 “raw” image into a binary representation. A chain coding algorithm54 is used to derive measures of the plan area, perimeter and centre of gravity of the binary image. A measure of the circularity of the image is given by deriving the ratio of the area of the “blob” to the area of a box around the extremes of the image. If the dispensed blob is circular (radius r) then this measure, termed the Box Area Ratio” (BAR), is given by: BAR =
π × r2 π = 2r × 2r 4
The BAR parameter is particularly useful in enabling the detection of problem dispenses in which trails of adhesive, which could lead to solder pad contamination, are present (Fig. 5 and Sec. 2.3).
The parameters derived from the variation of pressure within the syringe of adhesive are illustrated in Fig. 6(a). The pressure variation is represented by pulse height, pulse width and the 10%–90% (of peak pulse height) risetime and falltime parameters. These parameters are not independent since the risetime, pulse width falltime are dependent on the pulse height. Some measure of noise immunity is provided by the adoption of the use of the 10%–90% limits in deriving the risetime and falltime parameters. 2.3.
Variability within the adhesive dispensing process
The variability of the process can be categorised as being due to material variation and/or process actuator variation. In both cases faults can lead to catastrophic unpredictable failure. The process faults are illustrated below (see Table 2) by taking examples from real data recorded during the “open loop” operation of the system. In all cases the variation in the temperature throughout the run was < 0.5 ◦ C. Three categories of material variability, (i) high frequency, (ii) low frequency and (iii) adhesive trials have been observed. High frequency variability is observed as a large increase in blob area, in the majority of cases over the period of a single dispense, followed by either a tiny dispense or a completely missed dispense. Figures 7(a) and (b) are photographs illustrating a typical increased dispense and tiny dispense respectively. Note: the spatial positions of the images in the photographs are not relevant since the (x-y) dispense sequence has been randomised in order to represent more accurately the industrial environment where each dispense is not
Design Issues Associated with Neural Network Systems 27
Fig. 5. Image processing parameters derived for the adhesive dispensing system and illustration of fault detection using box area ratio parameter.
Fig. 6. (a) Parameters derived from the variation of pressure within the syringe of adhesive. (b) Illustration of fault signatures associated with the variation of pressure within the syringe of adhesive.
28 A. A. West et al.
Table 2. Sources of variability within the adhesive dispensing testbed.
located on a simple sequential x-y grid. The variation in the measured blob area of a number of these “bubbles in the flow” is given in Fig. 8. In some cases the bubbles are preceded by a gradual increase in the blob area followed by a missed dispense (see cases (b) and (c)) which could allow possible detection prior to failure. However these precursors occur relatively infrequently compared to the other examples (cases (a), (d) and (e)) in which there are no precursors and the only solution is for the operator to intervene and remove the problem dispense. These voids or bubbles in flow are believed to result from regions of low viscous material that develop at the adhesive-air interface within the syringe. Consistent operation is usually recovered after purging the syringe off-board for typically ten dispenses.
An example of aperiodic (low frequency) variation in dispensed volume throughout a run is given in Fig. 9(a). A variation of ∼ 35% can be observed throughout the run. There are no apparent precursors to this variation, the measured pulse height (Fig. 9(b)), risetime, pulse width and falltime (Fig. 9(c)) were constant throughout the run indicating a constant pressure pulse within the syringe throughout the run. Due to the lack of correlated process and environmental parameter variation it is postulated that the low frequency variations are due to local variation in material properties that cannot be monitored with the current process sensors. Examples of adhesive trails are given in the photograph in Fig. 7(c). The trails can lead to possible solder pad contamination and are detected via a
Design Issues Associated with Neural Network Systems 29
Fig. 7. Photographs illustrating the effects of material variability: (a) increase in blob area associated with the onset of a bubble in the flow, (b) decrease in dispensed blob immediately following the increased dispense associated with the onset of a bubble in the flow and (c) illustrations of adhesive trails.
reduction in the BAR parameter (Fig. 5). BAR variation for the dispensing run illustrated in the photograph of Fig. 7(c) is given in Fig. 10. Adhesive trials are thought to be the result of viscosity, process and/or environmental dependent behaviour of the material and dispensing mechanism. This is due to the fact that they tend to be observed when the syringe of adhesive is nearly empty and/or at high operating temperatures. Other causes of system variability are due to the process actuators and sensors for the control and monitoring of the variation of pressure within the syringe of adhesive. Idealised fault signatures are illustrated in Fig. 6(b) with the main reasons behind the variability listed in Table 2. Variability due to (i) a sticking solenoid valve, (ii) system air leak, (iii) load on the air line, (iv) random pulse width variation and (v) measurement noise have been observed and documented.56 As a typical example of the effect of actuator variability on dispensed blob area, Figs. 11(a), (b) and (c) detail the presence of a sticking solenoid valve and measurement noise. The
variation of blob area throughout the run is given in Fig. 11(a). Four definite regions (e.g. approximate dispense regions: 200–250, 300–330, 370–400 and 425–475) of increased area variation can be
Fig. 8. Illustrations of the variation in measured blob area for a number of bubbles in the flow (i.e. high frequency material properties).
30 A. A. West et al.
(a) Fig. 10. Box area variation for the adhesive trails illustrated in figure 7(c).
(b)
(c) Fig. 9. (a) Illustration of aperiodic (i.e. low frequency material properties) variation in dispensed volume. (b) The measured and programmed pulse height variation for the dispense run illustrated in figure 9(a). (c) The risetime, falltime and pulse width variation for the dispense run illustrated in figure 9(a).
observed superimposed on an otherwise stable run. The measured pulse height, detailed in Fig. 11(b), was stable apart from three noisy measurements around dispense numbers 20, 130 and 150. The risetime, pulse width and falltime data presented in Fig. 11(c) provide the reason for the increased variation in blob area. Four regions are apparent where the pulse falltime (and pulse width) show extreme variation, randomly increasing from the average values. Comparison with Fig. 11(a) shows a direct correlation with the increased variation in blob area. Random increases in the falltime indicate a failure of the solenoid valve to close properly which would be expected to produce random increases in the dispensed volume since this is proportional to the area under the pressure pulse curve. The falltime problem was subsequently traced to a build up of carbon around the solenoid valve piston. This “coking” was remedied by stripping and cleaning the assembly. It is to be also noted that: (i) inspection of the falltime data for dispense numbers 1–30 indicate less obvious increased variation that is correlated with variation in the blob area data. This suggests that the solenoid valve was sticking right at the beginning of the run and (ii) the three noisy pulse height measurements (Fig. 11(b)) have resulted in spurious values for the risetime, pulse width and falltime (Fig. 11(c)) since the parameters are not independent (Fig. 6) and furthermore the variations are not correlated with any observed effect within the blob area data (Fig. 11(a)).
Design Issues Associated with Neural Network Systems 31
3. The Design of Neural Networks for Manufacturing Control The development of neural networks has been driven by the interest in the creation and emulation of human intelligence (i.e. the process, manner and performance of thought) within the artificial intelligence community.57–59 There are two main directions of research: (i) using neural theories to explain human intelligence 60 and (ii) incorporating neural theories to develop intelligent artifacts. The latter is the focus of the research presented in this paper. (a)
(b)
Fig. 11. (a) Variation in dispensed volume for a dispense run when a sticking solenoid valve problem was observed. (b) The measured and programmed pulse height variation for the dispense run when a sticking solenoid valve problem was observed. (Note the noise spikes on the measured pulse height for dispenses ∼ 20, 130, 150). (c) The risetime, falltime and pulse width variation for the dispense run when a sticking solenoid valve problem was observed.
3.1.
Basic operation of neural systems
The basic theories behind the operation of the brain and the mathematical idealisations which are the foundations of neural network implementations are illustrated in Figs. 12(a) and (b). It is well established that the brain is made up of a large number (e.g. 1011 ) of interconnected neurons.62 Each neuron consists of a nucleus, surrounded by a cell body (i.e. a semi-permeable membrane) that receives inputs on connections termed dendrites. The output from each neuron is via a connection termed the axon that is attached to the inputs of other neurons at points termed synapses. There is evidence that the connectivity is essentially pre-determined with learning proceeding via strengthening or weakening of pre-configured connections.60,61 Thermodynamics provides evidence to explain the operation of each neuron.62 A potential of ∼ −70 mV exists within the cell body. If the input connections reduce this potential to < 60 mV threshold then the cell body membrane becomes permeable and produces an output. The concepts of pre-determined interconnection, summation of inputs, and comparison via a threshold mechanism to produce an output are fundamental to the mathematical idealisations of neural networks,63–65 (Fig. 12(b)). The features (Fi ) of a pattern are connected to the input nodes of the network. Each subsequent node in the network operates in the same way. A weighted sum over all of the releP vant input features (i.e. (input features: Fi ) × (interconnection strengths: Wi ) is performed and compared using a threshold function to produce an output, which is either used as input to other neurons or if the neuron is an “output neuron” then represents a pattern class (Pi ). Two extremes of threshold function are illustrated for the “perceptron” network in Fig. 13. The “hard limit threshold”, adopted within the early perceptron applications,63 produces either
32 A. A. West et al.
Fig. 12. (a), (b) Illustration of (a) the basic theory of the operation of neurons in the brain and (b) associated mathematical idealisations.
an output P1 (e.g. +1) if the weighted sum of the input > 0 or an output P2 (e.g. −1) if the weighted sum of the inputs < 0. The “sigmoid function” is alternative threshold function derived from a thermodynamic approach to the training of networks using the “Feed Forward Back Propagation” (FFBP) algorithm.66 The sigmoid function used to implement (1, −1) logic threshold is: −1 sigmoid(1,−1) (X, T ) =
2 X 1 + exp − T (1)
where X is the input to the respective neuron: X=
n X i=0
Wi × Fi
(2)
and T is “temperature” measure that determines the “steepness” of the sigmoid function (i.e. a lower temperature indicates a steeper function, see dashed line in Fig. 13) and is one of the learning parameters associated with network training (Sec. 3.5). The temperature parameter is associated with a thermodynamic model of training to avoid solutions that represent local minima and not the best overall minimum solution. For example a high temperature (flat sigmoid) implies that there is a lot of energy within the system to ensure that the system “bounces” around the solution space and hence does not stuck in local minima.66 The trade off is that training can take a large amount of time to settle down to the optimum solution. The sigmoid function does not have a hard logical decision structure like the hard limit threshold function. Since the output of the sigmoid function varies within the range −1.0 to 1.0 decision thresholds must be specified to define a tolerance band
Design Issues Associated with Neural Network Systems 33
Fig. 13 Illustration of typical threshold functions (e.g. hard limit and sigmoid) adopted within neural networks.
around each decision. A decision threshold of 0.95 is indicated in Fig. 13 for reference. It is noted that the range of values around which a decision is considered valid is dependent on the temperature constant of the sigmoid function and the magnitude of the weights in the neural network. 3.2.
Representation of simple logic functions
Figure 14 illustrates representations of simple logic functions using a simple two-layer perceptron network implementing a hard limit threshold for simplicity. There are three input features (F0 , F1 and F2 ) and three associated interconnection weights (W0 , W1 and W2 ). The input to the output node is given by: X = W0 F0 + W1 F1 + W2 F2
(3)
If F0 = 1.0 (i.e. used as a bias node) then: X = W0 + W1 F1 + W2 F2
(4)
which is analogous to the straight-line representation “y = mx + c” where W0 is a constant allowing lines to be offset from the origin.
Taking the AND representation as an example, given the (1, −1) truth table for AND logic as: F1
F2
O/P
1.0
1.0
1.0
−1.0
1.0
−1.0
1.0
−1.0
−1.0
−1.0
−1.0
−1.0
Then if the connection weights are selected as W0 = −1.4, W1 = 1.0 and W2 = 1.0, inspection of equation n using the values for F1 and F2 in the truth table above results in the AND logic i.e. O/P = 1 only when F1 = 1.0 and F2 = 1.0 and is −1 otherwise. Similar analysis yields the NAND, OR and NOR logic in Fig. 14. The above analysis has been shown to be valid only for linearly separable functions (i.e. patterns that can be discriminated by a straight-line decision surface as illustrated in Fig. 14). Non-linearly separable (NLS) logic (i.e. the output patterns cannot be separated by a single straight line decision surface, for example the exclusive OR problem)67 can be represented if there is a priori knowledge of the required decision surface/surfaces. For example the NLS logic
34 A. A. West et al.
Fig. 14. Representation of simple AND, NAND, OR and NOR logic functions with a simple two layer network.
may be separable by either (i) using an algebraic function of the set of input features (e.g. the Polynomial ADaptivE LInear NEuron (PADELINE))67 to discriminate between the output patterns or (ii) using a multi-layered network. Multi-layered networks can be used to represent a much wider class of functions than a single layer.67 For example, networks with an intermediate layer between the input and output layers can represent any Boolean function of prepositions which includes all rule based and expert systems that use simple statements. The advantage of the neural network solution is that the network can be trained. 3.3.
Representation of threshold discrimination with respect to the adhesive dispensing system parameters
An illustration of the required network topology to For node N12:
implement threshold discrimination on a system parameter is given in Fig. 15. The example given represents the segregation of good and bad dispenses based on the derived Box Area Ratio parameter. Good dispenses are indicated by a BAR > 0.7 whereas values less than this indicate possible solder pad contamination and hence require operator intervention and rework (Sec. 2.3). Assuming 1, −1 logic (refer to Fig. 15 for details of the following analysis): (Note: Bias Nodes B00 = 1.0, and hard threshold logic discrimination at each node for simplicity. In Sec. 4.1 where the design neural network is trained on simulated and real process data this threshold function is replaced by the sigmoid function.) If the dispensed blob is okay with respect to the BAR parameter (i.e. BAR > 0.7) then:
∗ B00 W001 + BAR∗ W011 = 1.0∗ 0.7 + BAR∗ (−1.0) < 0
therefore the node does not “fire” i.e. N12
output = −1.0
Referring to the interpretation of the network outputs as listed in Table 3 (Sec. 4.1), an output of value −1.0 on node N12 (i.e. the Box Area Ratio Flag) indicates that “no decrease in the mea-
sured BAR has been observed” and the dispense can be considered to be okay. If however a BAR less than the threshold had been determined then
Design Issues Associated with Neural Network Systems 35
Fig. 15. Illustration of threshold discrimination (using the BAR system parameter) and associated network structure.
application of the above logic would set the BAR flag (= 1.0) indicating a problem. Other parameters within the adhesive dispensing system that can be categorised into good and bad operation by simple threshold discrimination include the pulse risetime, pulse width and pulse falltime (Table 2). By adopting the approach outlined above it is possible to monitor the system for (Sec. 2.3): • air leaks: increase in risetime of the pressure pulse within the syringe due to wear of pneumatic seals or catastrophic failure of the air connections, • pulse width variation due to either solenoid valve mechanism failure or electronic noise on either the actuator or feedback sensor circuits, • sticking solenoid valve: pseudo random variation in the falltime of the pressure pulse within the syringe due to carbon build up within the solenoid valve mechanism.
3.4.
Representation of set-point control to within a tolerance band with respect to the adhesive dispensing system parameters
The primary aim of the adhesive dispensing system is to control the amount of adhesive dispensed to within a tolerance around a pre-determined set point.
For practical purposes the tolerance has been set to +/ − 5% of the target dispensed blob size. The situation is illustrated in Fig. 16 where upper limit and lower limit bands are indicated around a target blob area. The “graph” is essentially divided into three regions:
Table 3. Derived bias and weight values for a decision threshold DT=0.95.
36 A. A. West et al.
1. a region within the upper and lower limits where the dispensed blob is within tolerance and no control action is required, 2. a region above the upper limit where it is necessary to decrease the pressure to bring the next dispense within the tolerance bands and 3. a region below the lower limit where it is necessary to increase the pressure to bring the next dispense within the tolerance bands. The logic behind the topology of the network illustrated in Fig. 16 is to use two middle layer nodes (N11 and N12 ) to implement the upper and lower limit discrimination surfaces (i.e. quantisation of the dispense area) and then use the connections between the middle layer nodes and the output nodes (W101 , W111 and W121 ) to implement the AND logic to determine whether a pressure action is required (see Table 3 for details). (Note: The topology illustrated in Fig. 16 only illustrates that a change in the pressure is required. In the adhesive dispensing system it is necessary to determine whether the change is to
either increase or decrease the pressure. In practice this is achieved by including another output node N22 with connections W102 to B10 , W112 to N11 and W122 to N12 . This Pressure Change node has been omitted here to simplify the discussion. The analysis is identical to that outlined below.) For 1, −1 logic (refer to Fig. 16 for details of the following analysis): (Note: Bias Nodes B00 and B10 are both = 1.0, hard threshold logic discrimination at each node for simplicity. In Sec. 4.1 where the design neural network is trained on simulated and real process data this threshold function is replaced by the sigmoid function.) Let the Target Area = TA and the tolerance = (Upper Limit – TA) = (TA – Lower Limit) = ε Assuming that a dispensed blob area is outside of the upper limit tolerance band then Node N01:Area > (TA + ε) Following the analysis through the nodes in the network,
Fig. 16. Illustration of banded discrimination and associated network structure.
Design Issues Associated with Neural Network Systems 37
Node N11 :
∗ B00 W001 + (Area)∗ W011 = 1.0∗ (−TA + ε + (Area)∗ 1.0 > 0
therefore the node “fires” i.e. N11 output = 1.0 Node N12 :
∗ B00 W002 + (Area)∗ W012 = 1.0∗ (TA + ε) + (Area)∗ 1.0 > 0
therefore the node “fires” i.e. N12 output = 1.0 Node N21 :
∗ B10 W101 + 1.0∗ W111 + 1.0∗ W121 = 1.0∗ (−2 + ε) + 1.0∗ 1.0 + 1.0∗ 1.0 > 0
therefore the node “fires” i.e. N21 output = 1.0 Referring to the interpretation of the network outputs as listed in Table 4, an output of value 1.0 on node N21 (i.e. the Pressure Action node) indicates that a change in the pressure is required due to the previous dispense being outside of the control limits (in the implementation ε = +/ − 5%). When the area of the dispensed blob is within the Upper and Lower limit band around the target area it is easy to show that the Pressure Action node output = −1.0 indicating no change in the pressure applied to the syringe is required. Other parameters within the adhesive dispensing system that are categorised by upper and lower tolerance limits around and expected target value and can hence be represented by the above network topology are the pulse height (i.e. an unexpected increase indicates noise pickup whereas an unexpected decrease indicates additional load on the air supply) and change in dispensed area (i.e. the characteristics of a bubble in the flow (see Sec. 2.3). 3.5.
Neural network training via the backpropagation algorithm
The ability to train a neural network to associate input features with desired output patterns was the main reason for the resurgence in neural research in the 1980’s. The most widely adopted approach is the backpropagation (BP) algorithm applied to feedforward (FF) networks initially proposed by McClelland and Rumelhart.66 The application of the FFBP algorithm coupled with the selection of an appropriate network topology (Sec. 3.3) offers the possibility to greatly enhance the convergence and operational performance of the chosen network. Training using the BP algorithm is illustrated in Fig. 17. An initial network topology is selected and the training data of input features (Fi ) and desired output patterns (Pc ) determined. Small ran-
dom values are selected for the initial interconnection strengths (Wi ) and the features are fed forward through the all the layers of the network resulting in an actual output Yc i.e. P (i) sum the inputs to the node: Xc = ni=0 Wi × Fi (ii) determine the output via the sigmoid threshold function: Yc = sig(1,−1) (X, T ) The initial part of the BP algorithm is to calculate the change in the interconnection strengths (∆Wi ) that minimise a cost function (E see Fig. 17) based upon the difference between the actual output Yc and the desired output Pc . The change in the connection weights is given by: ∆Wi = −ε
∂E ∂Wi
(5)
where ε(∼ 0.01) is termed a learning parameter and controls the rate at which convergence is achieved. Since E is not a function of Wi the partial differential is obtained via the function of a function differential i.e.: ∂E ∂E dY ∂X = × × ∂Wi ∂Y dX ∂Wi
(6)
resulting in the equation for ∆Wi : d(sig(X, T )) ∆Wi = −ε (Yc − Pc ) × × Fi (7) dX In order to backpropagate the error to previous layers, the change in the features ∆Fi for the current input layer must be determined (see Fig. 17). Identical analysis to the above (see the diagram for details) results in the following relationship for ∆Fi : d(sig(X, T )) ∆Fi = −ε (Yc − Pc ) × × Wi (8) dX
38 A. A. West et al.
Fig. 17. Schematic representation of the back propagation algorithm.
Knowledge of ∆Fi allows the backpropagation algorithm to be repeated to obtain ∆Wi and ∆Fi for the previous layers. 3.6.
The derivation of initial weight values for neural networks
The decision threshold of the sigmoid function has already been discussed in Sec. 3.1. For limited interconnected networks, assuming that the temperature of the sigmoid function is fixed (e.g. T = 0.1), the magnitudes of the interconnections weights can be manipulated to produce nodes which have setpoints specified to given decision thresholds (e.g. DT = 0.95). The aim is to initialise the weights to values that are close to those expected for the optimised solution and allow the FFBP to fine tune the parameters. For the node to provide a definite decision: | sigmoid (node
activity)| ≤ DT
and so for each decision threshold the range of X over which there is no valid decision is: 2 −T × ln −1 (−DT + 1) 2 ≤ X ≤ −T × ln −1 (DT + 1) (Note: For a decision threshold = 0.95, T = 0.1 these limits are XDT = +/− 0.3.) If the lowest reliable input value XL and the highest reliable input point XH is for example 20% higher then the node activity satisfies the following relationships: node
≤ XDT when input ≤ XL and node
(9)
The sigmoid function (Eq. (1)) has an inverse given by: 2 −1 (10) X = −T × ln (Y + 1)
activity = bias + weight × input
activity = bias + weight × input ≤ XDT when input ≤ XH
so at the crossover points: node
activity = bias + weight × XL = XDT
(11)
Design Issues Associated with Neural Network Systems 39
node
activity = bias + weight × XH = −XDT
(12)
Equations (11) and (12) can be solved to obtain values for the weight and bias: weight =
bias =
−2 × XDT (XH − XL )
XDT × (XH + XL ) (XH − XL )
(13)
(14)
For the case where DT = 0.95 and XDT = 0.3 if a lower reliable input limit of XL = 1.0 chosen and an upper reliable input limit 20% higher i.e. XH = 1.2 then from Eqs. (13) and (14) weight = −0.6/(1.2 − 1.0) = −3 and bias = 0.3 × (1.0 + 1.2)/(1.2 − 1.0) = 3.3 Values for the weight and bias have been calculated using Eqs. (13) and (14) for different ranges of XH −XL (expressed as percentages) and are listed in Table 3. 4. Application of Neural Network Control and Monitoring to the Adhesive Dispensing Testbed The research outlined in Sec. 3 can be applied to the design and implementation of an optimised neural network based upon the knowledge of the both process to be controlled and the operation of the network. If confidence in the desired operation (of both the process and the network) is high then it is also possible to select appropriate weight and bias values (Sec. 3.6) to aid the convergence of the network to an optimal solution. In order to assess the performance of a designed neural network with current methods of implementing neural systems a number of training experiments were undertaken corresponding to different training data, neural network structure and initial weights and biases that were assigned to the nodes (Sec. 4.1). The network that provided the best convergence behaviour was then applied to the control and monitoring of the adhesive dispensing system and the results are discussed in Sec. 4.2. 4.1.
Training the neural network controller
Eight training experiments were implemented based upon different (i) training data (i.e. real and simulated data), (ii) neural network structure (i.e. fully
connected, structured fully connected and structured partially connected) and (iii) neural network weights and biases (i.e. small random weights, small hand crafted weights and large hand crafted weights). 4.1.1.
Training data sets
Difficulties arise in training neural networks when the magnitudes of the input data range over large unknown values. In these cases the connection weights of the networks can be swamped and the convergence to a stable solution during the learning process can be extremely difficult. To overcome this problem the input values to the networks were normalised where appropriate (e.g. no normalisation was undertaken for the BAR data since the range of the data is already [0–1]) such that an average behaviour corresponds to an input value = 1.0. The following training data were used: • Real Control Data: The “real” training data were obtained by (i) monitoring the adhesive dispensing system under control using a rule based “bangbang” controller and (ii) selecting process fault data from the open loop experimental runs already recorded (see Sec. 2). In this way the network can fine-tune both its control action and fault diagnosis nodes. • Hand Crafted Data: Experiential knowledge of the process characteristics and process faults were used to construct a training data set. These data were useful in that (i) good and bad examples could be given fixed values which allows the network representation to be examined in detail and (ii) the data set was relatively small (50 examples) which enabled relatively fast convergence when possible. However the major disadvantage with this (and simulated data sets in general) is that any hidden properties of the real system such as unexpected correlations between the system variables were lost. A sample of the “real” network training data is given in Fig. 18. The input parameters to the network were the blob area, pulse height, risetime, falltime, pulse width, box area ratio and the area change. The outputs from the network were pressure action, pressure change, pulse height flag, pule height decision, risetime flag, falltime flag, pulse width flag, box area ratio flag, bubble flag and bubble decision. The training pattern (i.e. the desired network outputs is given in (1, −1) logic format. The logic is such that
40 A. A. West et al.
Fig. 18. Sample of the training data applied to the neural network trials for the adhesive dispensing system.
Table 4. Logic behind the desired network output training pattern.
Design Issues Associated with Neural Network Systems 41
a value of 1 indicates that there is a problem with the associated output or a change is required and −1 indicates fault free operation. Details of the interpretation of the logic behind the desired network output training pattern are given in Table 4. 4.1.2.
Different neural network structures
To compare the training and convergence properties of the designed network with “standard” neural networks three neural network structures were studied: • Fully Connected (FC) Network: This neural network structure consisted of the seven input nodes and ten output nodes as specified by the system (see Sec. 4.1.1). The fully interconnected structure had ten hidden nodes (see Fig. 19). These hidden nodes corresponded to the three banded regions of the Area, Area Change and Pulse Height system variables (six hidden nodes) and the four threshold boundaries of the Risetime, Falltime, Pulse Width and Box Area Ratio system variables (four hidden nodes), • Structured Fully Connected (SFC) Network: This neural network structure consisted of the specified input and output nodes, corresponding to the system specification and six hidden nodes (see Fig. 20). The threshold units of the four threshold boundaries of the Risetime, Falltime, Pulse Width and Box Area Ratio system variables were implemented without hidden nodes based upon the reasoning outlined in Sec. 3.3 concerning network structures suitable for implementing threshold discrimination. The three banded regions of the Area, Area Change and Pulse Height system variables were implemented using six hidden nodes. The connection pattern of this system was (i) that all of the input nodes were fully interconnected to all of the hidden node and (ii) all the input nodes were fully interconnected to all of the thresholding flag outputs namely the Risetime Flag, Falltime Flag, Pulse Width Flag, and Box Area Ratio Flag nodes and (iii) all the hidden nodes were fully interconnected to all the banded flag and decision outputs namely the Pressure Action, Pressure Change, Pulse Height Flag, Pulse Height Decision, Bubble Flag and Bubble Decision nodes. • Structured Partially Connected (SPC) Network: This neural network structure consisted of the specified input and output nodes corresponding to the system variables and six hidden nodes. As detailed in Secs. 3.3 and 3.4 the threshold bound-
aries of the four threshold units (i.e. Risetime, Falltime, Pulse Width and Box Area Ratio) can be implemented without hidden nodes whilst the three banded regions (i.e. for the Area, Pulse Height and Area Change input nodes) can be implemented with two hidden nodes per input node. The connection pattern for this system was (see Fig. 21), (i) each thresholded input node was connected to its corresponding output flag and no other, (ii) the banded input nodes were connected to a pair of hidden nodes and no others and (iii) each hidden node pair was connected to its corresponding output flag and decision node and no others.
4.1.3.
Neural network weights and biases
The following three weight specifications were used in the training experiments: • Small random weight values and biases were applied to the initial neural network structure. This configuration is typical of current research practices and provides no initial information to the network. • Small handcrafted weights and biases: Knowledge available about the control problem is used to give the neural network structure an approximate measure of where the set points of the banded regions and thresholds are in the input space (see Sec. 3.6 and Table 3). • Large handcrafted weights: Greater knowledge about the control problem is used to specify the set points of the banded regions and the thresholds in the input space by adopting much tighter tolerances for the input values around the decision threshold (e.g. XH − XL = 5%, see Sec. 3.6 and Table 3).
4.2.
Experiments on the convergence of the different neural networks
Neural network convergence experiments were undertaken with a learning rate ε = 0.01 (Sec. 3.5) and a temperature T = 0.1 (Sec. 3.1). The maximum number of iterations that were allowed was 5000. This set an upper limit on the training procedure to enable comparisons to be made. If the training algorithm did not converge within this limit the sum squared error values after 5000 iterations were used to examine the performance of the networks. The results of the experiments are shown in Table 5.
42 A. A. West et al.
Fig. 19. Fully interconnected network structure.
Fig. 20. structure.
Structured fully interconnected network
The behaviour of the network structures when trained using the simulated data was completely different when presented with real process data. The simulated data contained only completely good or completely bad examples that were numerically well separated. The spread in real data close to the threshold limits leads to the variation in convergence performance. The only structure to converge when presented with real process data was the SPC (i.e. minimally designed structure) network with initial large derived weights chosen to closely represent the required system response. This is not to say that the other network structures would not converge eventually, given enough time, but the rapid convergence (44 iterations) of the network with the designed structure and the selected initial weights is remarkable.
Fig. 21. Structured partially interconnected network structure.
In general improved convergence performance was obtained by increasing the design effort in implementing the neural network controller. The significant factors that affected the performance of the automated training were the structure of the initial neural network and the initial weight specification. Pruning the neural network to the minimum that was required to model the problem aided optimal convergence. Small random weights provide no information about what the network is expected to learn. For the fully interconnected and the minimal designed structures the sum squared error convergence was not good. However for the reduced interconnection structure using simulated data the network converged to a solution after ∼ 2000 iterations. In any optimisation problem convergence is dependent upon the location of the initial starting point of the solution within the global solution space. It is generally a matter of chance as to whether one is lucky enough to find this solution quickly. Including small weights that were designed to place the neural network in the approximate region of the control problem set points improved the convergence performance on the simulated data but had a relatively small effect on the convergence performance with real data. This could be explained by the fact that the real training data contained examples that required decisions to be made over small tolerances which can only be achieved by a neural network with a low temperature or large weight values. The iteration limit of 5000 could easily have curtailed the weight modification before large enough values were obtained.
Design Issues Associated with Neural Network Systems 43
Table 5. Convergence properties of the different initial neural network structures, weights and biases and training data.
Deriving large weight values that modelled the control set points to finer tolerances ensured that the automated training would converge within fewer iterations. This method effectively starts the neural network in a configuration that is very close to the optimum control solution. The automated training fine-tunes the neural network to produce the required controller. The fine tuning is necessary for the real data since the data will present properties and correlations that may not be modelled by the designed neural network. 4.3.
Implementation of the structured partially connected neural network on the adhesive dispensing system
The designed SPC neural network has been applied to the adhesive dispensing system. There are important implementation issues: • Data presented to the network must be normalised (Sec. 4.1.1). Considering the variation in system parameters with time (Sec. 2), this necessitates a calibration sequence prior to operation to determine the normalisation constants. This calibration procedure must be supervised by the operator to ensure correct calibration and • A small set of rules is necessary to interpret the output from the network (see Table 6). The rules
effectively check which flags are set and what actions must be applied to the system. The network interpreter is similar to the simple rule set used in the rule based controller51 applied to the system in previous research but there are subtle differences. The network interpreter deals with the output flags from the network that has, via its structure and interconnection strengths interpreted the system characteristics. The rule-based controller determines the system characteristics from the raw data itself. Figures 22(a) and (b) show the blob area and measured and programmed pressure for a dispensing run under the SPC neural network control. The dashed lines indicate limits of +/ − 5% around a target area of 40,000 pixels. Control to within +/− 5% is achieved however it must be noted that the amount of actuated pressure variation to keep the blob area within these limits is small (Fig. 22(b)) and it is likely that the material was particularly stable when the SPC neural network control trials were undertaken. In addition there were no process errors observed in the recorded data during the run and so the monitoring performance of the network was not exercised under operational conditions. Note: The calibration dispenses can be observed at the start of the run in both the blob area and measured and programmed pressure data.
44 A. A. West et al.
Table 6. Listing of the rules used to interpret the output of the SPC neural network.
Design Issues Associated with Neural Network Systems 45
(a)
(b) Fig. 22. (a) Variation in dispensed volume for neural network control of the adhesive dispensing system. (b) Variation in measured and programmed pulse height variation for neural network control of the adhesive dispensing system.
Calibration consists of the application of a single step response to determine the current operating conditions. 5. Conclusion The design and implementation of a structured neural network for the control and monitoring of a discrete manufacturing process has been reported in this paper. Knowledge of the process in terms of control performance and failure modes of the material, sensors and actuators provides the requirements for the network design. A basic understanding of the required network topology to produce appropri-
ate discrimination (e.g. threshold and banding) of the process parameters has been shown to result in an optimised network from both a training and operational viewpoint. It has also been shown that if there is sufficient confidence in the functionality of the designed network then weights and biases can be determined that can greatly enhance the convergence performance (i.e. a structured network based on known process features can rapidly converge to an appropriate control solution). Nevertheless the analysis, implementation and results discussed in this paper are considered to be an improvement over the applications that “use the learning capabilities of neural networks” by “varying” the structure and learning parameters of the network “until convergence is achieved” (see references in Sec. 1). It is important that attempts are made to understand and explain the mechanisms of convergence and operation of neural networks. Only through structured design methods and increased confidence in the network operation will widespread industrial uptake be facilitated. The conclusions resulting from the work outlined in this paper are represented in Fig. 23. The generalised network design process and relevant design and implementation issues are illustrated. The design method can be implemented in four stages, structure, train, test and modify. From a structural perspective, correct representation allows features to be understood that would otherwise be hidden. For example the structure of a three-layer network (note: a multi-layered network can cover a much wider class of functions that a single layer, see Sec. 3.4) can be divided in four distinct regions of functionality: (i) the input pattern region in which either Boolean or real valued inputs are assigned to individual input nodes, (ii) a quantisation region comprising the connection weights and biases from the input to the middle layers and the thresholding function at the middle nodes. Quantisation involves the determination of decision surfaces to segregate the input values into a number of discrete regions, (iii) a Boolean region comprising the connection weights and biases from the middle layer to the output layer and the thresholding function at the output nodes. Logical combinations of the quantised parameters are determined.
46 A. A. West et al.
Fig. 23. Representation of conclusions concerning application of a designed neural network to the adhesive dispensing system.
(iv) the Boolean output pattern region that, after successful training, represents the desired output behaviour. Training and testing involves the segregation of input parameters and output patterns into separate independent data sets. As illustrated in Sec. 4.2, simulated data can lead to faster convergence behaviour although the network must be trained on real data if robust performance is required. It is essential that both data sets are complete (i.e. cover the entire range of process features that are to be represented within the network) and are consistent (i.e. there a no contradictory patterns in the different sets). Modification of the structure (i.e. deletion or insertion of nodes) is only possible after training and testing has highlighted inconsistencies in network operation. The general situation is more complicated than that presented by the adhesive dispensing application. It is unlikely that industrially based processes are as well understood as the adhesive dispensing process. When the emphasis is on production throughput, operators have little opportunity to vary process parameters to determine appropriate transfer functions. In addition the observation of process faults requires appropriate sensors to be introduced and the data must be recorded, analysed for appropriate features and interpreted to benefit from the neural network implementation as outlined above. It is difficult to determine the relevant pro-
cess features without access to an instrumented real process. Nevertheless the application of neural network technology is useful in these applications where there are no agreed rules or algorithms for getting from the process input features to the output patterns. In addition, recent research based upon the analysis outlined in this paper has enabled the determination of the rules that are represented within the Boolean layer of the network. This research is providing the background knowledge that will allow a multi-layered network to be used in the future as a convenient tool to enable operators to easily learn process interdependencies rather than via more specialised data reduction and analysis as is currently the norm. References 1. P. Waddell, “Emerging technologies”, Printed Circuit Design 13(2) (1996), 1047–1067. 2. S. Asp and P. Wilde, “Multivariate process modeling of a high volume manufacturing of consumer electronics”, SPIE Proceedings Series 3518 (1998), 164–171. 3. G. J. Udo, “Neural networks applications in manufacturing processes”, Computers and Industrial Engineering 23(1–4) (1992), 94–100. 4. H. C. Zhang and S. H. Huang, “Applications of neural networks in manufacturing: A state of the art survey”, International Journal of Production Research 33(3) (1995), 705–728.
Design Issues Associated with Neural Network Systems 47
5. H. Lee, C. O. Malave and S. Ramachandran, “A selforganizing neural network approach for the design of cellular manufacturing systems”, Journal of Intelligent Manufacturing 3(5) (1992), 325–332. 6. S. Kaparthi and N. Suresh, “A neural network system for shape-based classification and coding of rotational parts”, International Journal of Production Research 29(9) (1991), 1771–1784. 7. Y. Kao, and Y. Moon, “A unified group technology implementation using the backpropagation learning rule of neural networks”, Computers and Industrial Engineering 20(4) (1991), 425–437. 8. C. O. Malave and S. Ramachandran, “Neural network-based design of cellular manufacturing systems”, Journal of intelligent Manufacturing 2(5) (1991), 305–314. 9. S. V. Kamarthi, S. Kumara, F. T. S. Yu and I. Ham, “Neural networks and their applications in component design data retrieval”, Journal of Intelligent Manufacturing 1(2) (1990), 125–140. 10. V. Venugopal and T. Narendran, “Neural network model for design retrieval in manufacturing systems”, Computers in Industry 20 (1992), 11–23. 11. G. Chryssolouris, M. Lee and J. Pierce, “Use of neural networks for the design of manufacturing systems”, Manufacturing Review 3(3) (1990), 187–194. 12. Burke and S. Rangwala, “Tool condition monitoring in metal cutting: A neural network approach”, Journal of Intelligent Manufacturing 2(5) (1991), 269–280. 13. K. Ray, “Equipment fault diagnosis: A neural network approach” Computers in Industry 16 (1991), 169–178. 14. T. Miyanishi and H. Shimada, “Using neural networks to diagnose web breaks on a newsprint paper machine”, Tappi Journal B(9) (1998), 163–170. 15. T. H. Hou and L. Lin, “Manufacturing process monitoring using neural networks”, Computers and Electrical Engineering 19(2) (1993), 129–141. 16. G. Cohen, “Neural network implementations to control real-time manufacturing systems”, Computer Integrated Manufacturing Systems 11(4) (1998), 243–251. 17. G. Hao, J. S. Shang and L. G. Vargas, “A neural network model for on-line control of flexible manufacturing systems”, International Journal of Production Research 3(10) (1995), 2835–2854. 18. I. Wadi and R. Balendra, “Intelligent approach to monitor and control the blanking process”, Advances in Engineering Software 30(2) (1999), 85–92. 19. K. Anderson, G. E. Cook and K. Gabor, “Artificial neural networks applied to arc welding process modeling and control”, IEEE Transactions Oil Industry Applications 26 (1990), 824–830. 20. A. Guha, “Continuous process control using neural networks”, Journal of Intelligent Manufacturing 3(4) (1992), 217–228. 21. G. Gingrich, D. R. Kuespert and T. J. McAvoy, “Modeling human operators using neural networks”, ISA Transactions 31(3) (1992), 81–90.
22. N. Rewal, D. Toncich and C. Friedl, “Predicting part quality in injection molding using artificial neural networks”, Journal of Injection Molding Technology 2(3) (1998), 109–119. 23. N. Rail and R. Pitchumani, “Rapid cure simulation using artificial neural networks”, Composites Part A: Applied Science and Manufacturing 28(9–10) (1997), 847–859. 24. F. HuaTzu and S. M. Wu, “Case studies on modeling manufacturing processes using artificial neural networks”, Journal of Engineering for Industry 117(3) (1995), 412–417. 25. G. Chryssolouris, M. Domoroese and P. Beaullieu, “Sensor synthesis for control of manufacturing processess”, Journal of Engineering for Industry 114 (1992), 158–174. 26. J. Lazaro and I. Serrano, “Ultrasonic recognition technique for quality control in foundry pieces”, Measurement Science and Technology 10(9) (1999), 957–974. 27. D. Barschdorff and L. Monostori, “Neural networks: Their applications and perspectives in intelligent machining”, Computers in Industry 17 (1991), 101–119. 28. R. B. Chinnan and W. J. Kolarik, “Neural network based quality controllers for manufacturing systems”, International Journal of Production Research 35(9) (1997), 2601–2620. 29. G. A. Rovithakis, V. I. Gaganis, S. E. Perrakis and M. A. Christodoulou, “Neuro schedulers for manufacturing systems”, Computers in Industry 39(3) (1999), 209–217. 30. S. Cavalieri and O. Mirabella, “Neural networks for process scheduling in real time communication systems”, IEEE Transactions on Neural Networks 7(5) (1996), 1272–1285. 31. S. Vaithyanathan and J. Ignizio, “A stochastic neural network for resource constrained scheduling”, Computers and Operations Research 19(3–4) (1992), 241–254. 32. L. C. Rabelo and S. Alptekin, “Integrating scheduling and control functions in computer-integrated manufacturing using artificial intelligence”, Computers and Industrial Engineering 14 (1989), 101–106. 33. T. C. Chang, Expert Process Planning for Manufacturing, Addison-Wesley, MA. 1990. 34. Y. K. Ham and Y. C. S. Lu, “Computer-aided process planning: the present and the future”, Annals of the CIRP 37(2), (1988), 1–11. 35. K. Osakada and G. B. Yang, “Neural networks for process planning of cold forging”, Annals of the CIRP 40(1) (1991), 243–246. 36. S. Wang, Y. Chen, X. Wang, Z. Li and L. Shi, “Modeling and optimization of semiconductor manufacturing process with neural networks”, Chinese Journal of Electronics 9(1) (2000), 1–5. 37. S. Sung and Y. I. Choung, “Neural network approach for batching decisions in wafer fabrication”, International Journal of Production Research 37(13) (1999), 3101–3114. 38. L. Huang, Y. H. Huang, T. Y. Chang, S. H Chang,
48 A. A. West et al.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
C. H. Chung, D. T. Huang and R. K. Li, “Construction of production performance prediction system for semiconductor manufacturing with artificial intelligence”, International Journal of Production Research 37(6) (1999), 1387–1402. R. L. Mahajan, “Neural nets for modeling, optimisation and control in semiconductor manufacturing”, SPIE Proceedings Series 3812 (1999), 176–187. Z. Nami, O. Misman, A. Erbil and G. S. May, “Semiempirical neural network modeling of metal-organic chemical vapor deposition”, IEEE Transactions on Semiconductor Manufacturing 10(2) 1997, 288–294. H. Meng, P. C. Russell, P. J. G. Lisboa and G. R. Jones, “Modeling and control of plasma etching processes in the semiconductor industry”, Computers and Industrial Engineering 37(1–2) (1999), 367–370. M. Salam, C. Piwek, G. Erten and T. Grotjohn, J. Asmussen, “Modeling of a plasma processing machine for semiconductor wafer etching using energy-functions-based neural networks”, IEEE Transactions on Control Systems Technology 5(6) (1997), 598–613. E. A. Rietman, R. C. Frye, E. R. Lory and T. R. Harry, “Active neural network control of wafer attributes in a plasma etch process”, Journal of Vacuum Science and Technology, Part B. 11(4) (1993), 1314–1316. J. P. Card, M. Naimo and W. Ziminsky, “Run-to-run process control of a plasma etch process with neural network modeling”, Quality and Reliability Engineering International 14(4) (1998), 247–260. A. Bensaoula, H. A. Malki and A. M. Kwari, “The use of multi-layer neural networks in material synthesis”, IEEE Transactions on Semiconductor Manufacturing 11(3) (1998), 421–431. M. D. Baker, C. D. Himmel and G. S. May, “In-situ prediction of reactive ion etch endpoint using neural networks”, IEEE Transactions on Components, Packaging and Manufacturing Technology, Part A 18(3) (1995), 478–483. G. S. May, H. C. Forbes, N. Hussain and A. L. Charles, “Modeling component placement errors in surface mount technology using neural networks”, IEEE Transactions on Components, Packaging and Manufacturing Technology, Part C: Manufacturing 21(1) (1998), 66–70 K. Tae Seon and G. S. May, “Intelligent control of via formation by photosensitive BCB for MCM-L/D applications”, IEEE Transactions on Semiconductor Manufacturing 12(4) (1999), 503–515. Y. Tan, P. Fang, Y. Zhang and S. Yang, “Evaluating nugget sizes of spot welds by using artificial intelligence”, Lecture Notes in Computer Science 1625 (1999), 53–58. A. Koenig, A. Herenz and K. Wolter, “Application of neural networks for automated X-ray image inspection in electronics manufacturing”, Lecture Notes in Computer Science 1607 (1999), 588–595. R. Chandraker, A. A. West, and D. J. Williams,
52.
53.
54. 55.
56.
57.
58. 59. 60. 61.
62.
63. 64.
65. 66.
67.
68.
“Intelligent control of adhesive dispensing”, IJCIM Special Issue on Intelligent Control 3(1) 24–34. A. VanDenBosch and T. Debarros, “Step-by-step SMT: Back to basics, Part 5 — adhesives/epoxies and dispensing”, SMT Surface Mount Technology Magazine 11(5) (1997). A. A.West, R. Chandraker, D. J. Williams and D. J. Mulvaney, “Hybrid representations of real time control rules for manufacturing process control in electronics manufacture”, Proceedings of the IEEE Symposium on Intelligent Control Arlington, Virginia, 1988, 745–750. R. C. Gonzales and R. E. Woods, Digital Image Processing, Addison-Wesley, Wokingham, 1992. D. J. Williams, “Towards the intelligent control of fast manufacturing processes”, Manufacturing Processes Group Technical Report 90/5, Loughborough University, 1990. A. A. West, D. J. Williams and C. J. Hinde, “The application of alternative control paradigms to the dispensing of adhesive in the manufacture of mixed technology printed circuit boards”, Manufacturing Processes Group Technical Report 92/9.1, Loughborough University, 1992. W. S. McCulloch and W. H. Pitts, “A logical calculus of the ideas immanent in nervous activity”, Bull. Math. Biophys. 5 (1943), 115–133. W. R. Ashby, Design for a Brain, Chapman and Hall, London, 1952. W. R. Ashby, An Introduction to Cybernetics, Chapman and Hall, London, 1958. D. O. Hebb, The Organisation of Behaviour, Wiley, London, 1949. G. D. Wasserman, Molecular Control of Cell Differentiation and Morphogenesis: A Systematic Theory, Marcel Dekker, New York, 1972. I. Aleksander and H. Morton, An Introduction to Neural Computation, Chapman and Hall, London, 1990. F. Rosenblatt, Principles of Neurodynamics, Spartan, New York, 1962. J. J. Hopfield and D. W. Tank, “Computing with neural circuits: A model”, Science 233 (1986), 625–633. G. E. Hinton, “Connectionist learning procedures”, Artificial Intelligence 40 (1989), 185–234. J. L. McClelland and D. E. Rumelhart, “A simulation based tutorial system for exploring parallel distributed processing”, Behavior Research Methods, Instruments and Computers 20(2) (1988), 263–275. G. P. Fletcher and C. J. Hinde, “Producing evidence for the hypotheses of large neural networks”, Neurocomputing 10(4) (1996), 359–373. G. P. Fletcher and C. J. Hinde, “Interpretation of neural networks as Boolean transfer functions”, Knowledge Based Systems 7(3) (1994), 207–214.