Food Science and Technology International - Semantic Scholar

Food Science and Technology International http://fst.sagepub.com

Optimization of Computational Neural Network for Its Application in the Prediction of Microbial Growth in Foods C. Hervás, J.A. Martínez, G. Zurera, R.M. García and J.A. Martínez Food Science and Technology International 2001; 7; 159 The online version of this article can be found at: http://fst.sagepub.com/cgi/content/abstract/7/2/159

Published by: http://www.sagepublications.com

On behalf of:

Consejo Superior de Investigaciones CientÃ-ficas (Spanish Council for Scientific Research)

Additional services and information for Food Science and Technology International can be found at: Email Alerts: http://fst.sagepub.com/cgi/alerts Subscriptions: http://fst.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav

Downloaded from http://fst.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Optimization of Computational Neural Network for Its Application in the Prediction of Microbial Growth in Foods 1 2, 2 1 C. Hervás, G. Zurera, * R.M. García and J.A. Martínez 1

Department of Computer Science and Numerical Analysis, University of Córdoba, Campus de Rabanales, Edif C-2, 14014 Córdoba, Spain 2 Department of Food Science and Technology, University of Córdoba, Campus de Rabanales, Edif C-1, 14014 Córdoba, Spain. The power of computational neural networks (CNN) for microbiological growth prediction was evaluated. The training set consisted of growth responses data from a combination of three strains of Salmonella in a laboratory medium as affected by pH level, sodium chloride concentration and storage temperature. The architecture of CNN was designed to contain three input parameters in the input layer and one output parameter in the output layer. For their optimization, algorithms were developed to prune the net connections, obtaining an improvement in the generalization and a decrease in the number of necessary patterns for the training. The standard error of prediction (%SEP) obtained was under 5% using twenty inputs to the net, and the result was significantly smaller than the one obtained using regression equations. Therefore, the usefulness of CNN for modeling microbial growth is appealing, and its improvement promises results that will be better than those obtained by other estimation methods up to now. Key Words: computational neural networks, genetic algorithms, microbial growth, modeling En este estudio se evalúa el poder de las redes neuronales computacionales (CNN) para la predicción de crecimiento microbiano. El conjunto de entrenamiento estaba formado por datos procedentes de respuestas de crecimiento de un combinado de tres cepas de Salmonella frente a diferentes niveles de pH, concentración del cloruro de sodio y temperatura. La arquitectura del CNN se diseñó fue designada con tres parámetros de entrada para la capa de entrada y un parámetro de salida en la capa de salida. Para su optimización, se han desarrollado algoritmos para podar conexiones de la red, con lo que se obtiene una mejora en la generalización y disminuye el número de patrones necesarios para el entrenamiento. El error estándar de predicción (%SEP) obtenido fue inferior a 5% utilizando 20 entradas a la red, resultado significativamente más pequeño que el hallado por ecuaciones de la regresión. Por consiguiente la utilidad de CNN en los modelos de crecimiento microbiano es atractiva y su perfeccionamiento promete resultados que superan los obtenidos hasta ahora por otros métodos de estimación. Palabras Clave: redes neuronales computacionales, algoritmos genéticos, crecimiento microbiano, modelos

INTRODUCTION Food microbiologists are continuously concerned about conducting experimental investigations in an attempt to determine the extent of the microbial growth dependence on factors related to the ecology or processing modes. A model that relates the microbial growth of different types of microorganisms to several environmental factors is of signifi-

*To whom correspondence should be sent (e-mail: [email protected]). Received 14 March 2000; revised 27 June 2000. Food Sci Tech Int 2001;7(2):159–163 © 2001 Technomic Publishing Co., Inc. ISSN: 1532-1738 DOI: 10.1106/6Q2A-8D7R-JHJU-T7F6

cant importance in the field of predictive microbiology (Hajmeer et al., 1997). Most of the studies described the category of modeling the time-based growth of certain microorganisms on specific foods using different types of kinetics models (Zwietering et al., 1990, 1994). As a kinetic model, the sigmoidal function is commonly employed to express the time-dependent growth of microorganisms. The modified Gompertz equation was found to be the easiest to implement and statistically adequate to describe the growth of different types of microorganisms (Zwietering et al., 1990; Baranyi and Roberts, 1994). Usually, for a set of operating conditions, models are commonly determined from expressions developed via multiple linear regression. A substitute for the nonlinear regression-based equations has been developed using computational neural networks (CNN), showing that growth predictions of Shigella flexneri offered better agree159


160

C. HERVÁS, G. ZURERA, R.M. GARCÍA AND J.A. MARTÍNEZ

ment than those obtained by regression equations (Hajmeer et al., 1997). A detailed description of artificial computational neural networks is described by Najjar et al. (1997). CNN are highly interconnected network structures consisting of many simple processing elements capable of performing massively parallel computations for data processing. Unlike regression modeling, CNN impose no restrictions on the type of relationship governing the dependence of growth parameters on the various running conditions (Hajmeer et al., 1997). For these authors, regression-based response surface models require stating the order of the model (i.e., second, third or fourth order), while CNN tend to implicitly match the input vector (i.e., running conditions) to the output vector (i.e., Gompertz parameters). Further, when CNN are trained on appropriate data, they can then be used to predict growth curves for new microbial growth cases without the need to conduct any experimental investigation. For Geeraerd et al. (1998), the CNN reveal themselves to be low complexity, nonlinear modeling techniques, capable of describing accurately experimental data in the field of secondary models in predictive microbiology. For these authors, the complexity of the CNN model in a specific application can be adapted taking into account the general trend and the number of data points. When experimental results on the influence of other environmental factors become available, the CNN models can be extended simply by adding more neurons and/or layers. A key aspect in the use of this modeling methodology is to be able to reduce the complexity of the net during the training. With node-pruning techniques, unnecessary connections can be eliminated as a function of the accuracy index of the CNN (Le Cun et al., 1990), by means of a simple weight decay to reduce network size and to improve generalization (Krogh and Hertz, 1992; Williams, 1995; Bebis et al., 1997), or CNN can be redesigned using genetic algorithms (Miller et al.,1989; Hervás et al., 2000). The objective of the present study was to analyze the possibilities and improvements afforded by CNN, optimized by

means of genetic algorithms, compared to the regression equations usually used in predictive microbiology for the study of growth or inhibition of microorganisms on food.

MATERIALS AND METHODS Experimental Data The data of growth responses of a cocktail of three strains of salmonellae (S. stanley, S. thompson and S. infantis) in a laboratory medium as affected by pH level (5.6 to 6.8), sodium chloride concentration (0.5 to 4.5%, w/v) and storage temperature (10 to 30 ºC), were taken from an extensive experimental study by Gibson et al. (1988). The number of microorganisms of each one of the selected curves obtained every two hours during the first 48 hours of the experiment was taken as an input characteristic together with the conditions of pH, temperature (°C) and % NaCl; as an output characteristic, the value of the maximum rate of growth (mmax) estimated by the function of Gompertz was considered. CNN Design A total of 66 growth curves were taken (53 for training and 13 for generalization). The parameter of growth rate obtained by the Gompertz equation was considered for modeling. The architecture of the CNN was designed to contain twenty-eight input parameters for the estimation of the growth rate, of which three characteristics were associated to factors that impact it (pH, temperature and NaCl) and twenty-five values associated with the microorganisms number after 0, 2, 4, . . . , 48 h and one output parameter (growth rate) in the output layer (Figure 1). A sigmoid function for the hidden layer and a linear function for the output layer were used for the transferees functions.

Figure 1. Initial architecture of the neuronal net used for the prediction of the maximum rate of bacterial growth. Downloaded from http://fst.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Computational Neural Network

CNN Optimization

EW =

Initial neural network architecture was designed with two layers on which was applied a standard backpropagation training algorithm using the rule extended delta-bar-delta EDBD (Minai and Williams, 1990). A genetic algorithm searches the ideal network architecture from those generated in the previous stage. Initially, a population of 50 neural networks with an identical number of neurons and layers and different connectivities [different weights, initially random following the Laplace distribution, and different states of activation, “frozen or nonfrozen” weights with a probability Pfro or (1 - Pfro)] were used. A connection elimination (pruning) algorithm was performed, following the Bayesian regularization method (Williams, 1995), which may, or may not, be executed once the genetic algorithm converges. The pruning can also be done in one or two steps, in which after each execution of the pruning algorithm, the training of the networks, pruned or not, is reinforced with an additional number of executions—epochs—of the training algorithm. An evaluation is made by training each network for a small number of epochs, so that we have partly trained networks, using the weight elimination (pruning), and selecting the following as an evaluation function:

nW nT 1 nT log ∑ ( y p − o p )2 + λ EA nW log ∑ wk 2 p =1 k =1

161

(1)

where yp is the output of the models used in the validation group, op is the output provided by the network, nT is the number of training group models and wk are the network weights with the number is nW. The aptitude function is based on the hypothesis that the a priori distribution of the weights in the network follows a Laplace distribution of probability so that the initialization of the weights is done in the form wj = ± t log (h),where h is a uniform random variable in the interval [0,1], t > 0 determines the scale and the sign is selected at random. The value of t is selected as 1 / 2l , l being the number of input connections to each neuron. The adaptive parameter lEA is a weighting factor that determines the significance of network complexity against network performance on the training set, thereby stimulating the reproduction of good trainable networks with a small number of connections. Several authors have developed procedures for determining the lEA adaptively during the training process (Weigend et al., 1991). The connections associated with the “nonfrozen” weights, i.e., those that had not been previously pruned with a pfro probability, were eliminated when reaching a specific

Figure 2. Flow sheet of the developed algorithm. Downloaded from http://fst.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

162

C. HERVÁS, G. ZURERA, R.M. GARCÍA AND J.A. MARTÍNEZ

number of training epochs, and once the algorithm had converged, the following two conditions were verified: wij < ghiw (wa /(1 + Sw2 a ))

(2)

∂EW < ghider (dw /(1 + Sd2w )) ∂wij

(3)

and

where wa and Sw2 a are the mean and the variance of the absolute values of the weights in a network, dw and Sd2w are the mean and the variance of the absolute values of the derivatives, ghiw and ghider are heuristic parameters to be determined experimentally. The convergence of the genetic algorithm was determined two ways: when the genetic algorithm reaches a maximum number of generations (maxgen = 200) and by the average improvement of each generation from the previous one. The algorithms were implemented in programming language C, and the training set was carried out in Silicón Graphics Origin 2000. All of the input variable values were normalized in the interval [0.1, 0.9] where E¡ is the value of the nth characteristic of normalized input (Figure 2). The error criterion chosen for this study was the standard error of prediction (SEP) expressed in percentage, which has the advantage of being dimensionless and is calculated as follows: n

100 SEP = pi

∑ ( pi − pˆ i )2 i =1

(4)

n

where pi and pˆ i are the observed and predicted values for growth rate, respectively; pi is its average value; and n is the number of patterns used in training or in generalization sets. Other authors such as Geeraerd et al. (1998) preferred the residual sum of squares error criterion (SSE) or mean square error (MSE), but we think that the use of the SEP is more correct because it is a dimensionless criterion, and it allows the comparisons of results need to take into account the unit of measure.

• 28 makes reference to the three factors (pH, T and NaCl) and to the twenty-five values of growth in twenty-five instants of time • s, the transfer function for the neurons of the hidden layer, is sigmoid • 2 is the number of nodes in the hidden layer • 1, one output node from the net that was the maximum growth rate obtained by the Gompertz equation in our case • l to which the transfer function for the units of output layer is lineal The initial population was obtained by first fixing the net architecture (layers number and number of nodes in each layer). Next, fifty copies were made of this architecture but with different weights and with different activation degrees (frozen or nonfrozen). The parameter values used by the proposed algorithm are shown in Table 1. The combined results obtained after ten runs of the procedure and the best result associated with the best optimal net obtained with the previous parameters are shown in Table 2. The results indicate that the algorithm is completely stable because the generation number in which the convergence concludes after the two pruning processes is very homogeneous, and the %SEP, besides being very low, is also homogeneous, between 54 and 72 generations (Table 2). The number of final connections, once the net was pruned, was 33.1 of the 61 of a 28:2:1 totally connected net, and it is of importance that the net with the best result prunes all the connections of eight inputs to the net (those corresponding to the microbial growth for t = 0, 8, 12, 18, 28, 32, 34 and 44 h) which indicates that, a priori, the algorithm prunes almost half of the connections, and therefore, it is not necessary to introduce twenty-eight inputs because twenty are enough. In a second step, the inputs were eliminated to the net associated with the microbial growth for t = 0, 8, 12, 18, 28, 32, 34 and 44 h, so that we had a totally connected net with an architecture (3 + 17):2:1, for which the average SEP, after ten runs, was 4.36%, where it was only necessary to estimate 45 weight and not apply a pruning algorithm. Therefore, the %SEP can be considered as being very satisfactory if we compare it with the errors of SEP obtained by Gibson et al. (1988) of 39.63%. In addition, the maximum Table 1. Parameter values used by the algorithm.

RESULTS AND DISCUSSION The experimental results were obtained by carrying out a training for each net design, where different groups of initial weights had been used for each design (the initial weights were extracted randomly from a uniform distribution in the interval [-0.1, 0.1]). The size Tp of the population of the genetic algorithm was defined as 50, and the architecture of the net to consider it was 28:2s:1 l, where:

Pruning parameters ghiw ghider Parameters associated with the genetic operators Cross probability of input weights in each node of hidden layer, between two nets randomly chosen Mutation probability of one of two randomly chosen nets Pfro, Frozen probability Parameters associated with fitness function lEA Tolerance


1.1 1.1 0.8 0.01 0.5 0.75 0.05

Computational Neural Network

Table 2. Design results of the net for an initial architecture of (3 + 25):2:1. Number of Generationsa 54–72 54 61.9 a

%SEPb

Total Connections

Fitness Average

Fitness Standard Deviation

8.09–9.70 8.09 9.05

22–40 29 33.1

1.19–1.69 1.37 1.40

0.27–0.58 0.49 0.47

Number of iterations at which the algorithm converges.

b

Standard error of generalization.

growth rate can be predicted within the first 48 hours of the microbial growth under the conditions described previously with very small error. According to Hajmeer et al. (1997), CNN impose no restrictions on the type of relationship governing the dependence of growth parameters on the various running conditions. Therefore, we have obtained totally connected nets with twenty inputs and with an SEP of 4.36% compared to the nets pruned with twenty-eight inputs with 9% of SEP. Both values are much lower than those described by the experimental study of Gibson et al. (1988) using nonlineal regression methods. Therefore, the usefulness of CNN in the modeling of microbial growth is encouraging. For Geeraerd et al. (1998), the CNN model describing the growth parameters as a function of temperature, pH and %NaCl is far more accurate than the polynomial relationship available in the literature. CNN are capable of modeling the real complex growth of microorganisms and offer better fit agreement with experimental data as compared to predictions obtained via corresponding regression equations. In our study, we observed that CNN models pruned with genetic algorithms provide a good architecture design and only need a small number of connections. This decrease in the number of connections implies a lower hardware implementation cost and a more efficient estimation of the parameters, because it is estimated that less parameters with the same number of patterns in the training set gives a low-complexity neural network.

ACKNOWLEDGEMENT This work has been partly financed by the CICYT ALI98-0676-CO2-01 and ALI98-0676-CO2-02, and Research Group AGR-170 HIBRO and TIC-148 AYRNA.

163

Bebis G., Georgiopoulos M. and Kasparis T. (1997). Coupling weight elimination with genetic algorithm to reduce network size and preserve generalization. Neurocomputing 17: 167–194. Geeraerd A.H., Herremans C.H., Cenens C. and Van Impe J.F. (1998). Application of neural networks as a non-linear modular modelling technique to describe bacterial growth in chilled food products. International Journal of Food Microbiology 44: 49–68. Gibson A., Bratchell N. and Roberts T. (1988). Predicting microbial growth: growth responses of salmonellae in a laboratory medium affected by pH, sodium chloride and storage temperature. International Journal of Food Microbiology 6: 155–178. Hajmeer M., Basheer I. and Najjar Y. (1997). Computational neural networks for predictive microbiology II. Application to microbial growth. International Journal of Food Microbiology 34: 51–66. Hervás C., Algar J.A. and Silva M. (2000). Correction of temperature variations in kinetic-based determinations by use of pruning neural networks in conjunction with genetic algorithms. Journal of Chemical Information and Computer Science 40: 724–731 Krogh A. and Hertz J. (1992). A simple weight decay can improve generalization. In: Touretzky, D.S. (ed.), Advances in Neural Information-Processing Systems 4. San Mateo, CA: Morgan Kaufmann, pp. 950–957. Le Cun Y., Denker J. and Solla S. (1990). Optimal brain damage. In: Touretzky, S. (ed.), Advances in Neural Information-Processing Systems 2. San Mateo, CA: Morgan Kaufmann, pp. 598–605. Miller G., Todd P. and Hedge S. (1989). Designing neural networks using genetic algorithms. In: Schaffer, J. (ed.), Proc. 3rd Int. Conf. Genetic Algorithms Their Applications. Morgan Kaufmann, pp. 379–384. Minai A. and Williams R.J. (1990). Back-propagation heuristics: A study of the extended delta-bar-delta. IEEE International Joint Conference on Neural Networks, 595–600. Najjar Y., Basheer I. and Hajmeer M. (1997). Computational neural networks for predictive microbiology: I Methodology. International Journal of Food Microbiology 34: 27–49. Weigend A., Rumelhart D. and Huberman B. (1991). Generalisation by weight elimination with application to forecasting. Advances in Neural Information Process System 3: 875–882. Williams P.M. (1995). Bayesian regularization and pruning use a Laplace prior. Neural Computation 7: 117–143.

REFERENCES

Zwietering M., Jongenburger I., Rombouts F. and Riet K. (1990). Modelling of the bacterial growth curve. Applied Environmental Microbiology 56: 1875–1881.

Baranyi J. and Roberts T.A. (1994). A dynamic approach to predicting bacterial growth in food. International Journal of Food Microbiology 23: 277–294.

Zwietering M., de Wit J.C., Cuppers H. and Van’t Riet K. (1994). Modelling of bacterial growth with shifts in temperature. Applied Environmental Microbiology 60: 204–213.