Problem statement. Ganguli et al. (2006) applied re- current neural networks to model the power consumption of the SAG mill at the Fort Knox mine in Fairbanks, ...
Technical Papers
Refining automated modeling of operational data by identifying the most important input factors by Siddhartha Agarwal and Rajive Ganguli
Abstract n The mining industry collects a significant amount of operational data. However, gleaning useful information from the terabytes of data is difficult, and not just because of the sheer volume of the data. Therefore, an automated tool was developed at the University of Alaska Fairbanks to go through data and apply sophisticated statistical and neural network techniques in order to identify the data streams that are important to a process. This paper presents results from the tool as applied to SAG mill data from a gold mine. The results were compared to results achieved earlier with available commercial modeling tools. The comparison indicates that there was little or no loss in performance by automating the very complicated process of neural network modeling. Therefore, the intent of the exercise, to examine if complicated modeling tasks can be automated, was realized. Mining Engineering, 2011, Vol. 63, No. 12, pp. Official publication of the Society for Mining, Metallurgy and Exploration, Inc.
Introduction
The mining industry collects a lot of real-time operations data. However, given the sheer volume of the data, it is very difficult to monitor and utilize the data effectively. Therefore, the information in the data is often not completely extracted, especially when it requires advanced mathematical skills. Mine sites rarely have engineers withthe skills or the time to apply mathematical modeling to operations data. Siddhartha Agarwal and Rajive Ganguli, members SME, are ___[TITLES] at the University of Alaska Fairbanks, Fairbanks, AK. Paper number TP-10-069. Original manuscript submitted November 2010. Revised manuscript accepted for publication April 2011. Discussion of this peer-reviewed and approved paper is invited and must be submitted to SME Publications Dept. by March 31, 2012. www.miningengineeringmagazine.com
Mathematical modeling is a logical, quantitative process and, therefore, can be programmed. However, modeling can be complex and is easily impacted by nuances. For example, Ganguli et al. (2006) discovered, when applying neural networks to model semi-autogenous grinding (SAG) mill power draw, that not all seemingly relevant inputs were relevant. Irrelevant or spurious inputs can compromise model performance. Automation of modeling will, therefore, need to handle such complexities. This paper explores the viability of automating the modeling process by developing a Microsoft Excel-based neural network software that automatically applies neural network technology to data from a semi-autogenous grinding (SAG) mill. The success of the approach was measured against the manual modeling process using a commercial neural network software. Success of automated modeling approach would indicate that the industry could deploy such tools to make up for the lack of modeling expertise at mine sites.
Problem statement
Ganguli et al. (2006) applied recurrent neural networks to model the power consumption of the SAG mill at the Fort Knox mine in Fairbanks, AK. SAG density, bearing pressure, rpm, noise, recycle and feed rate were used to model the power draw. The modeling exercise was fairly successful: the coefficient of determination (COD or R2) of the prediction dataset was 0.87. However, when the inputs were limited to rpm, recycle and feed rate, the COD improved to 0.91, without impacting the mean absolute error. The exercise was helpful in identifying the challenges that would need to be addressed by an automated tool, especially one using neural networks at its core: • • •
Conditioning of data. Selection of modeling parameters and process. Evaluation of competing models.
Mınıng engıneerıng
december 2011
47
Table 1 Results from the automated tool. Six inputs
Three inputs
Four inputs
Prediction-set coefficient of determination (COD)
0.84
0.81
0.82
Prediction-set root mean square error (RMSE)
330.95
406.4
347.22
Bias
-13.08
-17.44
-10.85
•
Elimination of irrelevant inputs.
These challenges are discussed next. Conditioning of data. Ill-conditioning in data manifests itself in many ways: failure to converge or converging to a solution with a large variance, poor prediction in some regions, over-fitting of data and extreme sensitivity to variation in input, i.e., small changes in input bringing dramatically different results (Saarinen et al., 1993). Causes of ill-conditioning in neural networks include large values of initial weights and biases, large values of inputs and outputs, extreme number of weights and biases and limits of the algorithm (Sarle, 1997). Selection of modeling parameters and process. Many popular modeling techniques, especially neural networks, require the appropriate selection of parameters. In neural networks, these include factors such as the number of weights and biases, the number of layers and activation functions. Some of these selections also impact the conditioning of data. In the application of neural networks, the modeling process is also important, including algorithm selection and length of training. Evaluation of competing models. This is, in some ways, one of the easiest aspects of automated modeling. Models are evaluated using statistics such as root mean square error (RMSE), COD of a prediction set or Akaike information criteria or AIC (Anderson and Burham, 2002). However, there is always the struggle between reducing the RMSE, increasing the COD, and parsimony of parameters. Elimination of irrelevant inputs. Whether an input is important for a model’s process can be determined by eliminating it as an input. If the modeling performance remains the same or improves, it can be assumed that the input was not useful for modeling. Note that a perfectly legitimate input may not be selected as an input for modeling if the sensor generating the data for the input is faulty or out of calibration. Thus, it should not be automatically assumed that an input not selected for modeling is truly irrelevant. An input not selected for modeling should be investigated for relevancy and could help identify an out-of-calibration sensor.
Tool development 48
december 2011
Mınıng engıneerıng
A Microsoft Excel tool was developed to apply neural network modeling to SAG mill operational data with minimal human intervention. It was designed to handle the complications listed above. The reader is directed to Hagan et al. (1996) if they need a primer on neural networks. Features of the tools include: 1. Data normalization in the [-1,1] range. This helps condition the data. 2. Number of layers set to one. This is because many researchers, such as Hagan et al. (1996), contend that a single layer is sufficient to model any problem. 3. The number of neurons was fixed at 15. However, the user can specify a different number. 4. Multiple runs for each model. Since one cannot guarantee arriving at a global minima with neural networks, it is always advisable to run a model numerous times with different initial weights. 5. Random activation functions for hidden neurons. The activation functions in the hidden layer were randomly set to Gaussian, Gaussian-complement, hyperbolic tangent or logsig. The activation function for the output layer was fixed at linear. 6. The length of training is determined by the quick stop process in which training is stopped when the error of the independent calibration subset bottoms out. 7. Insignificant inputs were identified by systematically eliminating each input and comparing network performance to the best network performance. Therefore, this was a massively combinatorial exercise. Since 6 inputs can generate 720 (6!) combinations of inputs, the minimum number of inputs was set to 3, to minimize the computations.
Results
The tool was tested on a subset of the dataset used by Ganguli et al. (2006). The intent, as with Ganguli et al. (2006), was to model power draw as a function of the six inputs. The data consisted of 18,652 minutes of operational data (one row per minute). Of this, 3,730 rows were used for calibration and an equal amount for prediction. Table 1 shows the results from the modeling exercise. To serve as a benchmark, in an initial run the tool was forced to utilize all six inputs. When allowed to be free, it eliminated www.miningengineeringmagazine.com
density and noise as inputs, since the performance of the neural network was very similar to that of a “six input” model. The reader will note that, as a benchmark against a commercial modeling product, the performance of the automated six-input version was very similar to that obtained by Ganguli et al. (2006), who used a commercial software and a manual modeling process. As reported in a previous section, Ganguli et al. (2006) achieved a COD of 0.87 with six inputs and 0.91 with three inputs. Direct comparison between the two is, however, not entirely valid, for two reasons: 1) they used slightly different data (20,120 rows versus 18,652) and 2) they used two different classes of neural networks, timedomain-based recurrent networks by Ganguli et al. (2006), and non-time-domain-based networks in this paper. Despite these discrepancies, there are no dramatic differences in the performance at the six-input level. The four-input performance, however, was somewhat lower than that obtained with three inputs by Ganguli et al. (2006). When the tool was limited to the same three inputs as Ganguli et al. (2006), the performance was lower than the one they achieved (COD of 0.81 compared to 0.91).
Discussion
The results demonstrate the promise of automated modeling. The automated tool had a similar or somewhat lower performance than the manual expert modeling using a commercial tool. This was despite the fact that the expert used a more advanced neural network (recurrent network). It is very possible that an advanced automated tool would have matched the expert performance. After all, the logical process followed by the expert can be fully automated. Additional challenges to automation that were not discussed here include: 1. Need to preprocess data. Sensors often produce error state data (garbage) that needs to be filtered out
www.miningengineeringmagazine.com
by the tool. 2. Computational intensity. The automated tool would be very intense computationally. The developed tool was fairly simplistic (fixed number of neurons and limitations on inputs), and it handled a small amount of data. A more real-world tool would have to be more robust. It would need to exploit the most advanced computational technologies (hardware and programming). With some common sense, automated tools can, however, still be limited appropriately to be useful without infinite computational need.
Conclusions
An automated neural network tool was developed in Microsoft Excel to model power draw in a SAG mill. The performance of the automated tool was similar or slightly lower than that of an expert using commercial software. The tool was also able to reduce the number of inputs and identify the important factors. Therefore, the tool demonstrated that complicated tasks such as modeling and identification of important inputs can be automated, a key to the development of a commercial application. n
References
Anderson, D.R., and Burham, K.P., 2002, “Avoiding pitfalls when using information-theoretic methods,” Journal of Wildlife Management, Vol. 66, No. 3, pp. 912-918. Ganguli, R., Dutta, S. and Bandopadhyay, S., 2006, “Determining relevant inputs for SAG mill power draw modeling,” Advances in Comminution, S.K. Kawatra, ed., Littleton, CO: Society for Mining, Metallurgy and Exploration, Inc., pp. 161-168. Hagan, M.T., Demuth, H.B., Beale, M., 1996, Neural Network Design, PWS Publishing Company. Saarinen, S., Bramley, R., and Cybenko, G., 1993, “Ill-conditioning in neural network training problems,” SIAM J. Sci. Comput., Vol. 14, No. 3 (May), pp. 693-714. Sarle, W.S., ed., 1997, Neural Networks FAQ, ftp://ftp.sas.compub/neural/ FAQ.html, Accessed June 2010.
Mınıng engıneerıng
december 2011
49