Available online at www.sciencedirect.com
ScienceDirect Procedia CIRP 33 (2015) 133 – 138
9th CIRP Conference on Intelligent Computation in Manufacturing Engineering - CIRP ICME '14
Cognitive Parameter Adaption for Model Based Control Systems Martin Schmida*, Simon Bergera, Gunther Reinhartb a Fraunhofer Project group for Ressource-efficient Mechatronic Processing Machines, Beim Glaspalast 5, 86153 Augsburg, Germany Technical University of Munich, department for tooling machines and industrial management, Boltzmannstraße 15, 85748 Garching, Germany * Corresponding author. Tel.: +49-821-568-8383; fax: +49-821-568-8350.E-mail address:
[email protected]. b
Abstract Modern printing machines are designed to produce big quantities with high product quality in a short time. Due to large dead times in the control system, the first sheets have an insufficient quality and need to be wasted. In the context of decreasing lot sizes the total production costs increase. To improve the resource and cost efficiency, a model based control system was developed. The model parameter will be adapted by a neural network according to all relevant influence parameters, which enables high simulation accuracy. A data base contains all relevant production parameters and influences, which are used to train the neural network. It is designed to learn the correlation between 26 different influence parameters and the resulting system behaviour without any manual assistance. This paper describes a method to handle irregular distributed data sets to improve the training performance by using cluster algorithms. The initialization of the net weights is done according to the Nguyen-Widrow algorithm and different training algorithms are compared. Additionally, different network topologies are tested automatically to identify the best suited network. To combine the real time simulation model with the non-deterministic training process, the system is divided into two platforms. With the described control system it is possible to reduce the waste up to 80 % at the start of the production. © 2014 The Authors. PublishedB.V. by Elsevier Published by Elsevier This is anB.V. open access article under the CC BY-NC-ND license Peer-review under responsibility of the International Scientific Committee of “9th CIRP ICME Conference". (http://creativecommons.org/licenses/by-nc-nd/4.0/). Selection and peer-review under responsibility of the International Scientific Committee of “9th CIRP ICME Conference” Keywords: Machine Learning; Adaptive Control
1. Progress in IT changes the Printing Business 1.2. Life Cycle Analysis shows prevailing Cost Factors 1.1. Increased Competition caused by changing Markets In the last two decades information technology (IT) evolved and the internet grew, which changed the flow of information in society. As a result a lot of information is available online nowadays for a wide range of people worldwide. Lower production costs enable the propagation of modern tablet computers and eBooks widely. Thus, these handy and portable devices push conventional printed media out of the market and thus decrease the demand for them. The total number of sales for books and papers goes down, so that they become less attractive for advertisements [1]. On the one hand the printing industry has to face the challenge of decreasing revenues and on the other hand huge printing capacities are available since printing machines are very durable. Consequently the competition increases and the profit decreases. Therefore the printing companies are forced to analyze their cost structure in order to minimize their expenses.
To find the dominating cost factors, the structure of the production expenditures was analyzed by the Fraunhofer project group for Resource-efficient Mechatronic Processing Machines. Figure 1 shows the distribution of expenses of a middle sized printing machine (48 pages per revolution). Such a machine prints one multicolor sheet with a size of app. 1.5 m² per revolution. This format corresponds to 24 DIN-A4 pages, which are double side printed. For this analysis common cost structure and cost ratio of the printing business are used [2]. High performance printing machines can print twice as many pages per revolution at a lower speed, which increases their total output per time by 75 %.
2212-8271 © 2014 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Selection and peer-review under responsibility of the International Scientific Committee of “9th CIRP ICME Conference” doi:10.1016/j.procir.2015.06.025
134
Martin Schmid et al. / Procedia CIRP 33 (2015) 133 – 138
complexity of the printing process and the huge area of possible machine settings, it is difficult to improve the efficiency sustainably. So the efficiency of a machine depends on the operator and thus can fluctuate.
Life cycle cost analysis of an offset printing machine ink removal desposal purchase cost operating materials energy
paper
1.4. Maximizing the Performance by Automation When the batch size decreases, using automation systems can hold the productivity constant. This is performed by reaching the desired product quality earlier, compared to conventional systems. Powerful and cheap computer enable an improved process control and as a result the resource efficiency can be increased easily.
interests indirect costs
maintenance occupancy costs labor
Fig. 1. Production costs in a printing room.
1.5. Quality Criterions for the Printing Quality The investigation shows that the consumables are the most prevailing factors and cause nearly 80 % of all cost. The purchase cost for the machine or the labor and energy costs are significantly smaller. Reducing the consumption of paper and ink has thus the biggest impact on the cost structure. 1.3. Cost Reduction by Applying Resource Efficiency Modern printing machines process up to 4000 kg paper per hour and generate high quality printing products. Integrated measurement devices recognize even tiny quality deficits, so high quality standards can be guaranteed. Sheets, which do not fulfil the quality requirements, are discarded. The relative amount of waste strongly depends on the printing image and on the batch size and is approx. 6 % on average. There are many diverse factors, which can cause paper wastage and many strategies and techniques have been developed to reduce it. Figure 2 shows methods to decrease the amount of wastage, which differ in their complexity of realization and their potential.
Different kinds of aspects are used to evaluate the printing quality, especially fold, cut-off compensator, colour register and the optical density. The optical density D v is defined by the logarithm of the ratio of the intensities of reflected light with and without ink on the paper, shown in equation 1 and figure 3. [3] ܦ ൌ ݈݃
ூಳ
(1)
ூೇ
IB is the reflected light intensity of a blank paper; IV is the reflected light intensity of a full printed area.
IB
No ink on the paper
IV
Ink on the surface and in the paper
P aper
Fig. 3: Principle of measuring the optical density. New processes
Life cycle efficiency potential
high
low
low
high Complexity of realization
The nominal value is in a range between 1.2 and 1.8. To start a measurement, a printed marker with a minimum optical density of 0.5 has to be found. Furthermore, the system behavior depends on the area coverage. The lower the area coverage is the slower the density reaches its stationary value. Both effects are shown in figure 4. 1,5
Fig. 2: Opportunities to improve the life cycle efficiency of a machine.
[-] 1,0
Optical
A mechanical and electrical upgrade of older or inefficient components is complicated and expensive. New machines or processes like digital printing machines can improve the efficiency eminently, but are linked with high costs. The easiest way to improve the efficiency is to reorganize the infrastructure and to train the operators. A multitude of parameters must be controlled to run a printing machine at an optimal quality level, for example the paper and ink characteristics or wear and deposits. Only very experienced operators can realize an improved process run. Due to the
Run of the optical density at different area coverage
0,5
Desired printing quality 30% 30 %FD area coverage 3% 3% FDarea coverage 0
1000
2000
3000
sheets
5000
Exemplare insheets Tausend Printed
Fig. 4: Optical density with different area coverage.
Martin Schmid et al. / Procedia CIRP 33 (2015) 133 – 138
The curves show the progress in optical density at production start. The batch with area coverage 30 % reaches its desired value very fast. The batch with lower area coverage of 3 % needs much more copies to reach the aim value. The first measured values are received at an optical density of 0.3, because this is the lowest measureable value. These effects cause an insufficient system behavior and as a result a poor resource efficiency at the beginning of a production lot.
2. Model-based Adaptive Control by Assistance of Learning Mechanisms To increase the resource efficiency an analytic simulation model of the printing machine was developed in cooperation with a printing machine manufacturer. [4] It uses machine settings to calculate the optical density at any time without dead times according to equation 2. ܩሺݏሻ ൌ
ଵା௦்
݁ ି௦்
(2)
The behavior of the system is described by the system gainܭ , the time constant ܶ and the dead timeܶ௧ , which represents the delay caused by the ink transport. Consequently the calculation is done in real-time so that a control loop can be closed. The simulation model calculates the amount of ink on the paper. In contrast the relevant quality criterion is the optical density. The ratio of ink and optical density, the so called ink efficiency, is an unknown parameter. For example it depends on the area coverage, the relative air humidity and the paper properties like gloss, porosity and roughness. [5] As mentioned before, the printing process is influenced by a lot of factors. So an adaption mechanism is necessary to get higher simulation accuracy and thereby to a constant high production quality. Yet no correlations between single influences and the ink efficiency could have been identified in our analysis. For example in figure 5 the paper roughness and the ink efficiency are compared. C orrelation analy s is Ink efficiency
2,5 2 -
1,5 1
135
learning algorithm runs autonomously, no manual supervision is possible.
3. Machine Learning in Real-World-Applications Machine learning systems consist of “computational methods using experience to improve performance or to make accurate predictions” [6]. There are several classes of machine learning known, which can be used for the parameter identification. The most important learning systems are neural networks, state vector machines or regression analyses. [7; 8; 9; 10; 11] In most cases machine learning is used to simulate biological [12, 13], technical [14, 15; 16] or organizational processes. [17; 18] In all cases no analytic process description is known. In contrast a machine learning system can reproduce any nonlinear and complex process behavior. For this a training run is necessary. The training data can either be generated by performing experiments or evaluating production data. Especially for research the training data are evaluated manually only one time, because the data acquisition is time consuming and expensive. In reality this is not sufficient to ensure a high accuracy. There an autonomous data acquisition and recurrent training runs are needed without manual actions.
4. Machine Learning for an Optimal Process Control 4.1. Requirements for the Integration in the Production Different requirements need to be fulfilled to integrate machine learning systems into real control applications. Many of them are technical issues like x stability of the closed loop control system, x robustness against disturbances and varying influences x Reaching the desired product quality in a short period. [19] Furthermore practical requirements exist as well. Some criterions are for example a reliable operation at the run time without manual intervention. Additionally the system hast to be able to adapt according variable production conditions or machine configurations. Users without special knowledge in computation or information technologies also have to be regarded to enlarge the acceptance of the system. [19] 4.2. Fundamentals of Neural Networks
0,5 0 0
50
100
150
Paper roughness
200 -
250
Fig. 5: Correlation analysis between paper roughness and ink efficiency.
A lower paper roughness tends to result in lower ink efficiency, but with a wide fluctuation range. Overall, there exist over 25 influences, which affect the printing process. Therefore, a machine learning algorithm is needed to calculate the ink efficiency according to the influences. Because the
There are different types of neural networks. If the correlation between input and output is time independent and a training data set is available, the most appropriate type is the feed-forward net. This is also called Multi-Layer-Perceptron (MLP). [19, 20] In figure 6 an example of a feed-forward net is shown with five inputs and one output.
136
Martin Schmid et al. / Procedia CIRP 33 (2015) 133 – 138
5 parameter P2
4
3 2 1
0 0
1 2 parameter P1
3
Fig. 6: Structure of a feed-forward neural network.
4.3. Data Acquisition To ensure an efficient training, the training data have to be analyzed automatically. The training data will be acquired continuously. So there is no possibility for a manual check of the data. The first step of preprocessing is to check the plausibility of the data sets. Incorrect data can be caused e.g. by breakdowns of devices or lost communication between different components. Furthermore, it is necessary to concentrate time series to key figures, which describe the process in a clear and unique way. These key figures need to be strictly defined for a plenty of different measured variables. One important point is the starting point of the stationary run. To identify this point of time the standard deviation of the process inputs and outputs has to be lower than a certain value. Thereby each production run can be concentrated to a single data set which is used for the training phase. 4.4. Automatic Data Selection and Clustering To train the neural network accurately, there have to be as many data sets as possible. Furthermore the entire parameter range needs to be covered. [21] Theoretically, experiments could be performed, generating a lot of data, which are spread over a wide range. This would be very cost intensive and would increase the expenses for the system. Hence, it is necessary to generate all needed data for the training of the neural network during production. In the real production there are operating points which are similar to each other. As a result some combinations of settings and influences occur very often and data cluster are designed. Figure 7 shows two dimensions, which could be physical parameters like the paper gloss and the printed ink. There are 20 different operating points marked.
Fig. 7: Illustration of an irregular distributed parameter space.
In the training the common used parameter combinations can be learned well. The remaining parameter combinations are only underrepresented in the training data set and so the net output is inadequate. To enable an accurate and generalized neural network at all parameter combinations, it is necessary to cluster the training data. All parameter combinations and cluster should be distributed uniformly. For this the Ward-agglomerative algorithm was used. At the beginning all data are clusters, containing just one item. At each step two clusters are merged to one to get a minimum variance between all clusters. [22; 23] There exist different stopping rules to determine the optimal number of steps. The most famous is the CalinskiHarabasz-index (CH-index). The optimum number is reached, when the CH-index get its maximum. [24] In figure 8 there have been 15 steps of the Ward-algorithm, so 4 clusters are left. The CH-index reaches its maximum with 37. The next step combines cluster 3 and 4 to one. This is marked with a dashed circle and reduces the CH-index to 26. 2
1
5 parameter P2
A feed-forward neural network consists of several neurons, which are linked with each other, multiplied by weights. Each weight amplifies the output from one neuron to the following neuron. The strength of the lines in figure 6 represents the weighting factors. Bold lines indicate a high weight and vice versa. An MLP can be used to reproduce any transfer function between the inputs and the output.
4 5 cluster operating point
3
4 3
2 4
1 0 0
1 2 parameter P1
3
4 5 cluster operating point
Fig. 8: Clustered operating points with a maximum CH-index.
The real production data are clustered automatically, so approximately 1500 data sets are available for the training. 5. Network Training and Validation 5.1. Optimal Network Training without Overfitting The neural network can change its transfer function arbitrarily. Therefore the weights get adapted. The goal of the training is to minimize the deviation between the net output oj and the desired output tj, which are included in the training data set. The performance index for the network quality is called mean squared error (MSE) according to equation 3.
137
Martin Schmid et al. / Procedia CIRP 33 (2015) 133 – 138 ଵ
ܧܵܯൌ σୀଵή ሺݐ െ ሻଶ
(3)
At the beginning of the training the weights are chosen randomly and so the MSE is at its maximum. Each training run decreases the MSE. In most cases the backpropagation or the resilient backpropagation learning rule is used. Both are based on the delta-rule. [11, 20] To prevent the neural network to recite the training data the available data are separated into a training data set and a validation data set. The training data set is used to adapt the weights, so the MSETraining sinks continuously. At the start of the training the MSEValidation, which is calculated on the validation data set, goes done in a similar way. The optimal network configuration without overfitting is the minimum of the MSEValidation-curve, which is marked in figure 9. The minimum MSE of the validation data set can be found automatically in an easy way. MSE
Optimal network configuration
MSEValidation
5.2. Analysis of Influence Parameters Experiments showed that some input parameters do not affect the system behavior significantly and thus they have been removed from the input of the neural network. Instead of 26 input variables only 19 parameters are used for the training and for the validation. Theoretical nonrelevant inputs should be weighted very low. Obviously this was not observed. In fact the MSE decreased from 0.026 to 0.016 with the lower number of inputs. This is additionally shown in figure 9. 5.3. Optimal Network Topology The network topology describes the structure and number of neurons in the different layers. In general it is not possible to predict the optimal network topology. That is why a predefined number of network topologies will be compared automatically. The network topology with the lowest MSE, based on the validation data set, is considered to be the best network for this application. Because the data base grows with each production run, the training is repeated frequently. At each run the optimal network topology is determined new.
MSETraining Training iteration
6. Accuracy of Predicted Model Parameters Fig. 9: Trends of the MSE for different data sets.
0,06 0,04
Resilient Propagation/ Reduced Inputs
Resilient Propagation
Dynamic Backpropagation
Momentum Backpropagation
0
Backpropagation
0,02 Arithmetic Average
MSEValidation
0,08
Neural Network
Fig. 10: Effects of different improvements.
The MSE of the arithmetic mean is similar to the neural network, trained with the backpropagation or dynamic backpropagation algorithm. In this case, both learning rules just got a local optimum in the error function. The momentum backpropagation learning rule reached an MSE of 0.035, which is better than the other learning rules. The best MSE with 0.026 could be reached with the resilient backpropagation algorithm.
For the application in a printing machine the neural network is applied to estimate the unknown model parameter “ink efficiency” In figure 11 the desired and actual outputs are compared for 20 data sets. 2,5
N etw ork output/ Training data
Different training algorithms can be found in the literature. The most common of them are implemented in software libraries like NEUROPH, ENCOG or equals. The learning algorithms have been compared with the same training and validation data. The training stops when the minimum MSE of the validation data set is reached. The results are compared in figure 10.
--2 1,5 1 0,5 0 0
5 10 Data s et Network output
-15
20
Training data
Fig. 11: Parameter estimation of the optimized neural network.
It can be seen that the actual output of the neural network is almost the desired value of the training data. The average value is about 1 and fits to the training data. The data sets 8 and 12, which are different to the average, are also well approximated. So the neural network identifies optimal model parameter just by regarding the influence parameters.
7. Economic Benefit of the Model Based Process Control With the optimal model parameter it is possible to calculate the optical density DV without a dead time. The time behavior of the optical density is then independent from the influences. In figure 12 different production run-ups are
138
Martin Schmid et al. / Procedia CIRP 33 (2015) 133 – 138
shown. The desired value is 1.6 with a tolerance band from 1.48 to 1.72.
The authors want to express their gratitude to the state government of Bavaria for its financial support and to the project sponsor VDI/VDE IT and the project partners for the good cooperation.
1,8 Dv
Optical density
Acknowledgements
1,2
0,6
References State of the art Usage of parameter adaption
0 0
100
200
time
s 300
400
Fig. 12: Shorter ramp-up with parameter adaption.
[1] [2] [3] [4]
The dashed lines are the conventional progress of the optical density. At a time of about 70 s the starting values vary in a range between 0.7 and 1.7 because of the impact of different influence parameters like the paper or ink properties. The tolerance band is reached at 300 to 400 s stably. The solid lines show the optical density with a parameter adaption by using neural network. At the same production conditions the starting values are in a range of 1.35 to 1.55. This is nearby the tolerance band, which is reached at 200 s. The reduced run-up time enables important savings of about 100,000 € per year because of reducing paper waste. Additionally costs can be reduced for ink, energy and personal.
[5] [6] [7]
[8]
[9] [10] [11]
[12]
8. Summary [13]
This paper describes a method for controlling processes with big dead times. A simulation model calculates the behavior of the process without any dead time and so the controller can react very dynamically. To ensure an optimal simulation quality a neural network is used to find best-fitting model parameter. For this, the last production runs are analyzed to build up a data base for a neural network. Because the system runs autonomously, there are some steps necessary. These steps include the selection of the training data and clustering similar data sets. Above that several training algorithms and also different network topologies are compared to each other. So a reproducible process control can be established which considers varying influence parameters as well as different qualities of the consumables. This approach reduces the resource wastage at production start and enables a much more efficient production with lower costs.
[14]
[15]
[16]
[17]
[18]
[19]
[20] [21]
9. Outlook The research activities demonstrate the ability to improve the life cycle cost of a printing machine. Further research will be done to transfer this system to other continuous processes.
[22] [23] [24]
Apenberg + Partner (2010): press release from 2010.04.28. Bundesverband Druck- und Medientechnik (2012): Die Druckindustrie in Zahlen, online source: http://www.bvdm-online.de/zahlen/ Kipphan (2002): Handbuch der Printmedien. Technologien und Produktionsverfahren; Springer Verlag, Berlin. Schmid, Martin et al. (2013): Cognitive Parameter Adaption in Regular Control Structures. ICINCO 2013. Wang, D. (1984): An investigation of the applicability of Walker and Fetsko ink transfer equation, Rochester, NewYork. Mohri, Mehryar; Rostamizadeh, Afshin; Talwalkar, Ameet (2012): Foundations of machine learning. Cambridge, MA: MIT Press. Rangwala, S. and Dornfeld, D. A. (1989): Learning and optimization of machining operations using computing abilities of neural networks, IEEE Transactions on Systems, Man, and Cybernetics, 19, page 299. Ramesh. R., Mannan M.A., Poo A.N. (2002): Support vector machines model for Classification of Thermal Error in Machine Tools, in: The International Journal of Advanced Manufacturing Technology, pp. 114-120, Anand, G. (2012): Application of Artificial Neural Networks in electrical Machines: An Overview. Publication. Isermann, Mario (2013): Kognitive Regelung komplexer Produktionsprozesse. Aachen: Apprimus Verlag. Rajagopalan, Ramesh; Rajagopalan, Purnima (1996): Applications of Neural Network in Manufacturing. In: IEEE (Hg.): International Conference on System Sciences, Hawaii. Carrasco, R.; Sanchez, E. N.; Carlos-Hernandez, S. (2011): Neural network identification for biomass gasification kinetic model. In: The 2011 International Joint Conference on Neural Networks: IEEE. Faridi, A.; Golian, A. (2011): Use of neural network models to estimate early egg production in broiler breeder hens through dietary nutrient intake. In: Poultry Science 90 (12). Poznyak, Alex; Chairez, Isaac; He, Haibo; Yu, Wen (2012): Dynamic Neural Networks for Model-Free Control and Identification. In: Journal of Control Science and Engineering 2012. Mayosky, M.A; Cancelo, I.E (1999): Direct adaptive control of wind energy conversion systems using Gaussian networks. In: IEEE Trans. Neural Networks. Zeng, G. M., et al. (2003): A neural network predictive control system for paper mill wastewater treatment. In: Special Issue on Applications of Artificial Intelligence in Process Systems Engineering 16 (2). Viharos, Zsolt Janos; Kis, Krisztian Balazs (2011): Support Vector Machine (SVM) based general model building algorithm for production control. In: IFAC: IFAC- World congress, IFAC, Milano. Rashmi, Sharma; Ashok, K. Sinha (2012): A Production Planning Model using Fuzzy Neural Network: A Case Study of an Automobile Industry. In: International Journal of Computer Applications. Rauch, Benjamin (2005): Optimierung eines Produktionssystems bei der VW AG mit Hilfe der Anlagen- und Logistiksimulation. Otto-vonGuericke-Universität Magdeburg. Hassoun, Mohamad H. (1995): Fundamentals of artificial neural networks. Cambridge, Mass: MIT Press. Edwards, A. W. F.; Cavalli-Sforza, L. L. (1965): A Method for Cluster Analysis. In: Biometrics 21 (2). Ward, J.H., 1963. Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association, Vol. 58. El-Hamdouchi, A.; Willett, P. (1986): Hierarchic Document clustering using ward's method. In: Conference on Research and Development. Handl, A. (2002): Multivariante Verfahren. Theorie und Praxis unter besonderer Berlin: Springer Berücksichtigung von S-PLUS.