Neurocomputing 149 (2015) 397–404
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
Sparse Bayesian extreme learning machine and its application to biofuel engine performance prediction Ka In Wong a,n, Chi Man Vong b, Pak Kin Wong a, Jiahua Luo b a b
Department of Electromechanical Engineering, University of Macau, Macau, PR China Department of Computer and Information Science, University of Macau, Macau, PR China
art ic l e i nf o
a b s t r a c t
Article history: Received 2 August 2013 Received in revised form 16 September 2013 Accepted 20 September 2013 Available online 10 September 2014
Biofuels are important for the reduction of engine exhaust emissions and fossil fuel consumption. To use different blends of biofuels, the electronic control unit (ECU) of the engine must be modified and calibrated. However, the calibration process of ECU is very costly and time-consuming. Therefore, most of the engines can only use one specific biofuel blend; otherwise the engines cannot run properly. To alleviate this problem, a mathematical engine model can be used for predicting the engine performance at different ECU settings and biofuel blends so that the ECU can be re-calibrated in real-time via some controllers. The prediction of the engine model must be very fast and accurate for such online control purpose. It must also be very compact due to the limited memory size of the ECU. As a result, a new method called sparse Bayesian extreme learning machine (SBELM) is proposed in this paper to fulfill these requirements of the mathematical engine model for fast engine performance prediction and ECU online re-calibration. Experiments were conducted to compare SBELM with conventional ELM, Bayesian ELM (BELM) and back-propagated neural network (BPNN). Evaluation results show that SBELM can perform at least similar to, but mostly better than, ELM, BELM and BPNN, in terms of prediction accuracy. In terms of execution time, model size, and insensitivity to hidden neuron number, SBELM completely outperforms the other three methods. By these results, SBELM is verified to better fulfill the practical requirements of mathematical engine model for online engine performance prediction. & 2014 Elsevier B.V. All rights reserved.
Keywords: Extreme learning machine Sparse Bayesian Biofuel Dual-fuel engine Engine performance
1. Introduction The rapid growth of motorization and over-dependence on fossil fuels lead to the search for promising renewable, sustainable, clean and reliable alternative fuels. Biofuels are some common types of alternative fuels to gasoline that can help to reduce the undesirable engine emissions. There are different blends (or ratio) of biofuel with gasoline to fulfill the requirements of different regions in the world. However, in order to use a specific biofuelgasoline fuel blend, the engine control system (i.e., hardware) must be modified to fit the combustion property of that blend. Furthermore, the engine controllable parameters (i.e., software), such as fuel injection time and ignition timing, must also be adjusted to optimize the engine performance for that blend. In most modern automotive engines, these engine controllable parameters are stored and controlled by the electronic control unit (ECU). Therefore, the group of engine controllable parameters is usually referred to ECU setup.
n
Corresponding author. E-mail address:
[email protected] (K.I. Wong).
http://dx.doi.org/10.1016/j.neucom.2013.09.074 0925-2312/& 2014 Elsevier B.V. All rights reserved.
To calibrate the ECU setup, traditional practical method is by trial-and-error. The process involves measurement of many engine performance data, and requires a large amount of fuel and expensive equipment such as programmable ECU, engine dynamometer, engine emission analyzers and fuel-consumption detector. Additionally, there are a large amount of ECU parameters for adjustment. Numerous tests have to be conducted before an optimal ECU setup can be obtained. Obviously, the ECU calibration process is costly in fuel and time, and every engine model must undergo similar calibration process in the engine development cycle [1,2]. Even for an experienced automotive engineer, the tune up and calibration of an ECU for an engine can take more than one year and can cost millions of dollars [3]. As a result, there is no universal ECU setup for various biofuel blends; the ECU setup in the market is only suitable for either purely gasoline or one specific biofuel blend. In fact, such tedious and costly calibration process can be alleviated if a reliable mathematical engine model is known because all the costly data can be predicted by the model. Then, by using some model predictive controllers and optimizers based on the engine model, the ECU setup can be re-calibrated online to suit different biofuel blends. However, this mathematical model is
398
K.I. Wong et al. / Neurocomputing 149 (2015) 397–404
very difficult to be determined because the relationship between the input (i.e., ECU setup) and output parameters (i.e., engine performance) of the engines is a complex multi-variable nonlinear function, and a huge number of combinations of engine control parameters need to be handled. Using different biofuel blends may also alter the relationship between the input and output parameters of the engine, leading to poor accuracy [4]. Moreover, the current engine control system in the market is a real-time system of limited memory. Hence, execution time and model size are the additional difficulties for successful application of the mathematical engine model. To address the aforementioned concerns, this study aims to create an accurate, fast, and compact biofuel engine model. In the current literature [5–10], many researches already attempted to apply artificial neural networks (ANNs) to model and predict the spark-ignition (SI) engine performance, either with or without using biofuels. Traditional ANN approach, such as backpropagation neural network (BPNN), was adopted in these studies. However, according to [1,11–13], this approach has the drawbacks of slow learning and execution speed, large network size and poor generalization performance, which together make the engine models unreliable and unsuitable for practical use. Support Vector Machines (SVM) were popular in the last two decades and they were designed to overcome the drawbacks of BPNN. However, the size of SVM model is usually large for a large training dataset because the number of selected support vectors increases when the size of training dataset increases. In addition, an SVM model of more support vectors takes longer execution time so that SVM may not fit the current requirements of a mathematical engine model. Extreme learning machine (ELM) [14,15], as a popular approach in recent years, has the capability of providing good generalization performance, extremely fast learning speed, but its execution time is quite unstable, depending on the number of hidden neurons (network size). Although a kernel based ELM [15] was proposed that does not require hidden neurons and tends to provide better accuracy than basic ELM, it still suffers from the same issues of SVM (i.e., execution time, memory size). Besides, these two ELM versions learn the output weights with the use of least-squares method, which can easily overfit to the training data. In this study, a new version of ELM, called sparse Bayesian ELM (SBELM), is proposed. The proposed SBELM is similar to Bayesian ELM (BELM) [16], which learns the output weights of basic ELM based on parametric Bayesian methods. Both BELM and SBELM attempt to estimate the probability distribution of the output values, and hence the overfitting problem of conventional ELM is solved. The main difference from BELM is that SBELM determines the output weights by means of sparse Bayesian learning [17,18], where independent regularization priors are imposed on each weight instead of one shared prior for all weights. Therefore, some weights with large regularization priors are automatically tuned to zero. Those hidden neurons corresponding to zero weights can then be pruned, leading to a sparse network. With the advantageous properties of SBELM, a compact, fast and accurate engine model can be constructed, which is more favorable for this application. In addition, it is worth noting that SBELM for multiclassification problem has been proposed by some authors of this study [19]. The rest of this paper is organized as follows. The proposed SBELM approach is provided in Section 2. The experimental setup for sample data acquisition is then described in Section 3. For demonstration purpose, the biofuel used in this study is ethanol. The construction of the engine model using SBELM is discussed in Section 4, in which the model performance is evaluated by comparing the results with basic ELM, BELM and traditional BPNN. Finally, the conclusion to this study is summarized in Section 5.
2. Sparse Bayesian extreme learning machine (SBELM) 2.1. Review of ELM and BELM ELM is a new learning scheme for single-hidden-layer feedforward network, in which the hidden node parameters are initialized randomly and the output weights are optimized using Moore–Penrose pseudoinverse a set of N [14,15]. Considering training samples D ¼ xi ; yi ; i ¼ 1; …; N , with each xi being a d dimensional input vector and yi as the target scalar output, the goal of ELM is to find the relation between xi and yi. For a singlehidden-layer feedforward network with L hidden nodes and activation function hð U Þ, there exists β and w such that Eq. (1) can be satisfied: L
∑ βk hk ðwk ; xi Þ ¼ hðw; xi Þβ ¼ yi ; 1 r i r N
k¼1
ð1Þ
where hðw; xi Þ ¼ ½h1 ðw1 ; xi Þ; ⋯; hL ðwL ; xi Þ is the hidden layer feature mapping, w ¼ ½w1 ; ⋯; wL are randomly generated parameters of the hidden layer, and β ¼ ½β1 ; ⋯; βL T is the output weight vector. Eq. (1) can be compactly written as Hβ ¼ Y, with Y ¼ ½y1 ; ⋯; yN T and hidden matrix H expressed as: 2 3 h 1 ðw 1 ; x 1 Þ ⋯ h L ðw L ; x 1 Þ 6 7 ⋮ ⋱ ⋮ H¼4 ð2Þ 5 h 1 ðw 1 ; x N Þ
⋯
hL ðwL ; xN Þ
NL
As Eq. (2) is a linear system, ELM attempts to estimate the corresponding output weights with the smallest norm least squares method: β ¼ H† Y
ð3Þ †
where H is the Moore–Penrose pseudoinverse of H, which can be calculated using the orthogonal projection method [20]: 1 1 H† ¼ HT H HT when HT H is nonsingular, or H† ¼ HT HHT when HHT is nonsingular. As this least squares method may lead to overfitting, a penalty term can be added to the diagonal of HT H or HHT for regularization purpose [15]. However, the optimum value of this penalty term is still subjected to the minimization of validation error [18]. BELM, in contrast, determines the output weight with Bayesian inference method [16]. To prevent overfitting to noisy data, each observed yi is assumed to have an independent noise component ϵi which is Gaussian distributed with zero mean and variance σ2. That is, yi ¼ hðw; xi Þβþ ϵi , with p ϵi jσ 2 ¼ N ð0; σ 2 Þ. The probabilistic model is then given as: p yi jH; β; σ 2 ¼ N yi jhðw; xi Þβ; σ 2 : ð4Þ For all the training samples, the likelihood function is obtained as: N p YjH; β; σ 2 ¼ ∏ p yi jH; β; σ 2 i¼1
" 2 # N y hðw; xi Þβ 1 : ¼ ∏ pffiffiffiffiffiffiffiffiffiffiffiexp i 2 2σ 2 i ¼ 1 2πσ
ð5Þ
Then, in order to penalize large weights, a natural prior distribution is given as: h α i α 2L exp βT β ð6Þ pðβjαÞ ¼ N β0; α 1 I ¼ 2π 2 where I is the identity matrix and α is a shared prior. As both the prior distribution and the likelihood function follow Gaussian distribution, the posterior is also a Gaussian distribution, pðβjY; H; α; σ 2 Þ ¼ N ðβjm; SÞ, with the mean value m and covariance S
K.I. Wong et al. / Neurocomputing 149 (2015) 397–404
Σ ¼ ðdiagðαÞ þσ 2 U HT UHÞ 1 :
defined as: m¼σ
2
T
U S UH U Y
ð7Þ
S ¼ ðαI þ σ 2 U HT U HÞ 1 :
ð8Þ
Since the posterior distribution of the hyperparameters α and σ2 in Eqs. (7) and (8) is p α; σ 2 jY; H p p YjH; α; σ 2 pðαÞp σ 2 , the optimal values of the hyperparameters can be obtained with the use of type-II maximum likelihood (ML-II) or evidence procedure [16,21]. The process involves the maximization of the marginal R likelihood p YjH; α; σ 2 inferred from integral p YjH; β; σ 2 pðβjαÞdβ. Using expectation maximization or differentiating the marginal log-likelihood function log p YjH; α; σ 2 with respect to α and σ2 and setting them to zero, the optimal conditions can be obtained as two fixed-point equations: αnew ¼
L α Utrace½S mT m
σ 2new ¼
ð9Þ
2 ∑N i ¼ 1 yi hðw; x i Þm : N L þα U trace½S
ð10Þ
By initializing α and σ2, m and S are updated iteratively via Eqs. (7)–(10) until convergence. The resulting m can then be used for predicting new output ynew based on unseen input xnew , which follows the distribution: p ynew jhðw; xnew Þ; m; α; σ 2 ¼ N hðw; xnew Þm; σ 2 ðxnew Þ ð11Þ where σ 2 ðxnew Þ ¼ σ 2 þ hðw; xnew ÞT US U hðw; xnew ÞT
ð12Þ
Unlike basic ELM, the regularization term of BELM, α, is a natural consequence of Gaussian process [16]. Therefore, BELM does not require any user-defined term for regularization, and should gain better generalization performance than basic ELM. 2.2. Proposed SBELM Although BELM can overcome the overfitting problem of basic ELM with Gaussian regularization, it uses only one single shared hyperprior α for penalizing the cost function, which may not be flexible enough to control the precisions of the output weight and may result in underfitting [18]. SBELM resolves this issue by introducing an independent prior on each weight to govern the complexity. To begin with, the zero-mean Gaussian prior distribution in Eq. (6) is modified so that there are L independent priors instead of one shared prior: p βk jαk ¼ N βk j0; αk 1 ð13Þ rffiffiffiffiffiffi h α i αk exp k β2k 2π 2 k¼1
L pðβjαÞ ¼ ∏ p βk jαk ¼ ∏ L
k¼1
ð14Þ
where α ¼ ½α1 ; ⋯; αL T are the independent priors associated with each βk. Following the steps in BELM, the posterior distribution over the output weights is Gaussian p βjY; H; α; σ 2 ¼ N ðβμ; Σ Þ and can be estimated as: p YjH; β; σ 2 pðβjαÞ p βjY; H; α; σ 2 ¼ p YjH; α; σ 2
1 ¼ ð2πÞ L þ 1=2 jΣ j 1=2 exp ðβ μÞT Σ 1 ðβ μÞ 2 ð15Þ with μ and Σ being the mean value and covariance defined as: μ¼σ
2
T
UΣ UH U Y
ð16Þ
399
ð17Þ
It can be noticed that μ and Σ has a similar form as m and S of BELM. The only difference is that α has L independent values instead of only one. Moreover, it can be noticed that if any αk becomes infinity, the corresponding μk (kth mean) and Σkk (kth diagonal element of the covariance) become zero too: lim ∑kk ¼ lim ðαk þ σ 2 k Tk k k Þ 1 ¼ 0
ð18Þ
lim μk ¼ lim σ 2 ℰk HT t ¼ 0
ð19Þ
αk -1
αk -1
αk -1
αk -1
where k k is the kth column of H and εk is the kth row of Σ. Then, like the BELM, the hyperparameters α and σ2 in SBELM can be determined via ML-II. This time, the optimization involves the differential of the marginal log-likelihood function with respect to each αk and σ2, and the corresponding optimal conditions are [18]: αnew ¼ k
1 αk Σkk μ2k
σ 2new ¼
‖Y Hμ2 ‖ N ∑Lk ¼ 1 ð1 αk Σkk Þ
ð20Þ
ð21Þ
Again, after α and σ2 is initialized, μ and Σ are updated via Eqs. (16)–(21). The iteration is repeated until convergence. The resulting μ is used for predicting new data similar to BELM. Since some of the elements in μ are tuned to be zero during learning phase, only those non-zero elements are used for predicting new data: ynew ¼ hactive ðwactive ; xnew Þμactive
ð22Þ
where hactive ð U Þ is the vector of active hidden nodes and μactive is the vector containing those non-zero μk. In SBELM, each output weight is associated with an independent prior, so the complexity or the precisions of the output weights can be controlled effectively. Furthermore, when the priors αk tend towards infinity during learning process, the associated weights μk tend to become zero too. Consequently, these weights can be deleted and their corresponding hidden nodes hk ðwk ; xi Þ can be pruned. As a result, only those hidden nodes with best contribution hactive ð U Þ are left, and sparsity can be achieved.
3. Experimental setup for sample data collection 3.1. Test fuel Among the variety of biofuels nowadays, ethanol is the most competitive and attractive one because it produces relatively less pollutants when burnt and can be easily obtained from many sources such as the fermentation of corn [22]. Many studies already discussed the use of ethanol-gasoline fuel blends on SI engine [23–28]. These studies show that the addition of ethanol to gasoline fuel not only improves the combustion efficiency and stability, but also significantly reduces the carbon monoxide (CO) and hydrocarbon (HC) emissions. The high octane rating of ethanol also allows the engine to operate at higher compression ratios. Therefore, for demonstration purpose, ethanol was selected as the alternative fuel for gasoline. 3.2. Test engine & ECU A high performance electronic controlled, water-cooled, 4-cylinder gasoline engine was employed as a base engine for modification and its specifications are provided in Table 1. The
400
K.I. Wong et al. / Neurocomputing 149 (2015) 397–404
Table 1 Engine specifications. Base model
Honda K20A – Type R
Type Cylinder arrangement Bore and stroke Displacement Compression ratio Valve train Maximum power Maximum torque Features New features after modification
Water-cooled, four-stroke, DOHC i-VTEC Inline four-cylinder, transverse 86 86 mm2 1998 cc 11.5:1 Chain drive, 16 valves 160 kW @ 8000 rpm 196 N m @ 7000 rpm Variable valve timing and lift control Variable differential injection timing for gasoline and ethanol Variable gasoline to ethanol ratio Variable supplementary air intake
engine was modified to fit eight injectors, four of which were for gasoline and the remaining four were for ethanol. A MoTeC M800 programmable ECU was adopted in this study to control the engine. The ECU not only was suitable for eight injector outputs, but also could provide variable valve timing & lift, and supplementary air intake control functions, so the normal position of the intake valve could be adjusted for different amount of air intake. With the modified engine and the programmable ECU, variable differential injection timing for different injectors also became available (i.e., the injection timing for gasoline and ethanol can be different). Therefore, the gasoline to ethanol ratio could be controlled dynamically during different engine operating condition. 3.3. Experimental setup To collect sample data at different ethanol-gasoline ratios and ECU settings, experiments were set up as illustrated in Fig. 1. In this experimental setup, the fuel consumption was calculated by the ECU based on the injector flow rate and total injection time. The exhaust gases from the engine, including HC, CO and carbon dioxide (CO2), were measured on a continuous basis using a FGA-4100A automotive gas analyzer. This gas analyzer used nondispersive infrared sensor method to measure HC, CO and CO2 concentrations. The tolerances of the measurement are within 75% (relative error). Before each test, the gas analyzer was calibrated with standard sample and zero gases. As the gas analyzer is not suitable for measuring nitrogen oxides (NOx) concentrations, a lambda sensor was installed at the exhaust pipe to measure the lambda (λ) values, which could indicate the engine air-fuel ratio. When the λ value is higher than 1, more NOx will be produced, so it could be used to reflect the NOx emissions. 3.4. Experiment procedure Experiments were carried out at different ECU settings. Although there are many parameters in the ECU, most of them do not significantly affect the engine performance. Hence, only those basic controllable parameters greatly affecting the engine performance were adjusted in this study. These parameters were the fuel injection time, ignition advance, normal position of the intake valve and valve timing. For demonstration purpose, only the ECU settings at speeds from 1000 rpm to 2500 rpm were altered and the corresponding engine performance data were collected. Furthermore, the experiments were also conducted under a total of eleven different ethanol–gasoline ratios varied from 0 to 100 (i.e., 0%, 10%,…, 90%, 100% volume of ethanol). To ensure the repeatability and comparability of the measurements, the coolant temperature was automatically controlled by a temperature controller to 8075 1C. To avoid the density of air from varying a lot in
Fig. 1. Schematic diagram of experimental setup.
Table 2 Standard errors of the experiment results. Parameter
Standard errors (%)
Fuel consumption λ HC CO CO2
2.77 1.30 4.95 4.34 1.07
different experiments, the inlet air temperature was controlled to 3072 1C. Totally 130 different settings were tested in this study. For each ECU setup and ratio, data were recorded after the engine had reached the steady state, which was indicated by the engine temperature and CO2 concentrations. For the purpose of reducing experimental uncertainties, the data were recorded continuously for 5 min, and each test was carried out three times. The results of the three tests for each setting were found to agree with each other within the experimental uncertainties of the measurements, and the corresponding standard errors were shown in Table 2. The average values of the three tests were used.
4. Engine model design 4.1. Inputs, outputs & activation function As the objective of this study is to create an engine model suitable for predicting the engine performance at different ECU settings, all the adjusted parameters from the experiments were selected as the inputs, including the fuel injection time, ignition advance, idle valve position, ethanol-gasoline ratio and the corresponding engine speed, while all the engine performance parameters, including the fuel consumption, the λ value and the concentrations of HC, CO and CO2 emissions, were employed as the outputs. Moreover, to evaluate the performance of SBELM and
K.I. Wong et al. / Neurocomputing 149 (2015) 397–404
demonstrate its effectiveness, engine models were also created using basic ELM, BELM and traditional BPNN. Radial basis function was used as the activation function for SBELM, basic ELM and BELM, while tangent-sigmoid function was used for BPNN.
From the experiments, totally 130 sample data sets corresponding to different ECU settings and ethanol-gasoline ratios were acquired. They were randomly divided based on the ratio of 4:1. That is, 104 of the 130 data sets were used as the training data and the remaining 26 sets were used as the testing data. The training data sets were used to build the model, while the testing data sets were used to verify the model generalization capability. Moreover, before training the models, all the input and output in the data sets were linearly normalized to the range [0, þ1] in order to increase the model accuracy and prevent any parameter from dominating the output values [29].
Lambda value
20
15 MAPE
4.2. Data division
401
SBELM Training SBELM Validation ELM Training ELM Validation BELM Training BELM Validation BPNN Training BPNN Validation
10
5
0
0
20
40
60
80
Initial number of hidden nodes Fig. 3. Selection of hidden nodes number for lambda value using LOOCV.
HC emission
30
4.3. Performance indices
SBELM Training SBELM Validation ELM Training ELM Validation BELM Training BELM Validation BPNN Training BPNN Validation
To illustrate the performance of the modeling methods, the training time (ttrain) and executing time for all the testing data (ttest) are logged, and the prediction error is presented by mean absolute percentage error (MAPE). They are evaluated against the experimental data sets using Eq. (23): 1 Nt yni yi MAPE ¼ 100% ð23Þ ∑ Nt i ¼ 1 yi where yni is the ith model prediction value corresponding to xi, yi is the actual experimental value corresponding to xi, and Nt is the number of data points used to calculate the MAPE. Generally, the smaller the MAPE, the better the model accuracy is.
MAPE
20
10
0 0
20
40
80
Fig. 4. Selection of hidden nodes number for HC emission using LOOCV.
4.4. Parameter tuning
CO emission
30
SBELM Training SBELM Validation ELM Training ELM Validation BELM Training BELM Validation BPNN Training BPNN Validation
20 MAPE
It is well-known that the performance of ELM and BPNN is sensitive to the number of hidden neurons, while the choice for the optimal number of hidden neurons is still unknown even though there are several methods for determining this number [30,31]. For fair comparison, the simplest leave-one-out crossvalidation (LOOCV) method is adopted to determine the optimal number. The training error and validation error are both presented by MAPE. The MAPE results of the LOOCV on hidden neuron number selection for the three methods are presented in Figs. 2–6. From Figs. 2–6, it can be seen that after the number of hidden neurons exceed 20, the performance of SBELM remains relatively the same even though the hidden neurons are continuously added
60
Initial number of hidden nodes
10
0
0
20
40
60
80
Initial number of hidden nodes Fig. 5. Selection of hidden nodes number for CO emission using LOOCV.
Fuel consumption
CO2 emission
50
30
SBELM Training SBELM Validation ELM Training ELM Validation BELM Training BELM Validation BPNN Training BPNN Validation
30
20
10
0
0
20
40
60
80
Initial number of hidden nodes Fig. 2. Selection of hidden nodes number for fuel consumption using LOOCV.
SBELM Training SBELM Validation ELM Training ELM Validation BELM Training BELM Validation BPNN Training BPNN Validation
20 MAPE
MAPE
40
10
0 0
20
40
60
80
Initial number of hidden nodes
Fig. 6. Selection of hidden nodes number for CO2 emission using LOOCV.
K.I. Wong et al. / Neurocomputing 149 (2015) 397–404
to the model. However, for ELM and BPNN, when the number of hidden neurons increases, both the training error and validation decreases first, and then the training error keeps on decreasing while the validation error turns to increase. This phenomenon reflects the overfitting issue of ELM and BPNN, and demonstrates that both SBELM and BELM have successfully overcome this drawback. Moreover, the performance of both SBELM and BELM tends to be relatively insensitive to the number of hidden neurons, while SBELM can usually perform better than BELM. This also indicates that only one single shared hyperprior in BELM may not be sufficient to govern the complexity of the model. Furthermore, as indicated in Section 2.2 that most of the redundant hidden neurons in SBELM are pruned during model training, the hidden neuron number determined by LOOCV for SBELM is only used as the initial hidden neuron number, which may not be the number of active hidden neurons in the trained SBELM model. Details of the active hidden neuron number are discussed in Section 4.5. 4.5. Modeling results and discussion All the modeling algorithms were implemented using MATLAB R2012a and executed under Windows 7 on a computer with Intel Core i7 processor (3.4 GHz with 8 MB L3 cache) and 8 GB RAM onboard. Each algorithm was run for 100 times with arbitrary initial hidden node parameters generated from the standard uniform distribution on the open interval ( 1, þ1), and the average results over the 100 runs were presented. The prediction MAPE over the testing data sets and the corresponding standard deviation are summarized in Table 3. The computational complexity, including the training time (ttrain), executing time (ttest) and the active hidden neuron number (Lactive) in the trained model, is provided in Tables 4 and 5 respectively. Table 3 shows that SBELM, ELM and BELM are superior to BPNN, while SBELM can achieve similar or even better performance than ELM and BELM. From the overall results, the MAPE of SBELM is 0.11%, 0.08% and 1.66% (absolute error) better than those of ELM, BELM and BPNN, respectively. Its deviation over 100 runs is also smaller than those of ELM, BELM and BPNN, showing that the performance of SBELM is quite stable. Table 4 reveals the extremely slow training time and executing time of BPNN, which is the major reason why traditional BPNN is not acceptable for this application. On the other hand, although the training time of SBELM is longer than that of ELM, its execution time is a little bit faster than those of ELM. Since the current application is for online model prediction, the execution time and prediction accuracy are far more important than the training time as the model does not require further training once it has been trained. Hence, SBELM should be more favorable. In addition, Table 5 shows that the active number of hidden nodes for SBELM is much less than those of ELM, BELM and BPNN, which means that the execution time and memory size of the trained model are significantly
reduced. In case the model is used as virtual sensors like the current application, smaller memory size is more beneficial. From the modeling results, the active hidden neuron number of SBELM in the trained model is much less than the initial hidden neuron number. It is worth studying this interesting property of SBELM. Therefore, models were trained under different initial number of hidden neurons and the resulting number of active hidden neurons was recorded. The corresponding relation is provided in Fig. 7. It can be learnt that, when the initial number of hidden neurons increases, the number of active hidden neurons remains the same after the hidden number exceeds around 20. Table 4 Comparison of the training time and executing time of the engine models. Output parameters
SBELM
Fuel consumption λ HC CO CO2
ELM
BELM
BPNN
ttrain (ms)
ttest (ms)
ttrain (ms)
ttest (ms)
ttrain (ms)
ttest (ms)
ttrain (ms)
ttest (ms)
2.94 1.42 1.64 3.34 1.76
0.03 0.03 0.03 0.03 0.03
0.51 0.33 0.22 0.27 0.20
0.06 0.06 0.06 0.06 0.05
3.69 7.46 3.29 8.04 2.22
0.05 0.05 0.04 0.05 0.04
5805.2 7193.5 6855.9 7305.8 7515.9
6.23 6.29 6.26 6.29 6.32
Table 5 Comparison of the active hidden neuron number of the engine models. Output parameters
SBELM
ELM
BELM
BPNN
Fuel consumption λ HC CO CO2
10.92 4.45 6.85 4.78 4.21
45 41 40 43 27
26 19 16 17 13
14 16 22 37 18
15 Number of active hidden nodes (Lactive)
402
Fuel consumption
10
Lambda value HC emission CO emission
5
CO2 emission
0 0
20
40
60
80
Initial number of hidden nodes (L)
Fig. 7. Number of active hidden neurons in the SBELM model after training.
Table 3 Comparison of the average testing MAPE (mean) and its standard deviation (dev) of the engine models. Output parameters
Fuel consumption λ HC CO CO2 Overall average
SBELM
ELM
BELM
BPNN
Mean (%)
Dev (%)
Mean (%)
Dev (%)
Mean (%)
Dev (%)
Mean (%)
Dev (%)
4.00 4.13 5.15 8.12 5.02 5.28
0.35 0.18 0.27 0.30 0.27 0.27
4.27 4.01 5.45 8.30 4.90 5.39
0.51 0.29 0.39 0.50 0.37 0.41
3.96 4.42 5.08 8.23 5.12 5.36
0.27 0.15 0.20 0.69 0.21 0.30
9.37 4.80 5.85 8.50 6.19 6.94
4.92 1.33 1.96 2.68 1.86 2.55
*bold indicates better performance
K.I. Wong et al. / Neurocomputing 149 (2015) 397–404
This significant result once again verifies that the SBELM is relatively insensitive to the initial number of hidden neurons. Referring to Figs. 2–7, it can be found that SBELM usually requires a relatively small number of hidden neurons, say 20, to achieve the near-optimal generalization performance. Moreover, according to experiments over UCI benchmark datasets, the same results were obtained (but not shown here). By this property, the time-consuming optimization for determining the number of hidden neurons can be greatly reduced. As a result, the difficulty of choosing the optimal hidden number in ELM is also overcome in SBELM.
5. Conclusions In this study, a new learning scheme of ELM, entitled SBELM, is proposed to construct a mathematical biofuel engine model for ECU online calibration purpose. Since the ECU is a real time system with very limited memory size, the engine model must be very fast and compact. To verify that the proposed SBELM is suitable for this application, the engine model created using SBELM is compared with conventional ELM, BELM and traditional BPNN. Evaluation results show that, the SBELM engine model can achieve similar or even better generalization performance than the ELM, BELM and BPNN engine models. Most importantly, the computational cost of SBELM, such as the prediction time and model size, is much lower than the others, indicating that the SBELM model can better fulfill the requirements of the engine model. In addition, the results show that SBELM is relatively insensitive to the initial number of hidden neurons, while both ELM, BELM and BPNN strongly depend on the optimal hidden neuron number. This implies that, if an improper hidden neuron number is chosen, SBELM is more likely to perform better than the other traditional methods. With the advantages of extremely fast computational time, compact model size and insensitivity to hidden neuron number, SBELM is more suitable for practical applications, such as the present one.
Acknowledgments The research is supported by the University of Macau Research Grant, grant numbers MYRG2014-00178-FST and MYRG075(Y1-L2)FST13-VCM, and the Science and Technology Development Fund of Macau, grant number 075/2013/A. The authors would like to thank Mr. Wa Chio Lei for his assistance in experiment works. References [1] P.K. Wong, L.M. Tam, K. Li, and C.M. Vong, Engine idle-speed system modeling and control optimization using artificial intelligence, Proceedings of the Institution of Mechanical Engineers Part D-Journal of Automobile Engineering, 224, 2010, pp. 55–72. [2] P.K. Wong, L.M. Tam, K. Li, Automotive engine power performance tuning under numerical and nominal data,, Control Eng. Pract. 20 (2012) 300–314. [3] A.G. Bell, Four-Stroke Performance Tuning, 4 ed., Haynes Publishing, United Kingdom, 2012. [4] K.I. Wong, P.K. Wong, C.S. Cheung, C.M. Vong, Modeling and optimization of biodiesel engine performance using advanced machine learning methods,, Energy 55 (2013) 519–528. [5] C. Sayin, H.M. Ertunc, M. Hosoz, I. Kilicaslan, M. Canakci, Performance and exhaust emissions of a gasoline engine using artificial neural network,, Appl. Therm. Eng. 27 (2007) 46–54. [6] G. Najafi, B. Ghobadian, T. Tavakoli, D.R. Buttsworth, T.F. Yusaf, M. Faizollahnejad, Performance and exhaust emissions of a gasoline engine with ethanol blended gasoline fuels using artificial neural network,, Appl. Energy 86 (2009) 630–639. [7] N.K. Togun, S. Baysec, Prediction of torque and specific fuel consumption of a gasoline engine by using artificial neural networks,, Appl. Energy 87 (2010) 349–355. [8] M.K.D. Kiani, B. Ghobadian, T. Tavakoli, A.M. Nikbakht, G. Najafi, Application of artificial neural networks for the prediction of performance and exhaust emissions in SI engine using ethanol-gasoline blends,, Energy 35 (2010) 65–69.
403
[9] S. Tasdemir, I. Saritas, M. Ciniviz, N. Allahverdi, Artificial neural network and fuzzy expert system comparison for prediction of performance and emission parameters on a gasoline engine,, Expert Syst. Appl. 38 (2011) 13912–13923. [10] Y. Cay, A. Cicek, F. Kara, S. Sagiroglu, Prediction of engine performance for an alternative fuel using artificial neural network,, Appl. Therm. Eng. 37 (2012) 217–225. [11] S. Haykin, Neural Networks: A comprehensive foundation, Prentice-Hall, New Jersey, 1999. [12] C.M. Vong, P.K. Wong, Y.P. Li, Prediction of automotive engine power and torque using least squares support vector machines and Bayesian inference,, Eng. Appl. Artif. Intell. 19 (2006) 277–287. [13] K.I. Wong, P.K. Wong, C.S. Cheung, C.M. Vong, Modeling of diesel engine performance using advanced machine learning methods under scarce and exponential data set, Appl. Soft Comput. J. 13 (2013) 4428–4441. [14] G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: theory and applications,, Neurocomputing 70 (2006) 489–501. [15] G.B. Huang, H.M. Zhou, X.J. Ding, R. Zhang, Extreme learning machine for regression and multiclass classification,, IEEE Trans. Syst. Man Cybernet. Part B: Cybernet. 42 (2012) 513–529. [16] E. Soria-Olivas, J. Gomez-Sanchis, J.D. Martin, J. Vila-Frances, M. Martinez, J. R. Magdalena, et al., BELM: Bayesian extreme learning machine, IEEE Trans. Neural Netw., 22, 2011505–509. [17] M.E. Tipping, Sparse bayesian learning and the relevance vector machine,, J. Mach. Learn. Res. 1 (2001) 211–244. [18] M.E. Tipping, Bayesian inference: an introduction to principles and practice in machine learning, in: O. Bousquet, U. von Luxburg, G. Ratsch (Eds.), Advanced Lectures on Machine Learning, Springer, 2004, pp. 41–62. [19] J. Luo, C.-M. Vong, P.-K. Wong, Sparse Bayesian extreme learning machine for multi-classification, IEEE Trans. Neural Netw. Learn. Syst. 25 (4) (2014) 836–843. [20] C.R. Rao, S.K. Mitra, Generalized Inverse of Matrices and its Applications, Wiley, New York, 1971. [21] D. Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press, United Kingdom, 2012. [22] W.W. Pulkrabek, Engineering Fundamentals of the Internal Combustion Engine, 2nd Ed., Pearson Prentice Hall, New Jersey, 2004. [23] W.D. Hsieh, R.H. Chen, T.L. Wu, T.H. Lin, Engine performance and pollutant emission of an SI engine using ethanol–gasoline blended fuels,, Atmos. Environ. 36 (2002) 403–410. [24] A.K. Agarwal, Biofuels (alcohols and biodiesel) applications as fuels for internal combustion engines,, Prog. Energy Combust. Sci. 33 (2007) 233–271. [25] M.B. Celik, Experimental determination of suitable ethanol-gasoline blend rate at high compression ratio for gasoline engine,, Appl. Therm. Eng. 28 (2008) 396–404. [26] A.N. Ozsezen, M. Canakci, Performance and combustion characteristics of alcohol-gasoline blends at wide-open throttle,, Energy 36 (2011) 2747–2752. [27] D. Turner, H.M. Xu, R.F. Cracknell, V. Natarajan, X.D. Chen, Combustion performance of bio-ethanol at various blend ratios in a gasoline direct injection engine,, Fuel 90 (2011) 1999–2006. [28] M. Canakci, A.N. Ozsezen, E. Alptekin, M. Eyidogan, Impact of alcohol-gasoline fuel blends on the exhaust emission of an SI engine,, Renew. Energy 52 (2013) 111–117. [29] D. Pyle, Data Preparation for Data Mining, Morgan Kaufmann, San Francisco, 1999. [30] G.R. Feng, G.B. Huang, Q.P. Lin, R. Gay, Error minimized extreme learning machine with growth of hidden nodes and incremental learning, IEEE Trans. Neural Netw. 20 (2009) 1352–1357. [31] Y. Miche, A. Sorjamaa, P. Bas, O. Simula, C. Jutten, A. Lendasse, OP-ELM: optimally pruned extreme learning machine, IEEE Trans. Neural Netw. 21 (2010) 158–162.
Ka In Wong received the B.S. degree in electromechanical engineering from the University of Macau, Macao, China, in 2012. He is currently working towards the Ph.D. degree from the University of Macau, Macao, China. His research interests include automotive engineering, biofuels and engineering applications of artificial intelligence.
Chi Man Vong received the M.S. and Ph.D. degrees in software engineering from the University of Macau, Macao, China, in 2000 and 2005, respectively. He is currently an Associate Professor with the Department of Computer and Information Science, Faculty of Science and Technology, University of Macau. His research interests include machine learning methods and intelligent systems.
404
K.I. Wong et al. / Neurocomputing 149 (2015) 397–404
Pak-Kin Wong received the Ph.D. degree in Mechanical Engineering from The Hong Kong Polytechnic University, Hong Kong, in 1997. He is currently a Professor in the Department of Electromechanical Engineering and Associate Dean (Academic Affairs), Faculty of Science and Technology, University of Macau. His research interests include automotive engineering, fluid transmission and control, engineering applications of artificial intelligence, and mechanical vibration. He has published over 155 scientific papers in refereed journals, book chapters, and conference proceedings.
Jiahua Luo received his B.S. degree in Information and computational science from the Guilin University of Electronic Technology, Guilin, Guanxi province, China, in 2011. Currently, he is working towards his M.S. degree in Software Engineering at the Department of Computer and Information Science, University of Macau. His research interests focus on machine learning and its applications, under the supervision of Prof. Chi-Man Vong.