Generalized Constraint Neural Network Regression

0 downloads 0 Views 3MB Size Report
Generalized Constraint Neural Network Regression. Model Subject to Equality Function Constraints. Linlin Cao. NLPR/LIAMA, Institute of Automation. Chinese ...
Generalized Constraint Neural Network Regression Model Subject to Equality Function Constraints Linlin Cao

Bao-Gang Hu

NLPR/LIAMA, Institute of Automation Chinese Academy of Sciences Beijing, China Email: linlincao [email protected]

NLPR/LIAMA, Institute of Automation Chinese Academy of Sciences Beijing, China Email: [email protected]

Abstract—This paper describes a progress of the previous study on the generalized constraint neural networks (GCNN). The GCNN model aims to utilize any type of priors in an explicate form so that the model can achieve improved performance and better transparency. A specific type of priors, that is, equality function constraints, is investigated in this work. When the existing approaches impose the constrains in a discretized means on the given function, our approach, called GCNN-EF, is able to satisfy the constrain perfectly and completely on the equation. We realize GCNN-EF by a weighted combination of the output of the conventional radial basis function neural network (RBFNN) and the output expressed by the constraints. Numerical studies are conducted on three synthetic data sets in comparing with other existing approaches. Simulation results demonstrate the benefit and efficiency using GCNN-EF.

I.

INTRODUCTION

Artificial neural networks (ANNs) are one of the most popular models in artificial intelligence in the past decades. It has been widely applied to many different fields, such as system identification and control [1], decision making [2] and data mining [1]. However, the traditional ANNs suffer from an inherent limitation, i.e., the “black box” characteristics [3]. To alleviate this limitation, incorporating some constraints into the conventional neural network is a desirable way. Equality constraints [4] [5] is a kind of common forms of constraints. It contains a variety of expression and meaning, such as invariance transformation [6], derivative constraints [7], and boundary constraints [8] [9]. Some research have been developed for solving equality constraints satisfaction problem [10] [11] [12] [13]. In particular, incorporating the equality constraints to the conventional ANN has been proved to enhance the ability of learning [14]. Its problem formulation can be represented as min L(x, f (x, θ)) θ

s.t.

gi (x, θ) = 0, i = 1, . . . , p

(1)

where x denotes the input of neural network, θ denotes the optimized parameters of neural network, f (x, θ) denotes the output of neural network, L(x, f (x, θ)) denotes the loss function of neural network, gi (x, θ) = 0 denotes equality constraints. In this paper, we only consider a class of equality function constraints which give the accurate output function of the model in a sub-region and we will give the mathematical definition in section III.

The constrained optimization problem above may be tackled in a suitable constrained optimization method. For instance: Lagrange multipliers [11], active set methods, or a penalty function approach. A wide variety of algorithms have been developed for incorporating equality constraints and it has been proved to enhance the ability of learning. Hong [15] [16] presented a boundary value constraints radial basis function neural network(BVC-RBF). BVC-RBF has the capability of automatic constraints satisfaction. Many of the existing modeling algorithms for a conventional RBFNN are almost directly applicable to the BVC-RBF without adding algorithmic complexity and computational cost. However, it has two drawbacks: firstly, it only satisfies the known finite BVC (boundary value constraints) and cannot ensure to satisfy the unknown infinite BVC; secondly, it can only handle the discrete BVC and cannot handle the continuous BVC. Qu [6] proposed the structure of generalized constraint neural network with Linear Priors(GCNN-LP) by means of linear equality constraints. This method constructs an optimization problem with equality constraints. It can incorporate almost any equality constraints and inequality constraints. However, it also has some disadvantages. Firstly, too many constraints may lead to overfitting. Secondly, the constraints will consume the resources of radical kernel function and performs badly when the scale of network is small. Thirdly, this method can only solve the discrete constraints. Lauer [17], [18] can incorporate nonlinear prior knowledge on the function by considering them as some sets of equality constraints. That is to say, it can only handle the discrete constraint set rather than arbitrary constraint set. To deal with the continuous constraints, it first discrete the continuous constraints. But it cannot ensure that the regression result totally fulfills the original constraints. In this work, we propose a generalized constraint neural networks regression model subject to equality function constraints. It is a simple but effective method to incorporate a class of equality function constraints to the conventional RBFNN. A novel output function is derived from a weighted combination of the output of RBFNN and the output hinted by the constraints. The combination weight is determined by the distance between the instance and the constraint space (defined by the equality constraints). The proposed output function ensures that the equality function constraints are perfectly satisfied. This output function has the natural intuition: when the instance is far from the constraint space, its output will depend on the RBFNN; when it is close to or within the constraint space, its output will be dominated by the equality

fC

III. GENERALIZED CONSTRAINT NEURAL NETWORK REGRESSION MODEL SUBJECT TO EQUALITY FUNCTION CONSTRAINTS

rC $ fC

rC xi

Di

Note that conventional RBFNN is learned by training data and it cannot deal with the additional constraints. Suppose the output of the RBFNN strictly satisfies the known equality function constraints given by:

fW,C

rW rW $ fW fW

U Fig. 1.

fW (x) = fC (x), x ∈ C

(7)

where C denotes an equality function constraint set, fC (x) can be any numerical value or function. Now the conventional RBFNN model (Eq.(2)) is transformed into the following constrained minimization problem :

A brief graphical illustration of the proposed GCNN-EF model.

constraints. The remaining of this paper is organized as follows: the conventional RBFNN model and its learning are briefly introduced in Section II; section III demonstrates the proposed model and its learning process; experiments on several synthetic data sets are presented in Section IV, followed by the conclusions and discussions in Section V.

II.

CONVENTIONAL

RBF NEURAL NETWORKS

Given the training data set X = {x1 , . . . , xn } ∈ Rd×n and its desired outputs y = {y1 , . . . , yn } ∈ R1×n , our goal is to learn a RBF network model based on a specific learning criterion. Here the least squares criterion [1] is adopted, as follows: n X (yi − fW (xi ))2 = ky − fW (X)k22 , (2) arg min `2 (W ) = W



In this paper we devise a model , such that the constraints are exactly satisfied by construction and hence use unconstrained optimization techniques. We propose to modify the output function fW (x) of the conventional RBFNN, to ensure that the modified output function fW,C (x) strictly meet the equality function constraints, and it can be written as fW,C (x) = fC (x), x ∈ C

fW (xi )

=

wj · φj (xi ) = W > Φ(xi ),

(3)

φj (xi )

=

exp(−kxi − µj k

/σj2 ),



∂`2 (W ) = 2Φ(y − W > Φ)> = 0. ∂W It is easy to obtain the optimal model parameter W ∗ = (Φ> Φ)+ Φ> y, where (Φ> Φ)+ denotes the pseudo-inverse of Φ> Φ.

(5)

(6)

2 1+exp(β∆i )

if ∆i > ξ 1 if ∆i ≤ ξ, rW (xi ) = 1 − rC (xi ) ∈ [0, 1], ∆i = min kxi − ck2 ,

rC (xi ) =

(4)

where W = (w0 , w1 , . . . , wm ) ∈ R(m+1)×1 represents the model parameter, and m is the number of neurons of the hidden layer. In terms of the feature mapping function Φ(X) = (1, φ1 (X), . . . , φm (X)) ∈ R(m+1)×n (for simplicity, it is denoted as Φ hereafter), both the centers U = (µ1 , . . . , µm ) ∈ Rd×m and the widths σ = (σ1 , . . . , σm ) ∈ Rm×1 can be easily determined using the method proposed in [19]. By substituting Eq. (3) and (4) into problem (2), this convex problem can be easily solved by setting the derivative w.r.t w to zero,

(10)

where

j=1 2

(9)

To satisfy the equality constraints in Eq. (9), we specify the modified output fW,C (xi ) as follows: fW,C (xi ) = rW (xi )fW (xi ) + rC (xi )fC (xi ),

m X

(8)

The constrained optimization problem above may be tackled in a suitable constrained optimization method. For instance: Lagrange multipliers, active set methods, or a penalty function approach. However, it is more complex in algorithmic complexity nor computational cost than unconstraint problem. In addition, it must firstly discretize the continuous constraints which cannot ensure equivalent to the original constraints.

i=1

where fW (X) = (fW (x1 ), . . . , fW (xn )) ∈ R1×n denotes the predicted outputs of the RBF network. It is specified as [1]

minW `2 (W ) = ky − fW (X)k22 s.t.fW (x) = fC (x), x ∈ C

c∈C

(11) (12) (13)

where ∆i denotes the minimal distance from xi to C. rC is defined as a monotonically decreasing function of ∆i , where parameter β(β > 0) is used to adjust the slope of curve, and it should be manually determined. The user defined threshold ξ is introduced to handle the noisy training data, and its value will be specified in the following experiments. In contrast, rW is monotonically increasing. A general form of fW,C (xi ) is shown in Fig. 2. Considering the modified output function fW,C (xi ) in Fig. 2, we note the following basic features: •

when the instance xi is within the constraint set C, its output is strictly satisfy the equality function constraints: fW,C (xi ) = fC (xi ).





fW, C

when the instance xi is close to the constraint set C, its output fW,C (xi ) relies more on the constraints. Specifically, fW,C (xi ) should be close to the output fC (xi ).

fW fW,C

when the instance xi is far from the constraint set C, its output should be dominated by the conventional RBFNN and its own feature information. Specifically, the modified output fW,C (xi ) should be close to the conventional output fW (xi );

After the construction of the output of the novel model, the proposed method is optimized in an unconstrained optimization manner without adding algorithmic complexity nor computational cost. The unconstraint problem can be written as follows: min `2 (W, C) = ky − fW,C (X)k22 W,C

(14)

Substituting Eq. (10) into problem (2), the optimal parameter can be easily obtained by W ∗ = [(RW ◦Φ)> (RW ◦Φ)]+ (RW ◦Φ)> [y−rC (X)◦fC (X)], (15) where ◦ denotes the Hadamard product [20] and RW = [rW (X); . . . ; rW (X)] ∈ R(m+1)×n with rW (X) = [rW (x1 ), . . . , rW (xn )] ∈ R1×n . Besides, rC (X) = [rC (x1 ), . . . , rC (xn )] ∈ R1×n and fC (X) = [fC (x1 ), . . . , fC (xn )] ∈ R1×n . Consequently the predicted output of the new testing instance is easily obtained by substituting W ∗ into Eq. (10). A brief graphical representation of GCNN-EF model is shown in Fig. 1. Note that the simpleness of the proposed method is highlighted in three aspects. First, only a weighted combination is used to incorporate the data and equality function constraints. In fact, there are other coupling manners to combine them, such as multiplication and composition [3]. We will discuss the coupling manners in the future work. Second, the learning process of the consequent model is similar with the process of the conventional model. The combined output function can be used in any learning criterion, such as the least squares, to learn the model parameters. Last, since the combination way is independent of the model, any model, such as SVM and logistic regression, can be adopted. IV.

EXPERIMENTS

In this section, we use three synthetic data sets to demonstrate the effectiveness of the proposed method. The mean squared error (MSE) is used to measure the performance of the models. Defining N 1 X M SE = (yi − ybi )2 N 1

(16)

where yi and ybi represent the desired output and predicted output of observations, respectively. Several related works are compared, including BVC-RBF [15], GCNN-LP [6], the conventional RBFNN [1] and SVR [21]. The centers U and the kernel widths σ of the RBF kernel are determined by the method ued in [6].

rW $ fW

fC

rC $ fC

0 Fig. 2.

Di

Schematic plots for the relationship between fW , fC and fW,C

A. “Sinc” function In this section, we use the “sinc” function to illustrate the effectiveness of the proposed GCNN-EF, as follows: p sin x21 + x22 y= p 2 , x1 , x2 ∈ [−10, 10]. (17) x1 + x22 The equality constraints in this experiment are specified as fW,C (x) = fC1 (x) = sin(3π/2)/(3π/2), x ∈ C1 fW,C (x) = fC2 (x) = sin(5π/2)/(5π/2), x ∈ C2,

(18) (19)

where C1 = {x|x21 + x22 = (3π/2)2 , x1 , x2 ∈ [−10, 10]}, C2 = {x|x21 + x22 = (5π/2)2 , x1 , x2 ∈ [−10, 10]}. Our goal is to fit the sinc function based on the “sinc” data and the equality constraints above. Without loss of generality, one of them is automatically chosen according to the distances between the training data and two constraint sets, and the closer one will be used in Eq. (10). Training data are selected evenly within x1 , x2 ∈ [−10, 10], ∆x1 = ∆x2 = 2.0. Testing data(800 instances) are randomly sampled within x1 , x2 ∈ [−10, 10]. In addition, we add some instances within constraint space for training data and testing data(12 instances and 80 instances respectively). Fig. 3 shows the simulation results of different methods. Fig. 3(a) and 3(f) depict the surface of the exact “sinc” function and the reconstructed surface using the GCNN-EF model, respectively. Obviously the reconstructed surface by GCNNEF is closer to the exact surface, compared with other surfaces. Compared with RBFNN, the equality constraints significantly enhance the model performance. Note that not only the area within the constraint space is exactly fitted, but also the area around the constraint space is well fitted. It justifies that the weighted combination in Eq. (10) is effective. In contrast, although utilizing the equality constraints, the surfaces of BVC-RBF and GCNN-LP are far from the exact surface, as shown in Fig. 3(c) and Fig. 3(d) respectively. We also compare the performance of five methods when the training data are noisy. Specifically, a Gaussian noise N (0; 0.32 ) is added on the training data. Parameter ξ (see Eq. (11)) is set as 0.3 in this case. The mean and standard deviation of the MSE values in 10 different testing sets are calculated in Table I. GCNN-EF still shows much better performance

1

1.2

1.2

0.8

1

1

0.6

0.8

0.4

0.6

0.8

0.4

y

y

y

0.6

0.2

0.4

0

0.2

0.2 0 0

−0.2 −0.4 10

5

0

−5

−10

x2

−0.2 10

10

5

0

−5

−10

−0.2

5

0

−5

−10

(a) Sinc

5

0

x1

x2

x1

−0.4 10

10

5

0

−5

−10

−5

−10

(b) RBFNN(MSE=0.0109)

10

5

0

−5

−10

x1

x2

(c) BVC-RBF(MSE=0.0091)

0.5 1.2

0.5

0.4

1

0.4 0.3

0.8

0.2

0.6 y

y

y

0.2

0.3

0.1

0.1

0

0

−0.1

−0.1

0.4 0.2 0

−0.2 10

5

0

−5

−10

−0.2 10

10

5

0

−5

−10

−0.2

5

x1

x2

−5

−10

−0.4 10

10

5

0

−5

−10

5

0

x1

x2

(d) GCNN-LP(MSE=0.0201)

−5

−10

−10

(e) SVR(MSE=0.0178)

5

0

−5

10

x1

x2

(f) GCNN-EF(MSE=0.0014)

Simulations on ”Sinc” function. (β = 1.2, ξ = 0.001)

0.2

0.2

0.2

0.1

0.1

0.1

0

0

y

0

y

y

Fig. 3.

0

−0.1

−0.1

−0.1

−0.2

−0.2

−0.2

−0.3 10

−0.3 10

5

−0.3 10

10 5

0 −10

(a) RBFNN(MSE=0.0244)

5

0

−10

0

−5

−5 −10

x2

x1

10

5 0

−5

−5 −10

5

10 0

0

−5 x2

5

−10

x2

x1

(b) BVC-RBF(MSE=0.0190)

−5 −10

x1

(c) GCNN-LP(MSE=0.0325)

0.2 0.2

0.1 0.1

0 y

y

0

−0.1

−0.1

−0.2

−0.2

−0.3 10

−0.3 10

5

10 5

0

0

−5 x2

5

10

−10

−10

5

0

0

−5

−5 x1

x2

−5 −10

−10

x1

(d) SVR(MSE=0.0174) (e) GCNN-EF(MSE=0) Fig. 4. Simulations in constraint space. ‘◦’ denotes the desired outputs in constraint set C1, ‘∗’ denotes the predicted outputs in constraint set C1, ‘◦’ denotes the desired outputs in in constraint set C2, ‘∗’ denotes the predicted outputs in constraint set C2.(β = 1.2, ξ = 0.001)

than other models. This tells that GCNN-EF fully utilizes the effective information of the equality constraints, such that it shows good robustness and generalization performance. To highlight the difference on constraint satisfaction between GCNN-EF and other models, we also present the fitting results of the instances within the constraint space in Fig. 4. The instances within the constraint space are completely fitted by GCNN-EF. In contrast, other methods cannot satisfy the equality constraints strictly. Based on this comparison, we can conclude that the proposed GCNN-EF model performs much better on constraint satisfaction, than other related works.

B. “Hyperboloid” function In this section, we use the “hyperboloid” function [22] to illustrate the effectiveness of the equality function constraints, as follows: y = x1 · x2 , x1 , x2 ∈ [−1, 1], (20) the data generated from which is referred to as Hyperboloid data. The equality function constraints are specified as fW,C (x) = fC (x) = x21 , x ∈ C

(21)

where C = {x|x1 = x2 , x1 , x2 ∈ [−1, 1]}. Our goal is to fit the function based on the data and the equality constraints when the training data are insufficient. Training data are selected evenly within x1 , x2 ∈ [−1, 1], ∆x1 = ∆x2 = 0.5. Testing data(800 instances) are randomly sampled within x1 , x2 ∈ [−1, 1]. In addition, we add some instances within constraint space for training data and testing data(3 instances and 80 instances respectively). Fig. 5 shows the outputs of the GCNN-EF, BVC-RBF, GCNN-LP, RBFNN and SVR. Fig. 5(f) depicts the reconstructed surface using the GCNN-EF model. Obviously, GCNN-EF can close to the exact surface of the function (20). However, other methods cannot fit well due to the lack of training data. In this example, high quality of equality constraints can improve the capability of fitting. The simulation results of constraint satisfaction are shown in Fig. 6. Like the result in Fig. 4, GCNN-EF perfectly fits the equality constraints, while other methods cannot achieve the perfect fitting performance. The MSE results with nosiy training data are also presented in Table I. Specifically, a Gaussian noise N (0; 0.12 ) is added on the training data. Parameter ξ (see Eq. (11)) is set as 0.1 in this case. The mean and standard deviation of the MSE values in 10 different testing sets are calculated. GCNN-EF still shows much better performance than others.

C. An example of partial differential equation Consider the partial differential equation (PDE) [15] as follows 2

2

∂ ∂ −x1 [ ∂x (x1 − 2 + x32 + 6x2 ) 2 + ∂x2 ]f (x1 , x2 ) = e 1 2 x1 ∈ [0, 1], x2 ∈ [0, 1],

(22)

The equality function constraints are presented as the boundary conditions f (0, x2 ) = x32 f (1, x2 ) = (1 +

x32 )/e

f (x1 , 0) = x1 e−x1 f (x1 , 1) = e

−x1

(1 + x1 )

(23) (24) (25) (26)

The analytic solution is f (x1 , x2 ) = e−x1 (x1 + x32 )

(27)

Training data are selected evenly within x1 , x2 ∈ [0, 1], ∆x1 = ∆x2 = 0.1. Testing data(800 instances) are randomly sampled within x1 , x2 ∈ [0, 1]. In addition, we add some instances within constraint space for training data and testing data(12 instances and 80 instances respectively). Fig. 7 shows the outputs of the GCNN-EF, BVC-RBF, GCNN-LP, RBFNN and SVR. Fig. 7(f) depicts the reconstructed surface using the GCNN-EF model. Obviously, GCNN-EF can close to the exact surface of the function (27). However, other methods cannot fit well. The simulation results of constraint satisfaction are shown in Fig. 8. It is clearly that GCNN-EF perfectly fits the equality constraints, while other methods cannot achieve the perfect fitting performance. The MSE results with nosiy training data are also presented in Table I. Specifically, a Gaussian noise N (0; 0.052 ) is added on the training data. Parameter ξ (see Eq. (11)) is set as 0.05 in this case. The mean and standard deviation of the MSE values in 10 different testing sets are calculated. GCNN-EF still shows much better performance than others. D. Parameter tuning Here we explore the impact of parameter β to the proposed model. As shown in Fig. 9, we vary the β value in the range {0.1, 0.2, . . . , 3} for the “sinc” function and “hyperboloid” function and in the range {2, 4, . . . , 20} for the PDE function. Recall Eq. (10), the smaller β means the larger influence of the equality function constraints in model training, while the larger β represents the smaller influence of the equality function constraints. The corresponding MSE on the testing data are reported. Generally speaking, the error variations on both data sets are gentle, which reflects the robust performance of the proposed GCNN-EF model. V.

CONCLUSIONS

A novel GCNN-EF model has been proposed to incorporate a class of equality function constraints. A simple but effective output function is designed to weightedly combine the output of the conventional RBFNN model and the output hinted by the constraints, based on the distance between the training data and the constraint space. This output function brings in two advantages. Firstly, the equality function constraints are guaranteed to be satisfied perfectly. Secondly, the GCNN-EF can be easily learned in closed form, with a similar computational cost with learning the conventional RBFNN model. Experimental results on three synthetic data sets with different equality

MSE(M EAN ± S TANDARD ) RESULTS WITH NOISY TRAINING DATA (%).

TABLE I. Data set

Ntrain

Ntest

Noise

sinc hyperboloid A PDE example

133 28 133

880 880 880

N (0; 0.32 ) N (0; 0.12 ) N (0; 0.052 )

MSE(Mean ± Standard) BVC-RBF [15] GCNN-LP [6] SVR [21] 1.82 ± 0.68 2.29 ± 0.21 1.94 ± 0.12 1.24 ± 0.44 3.26 ± 0.41 6.44 ± 0.65 0.04 ± 0.006 0.15 ± 0.04 0.20 ± 0.002

RBFNN [1] 1.99 ± 0.67 2.75 ± 0.64 0.20 ± 0.05

1.5

1

1

1.5

0.5

1

0

0.5 y

y

y

0.5

0

−0.5

−0.5

−1

−1 −1

−1.5 −1

1 −0.5

0

0.5

0

0

−0.5 1 −0.5

0.5

1

−1 −1

0

0

−1

1

GCNN-EF 1.01 ± 0.66 0.25 ± 0.12 0.03 ± 0.001

−0.5

0

0

−1

1

0.5

−1

1

x2

x2 x1

x2 x1

x1

(a) Hyperboloid Function

(b) RBFNN(MSE=0.0200)

(c) BVC-RBF(MSE=0.0092)

0.3 1.5

1

0.2

0.8

1

0.6

0.1

0.4

0

0.5

y

0.2

y

y

0

−0.1 0

−0.2

−0.2

−0.4

−0.5

−0.6

−0.3

−0.8 −1 −1

1 −0.5

−0.4 −1

0

0

0.5

1 −0.5

0.5

1

1

−1 −1

0

0

−1

1

−0.5

−1

0

0

0.5

1

−1

x2

x2

x2

x1

x1

(d) GCNN-LP(MSE=0.0241)

x1

(e) SVR(MSE=0.0604)

(f) GCNN-EF(MSE=0.0010)

Simulations on “hyperboloid” function.(β = 0.3, ξ = 0.001)

Fig. 5.

1

1 0.8

0.5 y

y

0.6 0.4 0 0.2 −0.5 1

0 1 0.5

0.5

1

1

0.5

0 −1

−1

0

−0.5

−0.5

x2

0.5

0

0

−0.5

−1

x2

x1

(a) RBFNN(MSE=0.0206)

−0.5 −1

x1

(b) BVC-RBF(MSE=0.0122)

1 1

1

0.8 0.8

0.8

0.6

0.6 y

y

y

0.6

0.4

0.4

0.4

0.2

0.2

0.2

0 1

0 1 0.5

1 0.5

0

1

−1

x1

(c) GCNN-LP(MSE=0.0235)

0

−0.5

−0.5 −1

0.5

0

0

−0.5 x2

0 1

0.5

x2

0.5

1

−1

−1

0.5

0

0

−0.5

−0.5 x1

(d) SVR(MSE=0.1136)

x2

−0.5 −1

−1

x1

(e) GCNN-EF(MSE=0)

Fig. 6. Simulations in constraint space. ‘◦’ denotes the desired outputs in constraint space, ‘∗’ denotes the predicted outputs in constraint space.(β = 0.3, ξ = 0.001)

1.5

1.5

1 0.8

1 1 y

y

y

0.6 0.5

0.4 0.5 0

0 1

0.2

−0.5 1

0 1

1

1

0.8

0.5 0

x2

0

1

0.8

0.5

0.6

0.8

0.5

0.6

0.4

0.6

0.4

0.2

0

x2

x1

(a) PDE Function

0.4

0.2 0

0

x2

x1

(b) RBFNN(MSE=0.001510)

0.2 0

x1

(c) BVC-RBF(MSE= 0.000315)

1 1

1.5

0.8 0.8 1

0.6 y

y

y

0.6

0.4

0.4

0.5

0.2

0.2

0 1

0 1

0 1

1

1 0.8

0.5

0.6 0

0.2 0

0.8

0.5

0.6

0.6

0.4

0.4 x2

1

0.8

0.5 0

x2

x1

(d) GCNN-LP(MSE=0.000308)

0.4

0.2 0

0

x2

x1

(e) SVR(MSE=0.002219)

0.2 0

x1

(f) GCNN-EF(MSE= 0.000145)

Simulations on a PDE function.(β = 10, ξ = 0.001)

Fig. 7.

1.5

1 0.8

1

y

y

0.6

0.5

0.4

0

0.2

−0.5 1

0 1

1

1

0.8

0.5

0.8

0.5

0.6

0.6

0.4 0

x2

0.4

0.2 0

0

x2

x1

(a) RBFNN(MSE= 0.002773)

0.2 0

x1

(b) BVC-RBF(MSE=0.000537)

1 1

1

0.8

0.8

0.8

0.6

0.6 y

y

y

0.6

0.4

0.4

0.4

0.2

0.2

0.2

0 1

0 1

0 1

1

1 0.8

0.5

0.6

0.8

0.5

0.6

x2

0

0.2 0

x1

1 0.8

0.5

0.6

0.4

0.4

x2

0

0.4

0.2 0

x1

x2

0

0.2 0

x1

(c) GCNN-LP(MSE=0.000482) (d) SVR(MSE=0.003798) (e) GCNN-EF(MSE=0) Fig. 8. Simulations in constraint space on a PDE function. ‘◦’ ’◦’ ’◦’ ’◦’ denotes the desired outputs in constraint space, ‘∗’ ‘∗’ ‘∗’ ‘∗’ denotes the predicted outputs in constraint space.(β = 10, ξ = 0.001)

−3

−3

10

x 10

5

9

−4

x 10

10 9

4.5

8

x 10

8

4

7

7

5

MSE

MSE

MSE

3.5 6

3

6 5

2.5 4

4

2

3

1

0

0.5

1

1.5

β

2

2.5

3

(a) Sinc Function Fig. 9.

3

1.5

2

1

2

0

0.5

1

1.5

β

2

(b) Hyperboloid Function

[14]

[15]

[16] [17]

ACKNOWLEDGMENT

This work was supported by National Science Foundation of China (No.61273196).

[18]

[19]

R EFERENCES

[2]

[3]

[4]

[5] [6]

[7]

[8] [9]

[10]

[11] [12]

[13]

3

1

2

4

6

8

10

β

12

14

16

18

20

(c) PDE Function

MSE result with the change of β on “sinc” function, “hyperboloid” function and a PDE function.

function constraints demonstrate the superior performance of the proposed model to the state-of-the-art models, including the good robustness and generalization performance, as well as the help with equality constraints. Note that although this work focus on the RBFNN model, many other popular models, such as SVR and linear regression, can be easily substituted. Because the proposed output function is independent of the specific model. This will be explored in our future work.

[1]

2.5

S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed. Upper Saddle River, NJ, USA: Prentice Hall PTR, 1998. T. A. Jarrell, Y. Wang, A. E. Bloniarz, C. A. Brittin, M. Xu, J. N. Thomson, D. G. Albertson, D. H. Hall, and S. W. Emmons, “The connectome of a decision-making neural network,” Science, vol. 337, no. 6093, pp. 437–444, 2012. B.-G. Hu, H.-B. Qu, Y. Wang, and S.-H. Yang, “A generalized-constraint neural network model: Associating partially known relationships for nonlinear regressions,” Information Sciences, vol. 179, no. 12, pp. 1929– 1943, 2009. C. Hametner and S. Jakubek, “Nonlinear identification with local model networks using gtls techniques and equality constraints,” Neural Networks, IEEE Transactions on, vol. 22, no. 9, pp. 1406–1418, 2011. M. Bodirsky and H. Chen, “Quantified equality constraints,” SIAM Journal on Computing, vol. 39, no. 8, pp. 3682–3699, 2010. Y.-J. Qu and B.-G. Hu, “Generalized constraint neural network regression model subject to linear priors,” Neural Networks, IEEE Transactions on, vol. 22, no. 12, pp. 2447–2459, 2011. H. Lin, Z. Wang, Z. Li, and S. Li, “Weighing fusion method for truck scale based on an optimal neural network with derivative constraints and a lagrange multiplier,” Measurement, vol. 63, pp. 322–329, 2015. G. Gnecco, M. Gori, and M. Sanguineti, “Learning with boundary conditions,” Neural computation, vol. 25, no. 4, pp. 1029–1106, 2013. I. G. Tsoulos, D. Gavrilis, and E. Glavas, “Solving differential equations with constructed neural networks,” Neurocomputing, vol. 72, no. 10, pp. 2385–2391, 2009. M. Bodirsky and J. K´ara, “The complexity of temporal constraint satisfaction problems,” Journal of the ACM (JACM), vol. 57, no. 2, p. 9, 2010. ——, “The complexity of equality constraint languages,” Theory of Computing Systems, vol. 43, no. 2, pp. 136–158, 2008. J.-B. Park, K.-S. Lee, J.-R. Shin, and K. Y. Lee, “A particle swarm optimization for economic dispatch with nonsmooth cost functions,” Power Systems, IEEE Transactions on, vol. 20, no. 1, pp. 34–42, 2005. M. Bodirsky, “Complexity classification in infinite-domain constraint satisfaction,” arXiv preprint arXiv:1201.0856, 2012.

[20] [21] [22]

K. S. McFall and J. R. Mahan, “Artificial neural network method for solution of boundary value problems with exact satisfaction of arbitrary boundary conditions,” Neural Networks, IEEE Transactions on, vol. 20, no. 8, pp. 1221–1233, 2009. X. Hong and S. Chen, “A new rbf neural network with boundary value constraints,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 39, no. 1, pp. 298–303, 2009. S. Chen, X. Hong, and C. J. Harris, “Grey-box radial basis function modelling,” Neurocomputing, vol. 74, no. 10, pp. 1564–1571, 2011. F. Lauer and G. Bloch, “Incorporating prior knowledge in support vector regression,” Machine Learning, vol. 70, no. 1, pp. 89–118, 2008. ——, “Incorporating prior knowledge in support vector machines for classification: A review,” Neurocomputing, vol. 71, no. 7, pp. 1578– 1594, 2008. F. Schwenker, H. A. Kestler, and G. Palm, “Three learning phases for radial-basis-function networks,” Neural networks, vol. 14, no. 4, pp. 439–458, 2001. R. A. Horn, “The hadamard product,” in Proc. Symp. Appl. Math, vol. 40, 1990, pp. 87–169. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. Wiley-Interscience, 2000. O. L. Mangasarian and E. W. Wild, “Nonlinear knowledge in kernel approximation,” Neural Networks, IEEE Transactions on, vol. 18, no. 1, pp. 300–306, 2007.