Appl. Math. J. Chinese Univ. 2015, 30(2): 151-162
The construction and approximation of feedforward neural network with hyperbolic tangent function CHEN Zhi-xiang1
CAO Fei-long2,∗
Abstract. In this paper, we discuss some analytic properties of hyperbolic tangent function and estimate some approximation errors of neural network operators with the hyperbolic tangent activation function. Firstly, an equation of partitions of unity for the hyperbolic tangent function is given. Then, two kinds of quasi-interpolation type neural network operators are constructed to approximate univariate and bivariate functions, respectively. Also, the errors of the approximation are estimated by means of the modulus of continuity of function. Moreover, for approximated functions with high order derivatives, the approximation errors of the constructed operators are estimated.
§1
Introduction
Sigmoid function is a class of important activation function for feedforward neural networks (FNNs). A function σ defined on R is called sigmoid function if the following conditions are satisfied: lim σ(x) = 1,
x→+∞
lim σ(x) = 0.
x→−∞
The hyperbolic tangent function defined by e2x − 1 ex − e−x (1) = tanh(x) := x e + e−x e2x + 1 is a sigmoid function and is called bipolar sigmoid function, which usually is used to be the activation function of FNNs. Mathematically, FNNs with one hidden layer can be expressed as n Nn (x) = cj σ(aj · x + bj ), x ∈ Rs , (2) j=1
where σ is the activation function of the network, for 1 ≤ j ≤ n, bj ∈ R are the thresholds, Received: 2012-01-19. MR Subject Classification: 41A25, 41A63. Keywords: Hyperbolic tangent function, neural networks, approximation, modulus of continuity. Digital Object Identifier(DOI): 10.1007/s11766-015-3000-9. Supported by the National Natural Science Foundation of China (61179041, 61272023, and 11401388). ∗ Corresponding author:
[email protected].
152
Appl. Math. J. Chinese Univ.
Vol. 30, No. 2
aj ∈ Rs , cj ∈ R are input weights and output weights respectively, and aj · x is the inner product of aj and x. Theoretically, any continuous function defined on a compact set can be approximated by an FNN to any desired degree of accuracy by increasing the number of hidden neurons (see [1], [11]-[13], [15], [21]-[24]), which is called density problem of FNN approximation. Another important problem for FNN approximation is so-called complexity problem, which determines the number of neurons required to guarantee that all functions can be approximated to the prescribed degree of accuracy. By now, many results for this problem have been published. We refer reader to [8]-[10], [14], [25], [26], [28], [29]. Recently, Anastassiou considered the construction and approximation of FNNs with hyperbolic tangent activation function in [3]-[6], where a class of rational function type FNN operators was constructed. And some results of approximation by similar rational function network operators with sigmoid activation function were obtained in [7], [16]-[20]. Motivated by [3], we first discuss, in this paper, some analytic properties of hyperbolic tangent activation function, and then we construct another kind of FNN operators called quasi-interpolation type FNN operators. Particularly, the Jackson type theorem for approximation errors by these operators are achieved.
§2
Preliminaries
For n ∈ N, we set ϕ˜n (x) := tanh(nx) :=
e2nx − 1 , e2nx + 1
1 1 1 ϕ˜n (x + ) − ϕ˜n (x − ) . 2 2 2 Then, Φn (x) has the following properties. and
Φn (x) :=
Theorem 2.1. Φn (x) is an even function. Moreover, for x > 0, Φn is positive and strictly decreasing. Proof. Since
1
1
e2n(x+ 2 ) − 1 e2n(x− 2 ) − 1 1 1 ϕ˜n (x + ) = 2n(x+ 1 ) , ϕ˜n (x − ) = 2n(x− 1 ) , 2 2 2 + 1 2 + 1 e e a straightforward calculation implies 1 1 e2n(−x+ 2 ) − 1 e2n(−x− 2 ) − 1 1 1 − 2n(−x− 1 ) ϕ˜n (−x + ) − ϕ˜n (−x − ) = 2n(−x+ 1 ) 2 2 2 + 1 2 + 1 e e 1
1
=
e−2n(x− 2 ) − 1 1
e−2n(x− 2 ) + 1
−
1
=−
e2n(x− 2 ) − 1 1 e2n(x− 2 )
+1
1
e−2n(x+ 2 ) + 1 1
+
=
e2n(x+ 2 ) − 1 1 e2n(x+ 2 )
1
1−e2n(x− 2 )
1
e−2n(x+ 2 ) − 1
1 e2n(x− 2 ) 1 1+e2n(x+ 2 ) 2n(x− 1 ) 2 e
1−e2n(x+ 2 )
−
1 e2n(x+ 2 ) 1 1+e2n(x+ 2 ) 2n(x+ 1 ) 2 e
1 1 = ϕ˜n (x + ) − ϕ˜n (x − ). 2 2 +1
So Φn (−x) = Φn (x).
CHEN Zhi-xiang, CAO Fei-long.
The construction and approximation of feedforward neural network
153
That is, Φn (x) is an even function. Clearly, 1 1 e2n(x+ 2 ) − 1 e2n(x− 2 ) − 1 1 1 e2nx (en − e−n ) , − 2n(x− 1 ) = 2 2nx+n ϕ˜n (x + ) − ϕ˜n (x − ) = 2n(x+ 1 ) 2 2 (e + 1)(e2nx−n + 1) 2 + 1 2 + 1 e e which shows Φn (x) > 0. Finally, we prove that Φn (x) is strictly decreasing. In fact, it is easy to find n(x+1) e −1 = 2n(en(x+1) + 1)−2 en(x+1) =: Λ1 , en(x+1) + 1 n(x−1) e −1 = 2n(en(x−1) + 1)−2 en(x−1) =: Λ2 , en(x−1) + 1 and en e−n − Λ1 − Λ2 = 2nenx (en(x+1) + 1)2 (en(x−1) + 1)2 (e2nx − 1)(e−n − en ) = 2nenx n(x+1) < 0. (e + 1)2 (en(x−1) + 1)2 This shows that Φn ( x2 ) is strictly decreasing. So Φn (x) is also strictly decreasing for x > 0. In addition, from 1 1 e2nx (en − e−n ) ϕ˜n (x + ) − ϕ˜n (x − ) = 2 2nx+n , 2 2 (e + 1)(e2nx−n + 1) we get Φn (x) < So we have
en . e2nx
Theorem 2.2. For function Φn (x), the equation
(3)
∞
i=−∞
Φn (x − i) = 1(x ∈ R) holds.
Proof. For x ∈ R, we have ∞ ∞ 1 1 2 Φn (x − i) = ϕ˜n (x + − i) − ϕ˜n (x − − i) 2 2 i=−∞ i=−∞ ∞ −1 1 1 1 1 = ϕ˜n (x + − i) − ϕ˜n (x − − i) + ϕ˜n (x + − i) − ϕ˜n (x − − i) . 2 2 2 2 i=0 i=−∞ Since j 1 1 1 1 1 3 ϕ˜n (x + − i) − ϕ˜n (x − − i) = ϕ˜n (x + ) − ϕ˜n (x − ) + ϕ˜n (x − ) − ϕ˜n (x − ) 2 2 2 2 2 2 i=0 + · · · + ϕ˜n (x +
1 1 1 1 − j) − ϕ˜n (x − − j) = ϕ˜n (x + ) − ϕ˜n (x − − j), 2 2 2 2
we see that j −1 1 1 1 1 ϕ˜n (x + − i) − ϕ˜n (x − − i) = ϕ˜n (x + + i) − ϕ˜n (x − + i) 2 2 2 2 i=−j i=1 3 1 5 3 1 1 = ϕ˜n (x + ) − ϕ˜n (x + ) + ϕ˜n (x + ) − ϕ˜n (x + ) + · · · + ϕ˜n (x + + j) − ϕ˜n (x − + j) 2 2 2 2 2 2 1 1 = ϕ˜n (x + + j) − ϕ˜n (x + ). 2 2
154
Appl. Math. J. Chinese Univ.
From lim ϕ˜n (x −
j→∞
1 − j) = −1, 2
it follows that 2
∞
lim ϕ˜n (x +
j→∞
Vol. 30, No. 2
1 + j) = 1, 2
Φn (x − i) = 2.
i=−∞
Thus, we have proved Theorem 2.2.
Particularly, for n ∈ N, we have ∞
Φn (nx − i) = 1, x ∈ R.
(4)
i=−∞
§3
The main results and proofs
Let C([−1, 1]) be the space of continuous functions defined on [−1, 1], which is endowed with the uniform norm · . For f ∈ C([−1, 1]), we construct FNN operators n k f Fn (f, x) := Φn (nx − k). n k=−n
Then, we obtain Theorem 3.1. Let 0 < α < 1, n ∈ N, and 2n1−α − 3 > 0. Then for any f ∈ C([−1, 1]), there holds 1−α 1 e−n(2n −3) −n |f (x) − Fn (f, x)| ≤ ω f, α + 4 e + f , n n where ω(f, δ) is the modulus of continuity of f defined by ω(f, δ) :=
sup x,y∈[−1,1];|x−y|≤δ
|f (x) − f (y)|.
Proof. Obviously,
∞ n k |f (x) − Fn (f, x)| = f (x) Φn (nx − k) − f Φn (nx − k) n k=−∞ k=−n n k ≤ |f (x)|Φn (nx − k) f (x) − f Φn (nx − k) + n k=−n |k|≥n+1 n f (x) − f k Φn (nx − k) + f ≤ Φn (nx − k) =: Δ1 + f Δ2 , n k=−n
|k|≥n+1
where we used the fact Φn (x) > 0 given in Theorem 2.1. Below we estimate Δ1 = k k:|x− n |≤ n1α
1 and 2 , respectively. For 0 < α < 1, f (x) − f k Φn (nx − k) + n
we have f (x) − f k Φn (nx − k) n
k k:|x− n |> n1α
CHEN Zhi-xiang, CAO Fei-long.
The construction and approximation of feedforward neural network
155
∞ 1 ≤ ω f, α Φn (nx − k) + 2f Φn (nx − k) n k 1 k=−∞ k:|x− n |> nα 1 ≤ ω f, α + 2f Φn (nx − k). n 1−α k:|nx−k|>n
We have used (4) in the last step. Combining (3) and the fact that Φn (x) is strictly decreasing given by Theorem 2.1, we obtain +∞ +∞ 1−α en 1 Φn (nx − k) ≤ 2 Φn (x)dx ≤ 2 dx = e−n(2n −3) . (5) 2nx e n 1−α 1−α n n −1 −1 1−α k:|nx−k|>n
Since −n ≤ nx ≤ n and |k| ≥ n + 1, |nx − k| ≥ 1. Then ∞ ∞ n e Φn (x)dx ≤ 2 e−n + dx ≤ 4e−n . Δ2 ≤ 2 Φn (1) + e2nx 1 1 The above arguments lead to 1−α 1 2 |f (x) − Fn (f, x)| ≤ ω f, α + e−n(2n −3) f + 4e−n f n n −n(2n1−α −3) e 1 f . ≤ ω f, α + 4 e−n + n n This completes the proof of Theorem 3.1. Remark 3.1. Theorem 3.1 gives an estimate of upper bound for Fn approximating f . It is easy to see that when n satisfies n1−α > 2, we have 1 |f (x) − Fn (f, x)| ≤ ω f, α + 8e−n f . n Especially, when f ∈ Lip1, there holds that |f (x) − Fn (f, x)| ≤ n1α + 8e−n f , which is convenient in the practical applications. Let CB (R) be the set of continuous and bounded function on R. For f ∈ CB (R), we
∞ construct FNN operators: F¯n (f, x) := k=−∞ f nk Φn (nx − k). Then ∞ ∞ k |F¯n (f, x) − f (x)| = f f (x)Φn (nx − k) Φn (nx − k) − n k=−∞ k=−∞ ∞ f k − f (x) Φn (nx − k) = f (x) − f k Φn (nx − k) ≤ n n k k=−∞ k:|x− n |≤ n1α k + f (x) − f n Φn (nx − k) k k:|x− n |> n1α 1 1 2f −n(2n1−α −3) e Φn (nx − k) ≤ ω f, α + . ≤ ω f, α + 2f n n n 1−α |nx−k|>n
Hence, we have proved the following theorem. Theorem 3.2. Let 0 < α < 1, n ∈ N, and 2n1−α − 3 > 0. Then for any f ∈ CB (R), we have 1 2f −n(2n1−α −3) |F¯n (f, x) − f (x)| ≤ ω f, α + e , n n
156
Appl. Math. J. Chinese Univ.
Vol. 30, No. 2
where f is the uniform norm of f on R. If we define Ψn (x) := Ψn (x1 , x2 ) := Φn (x1 )Φn (x2 ), x = (x1 , x2 ) ∈ R2 , then
∞ k=−∞
and
∞
∞
Ψn (x − k) :=
∞
(6)
Ψn (x1 − k1 , x2 − k2 ) = 1,
k1 =−∞ k2 =−∞ ∞
Ψn (nx − k) :=
k=−∞
∞
Ψn (nx1 − k1 , nx2 − k2 ) = 1.
k1 =−∞ k2 =−∞
For f (x1 , x2 ) ∈ C([−1, 1]2 ), we introduce FNN operators: n n k1 k2 , f Gn (f ; x1 , x2 ) := Ψn (nx1 − k1 , nx2 − k2 ) n n k1 =−n k2 =−n n k =: f Ψn (nx − k). n
(7)
(8)
k=−n
We now estimate the error f (x1 , x2 ) − Gn (f ; x1 , x2 ). Theorem 3.3. Let 0 < α < 1, n ∈ N, and 2n1−α − 3 > 0. Then for any f ∈ C([−1, 1]2 ), we have 1−α 1 1 1 |Gn (f ; x1 , x2 ) − f (x1 , x2 )| ≤ ω f ; α , α + 6 4e−n + e−n(2n −3) f , n n n where ω (f ; δ1 , δ2 ) is the modulus of continuity of f defined by ω (f ; δ1 , δ2 ) =
sup x,y∈[−1,1]2 ,|xi −yi |≤δi
|f (x) − f (y)|.
Proof. It is easy to see that Gn (f ; x1 , x2 ) − f (x1 , x2 ) n ∞ k = f Ψn (nx − k) Ψn (nx − k) − f (x) n k=−n k=−∞ n n k1 k2 = , f − f (x1 , x2 ) Ψn (nx1 − k1 , nx2 − k2 ) n n k1 =−n k2 =−n ⎛ −f (x) ⎝ Ψn (nx − k) + Ψn (nx − k) + |k1 |≤n |k2 |>n
|k1 |>n |k2 |≤n
⎞ Ψn (nx − k)⎠
|k1 |>n |k2 |>n
=: Δ3 − f (x)(Δ4 + Δ5 + Δ6 ). From the deduction process of Theorem 3.1, we easily get Ψn (nx1 − k1 , nx2 − k2 ) = Φn (nx1 − k1 )Φn (nx2 − k2 ) Δ4 = |k1 |≤n |k2 |>n
≤
|k2 |>n
Φn (nx2 − k2 ) ≤ 4e
|k1 |≤n |k2 |>n −n
,
CHEN Zhi-xiang, CAO Fei-long.
and Δ5 =
The construction and approximation of feedforward neural network
157
Ψn (nx1 − k1 , nx2 − k2 ) ≤ 4e−n .
|k1 |>n |k2 |≤n
Also, Δ6
=
Ψn (nx1 − k1 , nx2 − k2 )
|k1 |>n |k2 |>n
=
Φn (nx1 − k1 )
|k1 |>n
Φn (nx2 − k2 ) ≤ 16e−2n .
|k2 |>n
So we have Δ4 + Δ5 + Δ6 ≤ 24e−n . On the other hand, n |Δ3 | =
n k1 k1 , f (x1 , x2 ) − f Ψn (nx1 − k1 , nx2 − k2 ) n n k1 =−n k2 =−n f (x1 , x2 ) − f k1 , k1 Ψn (nx − k) ≤ n n k k k1 :|x1 − n1 |≤ n1α k2 :|x2 − n2 |≤ n1α f (x1 , x2 ) − f k1 , k1 Ψn (nx − k) + n n k k k1 :|x1 − n1 |> n1α k2 :|x2 − n2 |≤ n1α f (x1 , x2 ) − f k1 , k1 Ψn (nx − k) + n n k k k1 :|x1 − n1 |≤ n1α k2 :|x2 − n2 |> n1α f (x1 , x2 ) − f k1 , k1 Ψn (nx − k) + n n k k 1 1 k1 :|x1 −
1 n
|> nα k2 :|x2 −
2 n
|> nα
1−α 1−α 1 1 1 1 ≤ ω f ; α , α + 4f e−n(2n −3) + 2f 2 e−2n(2n −3) . n n n n 1 1 1 −n(2n1−α −3) . ≤ ω f ; α , α + 6f e n n n From the above result, we have 1−α 1 1 1 |Gn (f ; x1 , x2 )−f (x1 , x2 )| ≤ ω f ; α , α +6 4e−n + e−n(2n −3) f . n n n
Remark 3.2. As we know, approximating multivariate function by using translations of univariate function is more difficult. But Theorem 3.3 shows that the constructed operators Gn can provide a good approach and an error estimate for such approximation. Finally, we discuss the high order approximation by using the smoothness of f . Theorem 3.4. Let 0 < α < 1, 2n1−α − 3 > 0. Then for any f ∈ C N ([−1, 1]), there holds 3 8 −n(2n1−α −3) −n |Fn (f, x) − f (x)| ≤ 4e f + + e f N nα n 1 2N +1 −n(2n1−α −3) (N ) 1 + e +ω f (N ) , α f , αN n n N! nN ! where f N = max{f , f , . . . , f (N ) }.
158
Appl. Math. J. Chinese Univ.
Vol. 30, No. 2
Proof. We apply Taylor’s formula with integral remainder: j k N ( k − t)N −1 n k f (j) (x) k f (N ) (t) − f (N ) (x) n f −x + dt. = n j! n (N − 1)! x j=0 Hence, n k=−n
+
j N n k k f (j) (x) −x f Φn (nx − k) Φn (nx − k) = n j! n j=0 Φn (nx − k) n
= f (x)
Φn (nx − k)
k=−n
k n
x
j!
Φn (nx − k)
k=−n
j k −x n
( k − t)N −1 f (N ) (t) − f (N ) (x) n dt. (N − 1)!
k f Φn (nx − k) − f (x) n
= −f (x)
Φn (nx − k) +
|k|>n
+
N n f (j) (x) j=1
k=−n
Therefore, we get n
( k − t)N −1 f (N ) (t) − f (N ) (x) n dt (N − 1)!
Φn (nx − k) +
k=−n n
k n
x
k=−n
+
k=−n
n
n
Φn (nx − k)
k=−n
N n f (j) (x) j=1
k n
x
j!
Φn (nx − k)
k=−n
j k −x n
( k − t)N −1 f (N ) (t) − f (N ) (x) n dt (N − 1)!
=: Ξ1 + Ξ2 + Ξ3 . For Ξ1 and Ξ2 , we immediately obtain that |Ξ1 | ≤ 4e−n f , and j N n k |f (j) (x)| |Ξ2 | ≤ Φn (nx − k) − x . j! n j=1 k=−n
Using (4) and (5) we have j n k Φn (nx − k) − x ≤ n k=−n
≤
1 nαj
j k Φn (nx − k) − x + n 1
k k:| n −x|≤ nα
∞
Φn (nx − k) + 2j
k=−∞
k k:| n −x|> n1α
k k:| n −x|> nα
Φn (nx − k) ≤
1−α 1 2j + e−n(2n −3) , αj n n
so we get an estimation for Ξ2 : N 1 |f (j) (x)| 2j −n(2n1−α −3) |Ξ2 | ≤ + e . j! nαj n j=1 From the expansion of ex : ex = 1 + x +
j k Φn (nx − k) − x n 1
xk x2 + ···+ + ··· , 2! k!
CHEN Zhi-xiang, CAO Fei-long.
The construction and approximation of feedforward neural network
159
and the inequality (see 3.6.6 of [27]): ex ≤ 1 + x + it follows that
N j=1
Therefore,
x3 x2 + , 2 2(3 − x)
(0 ≤ x < 3),
N 1 2j 3 ≤ 8. < , j!nαj nα j=1 j!
3 8 −n(2n1−α −3) |Ξ2 | ≤ + e f N . nα n For estimating Ξ3 , we use the results that (see P.72-73 of [2]) k
n ( k − t)N −1 ω f (N ) , n1α nαN1 N ! , (N ) (N ) n f (t) − f (x) dt ≤ N +1 x (N − 1)! f (N ) 2 N ! ,
| nk − x| ≤ | nk − x| >
1 nα , 1 nα .
So,
N +1 1 (N ) 1 (N ) 2 |Ξ3 | ≤ ω f , α Φ (nx − k) + f Φn (nx − k) n n nαN N ! k N! k k:| n −x|≤ n1α k:| n −x|> n1α N +1 1−α 1 (N ) 1 (N ) 2 + f e−n(2n −3) . ≤ ω f , α n nαN N ! nN ! Combining the estimates of Ξ1 , Ξ2 and Ξ3 , we obtain 3 8 −n(2n1−α −3) |Fn (f, x) − f (x)| ≤ 4e−n f + e + f N nα n 1 2N +1 −n(2n1−α −3) (N ) (N ) 1 + e + ω f , α f . n nαN N ! nN ! The proof of Theorem 3.4 is completed. Remark 3.3. Compared with Theorem 3.1, Theorem 3.4 shows that target function f has better smoothness, and the approximation effect of Fn is better. This demonstrates that the approximating capacity of the network operators Fn depends on the smoothness of target function f . Below, we will give a further discussion on the approximation capacity of operators Fn , F¯n , and Gn . Remark 3.4. For f ∈ C([−1, 1]2 ), we can establish the same result as that of Theorem 3.2. Remark 3.5. For f ∈ C N ([−1, 1]2 ), we can also obtain a similar result to Theorem 3.4. We have given a series of error estimates of approximation by means of ω(f, n1α ). Naturally, we expect that ω(f, n1 ) can be used to describe the approximation error. It is easy to deduce that ∞ n k |f (x) − Fn (f, x)| = f (x) Φn (nx − k) − f Φn (nx − k) n k=−∞ k=−n n k ≤ |f (x)|Φn (nx − k) f (x) − f Φn (nx − k) + −n n |k|≥n+1
160
Appl. Math. J. Chinese Univ.
≤
Vol. 30, No. 2
n f (x) − f k Φn (nx − k) + f Φn (nx − k) n
k=−n
1 ≤ ω(f, ) + 2f n
|k|≥n+1
Φn (nx − k) + f
|nx−k|>1
Φn (nx − k) ≤ ω(f,
|k|≥n+1
1 ) + 12e−nf . n
Therefore, we obtain the first corollary. Corollary 3.1. For f ∈ C([−1, 1]), n ∈ N, we have 1 |f (x) − Fn (f, x)| ≤ ω f, + 12e−n f . n Similarly, we can obtain the following corollaries. Corollary 3.2. Let f ∈ CB (R), n ∈ N. Then
1 ¯ |Fn (f, x) − f (x)| ≤ ω f, + 8e−n f . n
Corollary 3.3. Let f ∈ C([−1, 1]2 ), n ∈ N. Then
1 1 |Gn (f ; x1 , x2 ) − f (x1 , x2 )| ≤ ω f ; , + 72e−n f . n n
Finally, we give a remark as follows. Remark 3.6. Comparing the proofs of Theorem 3.1 and Corollary 3.1, we know that the differences between these two results lie in the partition of n f (x) − f k Φn (nx − k). n k=−n
As far as our method is concerned, α suited |x − nk | ≤ n1α is only in (0, 1]. It is not difficult to see that for given x ∈ [−1, 1] and n there are two values of k satisfying |x − nk | ≤ n1 at most. While there are many values of k for the case |x − nk | ≤ n1α , 0 < α < 1. But when n is large
−n(2n1−α −3) f and enough, ω f, n1α and ω(f, n1 ) are main parts of ω f, n1α + 4 e−n + e n
1 −n ω f, n + 12e f , respectively. Therefore, the results of Corollary 3.1 is better than that achieved in Theorem 3.1. This shows that operators Fn (f, x) have good localization. And the approximation property of operators F¯n (f, x) and Gn (f ; x1 , x2 ) is similar to that of Fn (f, x).
§4
Conclusions
It is well known that the property of universal approximation of FNN is an important theoretical guaranty of famous back-propagation (BP) algorithm of FNNs, which guarantees theoretically the convergency of BP algorithm. The complexity problem of FNN approximation mainly focuses on the quantization estimate of approximation error by FNN, and further determines the number of neurons required to guarantee that all functions can be approximated to the prescribed degree of accuracy. It reflects the relation between the topological structure of the hidden layer of FNN and the approximation degree. Therefore, the study of complexity problem is very important for constructing an FNN and characterizing its approximation capacity.
CHEN Zhi-xiang, CAO Fei-long.
The construction and approximation of feedforward neural network
161
We have addressed, in this paper, some analytic properties of hyperbolic tangent activation function. A class of FNN operators called quasi-interpolation type FNN operators, which can be used to approximate continuous function, are constructed. Further, we employed the modulus of continuity of function and established the quantization estimates of approximation errors by these operators, i.e., Jackson type theorems of approximation. So the complexity of approximation for the constructed FNN operators were solved. Acknowledgements. The authors are very grateful to the referees’ suggestions and comments on the improvement of the paper.
References [1] G A Anastassiou. Rate of convergence of some neural network operators to the unit-univariate case, J Math Anal Appl, 1997, 212: 237-262. [2] G A Anastassiou. Quantitative Approximations, Chapman & Hall/CRC, Boca Raton, New York, 2001. [3] G A Anastassiou. Univariate hyperbolic tangent neural network approximation, Math Comput Modelling, 2011, 53: 1111-1132. [4] G A Anastassiou. Multivariate hyperbolic tangent neural network approximation, Comput Math Appl, 2011, 61: 809-821. [5] G A Anastassiou. Multivariate sigmoidal neural network approximation, Neural Networks, 2011, 24: 378-386. [6] G A Anastassiou. Intelligent Systems: Approximation by Artificial Neural Networks, Intelligent Systems Reference Library 19, Berlin, Springer-Verlag, 2011. [7] G A Anastassiou. Univariate sigmoidal neural network approximation, J Comput Anal Appl, 2012, 14: 659-690. [8] A R Barron. Universal approximation bounds for superpositions of a sigmoidal function, IEEE Trans Inform Theory, 1993, 39: 930-945. [9] F L Cao, T F Xie, Z B Xu. The estimate for approximation error of neural networks: A constructive approach, Neurocomputing, 2008, 71: 626-630. [10] D B Chen. Degree of approximation by superpositions of a sigmoidal function, Approx Theory Appl, 1993, 9: 17-28. [11] T P Chen, H Chen. Approximation capability to functions of several variables, nonlinear functionals, and operators by radial basis function neural networks, IEEE Trans Neural Networks, 1995, 6: 904-910. [12] T P Chen, H Chen. Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to a dynamic system, IEEE Trans Neural Networks, 1995, 6: 911-917. [13] T P Chen, H Chen, R W Liu. Approximation capability in C(Rn ) by multilayer feedforward networks and related problems, IEEE Trans Neural Networks, 1995, 6: 25-30.
162
Appl. Math. J. Chinese Univ.
Vol. 30, No. 2
[14] Z X Chen, F L Cao. The approximation operators with sigmoidal functions, Comput Math Appl, 2009, 58: 758-765. [15] C K Chui, X Li. Approximation by ridge functions and neural networks with one hidden layer, J Approx Theory, 1992, 70: 131-141. [16] D Costarelli, R Spigler. Approximation results for neural network operators activated by sigmoidal functions, Neural Networks, 2013, 44: 101-106. [17] D Costarelli, R Spigler. Constructive approximation by superposition of sigmoidal functions, Anal Theory Appl, 2013, 29: 169-196. [18] D Costarelli, R Spigler. Multivarite neural network operators with sigmoidal activation functions, Neural Networks, 2013, 48: 72-77. [19] D Costarelli. Interpolation by neural network operators activated by ramp functions, J Math Anal Appl, 2014, 419: 574-582. [20] D Costarelli, R Spigler. Convergence of a family of neural network operators of the Kantorovich type, J Approx Theory, 2014, 185: 80-90. [21] G Cybenko. Approximation by superpositions of sigmoidal function, Math Control Signals Systems, 1989, 2: 303-314. [22] K I Funahashi. On the approximaterealization of continuous mappings by neural networks, Neural Networks, 1989, 2: 183-192. [23] B Gao, Y Xu. Univariate approximation by superpositions of a sigmoidal function, J Math Anal Appl, 1993, 178: 221-226. [24] K Hornik, M Stinchcombe, H White. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks, Neural Networks, 1990, 3: 551-560. [25] P C Kainen, V Kurkov´ a. An integral upper bound for neural network approximation, Neural Comput, 2009, 21: 2970-2989. [26] Y Makovoz. Uniform approximation by neural networks, J Approx Theory, 1998, 95: 215-228. [27] D S Mitrinovic. Analytic Inequalities, Springer-Verlag, 1970. [28] A Pinkus. Approximation theory of the MLP model in neural networks, Acta Numer, 1999, 8: 143-195. [29] S Suzuki. Constructive function approximation by three-layer neural networks, Neural Networks, 1998, 11: 1049-1058.
1
2
Department of Mathematics, Shaoxing University, Shaoxing 312000, China. Email:
[email protected] Department of Mathematics, China Jiliang University, Hangzhou 310018, China. Email:
[email protected]