Toward Constructive Methods for Sigmoidal Neural ...

4 downloads 0 Views 2MB Size Report
Toward Constructive Methods for Sigmoidal Neural Networks -. Function Approximation in Engineering Mechanics Applications. Jin-Song Pei, Joseph P. Wright, ...
Toward Constructive Methods for Sigmoidal Neural Networks Function Approximation in Engineering Mechanics Applications Jin-Song Pei, Joseph P. Wright, Sami F. Masri, Eric C. Mai, and Andrew W. Smyth

Abstract— This paper reports a continuous development of the work by the authors presented at IJCNN 2005 & 2007 [1, 2]. A series of parsimonious universal approximator architectures with pre-defined values for weights and biases called “neural network prototypes” are proposed and used in a repetitive and systematic manner for the initialization of sigmoidal neural networks in function approximation. This paper provides a more in-depth literature review, presents one training example using laboratory data indicating quick convergence and trained sigmoidal neural networks with stable generalization capability, and discusses the complexity measure in [3, 4]. This study centers on approximating a subset of static nonlinear target functions - mechanical restoring force considered as a function of system states (displacement and velocity) for singledegree-of-freedom systems. We strive for efficient and rigorous constructive methods for sigmoidal neural networks to solve function approximation problems in this engineering mechanics application and beyond. Future work is identified.

I. INTRODUCTION The universal approximator theorem [5, 6] proves the feasibility of sigmoidal neural networks in function approximation. [3, 4] further provide approximation and estimation error bounds and a theoretical justification for the advantage of sigmoidal neural networks as compared to linear combinations of fixed basis functions. These seminal studies, however, do not offer constructive solutions especially concerning neural network initialization, which has long been considered an important but difficult research topic. Following [7, 8], [9] proves that, training a sigmoidal neural network for function approximation is actually an “NP-complete” problem, i.e., it is computationally intractable. Among other choices, it is therefore reasonable to accept suboptimal local minima (called “good” solutions), say, by using some empirical but rational approach, and keep improving these solutions. This indicates the challenging nature of this topic. Jin-Song Pei is with the School of Civil Engineering and Environmental Science, the University of Oklahoma, Norman, OK 73019, USA (phone: 405-325-4272; fax: 405-325-4217; email: [email protected]). Joseph P. Wright is with the Applied Science Division, Weidlinger Associates Inc., New York, NY 10014, USA (phone: 212-367-3084; fax: 212-3673030; email: [email protected]). Sami F. Masri is with the Sonny Astani Department of Civil and Environmental Engineering, University of Southern California, Los Angeles, CA 90089, USA (phone: 213-740-0602; fax: 213-744-1426; email: [email protected]). Eric C. Mai is with Berkeley Transportation Systems, Berkeley, CA 94704, USA (phone: 405-503-1379; email: [email protected]). Andrew W. Smyth is with the Department of Civil Engineering and Engineering Mechanics, Columbia University, New York, NY 10027, USA (phone: 212-854-3369; fax: 212-854-6272; email: [email protected]).

II. LITERATURE REVIEW According to [10], there are constructive initialization and training approaches. While the former is our focus, these two aspects may be inherently inseparable. For function approximation, [11, 12, 13, 14] summarize constructive methods for the initialization of the universal approximator to various extent. Simply put, we see four insights standing out: A. Understanding the function to be approximated According to previous studies, typified by [15, 12], good initialization may only be addressed properly by examining the features of the function to be approximated, or equivalently, the features of the error function surface. Works along this line can be found in [16, 15]. We see this understanding as another way to express the importance of complexity measure of the target function. While [3] uses the Fourier spectrum of the target function in the approximation error bound, it is questionable whether that is sufficient from a practical computational point of view since [11]. [13] uses the dimension of the feature vectors. We use the dominant features, but have not defined them precisely and generalized the concept. B. Understanding the tool used to approximate the function Prior work built on a good understanding of the capabilities of sigmoidal neural networks include [17, 18, 19, 16, 20, 21, 22]. Given the existence of some well-established operations and basis functions, first and foremost we need to know how to construct sigmoidal neural networks to reproduce what these conventional basis functions can do. Therefore, we have approximated (without error bounds in most cases) the four elementary operations, polynomial fitting, truncated sinc, exponential, Gaussian and Mexican hat functions. More importantly, we believe the importance of showing how sigmoidal neural networks can be constructed to outperform the conventional basis functions - Theoretically it should [3], but we need to make it happen computationally. Along this line, we have approximated the dead space (or clearance) types nonlinearity as well as some highly asymmetrical nonlinearities that are useful in our applications [23]. C. Generalizing well Based on [24], the purpose of validation is to clarify if the estimated model “represents the underlying system adequately”. This is why regularization is performed. In need, good generalization is always our ultimate goal.

D. Coming from and returning to applications Sigmoidal neural networks are indeed used to approximate unknown functions, however, the users of ANNs, given their domain knowledge, may have some ideas about what nonlinearity to expect from the data. What we are trying to do is to bridge the gap between these ideas and the initialization of sigmoidal neural networks. Knowing what-to-do with a priori information, we would better handle situations where there is no a priori information at all. An illustrative example is given in [25]. In a constructive manner, [11] gives the most rigorous solution to the approximation of polynomials using sigmoidal neural networks. Other attempts for the same type of target function can be found in [26] and [27, 28]. Though polynomials are a very small subset of static nonlinear functions, these efforts are noteworthy because (i) we should develop constructive methods to approximate other known types of nonlinear functions (as mentioned before), (ii) we should connect sigmoidal neural networks with existing knowledge so that engineers and researchers wouldn’t feel a total “black box” for ANNs.

put forth these expectations on ANNs: (i) fast training convergence, (ii) reasonable accuracy, (iii) great generalization capabilities, e.g., required when experimental investigations that can mimic extreme events are difficult and/or expensive to conduct, (iv) little or modest computational resources in terms of network size and learning time, e.g., for wireless structural health monitoring and intelligent control systems, and (iv) constructive and insightful methods. IV. AN ILLUSTRATIVE EXAMPLE For clarity, one example from a carefully designed laboratory device to study coupled multi-body mechanical systems [36] are illustrated in Fig. 1, where test data and approximated surfaces are shown: the inputs, normalized relative rotation and angular velocity of the joints, are along the x- and y-axis, while the output, normalized torsional restoring force, is along the z-axis - following the force-state mapping formulation. Note that the nonlinearities including their mathematical representations and parameter values are unknown. This is not a trivial problem used to validate our constructive method. (a)

(b)

(c)

III. DOMAIN PROBLEM 1

1

We study the approximation of some static nonlinear functions using the universal approximators employing a logistic sigmoidal activation function (with one vector input, one hidden layer, and one scalar output), and training using the backpropagation. Here, the inputs represent the state of a nonlinear dynamical system while the output, the underlying restoring force of this system - an internal force that restores a moving mass to its neutral position. This force-state mapping technique [29] is one of the proven powerful methods to model nonlinear dynamical systems for aerospace, mechanical, civil and some bio-mechanical systems. However, the nonlinear restoring force can be very complex especially when involving hysteresis and when it is time varying (e.g., caused by deteriorating system properties). Furthermore, multi-component systems involving multi-degrees-of-freedom (MDOF) can be extremely challenging. This research topic creates an opportunity to showcase the power of sigmoidal and other neural networks when phenomenological representations are not available and commonly used fixed basis functions are not parsimonious and hard to adapt to data. Training sigmoidal neural networks to approximate static nonlinear function, however, is known computationally intractable. Therefore we focus on a “subset” of all possible target functions and tackle this challenging problem by moving forward one solid step at a time. The central issue herein is to develop constructive methods for sigmoidal neural networks to fit the aforementioned input-output relations. While we do look into the relevant training issues in our study, we focus on answering these two questions regarding the initialization, i.e., the determination of (i) the number of hidden nodes, and (ii) the initial values for weights and biases. The specified applications

0.5

0.5

z 0

z 0

−0.5

−0.5 −1

−1

−1

1 1

y

0

0 −1 −1

x

1

0 y 1 −1

0 x

Fig. 1. (a) The “AF2” device designed and tested at USC [36], (b) the test data points for one torsional DOF alone and trained surface using our constructive method, (3) the generalization error surface.

We applied our constructive method to the SDOF-only cases for now. Shown here are the rotational-motion-only data sets, Sets 14 and 17, from two independently conducted experiments. The former is for training while the latter, validation. As common practice, normalized inputs and outputs were used to train neural networks. The normalization factors for the relative rotation, relative rotation velocity and restoring torque are 0.2, 8 and 50, respectively. A. Proposed Method Here in this subsection, we use informal language to strive for an intuitive understanding of our constructive method [27, 37, 38, 28, 25, 23, 39, 1, 2]. Nicknames are first given with quotation marks for a quick grasp of the key idea. A flowchart of the same idea is presented in [23]. We have been developing a methodology and a couple of empirical techniques paired with our “neural network prototypes”, some pre-defined sigmoidal neural networks with fixed number (usually very small) of hidden nodes and specified weights and biases values (some fixed, some constrained and others, free). For example, Prototype 1 is derived to have three hidden nodes to approximate a softening to hard limiting type of nonlinearity, while Prototype 2 is derived to have four hidden nodes to approximate a hardening to

1 0

=

−1 1

1 0 −1 −1 Normalized Normalized (a) velocity displacement 0

1 0.5 0

+

−0.5 −1

(b)

−1 0 1 Normalized displacement

Normalized restoring force

Normalized restoring force

Normalized restoring force

dead space (clearance) type of nonlinearity. See Fig. 2 (e) for the nonlinearities generated by Prototypes 1 and 2 without training. Three curves are shown in each panel giving an idea of how the free values in the prototypes can affect the profile of the nonlinearity; we name them after “variants” for a prototype. 1 0.5

B. Result and Validation

0 −0.5

(c)

−1 −1

0 Normalized velocity

1

(d) ‘’hint book’’ for ten types of nonlinearities I. Linear

II. Cubic stiffness and more

III. Bilinear stiffness and more

IV. Multislope

Prototype #1

Prototype #2

for restoring force vs. velocity

for restoring force vs. displacement

VI. Softening cubic and more

V. Fractional power

Prototype #3

VII. Clearance (dead space

VIII. Hard saturation

Prototype #1b + #1c

IX. Saturation

X. Stiction

Prototype #1b + - #2

(e) two “building blocks’’ and variations (f) constructing using two “building blocks’’ 1

Σ

1

Prototype #2

Prototype #1

0.5

0.5

0

0

displacement

Σ Σ

Prototype #2

Σ Σ

Variant a

-0.5

Variant b

Variant c -1 -1

-0.5

0

0.5

Σ

Variant a

-0.5

Variant b

velocity

Variant c 1

-1 -1

-0.5

0

0.5

1

Σ

above) so that we can simply follow the “hint book” and use the right “building blocks” in the right place to create the “patterns”. Other results will be shown later in Fig. 3 and both fundamental and practical issues will be discussed in Section VI.

restoring force

Prototype #1

Σ

Fig. 2. A step-by-step procedure to explain the proposed neural network initialization.

Differing from what is commonly seen in the literature, the two questions are answered in a non-iterative (deterministic) manner. We have a three-step procedure. First, we examine the normalized data in the specified Cartesian coordinate system in Fig. 2(a), and obtain a dominant feature in each principal direction, i.e., z-x and z-y, for this bivariate function. See the thickened lines in Fig. 2(b) and (c). Next, we locate these features in Fig. 2(d), a “guideline” prepared by us beforehand. The two features point us to the use of Prototypes 1 jointly with 2. Last, we simply assemble the prototypes including adding zero weights to make a fully connected neural network shown in Fig. 2(f) that has seven hidden nodes and pre-defined initial values. Then we are ready for training using the backpropagation [40], e.g., the Levenberg-Marquardt algorithm. The training leads to a surface with reasonable stable extrapolation that the NguyenWidrow initialization cannot always achieve (see Fig. 3 later). These three steps are indeed simple, fast, clear, and fruitful. However, it takes us some nontrivial efforts to develop the existing prototypes and the guideline shown here and in our published work. We name prototypes as such because we use them either individually or in combination like using the “building blocks”. Our guideline is equivalent to a “hint book” for specific “patterns” that our current “building blocks” can potentially create. We will need to break down our target function into these “patterns” (Step 1

Regularization is not used for this problem given our large number of data points and parsimonious model. Specifically, the ratio between number of training data and weights in the network, Np = 2400 21 ≫ 20, the threshold recommended for regularization in [24]. The training and validation results are shown in Fig. 1(b) and (c). The method for validation is problem-dependent [24]. Here we chose to examine the prediction and generalization errors, error surfaces, and extraploted restoring force surface. The prediction error (using the training data) is close to the generalization error (using another independent data set) in MSE. The surface of the prediction error in [39] is similar to that of the generalization error (shown in Fig. 1(c)). Given the closeness of these two error values and consistent patterns of the two error surfaces, we can say that, very likely, the neural network generalizes well [24]. Nonetheless, the clear feature of the error surfaces and magnitude of the MSE indicate the need to further approximate these errors using more hidden nodes. This is not a surprise. As will be shown later in Eq. (1), the approximation error tends to be large when the number of hidden nodes is small. The predicted restoring force surface is extrapolated slightly: the one from using the proposed constructive method is shown in Fig. 1(b) and deemed rational and stable, i.e., a small change in the inputs only leads to a small change in the output. The predicted restoring force surfaces from using the Nguyen-Widrow algorithm [19], however, are often not as good - Following what is recommended for practical application [24], we took seven trials and the results are presented in Fig. 3. It can be seen that four out of seven trials are not stable. This contrast indicates the good stability of the proposed constructive method, which we want to preserve. While this result shows possibly poor stability of using the Nguyen-Widrow algorithm [19], our published work indicates its poor training convergence from time to time - using the number of hidden nodes obtained from our constructive method. On another note, we have proposed an iterative procedure if the training result is undesirable [23], however we rarely use it as our method often leads to a reasonable training result right away. This is why we call our method non-iterative. This result confirms the earlier work [41] suggesting not using random initial values for weights and bias. The result also indicates the usefulness of our proposed constructive method.

is the closet to the target function among all sigmoidal neural networks with the same value of n? This is an NP-complete problem. Our work has shown the importance to understand both the complexity of the function to be approximated and the capabilities of the universal approximator. B. Discussion of Cf Alone as Complexity Measure

Fig. 3. Validation results of the same problem shown in Fig. 1(b) but using the Nguyen-Widrow algorithm [19] for the initial values of weights and basis. Seven trials were made following the suggestion in [24].

Let us convince ourselves why Cf alone may not be a sufficient complexity measure for constructive methods by examining this set of target functions - a family of univariate sigmoidal functions shown in Fig. 4: 1

V. THOUGHTS ON COMPLEXITY MEASURE A. Review of Cf in [3, 4]

f

0. 0.5 w=1

Following [3, 4], the total statistical risk (or equivalently, the generalization error, [24]) in MSE, can be decomposed into two: total statistical risk, or generalization error

=

approximation error, or squared bias



O |

Cf2

+

!

n {z }

estimation error, or variance

+

O |

 nd log N N {z }

0. 4

0

0. 2

x

w=3

0.2

0.4

w=5

0.6

w=7

0.8

1

Fig. 4. Plots of the family of sigmoidal functions in Eq. (5) when w = 1, 3, 5, 7, respectively, and b = 0. Only x ∈ [−1, +1] is shown in the figure.

(1)

where n is the number of hidden nodes, d is the input dimension of the function, N is the number of training observations, and Cf is the first absolute moment of the Fourier F magnitude distribution of f , i.e., Z Cf = |ω| f˜(ω) dω (2) Rd

where

Z

0. 6

f (x, w, b) = σ(wx + b) =

estimation error bound

f˜(ω) , F (f (x)) ,

0. 8

where

approximation error bound



0 -1

1 1+

e−(wx+b)

(5)

with b ≡ 0, w > 0, and x ∈ [−a, a]. Let us estimate the values of Cf for this family of target functions. Define z = wx + b and σ(z) as follows: σ(z) =

1 1 + e−z

Then the family of the target functions can be equivalently expressed as follows:

+∞

f (x)e−iωx dx

(3)

f (x, w, b) = σ(z)

−∞

Cf plays an important role in [3, 4] as the complexity measure. For example, the number of nodes is optimized as follows:   12 N n ∼ Cf (4) d log N The two-part division in Eq. (1) is very insightful to quantify bias/variance (i.e., approximation error/estimation error) dilemma: The bias is proportional to the value of n, while the variance is inversely proportional to it. The bias does not depend on training algorithms, however the variance does. Following [4], “approximation error refers to the distance between the target function and the closest neural network function of a given architecture and estimation error refers to the distance between this ideal network function and an estimated network function.” Computationally, how can we reach the trained sigmoidal neural network whose output

with z ∈ [−wa, +wa]. Continuing on, we have: dσ −e−z ∂f =w =w ∂x dz (1 + e−z )2 We know that at z = 0 and

(6)

−e−z (1+e−z )2 is an even function with the minimum −e−z − 41 ≤ (1+e −z )2 < 0. It is well known that

F





∂f ∂x

∂f ∂x



= (iω)f˜(ω)

(7)

Also we have: F



Eq. (6)

= wF



dσ dz



(8)

Then we have: ,

Cf

Z

+∞

−∞ +∞

=

Z

−∞ +∞

= Eq. (7)

=

Eq. (8)

= =

Z

|ω| f˜(ω) dω

|iω| f˜(ω) dω

˜ iω f (ω) dω −∞ Z +∞   F ∂f dω ∂x −∞   Z +∞ wF dσ dω dz −∞ Z +∞   F dσ dω |w| dz −∞

This last equation is actually given in [3]; we have verified it. Now let us focus on the compact support, x ∈ [−a, +a], i.e., z ∈ [−wa, +wa]. We have the following steps:   dσ dσ 1[−wa,+wa] =⇒ F 1[−wa,+wa] dz dz    dσ [42, 43] ⋆ F 1[−wa,+wa] = F dz where ⋆ stands for convolution. Also, we know that    dσ F ⋆ F 1[−wa,+wa] dz   dσ 2 sin(waω) ⋆ = F dz ω | {z } | {z } independent of w and a

values of Cf or the maximum values of the first derivatives. In other words, we see the need to understand both the complexity of the function to be approximated and the capabilities of sigmoidal neural networks. This is why we have shown that sigmoidal neural networks can be constructed (and then, perhaps, trained) to approximate certain popular univariate and bivariate functions (often without error bounds in our existing work).

In sparse representations of signals, the discontinuities of the signal are the highlights [44, 43]. In other words, the discontinues are considered the features of the function to be approximated. When we explore nonlinear functions that are basically monotonic (i.e., with low values of ω), they may seem “featureless”. Nonetheless, these are the useful nonlinearities for the force-state mapping, the focus of our study. - Some features are inherently easy for sigmoidal neural networks and some are not; the “features” may need to be discovered in this manner.

In terms of looking for meaningful features as complexity measure and used in error bounds, it seems necessary to reexamine and refine the classifications of the nonlinearities shown in the “hint book” in Fig. 2(d) and others in our published work. Considering no features but Cf , it may not work for constructive methods as discussed previously in Section V-B. Considering too many features, it may be hard to introduce mathematical rigor. A good balance is in need.

dependent on wa

Substitute this back to Cf , we have:   Z +∞ dσ 2 sin(waω) Cf = |w| F ⋆ dω (9) |{z} −∞ dz ω | {z } | {z } independent of w and a dependent on wa

which means that Cf can be w-dependent. This contradicts with our common sense that the actual complexity of approximating this family of target functions is the same - One logistic sigmoidal function would achieve zero approximation error. Note that we do not question the correctness and usefulness of Cf in proving the error bounds. Our focus, however, is constructive methods, a topic of which is not touched upon in both [3, 4]. C. What is Next? For the sake of constructive methods, Cf perhaps tells only half of the story: Cf indicates the smoothness of the function to be approximated especially in terms of the boundedness of its first derivative. The other half of the story, perhaps, is to know what is considered complex for sigmoidal neural networks and what is considered not. In Fig. 4, the family of target functions can be approximated equally well (in terms of approximator error) using a universal approximator with one hidden node - regardless of how drastically different their

Our work has shown it necessary to treat hardening and softening nonlinearities separately when considering constructive methods. We have also learned that, however, it may not be as necessary to differentiate some unsymmetrical from symmetrical nonlinearities. If we initialize a sigmoidal neural network as if approximating a symmetrical nonlinearity, the backpropagation, very likely, can make it adapt efficiently to a related asymmetrical nonlinearity. This amazing performance is very useful in engineering mechanics applications, e.g., plain concrete has a highly asymmetrical constitutive relation and we have successfully trained an idealized relation adopted from a design standard [23].

An affine interpolation for using sigmoidal neural networks to approximate some hardening and softening nonlinear functions are given in Fig. 5(a) and (c), respectively, in constrast to using the families of polynomials and fractional powers shown in Fig. 5(b) and (d), respectively. For (0,w,−w∗1) x ∈ [0, 1], we have ff (x,w,−w∗1)−f (1,w,−w∗1)−f (0,w,−w∗0) for hardening

nonlinearity, and ff (x,w,0)−0.5 (1,w,0)−0.5 for softening nonlinearity. Note that a constant can be approximated exactly using two hidden nodes as in [27, 28]. This is just an example of how we will continue to systematically study the capabilities of sigmoidal neural networks.

1

(a)

(b)

0.6

0.4

0.2

0

0.6

0.4

0.2

0

0.2

0.4

x

0.6

0.8

0

1

1

0

0.2

0.4

x

0.6

0.8

1

0.6

0.8

1

1

(c)

(d)

0.5

0.8

0

1

0.5

0

VII. CONCLUSION

0.8

polynomial

0.8

fractional power

linear sum of three sigmoidal fcn linear sum of three sigmoidal fcn

1

0.6

0.4

0.2

0

0.2

0.4

x

0.6

0.8

1

0

0

0.2

0.4

x

Fig. 5. (a) and (c) hardening and softening families formed by sigmoidal neural networks with two and three hidden nodes, respectively, and (b) and (d) hardening and softening families formed by the polynomials and fractional polynomials, respectively.

Constructive methods for the initialization of sigmoidal neural networks in function approximation have been reviewed. By using an example, we have shown that the definition for the complexity measure used for approximation error bounds from [3, 4] may not be conveniently adopted for constructive methods. The alternatives are challenging to define, however our experiences have been shared in this paper. Given the NP-complete nature of training sigmoidal neural networks in funcation approximation, an empirical constructive method proposed by us to approximate a subset of static nonlinear functions has been introduced. An illustrative example using experimental data has been presented. The training and validation results have been compared the Nguyen-Widrow algorithm [19]. A vivid metaphor has been made for an intuitive understanding of our method. The proposed constructive method has shown its usefulness for the target functions studied by us.

VI. DISCUSSION

ACKNOWLEDGMENTS

The following Q & A list gives an overview of the past, present and future of our work. Nicknames follow those in Section IV-A:

This study is partially funded by NSF CMMI 0626401 (Program Officer, Dr. Shih-Chi Liu). The first author appreciates Dr. Paul Werbos for helpful discussions on [3, 4]. She also would like to thank Professor Jeff Scruggs for introducing [43]. Part of this work was developed during the first author’s sabbatical leave; she would like to thank Professors Jim Beck and Jeff Scruggs for their hospitality.

A. Is it possible for a user not to start from scratch when approximating a new function? Yes, based on our existing effort on the specified subset of static nonlinear function [27, 37, 38, 28, 25, 23, 39, 1, 2]. Note that we do not rely on random initial values in our constructive method. B. Do “building blocks” and “hint books” work for constructive methods? Yes, based on our existing effort, however what we have had is far from being enough. More “building blocks” and “hint books” need to be developed using real-world, laboratory and simulation data targeting complex, coupled, multivariates nonlinear functions. Ways to select the best “building blocks” and “hint books” and to greatly improve approximation accuracy are in need. More complex ANNs would be used in the future. C. Can we justify the reliability of “building blocks” and “hint books”? Yes, based on our existing training results, but no, based on rigorous proofs or well-organized evidence. Refining or replacing the complexity measure in [3, 4] and showing the error bound or evidence for what we can achieve using our “building blocks” and “hint books” would be the most challenging task in our pursuit. D. Can we do better? Fig. 1 indicates the necessity to improve the approximation accuracy while keeping good generalization. In addition, we will approximate nonlinear restoring forces of systems with multi-components and/or MDOF.

R EFERENCES [1] J. S. Pei, J. P. Wright, and A. W. Smyth, “Neural network initialization with prototypes - a case study in function approximation,” in Proceedings of International Joint Conference on Neural Networks 2005 (IJCNN’05), Montreal, Canada, July 31 - August 4 2005, pp. 1377–1382. [2] J. S. Pei, E. C. Mai, J. P. Wright, and A. W. Smyth, “Neural network initialization with prototypes - function approximation in engineering mechanics applications,” in Proceedings of International Joint Conference on Neural Networks 2007 (IJCNN’07), Orlando, FL, August 12-17 2007, iEEE Catalog Number 07CH37922C, ISBN 0-4244-1380-X. [3] A. R. Barron, “Universal approximation bounds for superpositions of a sigmoidal function,” IEEE Transactions on Information Theory, vol. 39, no. 3, pp. 930–945, May 1993. [4] ——, “Approximation and estimation bounds for artificial neural networks,” Machine Learning, vol. 14, pp. 115–133, 1994. [5] G. Cybenko, “Approximation by superpositions of sigmoidal function,” Mathematics of Control, Signals, and Systems, vol. 2, pp. 303–314, 1989. [6] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, pp. 359–366, 1989. [7] J. S. Judd, Neural network design and the complexity of learning. Cambridge, MA: MIT press, 1990. [8] A. L. Blum and R. L. Rivest, “Training a 3-node neural network is np-complete,” Neural Networks, vol. 5, pp. 117– 127, 1995. [9] L. K. Jones, “The computational intractability of training sigmoidal neural networks,” IEEE Transactions on Information Theory, vol. 43, no. 1, pp. 167–173, January 1997. [10] L. Franco, D. A. Elizondo, and J. M. Jerez, Eds., Constructive Neural Networks, ser. Studies in Computational Intelligence. Berlin Heidelberg: Springer-Verlag, 2009, vol. 258.

[11] F. Scarselli and A. C. Tsoi, “Universal approximation using feedforward neural networks: A survey of some existing methods, and some new results,” Neural Networks, vol. 11, no. 1, pp. 15–37, 1998. [12] M. Lehtokangas, “Fast initialization for cascade-corrolation learning,” IEE Transactions on Neural Networks, vol. 10, no. 2, pp. 410–414, 1999. [13] S. Ma and C. Ji, “Performance and efficiency: Recent advances in supervised learning,” in Proceedings of the IEEE, vol. 87, no. 9, 1999, pp. 1519–1535. [14] J. Y. F. Yam and T. W. S. Chow, “Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients,” IEEE Transactions on Neural Networks, vol. 12, no. 2, pp. 430–434, 2001. [15] T. Denoeux and Lengell´e, “Initializing back propagation networks with prototypes,” Neural Networks, vol. 6, pp. 351–363, 1993. [16] S. Osowski, “New approach to selection of initial values of weights in neural function approximation,” Electronic Letters, vol. 29, no. 3, pp. 313–315, 1993. [17] A. Lapedes and R. Farber, “How neural nets work,” in Neural Information Processing Systems, D. Anderson, Ed. American Institute of Physics, 1988, pp. 442–456. [18] L. K. Jones, “Constructive approximations for neural networks by sigmoidal functions,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1586–1589, 1990. [19] D. Nguyen and B. Widrow, “Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights,” in Proceedings of the IJCNN, vol. III, July 1990, pp. 21–26. [20] T. L. Burrows and M. Niranjan, “Feed-forward and recurrent neural networks for system identification,” Cambridge University Engineering Department, CUED/F-INFENG/TR158, Tech. Rep., 1993. [21] P. Costa and P. Larzabal, “Initialization of supervised training for parametric estimation,” Neural Processing Letters, vol. 9, pp. 53–61, 1999, kluwer Academic Publishers, Printed in the Netherlands. [22] S. Ferrari and R. F. Stengel, “Smooth function approximation using neural networks,” IEEE Transactions on Neural Networks, vol. 16-38, no. 1, pp. 24–38, January 2005. [23] J. S. Pei and E. C. Mai, “Constructing multilayer feedforward neural networks to approximate nonlinear functions in engineering mechanics applications,” ASME Journal of Applied Mechanics, 2008. [24] M. Norgaard, O. Ravn, N. K. Poulsen, and L. K. Hansen, Neural Networks for Modelling and Control of Dynamic Systems: A Practitioner’s Handbook, ser. Advanced Textbooks in Control and Signal Processing. Springer, 2000. [25] J. S. Pei, E. C. Mai, J. P. Wright, and S. F. Masri, “Mapping some functions and four arithmetic operations to multilayer feedforward neural networks,” Computer Methods in Applied Mechanics and Engineering, 2011, under review. [26] B. Curry, P. Morgan, and M. Beynon, “Neural networks and flexible approximations,” IMA Journal of Mathematics Applied in Business and Industry, vol. 11, pp. 19–35, 2000. [27] J. S. Pei, “Parametric and nonparametric identification of nonlinear systems,” Ph.D. Dissertation, Columbia University, 2001. [28] J. S. Pei, J. P. Wright, and A. W. Smyth, “Mapping polynomial fitting into feedforward neural networks for modeling nonlinear dynamic systems and beyond,” Computer Methods in Applied Mechanics and Engineering, vol. 194, no. 42-44, pp. 4481–4505, 2005. [29] S. F. Masri and T. K. Caughey, “A nonparametric identification technique for nonlinear dynamic problems,” Journal of Applied Mechanics, vol. 46, pp. 433–447, June 1979.

[30] K. J. O’Donnell and E. F. Crawley, “Identification of nonlinear system parameters in space structure joints using the forcestate mapping technique, pp.170,” MIT space systems lab., SSL#16-85, Tech. Rep., July 1985. [31] R. Bouc, “Forced vibration of mechanical systems with hysteresis,” in Proceedings of 4th Conference on Nonlinear Oscillations, 1967. [32] Y. K. Wen, “Method for random vibration of hysteretic system,” ASCE Journal of Engineering Mechanics, vol. 102, no. 2, pp. 249–263, 1976. [33] T. T. Baber and M. N. Noori, “Random vibration of degrading, pinching systems,” ASCE Journal of Engineering Mechanics, vol. 111, no. 8, pp. 1010–1026, 1985. [34] ——, “Modeling general hysteresis behavior and random vibration application,” ASME Journal of Vibration, Acoustics, Stress, and Reliability in Design, vol. 108, pp. 411–420, 1986. [35] D. Jeltsema and J. M. A. Scherpen, “Multidomain modeling of nonlinear networks and systems: Energy- and power-based perspectives,” IEEE Control Systems Magazine, pp. 28–59, 2009. [36] F. Tasbighoo, J. P. Caffrey, and S. F. Masri, “Development of data-based model-free representation of non-linear nonconservative dissipative systems,” International Journal of Non-Linear Mechanics, 2004. [37] J. S. Pei and A. W. Smyth, “A new approach to design multilayer feedforward neural network architecture in modeling nonlinear restoring forces: Part i - formulation,” ASCE Journal of Engineering Mechanics, vol. 132, no. 12, pp. 1290–1300, December 2006. [38] ——, “A new approach to design multilayer feedforward neural network architecture in modeling nonlinear restoring forces: Part ii - applications,” ASCE Journal of Engineering Mechanics, vol. 132, no. 12, pp. 1310–1312, December 2006. [39] J. S. Pei and S. F. Masri, “Demonstration and validation of a constructive neural network initialization method to approximate nonlinear restoring force using experimental data,” ASME Journal of Applied Mechanics, 2011, to be submitted for review. [40] P. J. Werbos, The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting, ser. Adaptive and Learning Systems for Signal Processing, Communications and Control Series. Wiley-Interscience, 1994. [41] ——, “Neural networks, system identification, and control in the chemical process industries,” in Handbook of Intelligent Control Neural, Fuzzy, and Adaptive Approaches, D. A. White and D. A. Sofge, Eds. New York: Van Nostrand Reinhold, 1992, pp. 283–356. [42] Z. Gaji´c, Linear dynamic systems and signals. Prentice Hall, 2003. [43] S. Mallat, A Wavelet Tour of Signal Processing, 3rd ed. Academic Press, 2009. [44] I. Daubechies, Ten Lectures on Wavelets, ser. CBMS-NSF Regional Conference Series in Applied Mathematics. Philadelphia, PA: SIAM, 1992.

Suggest Documents