A Time-Unrolling Method to Compute Sensitivity of Dynamic Systems Frank Liu
Peter Feldmann
IBM Research Austin Austin, TX
D E Shaw Research New York, NY
[email protected]
[email protected]
ABSTRACT Sensitivities of the dynamic system responses with respect to the system parameters are highly valuable, with broad applications such as system tuning and uncertainty quantification. Compared to the direct methods, adjoint methods are much more efficient when the number of parameters is large. In this paper, we present a time-unrolling method to compute adjoint sensitivities. Instead of explicitly constructing the adjoint system, which quite often is nontrivial, our time-unrolling method implicitly retrace the response trajectory by utilizing the fitting polynomial of the integration methods. This paper provides theoretical foundation of the method as well as experimental demonstrations of its effectiveness.
1.
INTRODUCTION
Many scientific and engineering problems require simulation of large scale dynamical systems, represented by linear/nonlinear differential-algebraic equations (DAE’s) or partial differential equations (PDE’s). With the advancement in algorithmic development and computing power in the past decades, simulating dynamical systems with millions to hundreds of millions unknowns has become a reality. While accurate dynamic responses of large dynamical systems are valuable, the sensitivities of those responses with respect to the system parameters or input excitations are equally useful. It is well recognized in the scientific community that the performance sensitivities with respect to model parameters are highly valuable for data assimilation, parameter estimation, uncertainty quantification as well as stability analysis[1]. In engineering, sensitivities are especially valuable for design optimization[2] because they serve for the computation of gradients that guide many powerful algorithms. In VLSI circuit analysis and design, sensitivities have been used for design optimization in a formal theoretical framework[3, 4]. However, the practice is not widely adopted, partially due to the impression that sensitivity computation, in particular adjoint sensitivity computation, is difficult to Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
[email protected]. DAC ’14, June 01 – 05 2014, San Francisco, CA, USA Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-2730-5/14/06 ...$15.00. http://dx.doi.org/10.1145/2593069.2593080.
implement. The commonly accepted notation is that the adjoint sensitivities have to be computed by solving the adjoint circuit, which can be elegantly derived from Tellegen’s theorem[5], or from Lagrange multiplier[6]. Although the adjoint circuit has the same topology as the nominal circuit, it consists of dual circuit elements with different BCR. Since the adjoint circuit has to be solved backward in time, the difficulty is further confronted to determine the correct boundary conditions for the adjoint circuit. It is well recognized that consistent boundary conditions are difficult to derive for general nonlinear adjoint systems[7, 8]. In this paper, we demonstrate that the adjoint sensitivities can be solved without explicitly constructing the adjoint circuit. By applying a time-unrolling scheme and storing information about the dynamic trajectory of the circuit, the adjoint sensitivities can be computed by utilizing the fitting polynomial from the numerical integration schemes. Furthermore, the method is applicable to other nonlinear dynamic systems, including complicated fluid dynamics problem, in which the underlying physical phenomena are modeled by nonlinear partial differential equations.
2.
SENSITIVITIES OF DYNAMIC SYSTEMS
We first derive the sensitivity of a general nonlinear system, then extend to a general nonlinear dynamic system.
2.1
Sensitivity of a general nonlinear system
Without loss of generality, we consider a nonlinear system, which depends on a state variable vector X ∈ RN and a set of parameters p ∈ Rn : Ψ(X, p) = 0
(1)
We are typically interested in studying various performances of the system φ ∈ Rm , which are functions of both state variables X and parameters p: φ = Φ(X, p)
(2)
In particular, we are interested in the small-signal sensitivities of system performances φ with respect to the parameters p. We differentiate the system in (1) with respect to p, under the assumption that X0 (p) is a solution, implicitly dependent on the parameters p. ∂Ψ ∂Ψ ∂X0 · + =0 ∂X ∂p ∂p
(3)
Based on the chain rule, we have: dφ ∂Φ ∂X0 ∂Φ = · + dp ∂X ∂p ∂p
(4)
and substituting (3) into (4), we obtain: −1 dφ ∂Φ ∂Ψ ∂Ψ ∂Φ =− + dp ∂X ∂X ∂p ∂p
(5)
Observe that the sensitivity expression contains the prod ∂Ψ −1 ∂Φ of dimensions m × N , ∂X of uct of 3 matrices: ∂X ∂Ψ dimensions N × N ( ∂X is often called the Jacobian matrix), ∂Ψ of dimensions N × n. Depending on the relative dimen∂p sions of the parameter (n) and performance (m) vectors, there are two ways to compute it: direct sensitivity formula " # −1 dφ ∂Φ ∂Ψ ∂Ψ ∂Φ =− + dp ∂X ∂X ∂p ∂p
(6)
The term within the square bracket is obtained by solving an N × N system with n right hand sides (RHS). This is the computationally preferred association when n is small. adjoint sensitivity formula " −1 # ∂Φ ∂Ψ ∂Ψ ∂Φ dφ =− + dp ∂X ∂X ∂p ∂p
(7)
In this case the term within the square bracket is obtained by solving a system involving the adjoint (transpose) of the same matrix as in the direct approach, with m right hand sides (RHS). This is the computationally preferred association when n is large. Observe that the direct sensitivity in (6) and adjoint sensitivity in (7) are just two different ways to evaluate the same expression (5). The choice of using one or the other is only a matter of computational efficiency.
2.2
Sensitivities of dynamic systems
The same methods can be used to compute sensitivities of dynamic system performance with respect to the parameters. Without loss of generality, we assume the underlying dynamic system is described by a system of differentialalgebraic equations with F (·) ∈ RM : ˙ t, p) = 0 F (x, x,
Assuming appropriate boundary conditions, the solution of this linear system can be expressed in linear operator form ∂x ∂F =L − (12) ∂p ∂p The sensitivities expressed as inner product in (10) can therefore be also expressed using linear operators ∂F ∂F ∗ ∂G ∂φ ∂G = L − (t) = − ,L (t) , ∂p ∂p ∂x ∂p ∂x (13) Here L∗ represents the adjoint operator of L and represents the solution of a related (adjoint) linear DAE system with time-varying coefficients, with appropriately chosen boundary conditions. The traditional dynamic adjoint sensitivity computation methods concentrate on formulating this adjoint DAE, often in circuit form and solving it. Instead, the method in this paper works directly with the discretized form of these operators and thus avoids most theoretical and practical complications associated with formulating and solving the adjoint system. In order to compute the dynamic response of the system, a time-marching method is usually used. Backward Differentiation Formula (BDF) [9, 10] methods are a family of time-marching method which is particularly useful in solving stiff DAE systems. The essence of the BDF method is to construct a polynomial interpolating the solutions at the current time point tn and a subset of past time points (tn−1 , ... tn−k ), as well as the derivative at the current time point. When constant time step is used, the interpolating polynomial can be shown as [9]: hβ0 x˙ =
0
are the solutions of the linear DAE system The functions ∂x ∂p with time-varying coefficients obtained from differentiating the system in (8) ∂F ∂x ∂F ∂ x˙ ∂F + =− ∂x ∂p ∂ x˙ ∂p ∂p
(11)
(14)
where h is the step size, β0 and αj are coefficients. For example, the simplest BDF method (k = 1) or BDF1 is also known as the backward Euler’s method, which can be expressed as: hx˙ n = xn − xn−1 . After substituting the approximate of the time derivative into the nonlinear DAE in (8), we have the following system of implicit nonlinear equations:
(8)
The sensitivity of the performance with respect to a parameter can be expressed in the form of an inner product in a linear function space Z tD ∂G ∂x ∂G ∂φ ∂x = (t) (t)dτ = (t), (t) (10) ∂p ∂x ∂p ∂p ∂x 0
αj xn−j
j=0
M
where x ∈ R represents the state of the dynamic system; x˙ represents its time derivative. The performances of interest of the dynamic system are typically defined as functions of the time dependent state variables, and will implicitly depend on the same parameters, e.g., Z tD φ(p) = G(x(t, p))dτ (9)
k X
F (xn ,
k 1 X αj xn−j , tn , p) = 0 β0 h j=0
(15)
At time point tn , the past values of the state variables, xn−j , j = 1, . . . , k are known, and the present state, xn , is the solution of the above system of nonlinear equations. After the successful solve of the associated nonlinear equations, we have: ∂F ∂F ∂F α0 = + ∂xn ∂x ∂ x˙ β0 h
(16)
∂F ∂F α1 = ∂xn−1 ∂ x˙ β0 h
(17)
and
Expressions for other terms can be derived similarly. The BDF method determines a temporal discretization of the continuous response of the dynamic system. The discretization results in a sequence of time points {t0 , t1 , t2 , . . . , tD }. and the discretized solution consists of the concatenation of
the state variables that satisfying the system (15) at each time point. X = [x0 , x1 , x2 , . . . , xD ]T
(18)
which is of size n = D · M , because the size of each x is M . Similarly we can define the nonlinear equation for the dynamic system unrolled in time as: F (x0 , x˙ 0 , t0 , p) F (x1 , x˙ 1 , t1 , p) Ψ(X) = . (19) =0 .. F (xtD , x˙ tD , tD , p) which is also a vector of size n = D · M . A common form of performance is the difference between the simulated output at a particular point with respect to the given observations or desirable behavior within a time interval. If we are interested the system output at node j and we use L1 norm, the performance is defined as: Z tD φ(x) = |xj (τ ) − x ˆj (τ )| dτ (20) t0
After discretization, the L1 norm in (20) becomes: φ≈
D X
|xj (k) − x ˆj (k)| · h
(21)
k=0
Or in the matrix form: ˆ φ = LT · X − X
(22)
where L is the selection matrix. Through the process of discretization and time-unrolling, the dynamic system of equation is just a special case of the nonlinear system shown in (1) – (2) and the direct and adjoint sensitivity formulas are defined by (6) and (7) respectively. In the direct sensitivity formula in (6) we need to solve the linear system ∂Ψ ∂Ψ θ= (23) ∂X ∂p where the matrix has a block lower diagonal (banded) form, with each block the Jacobian at a particular time point: ∂F 0
∂Ψ = ∂X
∂x0 ∂F1 ∂x0 ∂F2 ∂x0 ∂F3 ∂x0
.. .
∂F1 ∂x1 ∂F2 ∂x1 ∂F3 ∂x1
.. .
∂F2 ∂x2 ∂F3 ∂x2
.. .
∂F3 ∂x3
..
.
∂Fd ∂xd−1
.. . ∂Fd ∂xd
(24)
The width of the band is dependent on the order of the integration formula. The RHS is obtained by concatenating components from each time point: T ∂Ψ ∂F0 ∂F1 ∂Fd = , ,..., (25) ∂p ∂p ∂p ∂p We illustrate this structure using a fixed time-step BDF1 method. At time point t0 , the systems of equations basically says the initial condition x0 satisfies the nonlinear equation: F (x0 , p) = 0
(26)
and at every subsequent time point the nonlinear equation will only involve the present and immediately previous state variables F (xk−1 , xk , tk , p) = 0, k = 1, . . . , D
(27)
It is obvious that the matrix in (24) has the form of: ∂F 0
∂Ψ = ∂X
∂x0 ∂F1 ∂x0
.. .
∂F1 ∂x1 ∂F2 ∂x1
.. .
∂F2 ∂x2 ∂F3 ∂x2
.. .
∂F3 ∂x3
..
.
∂Fd ∂xd−1
.. . ∂Fd ∂xd
(28)
In the adjoint sensitivity formula in (7) we need to solve a system involving the transpose of the direct linear system and obtain the adjoint solution. T T ∂Ψ ∂Φ θ∗ = (29) ∂X ∂X In other words we need to solve a block upper triangular system which in the special case of using the BDF1 discretization has the following structure ∂F0 T 1 T ( ∂F ) ( ∂x0 ) ∂x0 ∂F ∂F T T ( ∂x11 ) ( ∂x12 ) ∂F2 T ∂F3 T ( ) ( ) (30) ∂x2 ∂x3 .. .. .. .. .. . . . . . d T ( ∂F ) ∂xd To solve the linear system in (23), since the LHS matrix is in lower block triangular form shown in (24), we can solve it block-wise as we solve the original system. After the nonlinear discretized system is solved at a particular time point tk , we can utilize the solution up to this point and use (24) to solve for θk , of size M × n. The updating of θk becomes an iterative accumulation process. When the simulation of the original system is completed, we effectively computed θ. The above procedure won’t work for adjoint sensitivity since the LHS shown in (30) is block upper triangular. We will have to start from the last block (corresponds to time point tD ) and solve θ ∗ as in (29) backwards to the first block. Conceptually this is equivalent to solve the system backwards in time, which is commonly known for adjoint sensitivity computation. Computationally we will have to store the Jacobian matrices at each time point as the original dynamic system is being solved forward in time. Once we have reached the last time point tD , we can construct the LHS matrix in (30). Since this LHS matrix is uniquely determined by time unrolled solution (or trajectory) X as in (18), we can also store X instead and re-evaluate the functions to compute the Jacobian matrices.
Figure 1:
Forward and backward discretization operators of the same dynamic trajectory.
However, the above discussion omits an important aspect of the system response computation. In order to compute the dynamic responses of a stiff DAE system, variable time steps have to applied, based on the estimation of Local Truncation Error (LTE). This is equivalent to applying a temporal discretization operator to the continuous trajectory. Theoretically the original system and adjoint system can have different discretization operators, which result in different time point selections, as illustrated in Fig. 1. Since effectively different discretization operators of the original and adjoint systems are simply different mappings of the same continuous trajectory. If we can ensure that the differentiation operator of the original dynamic system satisfies the error criterion of the adjoint system, we can simply reuse the stored solutions of the original system in order to compute the adjoint solutions, as illustrated in Fig. 2.
trajectory. The LTE estimation becomes: η = CBDF 2 · (tn−2 − tn−1 ) · (tn−2 − tn ) · x(3) (ζ2 )
(32)
Since we are using the same polynomial to interpolate the solutions in both forward and backward steps, we have: ||x(3) (ζ1 )|| ≈ ||x(3) (ζ2 )||
(33)
Hence the ratio of the two LTE estimates can be approximated by: ||tn − tn−1 || ||ξ|| ≈ ||η|| ||tn−2 − tn−1 ||
(34)
It has been shown in [12] that for variable step-size BDF methods to be stable, the ratio of the adjacent steps has to be bounded. Assuming the bound of two adjacent time steps is defined as ψ (a typical value is 2.0), the LTE of the ||ξ|| ≤ ψ. forward and backward steps are bounded by: ||η||
xn−1 xn−2
xn
xn+1
3.2
xn−3
Figure 2:
Illustration of forward and backward steps for a
BDF2 method.
3.
LTE OF FORWARD AND ADJOINT OPERATORS
The basic idea of BDF methods is to interpolate the current and past solutions with a polynomial, and use it to approximate the derivative. The error introduced by a BDF method is the difference between the computed solution and the true solution. Since the true solution is rarely known, the true error (or Global Error) is often difficult to compute. Usually the error is approximated by two aspects. The first aspect is the LTE, which is computed by assuming the solution is perfect up to time point tn−1 . The error introduced by one-step numerical integration method from point tn−1 to point tn is the LTE. The second aspect is the propagation of the LTE introduced at time point tn at the subsequent time points, tn+1 , tn+2 etc.
3.1
LTE of adjoint solutions
Following (16) and (17), we can write (35) as: −1 ∂Fn α0 ∂Fn α1 ∂Fn + n−1 n = − ∂x ∂ x˙ β0 hn ∂ x˙ β0 hn
(36)
or: n n−1
=−
∂Fn α0 ∂Fn + ∂x ∂ x˙ β0 hn
−1
∂Fn α1 ∂ x˙ β0 hn
(37)
For the backward step, note that we are solving the system backwards, we need to know how LTE at time point tn ( denote as ˆn ) propagate to time point tn−1 . Following (30): −T T ∂Fn+1 ∂Fn ˆn (38) ˆn−1 = − ∂xn ∂xn or:
In the forward step, the LTE is maintained by properly selecting time step sizes. Here we show that if we use the same discretization operator, the LTE is also bounded for the backward step in adjoint solve. For simplification of the notation, we assume a second-order method (BDF2) is used. The discussion can be extended to other multi-step methods. As shown in Fig. 2, during the forward step, we have solutions at time point tn−2 and tn−1 and to compute the solution at time point tn . After the convergence, the solutions at three time points, xn−2 , xn−1 and xn , determines an interpolating polynomial. Also because the solutions at three points satisfies the LTE, which is estimated as[11]: ξ = CBDF 2 · (tn − tn−1 ) · (tn − tn−2 ) · x(3) (ζ1 )
Propagation of LTE
Another aspect of the error is the propagation of the LTE to the next time point. Following the derivation in (24) , and assuming that the only error is the LTE introduced at time step tn−1 , we can see that for the forward step, the propagation of LTE at time point tn is: −1 ∂Fn ∂Fn n = − n−1 (35) ∂xn ∂xn−1
(31)
where CBDF 2 is a coefficient uniquely determined by the BDF2 method. x(3) (ζ1 ) is the estimate of the third-order derivative of the polynomial function. In the backward step, we are effectively using the same polynomial to interpolate the three solution points on the
ˆn−1 =− ˆn
∂Fn ∂Fn α0 + ∂x ∂ x˙ β0 hn
−T
∂Fn+1 α1 ∂ x˙ β0 hn
T (39)
Since ∂Fn /∂ x˙ and ∂Fn+1 /∂ x˙ are the derivative of the same nonlinear function, they are actually the same. Also note that matrix transpose doesn’t change its eigenvalues. Comparing (37) to (39), we come to the conclusion that: k
n ˆn−1 k=k k n−1 ˆn
(40)
The similar proof can be done for higher order methods.
3.3
Overall Flow
As shown in (29), we need time-unrolled Jacobian ∂Ψ/∂X in order to compute adjoint solution. Since at each time point, the Jacobian is uniquely determined by the solution, we store the solution at each time point during the forward step, to reduce the memory requirement. When we solve the adjoint solutions, the sensitivity of the performances with
respect to the solutions (∂Ψ/∂X) acts like the “excitation” to the adjoint system, although it propagates backwards in time. We also need the sensitivity of the time-unrolled nonlinear function with respect to the parameters (∂X/∂p). They can be computed analytically, or by a black-box approach such as complex-step derivation method[13]. A simplified version of the flow is summarized in Algorithm 1. Note that when carried out one time point at a time, the convolution of the adjoint solutions with the function sensitivity ∂Ψ/∂p becomes accumulation of S.
form. The integration method is BDF2. Its adjoint circuit is also an RC circuit[15]. We computed the adjoint solutions in two ways: solving the associated adjoint circuit backward in time, and using our time-unrolling method. Both adjoint solutions are plotted in Fig. 3. The differences are not distinguishable at the scale of the plot. 0.07
time−unrolled adjoint solutions adjoint circuit solutions
0.06
0.05
0.04
Algorithm 1 Time-Unrolling Adjoint Sensitivity Method 1: procedure AdjSens(Φ, p, , tD , tinit ) . Φ: performance, p:parameter, :tol, tD :t-end, tinit :init-step 2: t←0 3: ← /ψ . Adjust LTE tolerance 4: tstep ← tinit 5: repeat . Forward step: simplified 6: t ← t + tstep 7: Solve system at time point t: X(t) 8: Store solution X(t) 9: Determine next time step tstep based on 10: until t == tD 11: t ← tD . Backward Step 12: J ← Jacobian at t ∂Φ 13: r ← ∂X 14: YA ← [J ]−T · r . Solve adjoint solutions 15: Evaluate function sensitivity at t: ∂F ∂p 16: S ← ∂F · YA ∂p 17: YA0 ← YA 18: repeat 19: t ← t − tstep . Use stored trajectory 20: Retrieve X(t) 21: J ← Jacobian at t ∂Φ 22: r ← ∂X T 23: r ← r − Jprev · YA0 . For off-diagonal blocks −T 24: YA ← [J ] ·r 25: Evaluate function sensitivity at t: ∂F ∂p 26: S ← S + ∂F · YA ∂p 27: YA0 ← YA . Store for next time point 28: until t == 0 29: return −S + ∂Φ ∂p 30: end procedure
3.4
Application to PDE
Nonlinear PDE can be converted to DAE by applying spatial discretization methods such as method-of-lines[14]. Even for more general numerical integration methods, if the spatial discretization does not change, numerically the timeunrolling method can still be used. However, if the spatial computational grids have to be adjusted for adjoint solve, then the time-unrolling method cannot be directly applied.
4. 4.1
EXPERIMENTAL RESULTS A simple RC example
The first example is a simple RC segment with a unit resistor and a unit capacitor[15]. The input excitation is a sine-square waveform. The performance is defined as the output voltage at the capacitor compared with a given wave-
0.03
0.02
0.01
0 0
Figure 3:
5
10
15
Adjoint solutions from time-unrolling method and
from adjoint circuit. They are indistinguishable at the scale of the plot.
4.2
Clock tree example
This example is a three-level clock tree in a 35nm CMOS technology. The devices are modeled by PSP[16]. Nonlinear solve is done by Newton’s method and the integration is implemented by BDF methods. The topology of the clock tree is shown in Fig. 4, which is an H-tree. The perforQ1, h1 Q2, h2 4
6
Qj, hj
3
2 1
7
5
observation gauge
Figure 4:
LEFT: topology of the clock tree.
Individual
buffers are labeled. RIGHT: illustration of an open channel segment for parameter tuning.
mance is the skew between clock output signals at buffer number 6 and at buffer number 5. We are interested in the sensitivities of the defined clock skew with respect to the “driving strength” of each clock buffer, therefore we assume all transistors in each buffer are subject to a common perturbation parameter. The computed sensitivities using the time-unrolling adjoint methods are listed in Table 1. As a comparison, we also listed the sensitivities computed by the direct method, as well as finite-difference method. Note that there are slight differences between direct and adjoint method, which are due to numerical noise. The finite-difference results are computed by slightly change the parameter of one buffer and run the full transient simulation. The finite-difference results are different from adjoint (and direct) results, this is because the Newton’s iteration converged to different solutions, comparing to the original system. As expected, buffers 6 and 5 have the most influence on the skew. The upper level buffer 2 and 3 also have considerable sensitivity, while the top level buffer 1 actually have no influence on the clock skew.
buffer # 1 2 3 4 5 6 7
Table 1:
adjoint 0.0002 -2.6440 2.6439 0.0401 -3.1961 2.8171 -0.0401
direct -0.0001 -2.6930 2.6927 0.0408 -3.1933 2.8131 -0.0408
finite-diff -0.0001 -3.5888 3.7811 0.0563 -4.2821 3.9274 -0.0562
Sensitivity of the skew between buffers 6 and 5
with respect to the “driving strength” of each buffer.
4.3
Fluid dynamics example
This example is a river segment which is modeled as a 1D open channel flow, shown in Fig. 4. It is modeled by a set of nonlinear PDE’s called Saint-Venant equations, which is a special case of the Navier-Stokes equations[7, 17]. Two variables have to be solved at each spatial node: flow rate (Q) and depth (h). There is an observation gauge at the downstream node. Without calibration, the computed depths at the gauge often differ from the observations. The task of calibration is to tune the model parameters (the friction terms at each node) to improve the model fidelity. The adjoint system approach was considered too complex and expensive to be practical[17]. The rive channel is about 2.68 km long, modeled by 51 computational nodes. The physical data, including bathymetry and river bed slope, are obtained from land survey of a river in Southern US. The simulation in the first interval, from zero to 20.4 hour, was used for parameter tuning. The performance is computed by the depth data in the time window of [19.3, 20.4] hours. The time-unrolling method was used to compute the adjoint sensitivity of the peak flooding depth with respect to the friction terms at all 51 nodes. The tuning of parameters was done by the conjugate gradient method[18], which converged in 7 steps. To cross validate the effectiveness of the calibrated model, both the calibrated and uncalibrated model were simulated further to the 28hr time point. The results were shown in Fig. 5. For the flood peak around the 23 hour time point, compared to the observed data, the error of the calibrated model is about 0.6 inch (less than 1%), while the uncalibrated model has the error of over 2.6 feet, which is equivalent to 35.6%. 10
9 Time window for cross validation Time window for parameter tuning
8 Performance metric definition window
7
Depth (ft)
6
5
4
3
2
1
0
0
5
10 Observed
Figure 5:
15 Time (hr) Performance Validation
20
25
Uncalibrated
Depth of the uncalibrated, calibrated model, as
well the observed data for a 28 hour time window.
5.
CONCLUSION
In this paper, we proposal a time-unrolling method to compute the adjoint sensitivity. By utilizing the fitting polynomials of the numerical integration scheme of the original circuit, the method eliminates the needs to construct the adjoint circuit, hence makes it much easier to implement adjoint sensitivity method. The method can also be readily extended to other nonlinear dynamic systems, including complex fluid dynamic nonlinear PDE’s. Compared to the traditional adjoint circuit approach, the method requires more memory to store the trajectory of the dynamic system. But we believe the ease of implementation is well worth the additional memory requirement.
6.
REFERENCES
[1] R. M. Errico, “What is an adjoint model?” Bulletin of the American Meteorological Society, vol. 78, no. 11, 1997. [2] M. B. Giles and N. A. Pierce, “An introduction to the adjoint approach to design,” Flow, turbulence and combustion, vol. 65, no. 3-4, 2000. [3] J. Fishburn and A. Dunlop, “TILOS: A posynomial programming approach to transistor sizing,” in The Best of ICCAD. Springer, 2003. [4] A. R. Conn, P. K. Coulman, R. A. Haring, G. L. Morrill, and C. Visweswariah, “Optimization of custom mos circuits by transistor sizing,” in Proceedings of the IEEE/ACM Intl. Conf. on Computer-Aided Design. IEEE Computer Society, 1996. [5] S. W. Director and R. Rohrer, “The generalized adjoint network and network sensitivities,” Circuit Theory, IEEE Transactions on, vol. 16, no. 3, 1969. [6] A. Meir and J. Roychowdhury, “BLAST: efficient computation of nonlinear delay sensitivities in electronic and biological networks using barycentric lagrange enabled transient adjoint analysis,” in Proceedings of the ACM/IEEE Design Automation Conference. ACM, 2012. [7] B. F. Sanders and N. D. Katopodes, “Adjoint sensitivity analysis for shallow-water wave control,” Journal of Engineering Mechanics, vol. 126, no. 9,2000. [8] L. Petzold, S. Li, Y. Cao, and R. Serban, “Sensitivity analysis of differential-algebraic equations and partial differential equations,” Computers & chemical engineering, vol. 30, no. 10, 2006. [9] C. W. Gear, Numerical initial value problems in ordinary differential equations. Prentice-Hall, 1971. [10] ——, “The automatic integration of ordinary differential equations,” Communications of the ACM, vol. 14, no. 3, 1971. [11] U. M. Ascher and L. R. Petzold, Computer methods for ordinary differential equations and differtial-algebraic equations. Philadelphia, PA: SIAM, 1998. [12] C. Gear, H. Hsu, and L. Petzold, “Differential-algebraic equations revisited,” in Proc. ODE Meeting, Oberwolfach, West Germany, 1981. [13] R. M. Kpaqio, P. Sturdza, and J. J. Alonso, “The complex-step derivative approximation,” ACM Trans. on Math. Software (TOMS), vol. 29, no. 3, 2003. [14] W. E. Schiesser, The numerical method of lines: integration of partial differential equations. Academic Press, 1991, vol. 212. [15] X. Ye, P. Li, and F. Y. Liu, “Exact time-domain second-order adjoint-sensitivity computation for linear circuit analysis and optimization,” Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 57, no. 1, 2010. [16] G. Gildenblat, X. Li, H. Wang, W. Wu, R. Van Langevelde, A. Scholten, G. Smit, and D. Klaassen, “Introduction to PSP MOSFET model,” in Proc. the MSM 2005 Int. Conf., Nanotech 2005, 2005. [17] B. F. Sanders and N. D. Katopodes, “Control of canal flow by adjoint sensitivity method,” J. of Irrigation and Drainage Engineering, vol. 125, no. 5, 1999. [18] J. R. Shewchuk, “An introduction to the conjugate gradient method without the agonizing pain,” 1994.