Approximate Solutions to a Nonlinear Abel Equation

L. Edmonds 5/11/2018

Approximate Solutions to a Nonlinear Abel Equation Derived from a Variational Method Abstract A variational method is derived for an Abel equation having the form

U ( x)

d U ( x)   ( x)U ( x )   ( x ) . dx

The variational method is used to construct approximate solutions. Examples and numerical routines are given. Two kinds of error estimates are also given. One is a conservative estimate of function-value error and the other is the right-side error.

1. Introduction An example of an Abel equation of the second kind [1] can be expressed in the form

U ( x)

d U ( x)   ( x)U ( x )   ( x ) dx

(1.1)

where α and β are given real-valued functions of a real variable x, and U is the unknown to be solved subject to some endpoint condition. A variety of applications in science and engineering encounter equations of the form of (1.1). For example, a variety of applications encounter second-order equations of the type

d 2x

dx   ( x )   ( x) . dt dt 2

(1.2)

A more specific example of an application that encounters an equation having the form of (1.2) is the classical mechanics problem of the one-dimensional motion of a particle subjected to a position-dependent damping coefficient (proportional to α) in addition to a position-dependent force (proportional to β).1 If we define V ≡ dx/dt and use the chain rule to get dV/dt = VdV/dx, then (1.2) has the structure of (1.1). Another application that encountered such an equation is the study of drift-diffusion of charge carriers in a semiconductor [2]. Therefore, if we are able to solve (1.1) then we are able to solve a number of problems in science and engineering. The goal of this work is to derive a variational principle for constructing approximate solutions of (1.1). The analysis in this report applies if either α(x) ≥ 0 for all x, or α(x) ≤ 0 for all x.

1

Incidentally, while many problems in classical mechanics can be solved using a variational principle utilizing a Lagrangian satisfying Euler’s equation, a Lagrangian satisfying Euler’s equation cannot be constructed that includes damping forces. An alternate variational method applicable to (1.2), after the equation has been converted into (1.1), is derived in this report.

1

2. The Complete Statement of the Problem The statement of the problem considered here is made complete by listing the assumptions that accompany (1.1). An interval on the x-axis denoted [x1, x2] with x2 > x1 is given,2 an endpoint value denoted A (a real number) is given, and the functions α and β (continuous real-valued functions of a real variable) are given. The analysis in this report applies to the case in which α does not change sign. It is either non-negative on the entire interval [x1, x2], or it is non-positive on the entire interval [x1, x2]. We denote this condition as

Either   0 or   0

(2.1)

with the understanding that “α ≥ 0” is abbreviated notation for α(x) ≥ 0 for all x  [x1, x2], and “α ≤ 0” is abbreviated notation for α(x) ≤ 0 for all x  [x1, x2]. Two of the conditions that define the problem considered are

U ( x)

d U ( x )   ( x )U ( x )   ( x ) at each x  ( x1, x2 ) dx

U ( x1)  A .

(2.2a) (2.2b)

Another imposed condition warrants some discussion. Many (albeit not all) equations encountered in science and engineering have the property that there exists a unique solution. We can therefore expect the analysis given here to have some practical applications even if we impose a constraint that insures that solutions are unique when they exist. It is shown in Section 4 that solutions are unique (when they exist) if we stipulate that it be a “positive solution”, which is defined by the property that U(x) > 0 for all x  [x1, x2]. A similar analysis will show that solutions are also unique (when they exist) if we stipulate that it be a “negative solution”, which is defined by the property that U(x) < 0 for all x  [x1, x2].3 Both cases, positive and negative solutions, can be treated by an analysis of positive solutions because if U is a positive solution to (2.2a) then ‒ U is a negative solution to the equation obtained by replacing α with ‒ α in (2.2a) (the theory allows α to have either sign as long as it does not change sign as previously explained). It is therefore sufficiently general to confine our attention to positive solutions, so another assumption that defines the class of problems considered is U(x) is real and positive at each x



[x1, x2].

(2.2c)

Note that (2.2b) and (2.2c) imply that A > 0. The question of existence of a solution to (2.2) is difficult. Depending on the functions α and β, and the endpoint value A, it might not be possible to allow x2 to be arbitrarily large and still satisfy (2.2c). Some restriction might have to be imposed on x2 in order to satisfy (2.2c). The 2

In the notation used here, (x1, x2) is an open interval that includes all numbers between x1 and x2 but does not include x1 or x2. [x1, x2) is a half-open interval that includes x1 but not x2, while [x1, x2] is a closed interval that includes both x1 and x2. The symbol “  ” refers to an element of a set so the statement x  (x1, x2) is equivalent to x1 < x < x2, while x  [x1, x2) is equivalent to x1 ≤ x < x2, and x  [x1, x2] is equivalent to x1 ≤ x ≤ x2. 3 In contrast if we do not impose either the positive requirement or the negative requirement, we can find examples in which solutions exist but are not unique [2].

2

difficult topic of existence is avoided by taking existence as given. The last assumption that defines the class of problems considered is that the functions α and β, the value A, and the interval [x1, x2] are such that (2.2) has a solution. A necessary condition for the existence of a solution U to (2.2) when α ≥ 0 is given at the end of Section 3. The problem considered is called the forward problem here because an endpoint value is assigned to U at x = x1 and the goal is to solve for U(x) when x > x1. The reverse problem assigns an endpoint value to U at x = x2 and the goal is to solve for U(x) when x < x2. A reverse problem can be converted into a forward problem by using a coordinate transformation that reverses the direction of the x-axis. However it is easy to show that this coordinate transformation also reverses the sign of α. Because α is allowed to have either sign, the analysis given here of the forward problem can also be applied (when combined with a coordinate transformation) to the reverse problem.

3. Inequalities Inequalities derived in this section will be used in later sections to prove uniqueness of solutions and a variational principle. Before deriving the inequalities it is necessary to first define an admissible function. Select an X  [x1, x2]. A function γ is called “admissible on the interval [x1, X]” 4 if (and only if) γ is differentiable on [x1, X] (3.1a) γ(x) is real and positive at each x  [x1, X].

(3.1b)

The derivative at an endpoint of a closed interval is defined here to be the one-sided limit of the derivative, so differentiability on a closed interval requires that the one-sided limits exist.5 The set of all functions that are admissible on the interval [x1, X] is denoted A(X). An inequality is derived as follows. Let U be a solution to (2.2) and let γ  A(X). Note that

[U ( x )   ( x )]2  0 at each x  [ x1, X ] 2 ( x)

(3.2)

which gives

[U ( x )   ( x )]2  0 at each x  [ x1, X ] if   0 2 ( x )

(3.3a)

[U ( x )   ( x )]2  0 at each x  [ x1, X ] if   0 . 2 ( x )

(3.3b)

 ( x)

 ( x)

Expanding the squares on the left of (3.3) and rearranging terms gives

4

What is being called an interval is actually a single point if X = x1. The derivative is not required to be defined if X = x1 because all conclusions in this report are trivially correct for that case. 5

3

1  ( x) 2 1 U ( x )   ( x ) ( x ) at each x  [ x1, X ] if   0 2  ( x) 2

(3.4a)

1  ( x) 2 1 U ( x )   ( x ) ( x ) at each x  [ x1, X ] if   0 . 2  ( x) 2

(3.4b)

 ( x )U ( x )   ( x )U ( x ) 

The conclusion to be derived in this section is trivial if X = x1 so we consider the case in which X > x1. Using (3.4a), applicable when α ≥ 0, to substitute for αU on the right side of (2.2a), while expressing the left side of (2.2a) in terms of the derivative of U2 gives

d 2  ( x) 2 U ( x)  U ( x )   ( x ) ( x )  2  ( x ) at each x  ( x1, X ) dx  ( x) which can also be written  x  ( )   d  2 d  U ( x ) exp   x1 dx   ( )     x  ( )  exp    d  [ ( x ) ( x )  2  ( x )] at each x  ( x1, X ) when   0 .  x1  ( ) 

Integrating the above inequality while using U(x1) = A produces  x  ( )  U 2 ( x )  A2 exp   d    x1  ( )  x

 x  ( )  d [ ( ) ( )  2  ( )] d  ( ) 

x1 exp  

when   0 .

(3.5a)

when   0 .

(3.5b)

The same steps used with (3.4b) produce  x  ( )  U 2 ( x )  A2 exp   d    x1  ( )   x  ( )  exp x1    ( ) d  [ ( ) ( )  2 ( )] d x

The inequalities in (3.5) apply to any x  [x1, X] when γ  A(X). In particular, evaluating them at x = X gives

4

 X  ( x)  U 2 ( X )  A2 exp   dx    x1  ( x )  X  X  ( )  x1 exp  x  ( ) d  [ ( x) ( x)  2 ( x)] dx  X  ( x)  U 2 ( X )  A2 exp   dx    x1  ( x )  X  X  ( )  x1 exp  x  ( ) d  [ ( x) ( x)  2 ( x)] dx

when   A ( X ) and   0 .

(3.6a)

when   A ( X ) and   0 .

(3.6b)

Note that if γ(x) = U(x) for each x  [x1, X] then the equality applies in (3.2). A review of the derivation of (3.6) shows that this implies the equality in (3.6), which gives X  X  ( x)   X  ( )  U 2 ( X )  A2 exp   dx    exp   d   [ ( x )U ( x )  2  ( x )] dx .  x1 U ( x )  x1  x U ( ) 

(3.7)

A variety of inequalities equivalent to (3.6) can be derived by utilizing integrable combinations. A specific example is derived by starting with the identity (applies to an arbitrary function f  A(X)) given by  X  ( )  2   X  ( )   d  d f ( x)  d   f ( x )   exp   d   2 f ( x )   ( x) f ( x) . exp  x dx  f ( )  dx    x f ( )   

Integrating the above from x1 to X and subtracting the right side from both sides gives the identity  X  ( )  2 f 2 ( X )  exp   d   f ( x1)   x1 f ( )  X

 X  ( )   d f ( x)  d  2 f ( x)   ( x ) f ( x )  dx  0 . f ( )   dx 

x1 exp  x

(3.8)

Incidentally, (3.8) provides an alternate derivation of (3.7) by letting f = U and then using (2.2a) to substitute for the square bracket in (3.8). To obtain other inequalities that are equivalent to (3.6) we let f = γ and use (3.8) to write the right sides of (3.6) as X  X  ( x)   X  ( )  A2 exp   dx    exp   d   [ ( x ) ( x )  2  ( x )] dx   x1  ( x )  x1  x  ( )   X  ( x)   2 ( X )   A2   2 ( x1)  exp   dx     x1  ( x )  X  X  ( )   d  ( x)  2  exp   d    ( x )   ( x ) ( x )   ( x)  dx . x1 x  ( )   dx  

5

(3.9)

Using (3.9) allows (3.6) to be rewritten as  X  ( x)  U 2 ( X )   2 ( X )   A2   2 ( x1)  exp   dx     x1  ( x )  X  X  ( )   d  ( x)  2  exp   d    ( x )   ( x ) ( x )   ( x )  dx x1 x  (  ) dx   

when   0

(3.10a)

when   0

(3.10b)

 X  ( x)  U 2 ( X )   2 ( X )   A2   2 ( x1)  exp   dx     x1  ( x )  X  X  ( )   d  ( x)  2  exp   d    ( x )   ( x ) ( x )   ( x )  dx x1 x  ( )   dx  

The above derivation of (3.10) is useful for showing, via (3.9), that (3.10) is equivalent to (3.6) in the sense that the right sides of (3.10) are equal to the right sides of (3.6). However, an alternate derivation of (3.10) is useful for deriving additional inequalities. This derivation starts with the differential identity (applies to an arbitrary function f  A(X) and an arbitrary function g  A(X)) given by   X  ( )   2 d  d  f ( x)  g 2 ( x)   exp  x  dx  f ( )   

 X  ( )   d f ( x) d g ( x)  ( x) 2  exp   d  2 f ( x)   ( x) f ( x)  2 g ( x)  g ( x) . dx dx f ( x)  x f ( )   

Rearranging terms on the right converts this into   X  ( )   2 d  d  f ( x)  g 2 ( x)   exp  x  dx  f ( )     X  ( )   d f ( x) d g ( x)    2 exp   d    f ( x )   ( x) f ( x)    g ( x)   ( x) g ( x)   x f ( )   dx dx       ( x)  f ( x )  g ( x )2  . 2 f ( x) 

The sign of the far right term in the large square bracket on the right depends on the sign of α and this produces the inequalities

6

  X  ( )   2 d  d  f ( x)  g 2 ( x)   exp  x  dx  f ( )   

 X  ( )   d f ( x) d g ( x)    2 exp   d    f ( x )   ( x) f ( x)    g ( x)   ( x ) g ( x )   (  0) (3.11a) dx dx     x f ( )     X  ( )   2 d  d  f ( x)  g 2 ( x)   exp  x  dx  f ( )   

 X  ( )   d f ( x) d g ( x)    2 exp   d    f ( x )   ( x) f ( x)    g ( x)   ( x ) g ( x )   (  0) .(3.11b) dx dx     x f ( )  

Integrating (3.11) gives  X  ( x)   2 f 2 ( X )  g 2 ( X )  exp   dx  f ( x1)  g 2 ( x1)    x 1 f ( x)    X  X  ( )   d f ( x) d g ( x)    2  exp   d    f ( x )   ( x) f ( x)    g ( x)   ( x ) g ( x )   dx x1 x f ( )   dx dx     (when   0) (3.12a)  X  ( x)   2 f 2 ( X )  g 2 ( X )  exp   dx  f ( x1)  g 2 ( x1)    x 1 f ( x )   X  X  ( )   d f ( x) d g ( x)    2  exp   d    f ( x )   ( x) f ( x)    g ( x)   ( x ) g ( x )   dx x1 dx dx     x f ( )   (when   0) . (3.12b)

We can reproduce (3.10) by using (3.12) with f = γ and g = U and then using (2.2). Another inequality can be obtained by using (3.12) with g = γ and f = U and then using (2.2). The result is  X  ( x)  U 2 ( X )   2 ( X )   A2   2 ( x1)  exp   dx     x1 U ( x )  X  X  ( )  d  ( x)  2  exp   d    ( x )   ( x ) ( x)   ( x)  dx when   0 x1 dx   x U ( )  

(3.13a)

 X  ( x)  U 2 ( X )   2 ( X )   A2   2 ( x1)  exp   dx     x1 U ( x )  X  X  ( )  d  ( x)  2  exp   d    ( x )   ( x ) ( x)   ( x)  dx when   0 x1 x U ( )   dx  

7

(3.13b)

The inequalities (3.6), (3.10), and (3.13) apply to any γ  A(X). It is evident from (3.6a) (or, equivalently, (3.10a)) that a necessary condition for the existence of a solution U to (2.2) when α ≥ 0 is that the right side of (3.6a) (or, equivalently, (3.10a)) be positive for every γ  A(X). Conversely, if α ≥ 0 and we are able to find any γ  A(X) that produces a negative or zero value for the right side of (3.6a) (or, equivalently, (3.10a)), then there is no solution U to (2.2).

4. Uniqueness of Solutions Given that there exists a solution U to (2.2), suppose some function γ is also a solution. By assumption we have γ  A(x2) which implies γ  A(X) for each X  [x1, x2] so (3.10) and (3.13) apply to each X  [x1, x2]. Because γ satisfies (2.2), (3.10a) reduces to U2(X) ≤ γ2(X) when α ≥ 0, while (3.13a) reduces to U2(X) ≥ γ2(X) when α ≥ 0, implying that U2(X) = γ2(X) when α ≥ 0. This conclusion together with the fact that U(X) and γ(X) are positive implies that U(X) = γ(X) when α ≥ 0. Similar arguments utilizing (3.10b) and (3.13b) conclude that U(X) = γ(X) when α ≤ 0. These conclusions apply to arbitrary X  [x1, x2] which implies uniqueness of solutions if either α ≥ 0 or α ≤ 0.

5. A Variational Principle Recall from Section 3 that the equalities in (3.6) apply when γ = U. The inequalities for arbitrary γ  A(X) together with the equalities when γ = U imply that, for any X  [x1, x2], we have   X  ( x)  U 2 ( X )  min  A2 exp   dx    A ( X )   x1  ( x )   X  X  ( )  x1 exp  x  ( ) d  [ ( x) ( x)  2 ( x)] dx  when   0 (5.1a)   X  ( x)  U 2 ( X )  max  A2 exp   dx    A ( X )   x1  ( x )   X  X  ( )  x1 exp  x  ( ) d  [ ( x) ( x)  2 ( x)] dx  when   0 . (5.1b)

The function γ that minimizes the curly bracket in (5.1) when α ≥ 0 is U, and the function γ that maximizes the curly bracket in (5.1) when α ≤ 0 is U. Alternate expressions are obtained by combining (5.1) with (3.9) to get   X  ( x)  U 2 ( X )  min  2 ( X )   A2   2 ( x1)  exp   dx     A ( X )   x1  ( x )  X  X  ( )   d  ( x)   2  exp   d    ( x )   ( x ) ( x )   ( x )  dx  when   0 (5.2a) x1 dx    x  ( ) 

8

  X  ( x)  U 2 ( X )  max  2 ( X )   A2   2 ( x1)  exp   dx     A ( X )   x1  ( x )  X  X  ( )   d  ( x)   2  exp   d    ( x )   ( x ) ( x )   ( x )  dx  when   0 (5.2b) x1 dx    x  ( ) 

Again, the function γ that minimizes the curly bracket in (5.2) when α ≥ 0 is U, and the function γ that maximizes the curly bracket in (5.2) when α ≤ 0 is U.

6. A Conservative Estimate of Function Value Error When α ≥ 0 An error estimate derived here applies to the case in which α ≥ 0 and an upper bound estimate, denoted UUB, for U has been found and the goal is to estimate the error when using this upper bound as an approximation for U. One way to obtain an upper bound that has a reasonable chance of being a fairly good approximation for U is to select some convenient γ  A(X) that is suspected of being a rough (or better) approximation for U and define UUB to be the positive square root of the right side of (3.5a). We conclude from (3.5a) that this choice will give

U ( x)  UUB ( x) at each x [ x1, X ] .

(6.1)

The analysis in this section does not require that UUB be calculated from the right side of (3.5a) but does require that it satisfy (6.1). A conservative estimate of the error in the approximation U(X) ≈ UUB(X) is derived by bracketing U between the upper bound UUB and a lower bound that is constructed as follows. It follows from (6.1), together with U being positive (implying that UUB is positive), that

U ( x) 

U 2 ( x) UUB ( x )

and combining this with (2.2a) while using α ≥ 0 gives U ( x)

d  ( x) 2 U ( x)  U ( x)   ( x) dx UUB ( x )

which can be rewritten as d 2  ( x) 2 U ( x)  2 U ( x)  2 ( x) . dx UUB ( x )

Multiplying both sides by a common factor gives x  ( ) x  ( )   d    ( x) 2  exp  2  d   U 2 ( x)  2 U ( x )   2 exp  2  d   ( x) . x1 U ( ) x1 U ( ) UUB ( x ) UB UB    dx   

Rewrite the left side to get 9

x  ( ) x  ( )     d  2 d     2 exp  2  d   ( x) U ( x ) exp  2 x1 x1 U ( ) dx  UUB ( )   UB   

and integrate from x1 to X while using U(x1) = A to get

U ( X )  U LB ( X )

(6.2)

where the lower bound ULB is defined to be the positive quantity satisfying  X x  ( )  X  ( x)    U LB 2 ( X )  exp  2  dx   A2  2  exp  2  d    ( x ) dx  . x1 x1 U ( ) UB  x1 UUB ( x )     

(6.3)

The inequalities ULB ≤ U ≤ UUB can be manipulated into 0  UUB 2 ( X )  U 2 ( X )   FV ( X )

(6.4)

 FV ( X )  UUB 2 ( X )  U LB 2 ( X )

(6.5)

where ΔFV(X) is defined by

and is calculated for a given UUB by using (6.3). Note that (6.4) interprets ΔFV(X) as a conservative estimate of the error obtained by approximating U2(X) with UUB2(X). The subscript FV used with ΔFV(X) indicates that the error refers to error in the function value, with the function being U2. By using integrable combinations derived from differential identities (similar to the steps used to obtain (3.9)), together with (6.3) and (6.5), it can be shown that an equation for calculating ΔFV(X) that is an alternative to (6.5) is  X  ( x)   FV ( X )  UUB 2 ( x1)  A2  exp  2  dx     x1 UUB ( x )  X  X  ( )  d UUB ( x )  2  exp  2  d   UUB ( x )   ( x )UUB ( x )   ( x )  dx . (6.6) x1 dx   x UUB ( )  

7. A Conservative Estimate of Function Value Error When α ≤ 0 An error estimate derived here applies to the case in which α ≤ 0 and a lower bound estimate, denoted ULB, for U has been found and the goal is to estimate the error when using this lower bound as an approximation for U. One way to obtain a lower bound that has a reasonable chance of being a fairly good approximation for U is to select some convenient γ  A(X) that is suspected of being a rough (or better) approximation for U and define ULB to be the positive square root of the right side of (3.5b). We conclude from (3.5b) that this choice will give

U ( x)  U LB ( x) at each x [ x1, X ] .

10

(7.1)

The analysis in this section does not require that ULB be calculated from the right side of (3.5b) but does require that it satisfy (7.1). A conservative estimate of the error in the approximation U(X) ≈ ULB(X) is derived by bracketing U between the lower bound ULB and an upper bound that is constructed as follows. Using (7.1) together with α ≤ 0 gives

 ( x)U ( x)   ( x)U LB ( x) and combining this with (2.2a) gives

U ( x)

d U ( x )   ( x )U LB ( x )   ( x ) . dx

Integrating the above while using U(x1) = A gives

U ( X )  UUB ( X )

(7.2)

where the upper bound UUB is the positive quantity satisfying X UUB 2 ( X )  A2  2   ( x )U LB ( x )   ( x ) dx . x1

(7.3)

The inequalities ULB ≤ U ≤ UUB can be manipulated into 0  U 2 ( X )  U LB 2 ( X )   FV ( X )

(7.4)

 FV ( X )  UUB 2 ( X )  U LB 2 ( X )

(7.5)

where ΔFV(X) is defined by

and is calculated for a given ULB by using (7.3). Note that (7.4) interprets ΔFV(X) as a conservative estimate of the error obtained by approximating U2(X) with ULB2(X). The subscript FV used with ΔFV(X) indicates that the error refers to error in the function value, with the function being U2.

8. The Best Available Estimate An available estimate refers to a family of functions that a trial function γ, intended to be an approximation for U, is selected from. The user selects this family, probably for computational convenience, but the choice for whatever reason is made by the user. We indicate that γ is from a user-selected family having n adjustable parameters by writing γ(x, k1, k2,…, kn) instead of writing γ(x), where the parameters k1, k2,…, kn identify which element of the userselected family that the function γ(*, k1, k2,…, kn) is.6 We shorten the notation by writing γ(*, k) with the understanding that γ(*, k) is abbreviated notation for γ(*, k1, k2,…, kn). At any point of evaluation X  [x1, x2] at which U2 is to be estimated, the theory will require that γ(*, k)  A(X) 6

The notation γ(*, k1, k2,…, kn) denotes a function of a single variable. It is the function that γ(x, k1, k2,…, kn) is of x, with the function itself dependent on the values given to k1, k2,…, kn.

11

(recall that A(X) is the set of admissible functions explained in Section 2). This will typically impose some constraint on the allowed values of k1, k2,…, kn. After the user selects a family of trial functions, the user is then required to determine the set of n-tuples, denoted K(X), having the property that γ(*, k)  A(X) if and only if (k1, k2,…, kn)  K(X). (See the Appendix A for an example family of trial functions, and Appendix B for example applications.) We shorten the notation by writing this condition as k  K(X). The best available estimate of U2 from the user-selected family of trial functions is defined to be the minimum (if α ≥ 0) or maximum (if α ≤ 0) given by (5.1) but with the minimum or maximum over A(X) replaced by the minimum or maximum of that subset of A(X) that consists of the user-selected family of trial functions. Therefore the best available estimate, denoted UBE2, is given by   X  ( x)  U BE 2 ( X )  min  A2 exp   dx   kK ( X )   x1  ( x, k )   X  X  ( )  x1 exp  x  ( , k ) d  [ ( x) ( x, k )  2 ( x)] dx  when   0 (8.1a)   X  ( x)  U BE 2 ( X )  max  A2 exp   dx   kK ( X )   x1  ( x, k )   X  X  ( )  exp d  [  ( x )  ( x , k )  2  ( x )] dx  when   0 . (8.1b)   x1  x  ( , k )  

Alternate expressions are obtained by combining (8.1) with (3.9) to get   X  ( x)  U BE 2 ( X )  min  2 ( X , k )   A2   2 ( x1, k )  exp   dx    kK ( X )   x1  ( x, k )  X  X  ( )  d  ( x, k )   2  exp   d    ( x, k )   ( x ) ( x, k )   ( x )  dx  when   0 (8.2a) x1 dx    x  ( , k )     X  ( x)  U BE 2 ( X )  max  2 ( X , k )   A2   2 ( x1, k )  exp   dx    kK ( X )   x1  ( x, k )  X  X  ( )  d  ( x, k )   2  exp   d    ( x, k )   ( x ) ( x, k )   ( x )  dx  when   0 . (8.2b) x1 dx    x  ( , k )  

The minimum in (8.1a) is over a subset of the set that the minimum in (5.1a) is over so U 2 ( X )  U BE 2 ( X ) when   0 .

(8.3a)

U 2 ( X )  U BE 2 ( X ) when   0 .

(8.3b)

Similarly,

12

It is evident from (8.1) that the best available estimate satisfies the endpoint condition

U BE ( x1)  A .

(8.4)

A review of the derivation of the inequality (3.6a) will show that if α > 0 7 and if X > x1, then the strict inequality in (3.6a) applies unless γ(x) = U(x) for all x  [x1, X]. This implies the strict inequality in (8.3a). In other words, if α > 0 and X > x1, and the user-selected family of trial functions does not contain the solution U, then we have the strict inequality UBE(X) < U(X). Similar arguments conclude that if α < 0 and X > x1, and the user-selected family of trial functions does not contain the solution U, then we have the strict inequality UBE(X) > U(X). However, an interesting special case is that in which α = 0. We can see by comparing (8.1) to the integral of (2.2a) that any user-selected family of trial functions that are admissible functions will produce U(X) = UBE(X) when α = 0.

9. Right-Side Error of the Best Available Estimate When U is the quantity to be solved and Uapprox is some approximation for U, an error measure that is sometimes of interest is the difference between the differential equation satisfied by U and the exact differential equation satisfied by Uapprox. To be more specific, recall that U satisfies d U ( x ) U ( x )   ( x )U ( x )   ( x ) . dx If Uapprox exactly satisfies

U approx ( x )

d U approx ( x )   ( x )U approx ( x )   ( x )   RS ( x ) dx

for some function ΔRS(x), then the right-side error is defined here to be ΔRS(x). Continuing with the terminology and notation of Section 8, the goal of this section is to derive the right-side error when the approximating function is UBE. Note that UBE satisfies the endpoint condition (8.4) so the right-side error completely controls the accuracy of approximating U with UBE. For any given k  K(X) we define the estimate denoted UE to be the positive quantity given by X  X  ( x)   X  ( )  U E 2 ( X , k )  A2 exp   dx    exp   d   [ ( x ) ( x, k )  2  ( x )] dx . (9.1a)  x1  ( x, k )  x1  x  ( , k ) 

An equivalent expression obtained by using (9.1a) with (3.9) is

“α > 0” is abbreviated notation for α(x) > 0 for each x  [x1, x2]. “α < 0” is abbreviated notation for α(x) < 0 for each  x [x1, x2]. “α = 0” is abbreviated notation for α(x) = 0 for each x  [x1, x2]. 7

13

 X  ( x)  U E 2 ( X , k )   2 ( X , k )   A2   2 ( x1, k )  exp   dx     x1  ( x, k )  X  X  ( )  d  ( x, k )  2  exp   d    ( x, k )   ( x ) ( x, k )   ( x )  dx . (9.1b) x1 dx   x  ( , k )  

Using (9.1a) together with (8.1), or (9.1b) together with (8.2), we have

U BE 2 ( X )  min

kK ( X )

U E 2 ( X , k )

if   0

(9.2a)



if   0 .

(9.2b)



U BE 2 ( X )  max U E 2 ( X , k ) kK ( X )

Let the k’s that produce the extreme values in (9.2) be given by k1 = K1(X), …, kn = Kn(X). We abbreviate this notation by writing k = K(X) and (9.2) gives U BE 2 ( X )  U E 2 ( X , K ( X )) .

(9.3)

For any particular i = 1, …, n, we consider three possibilities for ki. The first possibility is that the user restricted the selected family of trial functions (perhaps for analytical convenience) by assigning a definite value to ki. In other words, the assigned value of ki is part of the definition of the user-defined family of trial functions. This makes Ki(X) independent of X so d Ki ( X )  0 dX

(first possibility) .

(9.4a)

The second possibility is that the minimum or maximum in (9.2) allowed ki to be varied but the minimizing or maximizing value happened to be independent of X. This also makes Ki(X) independent of X so d Ki ( X )  0 dX

(second possibility) .

(9.4b)

The third possibility is that the minimum or maximum in (9.2) allowed ki to be varied and the minimizing or maximizing value does depend on X. The theory in this section applies to the third possibility if the minimum or maximum is a relative minimum or maximum, i.e., if    2 0   k U E ( X , k )  i  k K ( X )

(third possibility) .

(9.4c)

With (9.4) describing the cases considered, we can now calculate the X derivative of UBE2(X). Applying the chain rule to (9.3) gives 14

n      d  d 2 2 U BE ( X )   U E ( X , k )   U E 2 ( X , k ) Ki ( X ) . dX  X  k  K ( X ) i 1   ki  k K ( X ) d X

Given that for each X  (x1, x2] one of the three possibilities in (9.4) apply, the above equation becomes    d (9.5) U BE 2 ( X )   U E 2 ( X , k ) for each X  ( x1, x2 ] . dX  X  k K ( X ) The next step takes the partial derivative of (9.1a) with respect to X and then uses (9.1a) to substitute for one of the terms in the derivative. The result can be written as  (X ) UE2( X , k)  U E 2 ( X , k )   ( X ) ( X , k )  2  ( X ) . X  ( X ,k)

Evaluating at k = K(X) while using (9.3) gives

   ( X ) 2  U BE 2 ( X )   ( X ) ( X , K ( X ))  2  ( X ) .   X U E ( X , k )   k  K ( X )  ( X , K ( X )) so (9.5) becomes d ( X ) U BE 2 ( X )  U BE 2 ( X )   ( X ) ( X , K ( X ))  2  ( X ) dX  ( X , K ( X )) which can be rewritten as U BE ( X )

d U BE ( X )   ( X )U BE ( X )   ( X )  dX (X ) U BE ( X )   ( X , K ( X )) 2 2  ( X , K ( X ))

for each X  ( x1, x2 ] .

Because this applies to each X  (x1, x2], we can change the notation by writing x instead of X to get d U BE ( x ) U BE ( x )   ( x )U BE ( x )   ( x )   RS ( x ) for each x  ( x1, x2 ] (9.6) dx where ΔRS(x) is the right-side error given by  RS ( x ) 

 ( x) 2  ( x, K ( x ))

U BE ( x)   ( x, K ( x)) 2 .

15

(9.7)

As previously stated, UBE satisfies the endpoint condition (8.4) so the right-side error completely controls the error between UBE and U. Recall from the last paragraph in Section 8 that if X > x1 and either α > 0 or α < 0, and if the user-selected family of trial functions does not contain the solution U, then UBE(X) ≠ U(X). However, UBE satisfies the endpoint condition (8.4) so the inequality UBE(X) ≠ U(X) implies that there is some x  (x1, X] such that ΔRS(x) ≠ 0. In summary, if X > x1 and either α > 0 or α < 0, and if the user-selected family of trial functions does not contain the solution U, then there is some x  (x1, X] such that ΔRS(x) ≠ 0.

10. A Critique of the Significance of J(X) In this section we confine our attention to those cases in which the user-selected family of trial functions satisfies the endpoint condition

 ( x1, k )  A

for each k K ( X ) and each X  ( x1, x2 ] .

(10.1)

An interesting choice for the fitting parameter k is J(X) that is selected to satisfy X

 X

x1 exp  x

  ( )   ( x, J ( X ))  d    ( x, J ( X ))   ( x ) ( x, J ( X ))   ( x )  dx  0 . (10.2)  ( , J ( X ))   x 

Note that (10.2) states that J(X) is selected to make the weighted average of the square bracket equal to zero when the weight function is proportional (to satisfy a normalization condition for a weighted average interpretation) to the exponential function in the integral. What makes this choice interesting is that (10.1), (10.2), (9.1b), and the fact that UE and γ are both positive, imply

U E ( X , J ( X ))   ( X , J ( X )) . This applies to arbitrary X  [x1, x2] so we can select an X > x1 and write the condition as

U E ( x, J ( x))   ( x, J ( x))

for each x [ x1, X ] .

(10.3)

The special condition (10.3) makes J an interesting choice for the fitting parameters and a reasonable question is whether K(x) = J(x) for each x  [x1, X]. Unfortunately, a “yes” answer applies only to special cases. In particular, if X > x1 and either α > 0 or α < 0, and if the userselected family of trial functions does not contain the solution, then there is an x  [x1, X] such that K(x) ≠ J(x). This is proven by contradiction. Assume that K(x) = J(x) for each x  [x1, X] so (9.3) gives UBE(x) = UE(x, K(x)) = UE(x, J(x)) and (10.3) gives UE(x, J(x)) = γ(x, J(x)) = γ(x, K(x)). Combining these equations gives UBE(x) = γ(x, K(x)) which implies that the right-side error in (9.7) is zero for each x  [x1, X]. This contradictions the conclusion in the last paragraph of Section 9 which states that if X > x1 and either α > 0 or α < 0, and if the user-selected family of trial functions does not contain the solution U, then there is some x  (x1, X] such that ΔRS(x) ≠ 0. The conclusion here is that the condition K(x) = J(x) applies only to special cases. A special case in which the condition K(x) = J(x) does apply is that in which the family of trial functions does contain the exact solution U. In other words, there is some k0  K(x2) such that γ(*, k0) = U. This will give K(X) = k0, which is a solution to (10.2) because the square bracket inside the integral is zero, so K(X) = J(X) when the family of trial functions contains the exact 16

solution. Also, if the family of trial functions contains the exact solution, then UBE = γ(*, K(X)) = γ(*, k0) = U and the right-side error is zero. However, while selecting k to satisfy k = J(X) does not produce the best available estimate, except for special cases, this choice might (depending on the example, but the examples in Appendix B are not such examples) make the estimate UE(X, J(X)) a good enough estimate of U(X) to satisfy the user’s accuracy requirements if the selected family of trial functions was a good choice. Accuracy tests were given in Sections 6 and 7.

17

Appendix A: Available Estimates from a Specific Family of Trial Functions Concepts such as the set K(X) might seem fairly abstract in the general context but an illustrative example might add some clarity. To avoid obscuring basic concepts with distracting details we consider a simplified version of (2.2a) in which α(x) = 1 for all x so (2.2) becomes

U ( x)

d U ( x ) U ( x )   ( x ) at each x  ( x1, x2 ) dx

U ( x1)  A U(x) is real and positive at each x

(A.1a) (A.1b)



[x1, x2].

(A.1c)

If existence of a solution to (A.1) imposes any restrictions on the interval endpoint x2 then it must be small enough for a solution to exist but is otherwise an arbitrary number that is greater than x1. The special case (A.1a) has more generality than it might appear to have. The change in variables x V ( x )  U    ( ) d   (A.2)  x1  will convert an equation for U having the structure of (A.1a) into an equation for V having the structure of (2.2a), but with the function β in (2.2a) replaced by αβ to obtain the V version of (2.2a). The equation for V will be separable if the β in (A.1a) is a constant because this makes the αβ that appears in the V version of (2.2a) proportional to α. The example family of trial functions selected for illustration here has the property that the best available estimate is the exact solution for U given by (A.1) for the special case in which the function β in (A.1a) is a constant. As explained in the previous paragraph, this solution together with the change in variables (A.2) can be used to obtain a solution to (2.2a) when (2.2a) is separable. The family γ(*, k) used here has the property that k is a real number and γ(*, k) is defined by   ( x, k )  ( x, k )   ( x, k )  k for x  ( x1, X ] when k  K ( X ) (A.3a) x

 ( x1, k )  A

for each k K ( X ) and each X  ( x1, x2 ] .

(A.3b)

An explicit expression for the set K(X) is constructed as follows. The condition k  K(X) was defined to mean that (A.4)  (*, k )  A ( X ) . First consider the case in which k > ‒ A. The right side of (A.3a) will be positive in a neighborhood of the point x1, and the coefficient γ(x, k) to the derivative on the left is also positive in a sufficiently small neighborhood of x1 (because of (A.3b)), so the derivative is positive, which implies that γ(x, k) is increasing in x, which implies that the right side of (A.3a) 18

increases and therefore remains positive (it does not change sign). Therefore, γ(x, k) is increasing in x without any restrictions on the x-interval for this case, so X can be as large as desired and still satisfy (A.4) for this case. In other words, X can be arbitrarily large and we will still have k  K(X) when k > ‒ A. Now consider the case in which k < ‒ A. The right side of (A.3a) will be negative in a neighborhood of the point x1, and the coefficient γ(x, k) to the derivative on the left is positive, so the derivative is negative, which implies that γ(x, k) is decreasing in x, which implies that the right side of (A.3a) decreases and therefore remains negative (it does not change sign). However, this conclusion applies only as long as the coefficient γ(x, k) to the derivative on the left side of (A.3a) is positive. When positive, γ(x, k) is decreasing in x for this case. There is now a restriction on how large the x-interval can be and still have γ(x, k) > 0 throughout the interval. Stated another way, for a given X, there is a restriction on how much negative k can be and still satisfy (A.4). This restriction can be found by using elementary methods to solve (A.3) to get   ( x, k )  k   ( x, k )  k ln  (when k   A) .   x  x1  A A  k   It was concluded above that γ(x, k) + k does not change sign, so it has the same sign as A + k, which allows the absolute values in the argument of the logarithm to be omitted, and the result is   ( x, k )  k    x  x1  A  A k 

 ( x, k )  k ln 

(when k   A) .

(A.5)

A value of k that is too much negative, because it results in γ(X, k) = 0, is a limiting value of k and is denoted KL(X). It is determined by evaluating (A.5) at x = X and setting γ(X, k) = 0 to get  KL ( X )   K L ( X ) ln    X  x1  A  A  KL ( X ) 

or  A  K L ( X ) ln  1    X  x1  A . KL ( X )  

(A.6a)

It can be shown that there exists a unique KL(X) satisfying (A.6a), and it also satisfies 8

KL ( X )   A .

(A.6b)

The condition that is equivalent to (A.4) but expressed as an explicit requirement on k is

8

The logarithm in (A.6a) is undefined if KL(X) is between ‒ A and zero so there is no solution in that interval. The inequality ln(1 + x) ≤ x (when x > ‒ 1) implies that there is no positive solution. The left side of (A.6a) increases without bound as KL(X) approaches ‒ A from below, and the left side approaches A (which is less than the right side) as KL(X) → ‒ ∞, so there is at least one solution that is less than ‒ A. Differentiating the left side of (A.6a) and using the inequality ln(1 + x) > x/(1 + x) (when x > ‒ 1 and x ≠ 0) shows that the left side is strictly increasing so the solution is unique.

19

 (*, k )  A( X ) if and only if

k  KL ( X )

(A.7)

where KL(X) is given by (A.6). The relevant equation in (8.2) is (8.2a) because α = 1. The minimum in (8.2a) is the minimum in k when k  K(X), and this is the minimum in k when k > KL(X). Using this fact together with (A.3), the best available estimate applied to the case when α = 1 can be written as U BE 2 ( X ) 

 2  X  X d    ( x)  k  dx  when   1 . (A.8)  ( X , k )  2 x1 exp  x   ( , k )  k KL ( X )    min

For the special case in which the function β in (A.1a) is a constant, with the value consistent with U  A(X), it is evident that γ(*, k) given by (A.3) for some suitably selected k, which will not depend on X, will be the exact solution U for this special case. For a more general case, an approximate solution is the best available estimate given by (A.8). The flexibility provided by allowing k to depend on X (i.e., selecting k to be the best choice from the set of values satisfying k > KL(X)), instead of insisting that the same k be used for all X, improves the accuracy of the approximation. An alternate and simpler expression for the right side of (A.8) can be obtained by rearranging (A.3a) into 1 1 d d   ( x, k )  k    ( x, k )  ln    ( x, k )  ( x, k )  k dx dx  A  k 

where the absolute value was omitted from the argument of the logarithm for reasons explained in the discussion above (A.5). Integrating the above gives X

x

  ( X ,k)  k  d  ln    ( , k )   ( x, k )  k 

(when k   A) .

(A.9a)

The right side is a simplification compared to the left side except for a difficulty that occurs when k = ‒ A. The solution to (A.3) when k = ‒ A is γ(x, k) = A = ‒ k for all x so the numerator and denominator in the argument of the logarithm both become zero. This indeterminate form can be evaluated by using γ(ξ, k) = A inside the integral when k = ‒ A to get X

x

d X x   ( , k ) A

(when k   A) .

(A.9b)

Still another expression, applicable when k ≠ 0, is obtained by using (A.5) to rewrite the right side of (A.9a) so the equation becomes X

x

d  ( X , k )  X x   ( x, k )    ( , k ) k k

20

(when k  0) .

(A.9c)

Using (A.9) we obtain, with some redundancy in the bottom expression below, that

 ( X , k )  k   ( x, k )  k when k   A   X d    X x . exp    exp  when k   A  x  ( , k )    A      ( X ,k)  X   x   ( x, k )  exp   exp   when k  0 k k     

(A.10)

The final result is given by (A.8) and (A.10). Numerical evaluation of γ(x, k) can be performed by first selecting k > KL(X) and then numerically evaluating γ(x, k) which can be done as follows. If k = ‒ A then γ(x, k) = A. If k > ‒ A then the left side of (A.5), regarded as a function of γ(x, k), is strictly increasing 9 so any convenient root-finding routine can easily solve (A.5) for γ(x, k). If KL(X) < k < ‒ A then the left side of (A.5), regarded as a function of γ(x, k), is strictly decreasing so, again, any convenient root-finding routine can easily solve (A.5) for γ(x, k).10 Numerical examples are in Appendix B.

9

This can be verified by differentiating the left side with respect to γ and then use the fact that γ(x, k) + k has the same sign as A + k. 10 Root-finding routines are more efficient when supplied with bracketing bounds for the solution. Bracketing bounds are as follows. If KL(x) < k < ‒ A then γ(x, k) is decreasing in x which implies 0 < γ(x, k) ≤ A. If ‒ A < k ≤ 0 then the fact that γ(x, k) is increasing in x, together with the inequality ln(ξ) ≥ 0 when ξ ≥ 1, can be used to show that A ≤ γ(x, k) ≤ x ‒ x1 + A. If k ≥ 0 then the fact that γ(x, k) is increasing in x, together with the inequality ln(1 + ξ) ≤ ξ when ξ > ‒ 1, can be used to show that A ≤ γ(x, k) ≤ (1 + k/A)(x ‒ x1) + A.

21

Appendix B: Numerical Examples The family of trial functions given by (A.3) in Appendix A is applied to specific numerical examples of the type given by (A.1). The selected numerical examples would, ideally, give an indication of the error that can be expected from the best available estimate UBE(X) for a typical application. Unfortunately there is no definition of a typical application. The next-best goal is to select numerical examples that do not give UBE(X) an accuracy advantage that it would not have in most applications. The discussion below begins with examples that do give UBE(X) an accuracy advantage so that we will know what to avoid when selecting examples. In the context of (2.2), it is easy to show that UBE(X) given by (8.1) is the exact solution, for any trial function γ, when α = 0. Therefore there is an accuracy advantage for those examples in which α ≈ 0. In the context of (A.1), there is an accuracy advantage when the absolute value of β is large enough to make β the dominant term on the right side of (A.1a). The opposite extreme, in which β is small enough in absolute value to be neglected, also gives an accuracy advantage because this is a special case in which β can be approximated as constant (zero for this case). More generally, the family of trial functions given by (A.3) has an accuracy advantage when β can be approximated by any constant greater than KL(X). In order for an example to not give UBE(X) an accuracy advantage that it would not have in other applications, the example should be selected so that neither term on the right side of (A.1a) can be neglected compared to the other term, and β is not approximately constant. Two such examples are given. For one example β is decreasing in x, and for the other example β is increasing in x. The examples are:

x1  0,

(B.1)

d U ( x ) U ( x )  x (Example 1) dx

(B.2a)

d U ( x ) U ( x )  x (Example 2) . dx

(B.2b)

U ( x)

U ( x)

A 1

One motive for this choice of examples is that the exact solutions can be expressed in terms of elementary functions when expressed in parametric form.11 This allows the best available estimate to be quantitatively compared to the exact solutions. The parametric equations are 2 t 1 (B.3a) x e sin 3 t , U  x  et cos 3 t (Example 1) 2 3



x

2 t e sinh 5











1 x  et cosh 2



5t

5t , U 



(Example 2) .

(B.3b)

Fig. A1 compares U to UBE, with UBE calculated via the numerical routines in Appendix C, for each example. The first example is the more interesting, and the greater challenge, because 11

Introducing a parameter t using the equation U = dx/dt converts (B.2) into second order linear equations with constant coefficients that can be solved using elementary methods.

22

U(X) < 0 when X ≥ 3.36. All analysis in this report is void when X ≥ 3.36. A change in the sign of U can be predicted even if the exact solution is unknown. When attempting to calculate UBE(X) when X = 3.6, it was found the right side of (A.8) is negative. Recall from the last paragraph in Section 3 that this implies that there does not exist a solution U that satisfies U(X) > 0 for all X up to 3.6. This, in turn, implies that the theory is void and UBE(X) is meaningless when X ≥ 3.6. However, the figure shows fairly good agreement between U and UBE over the plotted range. Accuracy can be improved, if necessary, by using an iteration in which the new family of trial functions on the right side of (8.1a) consists of the single function UBE, and the left side is the square of the new estimate. Accuracy is much better for Example 2 in Fig. A1. The close view shows that the plot is not a perfect straight line in spite of the appearance of a perfect straight line in the expanded view. The plot does become a perfect straight line in the large-X limit. This is seen from (B.3b). A large value of the parameter t makes the hyperbolic sine nearly equal to the hyperbolic cosine, so U(x) given by (B.3b) becomes proportional to x. For X as large as 2, and even larger, the error between U and UBE is less than the thickness of the plotted curves so only a single curve is seen in the close view. For the expanded view, the error is barely discernable from the thickness of the curves.

Fig. A1: The best available estimate UBE is compared to the exact solution U for two examples. Only one curve is visible in the close view of Example 2 because the error is smaller than the curve thickness.

23

Appendix C: Numerical Routines The routines that produced the numerical examples in Appendix B, using the family of trial functions in Appendix A, are given here. The syntax conforms to Octave, which is a GNU package that closely resembles MATLAB. A routine will be described as being above another routine if the former calls (or executes) the latter. The bottom-level routine is the function file named gama.m and is in the left textbox below. Prior to calling this routine, values are assigned to x1 (a scaler representing x1), A (a positive scaler representing A), x (which can be a vector in order to be compatible with an integration routine, but if any element is less than x1, that element is treated as being equal to x1 to avoid crashes), and k (a scaler representing k). It is up to the calling code, or user, to insure that the value assigned to k is greater than KL(z) calculated from (A.6) when z is the largest element of the x vector. Otherwise the routine will crash. When executed, the function routine returns the value (or set of values when x is a vector) of γ(x, k) defined in Appendix A. The routine that is above gama.m is named abel.m and is in the right textbox below. Prior to calling this routine, values are assigned to x1 (a scaler representing x1), A (a positive scaler representing A), X (a scaler that is greater than or equal to x1), and k (a scaler representing k). The value assigned to k is must be greater than KL(X) calculated from (A.6). This will insure that the condition given in the previous paragraph regarding k will be satisfied. The function β(x) is defined inside the routine on the first line. The version shown uses β(x) = ‒ x. This can be changed by editing the file. When executed, the routine abel.m calls the function gama and calculates the square root of the curly bracket on the right side of (A.8), which is denoted UE(X, k) in the main text of this report. This quantity is stored in the variable UE. When evaluated at the minimizing k, which requires a top-level routine, UE becomes the best available estimate denoted UBE(X) in the text. Users can write their own top-level routine that assigns parameters needed by abel.m and automates the search for the minimizing k. This top-level routine is the one that must insure that the value assigned to k is greater than KL(X). When executed, the top-level routine will search for and then display the minimum of UE. The routine will also keep track of whether any calculated square of UE is negative (i.e., UE is imaginary), which implies that the exact solution is not positive at the selected value of X.

24

gama.m function P = gama(x1,A,x,k) Q=0; P=0; for i=1:length(x) f=@(z) z-k.*log((z+k)./(A+k))-x(i)+x1-A; B1=0; B2=A; if (k>-A) B1=A; B2=x(i)-x1+A; endif if (k>=0) B1=A; B2=(1+k./A).*(x(i)-x1)+A; endif if (k==-A) Q(i)=A; elseif (x(i)