simple general approximations for a random variable and its ... - SIAM

19 downloads 0 Views 2MB Size Report
Jan 31, 1984 - Key words, approximations, binomial, chi-square, distributions, ... z and F, gamma (chi-square in particular) and the distributions, andtheir.
1986 Society for Industrial and Applied Mathematics 001

SIAM J. ScI. STAT. COMPUT. Vol. 7, No. 1, January 1986

SIMPLE GENERAL APPROXIMATIONS FOR A RANDOM VARIABLE AND ITS INVERSE DISTRIBUTION FUNCTION BASED ON LINEAR TRANSFORMATIONS OF A NONSKEWED VARIATE* HAIM SHOREf Abstract. Linear transformations of a nonskewed random variable are employed to derive simple general approximations for a random variable having known cumulants. Introducing the unit normal variate, these become linear normal approximations. Some nonskewed variates with explicit inverse cumulative density function are then used to derive general approximations for the inverse DF of the approximated variable. The approximations are applied to the binomial, Poisson, Fisher’s z and F, gamma (chi-square in particular) and the distributions and their accuracy examined. Simple general approximations for the loss function of a random variable either continuous or discrete are developed. A simple approximation for the loss function of the Poisson distribution is then derived and demonstrated by an example from inventory analysis. Two further examples from interval estimation and from hypothesis testing highlight the usefulness of the new approximations.

Key words, approximations, binomial, chi-square, distributions, F distribution, gamma distribution, hypothesis testing, inverse distribution function, loss function, normal approximation, Poisson, distribution

1. Introduction. Many of the statistical problems that a practitioner encounters are difficult to solve due to their inherent mathematical nontractability. If, for instance, he wishes to find a sample size needed in estimating the ratio of the variances of two normal populations for a nonstandard confidence or precision level he will find it mathematically impossible and otherwise difficult to accomplish since available tables are confined to standard values only. Sensitivity analysis of optimal solutions in inventory analysis, to take another example, is rarely possible even though derivation of the optimal solution with today’s computing facilities is easy to attain. A need thus arises for approximations which while accurate enough will preserve that degree of simplicity required to derive closed form expressions for various decision variables incorporated in statistical models. With regard to the majority of existing approximations this simplicity requirement is rarely met. For example, while most approximations based on transformations of the normal deviate (general ones like the Cornish-Fisher expansion (1937) or approximations aimed for individual distributions like Bailey’s (1980) are highly accurate their algebraic structure is too complex for the aforementioned objective. In this paper we develop a series of simple general approximations based on linear transformations of a standardized nonskewed random variable. In 2 we show that by a proper choice of the linear transformation any random variable (either nonskewed or skewed) with known cumulants may be approximated by a random variable with a symmetrical distribution, where accuracy is determined by the ensuing approximate equality of the first three or four cumulants. The structure these approximations assume when the standard normal variate serves as the approximating variable is shown in 3. In 4 several symmetrical random variables having explicit inverse DF are employed to derive simple general approximations for the inverse of a skewed variate. Section 5 repeats the latter for a nonskewed variable. * Received by the editors January 31, 1984, and in final form December 10, 1984. f 16 Yeguada Hanassi Street, Tel Aviv 69200, Israel.

2

HAIM SHORE

The above linear transformations are applied in 6 to the binomial, the Poisson, Fisher’s z and F, gamma (chi-square in particular) and the distributions, and their accuracy demonstrated. Simple general approximations for the loss function of a random variable expressed in terms of the distribution function are derived in 7, and demonstrated for the Poisson. A problem from inventory analysis exemplifies their usefulness. Eventually, by employing some of the above approximations we arrive in 8 at approximate explicit expressions for the decision variables of two problems in statistical inference commonly encountered by applicants. Comparison of the new approximations with existing ones in terms of both algebraic tractability and accuracy is deferred to a projected paper currently under preparation. 2. Presentation of general approximations. Let X be a standardized random variable, the distribution of which depends on a certain parameter, n, so that as n tends to infinity X approaches normality. Assume that l,., the rth cumulant of X, is of order n 1-r/2, and denote the cumulative density function of X by F(x). Let Z be a standardized random variable with a symmetrical distribution and known partial moments, Mi, where

Mi

I

Z

dG(z)

1/2

and G(. is the cumulative density function. Let (x, z) be a pair of values of the respective variates related by F(x)= G(z)= P. In this paper we examine a linear transformation of z that approximates x, where the transformation, denoted has the general algebraic structure

:,

(1)

"

-

A2z+B:,_,

.

P< 1/2, P>

has mean, variance, third and fourth cumulants equal to, respectively,

(2) (3)

E(f)=(A2-A)M+1/2(B2+B), V() (A22 + AI)( M2 MI) + 2A2AI M21

3 (4)

1"4 (5)

+ (B2- B)[(B2- B)+ M(A + A)], (a- A31)(M3 3 M M1 + 2M31) + 3(A2 A,- A= A) (M2 M,- 2M) + 3 (A. A)(B2- B1)(1/2 M2 M) +(A2- A,)(B2- B1)-M,, A + A4)( M4 4M3 M, +6M_ M- 3M) + 4(A32 A, + A2 A31)( M3 M1 3 M2 M2 + 3 M4) + 6A22 A21(2 M2 M2 3 M4) + 2(B2- B,)[(A32 + A31) (M3 3M2 M1 + 3M31) + (A A, + A2 A2)3M,(M2 M21)] + ( B2 B,)2[ (A2 + A)( M M1) + 2A A M12 ] + 1/2(B2- B1)3(A2 + A,)M1 + 6(B2- B,) 4- 3.

Let us now consider a few special cases.

GENERAL APPROXIMATIONS FOR A RANDOM VARIABLE

3

First, if X has a symmetrical distribution we should have 13 --0. This we obtain by putting A1 A2- A, which necessarily leads to B1 --BE-- B since then E(X) -0. The solution for this case is (see Appendix A for details)

A=[1 + M(2/ME)/EB]/(EM2) 1/2 B [ 14 + 3 0.5 M4/M2]/[(2/M2)3/2( M M4/M2 M3 ].

(6)

This approximation has its variance and fourth cumulant identical with those of X to O(B2). If M4 3/2 then B is O(n -1) and we achieve identity to O(n-2). Second, let A 1- C, A2= 1 + C and B1 =-hC, BE’--hC. Then

E(2)=C(2MI-h), V(2) 2M2 + C2(2M2-4M),

(2a) (3a)

13 14

(4a)

6C(M3 2M2 M1)+ 2 C3(M3-6M2 M1 + 8M3), 2(1 +6C2+ C4)(M4-aM3M1 +6M2M-3M)

+ 8(1 C4)(M M,- 3 Mz M] + 3 +6(1 C2)Z(2M: M- 3MI) 3.

(5a)

To have

E()=0 we first put h 2M. To find C we equate ’3 13 to obtain a cubic

equation,

C + 3C( U V)-(13/2)/V= O,

(7) where

U= M3- 2 M: M,

V= Ma-6M: M, + 8M3

U-4MI(M:- 2M).

> 0, the only real root of (7) is On condition that I]/(16 V-) + (U V)3=

Ca { 13/(4 V) + [l/( 16 V2) + U/V)3]/:} 1/3 + { 13/(4 V) l/(16 V:) + U/V)3] /2}1/3. However, equality to O(n -3/:) is obtained for the third cumulant if the term O(C a) (8)

in (7) is neglected and we obtain the simpler

(9)

Cb la/ (6 U).

For U 0 this solution yields equality of the variance and the third cumulant to O(n -) and O(n-3/:), respectively. For the majority of special cases we have examined, the more complex Ca does not add much to the accuracy of the approximation, so the simpler Cb should be preferred. Third, let

(10)

A

1

C1, A2 1 + C2,

B -hC, B2- -hUE,

where C C- D, C2 --C + D, and h, C and D have yet to be determined. Introducing this solution into (2)-(4), we have

(2b)

E(f()=(C,+C:)(MI-h/2)=C(2M-h),

4

HAIM SHORE

V([) 2M2 + C + C)(M_- M- hM1 + h2/4) 2C2C,(M2 hM1 + h2/4) + 2(C2- C1)(M2- hM,) 2M2 + D2(2M2-4M, h + h 2) + C2(2M2-4M) + 4D(M2 Mh ),

(3b)

I3 (4b)

[( C3 + C3) + 3( C C2) + 3( C_ + C1)]( M3 3 M2 M, + 2M3) + 3[( C- C) C2C(C2 + C1) + (C2 + C1)](M2 M- 2M)

3h(C- C)(C2- C + 2)(1/2 M2- M) +h2(C2-C1)2(C2+C1)M1 2C(C2+ D2+ 6D + 3)(M3 3M2 M1 + 2M) + 6C(-C2+ D2+ 2D + 1)(M2M1-2M3) + 24hCD(D+ 1)(1/2M2- M)+ 12CD2h2M1.

’4

The expression for is cumbersome and will not be given here. From (2b) in order to have zero mean, we first have h 2M1. Now assume that C is O(n -1/2) and D is O(n-1). Neglecting terms which are O(n -) in (4b) we obtain thereof

(11)

C

(13/6)/U= O(n-I/2),

which is identical with Cb (9). Introducing this solution into the expression for are O(r/-2) we obtain for D

(12)

14

and neglecting terms which

D={14-(2M4-3)-12C2[M4-4M3M+4M2M2]}/[8(M4-hM3)].

Under the above assumption this solution yields equality of the variance, the third and the fourth cumulants to O(n-1), O(n -1’/2) and O(n-2), respectively. Note that for D to be O(n -) we should have M4=. For a nonskewed X we have C 0, and D > 0. However D is not equal to B (6) and the latter should be preferred when approximating nonskewed X since it results in an approximation identical with X to O(B 2) in both its variance and fourth cumulant. 3. General approximations based on linear transformations of the unit normal variate. The partial moments of the standard normal variate, hereafter denoted z, are

-.

M1-l/(2"rt’) 1/2 M2-" 1/2, M3-(2/Tr) /2 M4 Introducing first into (6) we obtain a normal approximation for a nonskewed variate

Azi + B,

’=[Azl-B,

Z

< O,

z>0,

where

A= 1 + (2/’n’)’/2B 1 + (1/4)14,

B=(’n’/32)1/214.

For a skewed variate in order to solve (7) we have U= M3-2M2M=1/2(2/,rr)/-= M 0.3989, V= M3 -6M M1 + 8M3 [(4- 7r)/(27r)](2/r) 1/2 0.1090,

5

GENERAL APPROXIMATIONS FOR A RANDOM VARIABLE

hence from (8)

C, { 13/0.4360 + [ l/ 0.1901 + 49.0132] 1/}1/3

+ {13/0.4360-[I/0.1901 + 49.0132]/2} /3, and from (9) and (11)

Cb (’rr/18)1/13 0.417813 C. Also we have h 2M1 M3 =0.7979. Finally from (12) we have

D

14

6.5408 C 2) / 6.9070

0.1448( 14 1.1417132).

In 6 we apply these approximations to some commonly used distributions. 4. General approximations for the inverse cumulative function of a skewed variate. In this section we introduce for z several random variables with known inverse distribution function so that general approximations for the inverse DF of X can be derived. The headings of the subsections specify the respective z. The concluding subsection notes some of the considerations relevant to the choice of the appropriate approximation in

applications. 4.1. Shore’s approximations for the inverse unit normal distribution function. Recently we have presented several approximations for the inverse of the standard normal distribution function (Shore (1982)), among which the most accurate is

z2=-5.5310{[(1-P)/P]1193-1},

(13)

P>-_1/2

and the simplest is

(14)

z3

-0.4506 In [(1

P)/ P] + 0.2253 =0.2253 In {e[P/(1 p)]2},

=-,

.

P>1/2.

These approximations have M2 M4 =-32, identical with those of the unit normal. The partial moments of (13) are (see detailed derivation in Appendix B)

M1 0.4002, M2 1/2, M3 0.7990, M4

These are very close to those of the standard normal distribution. For (13) we have (see (7)) U=0.3988, V=0.1112; hence from (8),

Ca { 13/0.4448 + l]/O. 1978 + 46.1266] 1/2}1/3 + { 13/0.4448 [ 1/0.1978 + 46.1266] 1/2} 1/3, and from (9) and (11)

Cb 0.4179/3 C. Also we have h

2M

0.8004. Finally from (12)

D 0.1453(14-1.1344/]). The partial moments of (14) are (see Appendix B)

M

0.4249, M2 1/2, M3 0.7738,

6

HAIM SHORE

which lead to

U 0.3489,

V 0.1128, h 0.8498,

Ca { 13/0.4512 + l/ 0.2036 + 29.592111/2}1/3 + {/3/0.4512 [//0.2036 + 29.592111/2} 1/3, Cb 0.4777/3 C. Finally

D 0.1484(14 1.4949 l). 4.2. The logistic variate. Let

(15)

z4

0.5513 In [P/(1-P)]

be a standardized logistic variate, the partial moments of which are

M1 0.3821, M2 1/2, M3 0.9064, M4 2.0996. Introducing into (8), (9) and (12) we get

U 0.5243,

V 0.2064, h

0.7642,

Ca {13/0.8256+[1/0.6816+ 16.391211/2} 1/3 + {13/0.8256-[1/0.6816+ 16.391211/2} 1/3, Cb=0.317913=C. Finally

D 0.0888(14-1.2-1.2203132). 4.3. The uniform variate. A standardized uniform variate on the interval (0, 1) is

zs=(12)l/2(p-1/2),

(16) which has partial moments

M1 =0.4330, M_=1/2, M3--0.6495, M4=0.9. We obtain U=0.2165,

V=0, h=0.8660.

Therefore

Ca Cb C 0.7698/3 which yields equality with the approximated variable of the third cumulant! Finally

D 0.3703(14 + 1.2 1.0670/). z5 approximates x as a linear function of its distribution function with both mean and the third cumulant preserved. Yet the density function of the resultant approximation is a constant. Expectedly, the accuracy associated with this approximation will for the majority of approximated distributions be below acceptable standards, therefore it will not be elaborated upon any more in this paper. Notwithstanding, we chose to present the approximation here for two reasons. First, it occasionally occurs in simulation

GENERAL APPROXIMATIONS FOR A RANDOM VARIABLE

7

studies that we wish to replace a constant factor by a random one that will only roughly exhibit the characteristic behaviour of the original agent. If accuracy is not of prime concern and the distribution of the simulated factor is unspecified, the above approximation may prove an easy and simple application tool for the aforementioned purpose. Second, since the uniform approximation, when differentiated, results in a constant it may well replace more complicated expressions that appear in the target function of optimization models, thus assisting in deriving closed form solutions. However, the robustness of the optimal solution to deviations from the exact values should be carefully studied in order to avoid improper application of this approximation.

4.4. Comparison of the approximations for applicability---a note. When applied to individual distributions the above general approximations may differ on accuracy. Yet even when no dominance may be noticed in terms of the latter, we hold the view that no approximation may a priori be discarded on grounds of redundancy. In attempting to derive closed form solutions to stochastic models some approximations may prove fruitful in certain cases and some in others. For example: 2 by our experience usually results in the more accurate approximations. However z3 obviously leads to more tractable solutions since on differentiating it with respect to P/(1- P) an expression linear in this term results, from which the optimal value of P may be easily isolated (see a demonstration to inventory analysis in 7). Consequently, while the zi of this section have been introduced in an increasing order of simplicity (as judged for instance by the criterion referred to above) they are most likely to be associated with a decreasing order of accuracy. A practitioner should be advised to select the approximation not only to suit the needs of accuracy but also in accordance with the simplicity requirements, as presented by the case on hand.

5. General approximations for the inverse cumulative function of a nonskewed variate. Introducing the partial moments of the zi of 4 into (6) we obtain

for g2: B 0.3112/4, A 1 d-/4, for z3: B 14, A 1 + 0.2124/4, for z4: B 0.1791(/4-1.2), A= 1 +0.1368(/4-1.2). 6. Approximations for the inverse distribution function of individual distributions. 6.1. The binomial variate. Let x be a standardized binomial with parameters (n, p), that is

x

(y-/)/ty (y- np)/[np(1 p)]l/2,

where y, here and in subsequent subsections, is the unstandardized deviate. For x we have

13=(1-2p)/o",

14=[1-6p(1-p)]/o2.

As is common practice in approximating the binomial by a continuous variate (see for example Benedetti (1956), we apply a continuity correction, and to approximate x substract from each of the above approximations the term 1/(2t). To conserve space and in view of an earlier remark that solution (9) yields accuracy comparable to that of the more complex solution (8), the latter is discarded here and in subsequent subsections. The interested reader may work out this solution for himself from the respective general equations given in earlier sections.

8

HAIM SHORE

The resultant approximations For solution (9)"

are"

z>0, :=[l+0.4178(1-2p)/o’]z-(5-4p)/(6o’), z2>0, =[1 +0.4179(1-2p)/tr]z2-(5-4p)/(6cr), := [1 +0.4777(1-2p)/tr]z3-(O.9059-O.8118p)/o’, z3>0, [1 +0.3179(1-2p)/tr]Zn-(O.7429-O.4859p)/tr, z4>0. For solution (10)"

=

= (1-C)zi-hC+ hC(1 C)z,-

1/(2tr), 1/(2),

where

C C D, C2= C + D,

h

2M.

For zl we have C =0.4178(1-2p)/tr, D -(1/)[0.0e05 + 0.2075p( -p)], h =0.7979, for z2 we have C =0.4179(1-2p)/o,

D -(1/tr2)[0.0195 + 0.2125p(1 -p)], h =0.8004,

for z3 we have

C =0.4777(1-2p)/tr, D

-(1/tr2) [0.0734 + 0.0030p(1 p)], h =0.8498,

for z4 we have

C=0.3179(1-2p)/cr, D=-(1/tr2)[O.O196+O.lO66cr:Z+O.O993p(1-p)], h=0.7642. Some comparative values of for the various approximations, together with values derived from the standard commonly used approximation

o

z- 1/(2r),

.

The upper part of the table are given in Table 1. Values are the unstandardized contains values derived from (9) while the lower values derived from solution (10) (same order is preserved in all subsequent tables). Values of derived from z are missing (here and in subsequent tables) since they are nearly identical with those derived from z2. Also only P > 1/2 is referred to, because the same accuracy obtains for

P 0 and n- c so that pn tr 2 const. h), the former may be approximated by setting p 0 and np h in the approximations for the latter. To conserve space these approximations are not given here. However, they are utilized to form Table 2, where some comparative values of the approximate unstandardized are presented. Again we also exhibit values derived from the traditional

-o

Zl

1 / (2o’).

The pattern revealed in Table 2 is similar to that of Table 1. Z2 usually gives the more accurate results, while among the logistic variates (z3 and z4) z3 is better. Note the improvement in the accuracy for far tail probabilities of approximations z3 and z4 as we move from solution (9) to (10). This improvement, unnoticeable for z2, is due probably to the relatively longer tail of the logistic distribution as compared with that of the standard normal (recall that solution (10) gives near equality of both 13 and/4).

10

HAIM SHORE

TABLE 2 Comparative values

of approximations for the Poisson.

0.5 0.607 0.27

Exact

Z

Z3 24

0.910 1.34

h

0.986 2.20

0.998 2.88

2

3

0

0.736 0.63

0.920 1.41

0.981 2.08

0.996 2.65

2

3

4

-.016

0.740

1.349

1.829

1.130

1.910

2.580

3.150

-.020 0.093 0.003

1.166 1.096 1.064

2.142 2.132 2.161

2.923 3.178 3.268

1.068 1.110 1.002

2.148 2.053 2.032

3.109 3.053 3.123

3.947 4.101 4.266

-.006 0.138 0.057

1.152 1.053 1.012

2.104 1.998 1.998

2.865 2.952 2.994

1.072 1.122 1.027

2.136 2.018 1.958

3.084 2.969 2.945

3.911 3.964 3.978

A =2.0

Exact

1.0

A =3.0

0.677 0.46

0.947 1.62

0.995 2.57

0.647 0.38

0.815 0.90

0.966 1.82

0.996 2.65

2

4

6

3

4

6

8

2.297

3.937

5.281

3.369

4.270

5.864

7.301

2.022 2.151 1.964

4.115 3.978 4.010

5.910 6.033 6.312

2.995 3.195 2.942

4.094 4.068 3.933

6.080 5.924 6.039

7.900 8.086 8.493

2.026 2.166 2.022

4.103 3.943 3.874

5.885 5.941 5.957

3.000 3.210 3.026

4.092 4.067 3.922

6.069 5.887 5.828

7.879 8.007 8.046

6.3. The gamma variate (including chi-square). For the gamma distribution with parameters (t, r), the mean is a, the standard deviation is (r/t2) 1/2, and the kth cumulant of the standardized variate is Ik (k-1)!r 1-k/2. We obtain

r

13=2r -1/2,

14=6r -1"

2

In particular, for a X variate with v degrees of freedom (r v/2, t 1/2) we obtain 1/2 and 14 =12/v. /z=v, tr=(2v) 1/2, 13=(8/v) Introducing into the approximations of 3 and 4 we get the following.

For solution (9)"

:=[1 + 1.1817/(v)l/2]Zl-O.9429/(v)l/2, =[1 + 1.1820/(v)t/2]z2-O.9461/(v)l/2,

zl >=0,

:=[l+l.3511/(v)l/2]z3-1.1482/(v)l/2 =[1 +0.8992/(v)1/2]z4-O.6872/(v)l/2,

z3>_--0,

For solution (10)" x

{(1-C1)zi-hC1, (1 +

C2)zi- hC2,

zi 0,

z2

O,

z4>=0.

GENERAL APPROXIMATIONS FOR A RANDOM VARIABLE

11

where

C

For

Z

C

O C2 C + D,

h

2M1.

we have

C =1.1817/v 1/2, D=O.4151/v, h=0.7979, for

Z2

we have

C for

Z3

1.1820/v 1/2, D=0.4250/v, h =0.8004,

we have

C= 1.3511/v 1/2, D=0.0060/v, h=0.8498, for

Z4

we have

C 0.8992/u /2, D 0.1987/u-0.1066, h 0.7642. Some comparative values of these approximations, together with values derived from the simple z are given in Table 3. For solution (9) the logistic approximations seem to be the more accurate, z4 slightly better than z3. For solution (10) it is z3 which is usually the more accurate.

:o

TABLE 3 Comparative values

of approximations for the X 2. v=lO

P

Z1

0.7 0.524

0.8 0.842

0.9 1.282

0.95 1.645

0.99 2.326

0.995 2.576

Exact

11.781

13.442

15.987

18.307

23.209

25.188

12.345

13.764

15.731

17.356

20.404

21.519

z

11.929 12.251 11.711

13.842 13.801 13.418

16.498 16.134 15.986

18.727 18.283 18.352

23.002 23.030 23.579

24.572 25.038 25.790

Z2 Z3 Z4

11.878 12.250 11.826

13.850 13.801 13.418

16.588 16.135 15.812

18.887 18.285 18.019

23.294 23.034 22.893

24.912 25.043 24.955

v= 25

Exact

Z2 Z

Z4 z z

Z4

0.7 0.524

0.8 0.842

0.9 1.282

0.95 1.645

0.99 2.326

0.995 2.576

28.172

30.675

34.382

37.652

44.314

46.928

28.708

30.951

34.062

36.631

41.450

43.214

28.311 28.829 27.925

31.033 31.010 30.404

34.812 34.292 34.134

37.985 37.317 37.571

44.069 43.997 45.163

46.302 46.823 48.374

28.279 28.829 28.132

31.038 31.010 30.404

34.869 34.293 33.822

38.086 37.318 36.972

44.253 44.000 43.929

46.517 46.826 46.872

12

HAIM SHORE

0.5 In F, where

6.4. Fisher’s and the F distribution. Fisher’s is defined by F has an F distribution with vl and v2 degrees of freedom. Let

,

rl=l/(v-l), r2=l/(v2-1), /x=0.5(r2-r), r=[O.5(r+r2)]/; standardization of is based on Fisher’s normal approximation to and rl and r2 are defined in accordance with Fisher’s suggestion so as to normalize the standardized (for details see Johnson and Kotz (1970, Ch. 26, Section 4)). From Wishart (1947), an approximate formula for the cumulants of the standardized is

.

lk 0.5(k-2)! [rk- + (--1)krk-]/r k. Thus

/3 2/or,

14

2r

+ 6( x / or)

Introducing these into the equations of 3 and 4 simple approximations for ensue. However, of more interest to applied statisticians are approximations for the inverse of the F distribution. These are easily derived from the above approximations. In particular, let us consider that based on z3 and solution (9):

(-/z)/tr=

(1-0.477713)z3-0.405913, (1 +0.477713)z3-0.405913,

z30.

TABLE 4 Comparative values

v

Exact

0.75

0.90

0.95

0.975

0.99

1.89

3.30

4.74

6.62

10.10

13.60

2.08

3.39

4.72

6.48

9.80

13.34

30

30

v

0.90

0.95

0.975

0.99

0.995

1.28

1.61

1.84

2.07

2.39

2.63

1.31

1.57

1.78

2.01

2.35

2.64

20

v

10

0.75

0.90

0.95

0.975

0.99

0.995

1.52

2.20

2.77

3.42

4.41

5.27

1.59

2.17

2.68

3.29

4.28

5.21

v

Exact

0.995

0.75

v

Exact

v2= 5

10

v

Exact

of the approximation for F.

40

v2=24

0.75

0.90

0.95

0.975

0.99

0.995

1.30

1.64

1.89

2.15

2.49

2.77

1.33

1.61

1.84

2.08

2.45

2.77

GENERAL APPROXIMATIONS FOR A RANDOM VARIABLE

13

Introducing in terms of F, a simple approximation for F in terms of its distribution function is readily derived"

/3= [P/(1 p)]O.9012r+SpO.8610 , exp [BP0.4506cr + 0.8069/z] where

P1/2.

1,

Some comparative values

of/3 are introduced in Table 4.

6.5. The t distribution. For a variate with v degrees of freedom we have all odd order cumulants identically zero. For the standardized t, that is x t(1-2/v) ’/2 we have 1 6/( v 4). Introducing it into the expressions of 3 and 5 we get:

for

zv A= 1 +(3/2)/(v-4),

for z2" A= 1 +(3/2)/(v-4),

B=6(’rr/32)/2/(u-4), B= 1.8672/(v-4),

for z3" A= 1 + 1.2744/(v-4), for z4"

;o

B=(3/2)/(v-4), B 1.0746/(v-4)-0.2149. A=O.8208/(v-4)+0.8358,

Values of the approximate unstandardized t, together with values derived from z are shown in Table 5. Though for small P z performs worse than z3 and z, TABLE 5 Comparative values

of the approximations for t. v=10

exact

Z

Z3 24

0.70 0.524

0.80 0.842

0.90 1.282

0.95 1.645

0.99 2.326

0.995 2.576

0.542

0.879

1.372

1.812

2.764

3.169

0.586

0.941

1.433

1.839

2.601

2.880

0.395 0.543 0.548

0.830 0.873 0.871

1.434 1.368 1.357

1.942 1.824 1.805

2.914 2.832 2.795

3.271 3.259 3.213

v=20

exact

Z

Z3 Z4

0.70 0.524

0.80 0.842

0.90 1.282

0.95 1.645

0.99 2.326

0.995 2.576

0.553

0.860

1.325

1.725

2.528

2.845

0.553

0.887

1.351

1.734

2.452

2.715

0.490 0.592 0.593

0.849 0.868 0.870

1.347 1.284 1.288

1.766 1.667 1.674

2.568 2.514 2.525

2.863 2.872 2.885

14

HAIM SHORE

no uniform dominance may be noticed for higher values. All three of the new approximations obviously dominate in terms of accuracy, having the additional advantage of expressing x explicitly in terms of its distribution function.

o

7. Simple general approximations for the loss function of a random variable (continuous or discrete) and an application to the Poisson distribution. The loss function of a continuous variable, Y, is defined by

(17)

L(y)=

(x-u)f(x) dx=o.

(t-y)f(t) dt=o.

[1- F(x)] dx,

where f(. and F(. are the density and cumulative distribution functions, respectively, and x and u are in standard units. Likewise for a discrete variable we have

(17a)

.

L(y)

t- y)f( t)

o.

E (x

u)f(x)

o. 2 1

F(x)].

t--’y

The loss function plays a central role in many stochastic optimization models which incorporate it as a major component of their target function. Outstanding examples are inventory control problems, like the well-known "newsboy problem", and optimization problems associated with Bayesian statistics. Deriving a simple approximation for L(y) may enhance the development of closed form solutions for these problems. To start with, we first develop general approximations for the loss function of a continuous variable. In 2 we introduced a general approximation for x based on solution (9)

:=((1-c)z-hc,

(18)

(l+c)z-hc,

z-O.

Deriving dx from (18), introducing it in terms of dz into the rightmost wing of (17) and integrating, we obtain for P < 1/2

(19a) L(y)=o.{(1-c)[Lz(P)-Lz(1/2)]+(1 +c)Lz(1/2)}=o.{(1-c)Lz(P)+2cLz(1/2)} and for P > 1/2 L(y)= o.(l + c)Lz(P) (19b) where Lz(P) is the loss function of Z at G(z)= P. Let us now introduce the various approximating zi presented in 3 and 4, together with the respective c. First, approximating Lz(P) for the standard normal deviate from Shore (1982)

Lz(P)

0.4115P/(1 P)- z, 0.4115(1-P)/P,

where P (zl), and (. is the cumulative standard normal distribution function, we obtain

P1/2. o’[(1 + 0.417813)0.4115(1- P)/ P], Note that to derive (20) we introduced from (18) o’(1-0.417813)z=o.(:+hc)=

(20)

L(y)=

o.[-k-()/3], and also put the exact Lz(1/2) =0.3989.

GENERAL APPROXIMATIONS FOR A RANDOM VARIABLE

Next we derive the loss function of Z2, To do that, note that

Lz(P)

Z3

and

[1- G(z)] dz=

z4,

15

4.

as defined in

(1-G)[Oz/OG] dG.

Thus for z2 we obtain 0.6598

’ G-0.8807 (l-G)

-0.9

dG,

P1/2

P

which is most unlikely to lead to any useful results in the sense expounded above. For z3 we have

Lz( P) 0.4506

(I/G) dG -0.45061n P

which on introduction into (19) yields

L(y)

o’{-(1 0.4777/3)0.4506 In P+ 0.30/3}, -o-(1 +0.4777/3)0.4506 In P,

P 1/2.

To derive a simple approximation for the loss function of a discrete variate let us draw a graph of 1- F(x) (the vertical axis) as a function of x (the horizontal axis). From simple geometric considerations it can be easily verified that

L(y)

(21)

,=,2

1

F(x)]

0-

=0"{Iu

1

F(x)] dx + (1/0")0.5

[1-F(x)]

x=u"l/o.f(x)

dx}+O.5[1-F(x)].

The above first term on the right side is virtually the loss function of X were it regarded as a continuous variate, so that the second term may be considered a continuity correction. Introducing for the "continuous" loss function its approximation as derived above (19) we finally have for a discrete variate

(22)

L(y)

tr{(1-c)Lz(P)+2cLz(1/2)}+O.5(1-P), tr(1 + c)Lz(P)+0.5(1 P),

P

To demonstrate the accuracy associated with the above approximations we apply them to the Poisson loss function which is being in extensive use in the formulation of stochastic inventory control models. Introducing the respective parameters in (22) we obtain for zl

P 1/2" 0.5(1 P) by 0.5(1 P)/P, so that (23) simplifies to o’{(1-1.6329/o’)O.4115P/(1-P)-x}, L(y)-(23a) cr(l+l.6329/o’)0.4115(1-P)/P,

P1/2.

Introducing for x in terms of P from one of the approximations of 6, L(y) is expressible in terms of P only, which is very useful in application to inventory control models. For z3 we obtain likewise

(24) and for

L(y)=

-or(1 0.4777/o’)0.4506 In P+ 0.8- 0.5P, -cr(l+0.4777/r)0.4506 In P+0.5(1-P),

P1/2,

L(y)=

-tr(1 -0.3179/cr)0.5513 In P+0.74-0.5P, -cr(l+O.3179/tr)O.55131nP+O.5(1-P),

P1/2.

Z4

(25)

Since (24) and (25) are of the same algebraic structure a choice between them has to be made in terms of accuracy. Comparison shows that in terms of maximum deviation (25) is uniformly dominant. Table 6 presents some values for approximations (23), (23a) and (25). In terms of maximum deviation (23) generally yields the most accurate TABLE 6 Comparative values

0

of the approximations for L(y).

2

3

4

5

6

7

8

9

0.0183

0.0916

0.2381

0.4335

0.6288

0.7851

0.8893

0.9489

0.9786

0.9919

L(y) (exact)

4.0000

3.0185

2.1098

1.3481

0.7815

0.4103

0.1954

0.0848

0.0336

0.0123

App. (23) App. (23a) App. (25)

4.0030 4.0028 4.4410

3.0199 3.0152 2.9108

2.0844 2.0472 1.9517

1.2815 1.1156 1.2984

0.7729 0.8825 0.7785

0.3798 0.4092 0.4166

0.1792 0.1861 0.2053

0.0791 0.0805 0.0926

0.0325 0.0327 0.0383

0.0122 0.0122 0.0144

8

10

12

14

16

18

20

22

24

0.0220

0.0774

0.1931

0.3675

0.5660

0.7423

0.8682

0.9418

0.9777

L(y) (exact)

8.0159

6.0812

4.2852

2.7529

1.5872

0.8114

0.3672

0.1464

0.0504

App. (23) App. (23a) App. (25)

8.0222 8.0219 8.4767

6.0850 6.0817 5.8954

4.2562 4.2331 3.9818

2.6727 2.5659 2.5883

1.6110 1.7774 1.5719

0.7600 0.8047 0.8382

0.3419 0.3519 0.4023

0.1414 0.1432 0.1718

0.0526 0.0529 0.0648

GENERAL APPROXIMATIONS FOR A RANDOM VARIABLE

17

results, but as expected no meaningful differences in accuracy may be noticed between (23) and (233) for tail probabilities. Equation (25) performs best in middle range probabilities. To demonstrate the applicability of the above approximations we take a simple standard model where inventory supply is periodical, and it is required to determine the reorder level and order quantity which simultaneously minimize the total cost per unit time. The well-known fundamental equation is

T- Sd/ Q+ (g- ld)IC + ICQ/2 + (Trd/ Q)L(R), where S is the order cost; d the mean demand per unit time; the mean lead time; C the unit cost; I the interest rate; 7r the loss of revenue per item in case of shortage; R the reorder level; Q the order quantity; and T the total cost per unit time. Demand in the lead time is assumed to be Poisson with mean cr 2= ld. To find the optimal reorder level, R*, let us introduce for (R-ld) from of the Poisson based on z3, and for L(R) from (23a) (assuming P*= F[(R*-ld)/cr]>0.5), to obtain

(26)

T= Sd/Q+ ICcr{(1 + 0.48/cr)[0.4506 In [P/(1- P)] + 0.2253] 0.91/or}

+ ICQ/2 + (rd/Q)cr(1 + 1.63/or)0.4115(1 P)/P.

Differentiating with respect to (1- P)/P and with respect to Q and equating to zero we obtain

(27) (27a)

(1-P*)/P*= l.O950(ICQ*/Trd)[(cr+O.48)/(cr+l.63)], Q.)2 (2/IC)[Sd / rd(cr + 1.63)0.4115(1 P*)/P*]

from which

,

Q*= r{0.4506(1 + 0.48/r) + [0.2030(1 +0.48/o’)2+2Sd/(ICo’2)]/2}. Introducing (28) into (27) and then into x we obtain the optimal value of the reorder

(28)

level

(29)

R*= ld- (or + 0.48){0.4506 In [(1- P*)/P*]-0.2253}-0.91.

To demonstrate the accuracy of this solution let S 50, d 20 items, 1/2 time unit, C=2, I=0.10, 7r=5, r=/d=10. Then Q*- 101.6 (28), (1-P*)/P*=O.1692 (27), and from (29) R*= 12.83. The exact optimal solution is Q*= 102, R*= 13 with an associated cost of T 20.9201. To demonstrate the usefulness of the above solution, suppose that costs incurred by a change in the current inventory policy leave us indifferent to a deviation of +/-p% in Q*. What then is the permissible range of variation of the relevant parameters of the model (assuming cr unchanged)? From (28) we find:

(Sd)/(IC) {[Q*/r- 0.4506(1 + 0.48/r)]- 0.2030(1 + 0.48/cr)}(r/2) 0.5{(Q* 1.6164) 2- 2.6123}. Thus for a permissible change in Q* of say +30% from its current value, the relevant Sd/IC may be allowed to vary from -51% to +72% from its current value before a change in policy needs to take place. Finally note that (28) implies Q* being independent of 7r, unlike R* which increases with 7r (see (29)). 8. Two approximate solutions in statistical inference. 8.1. Determining sample size in estimating the ratio of the variances of two normal and r2 be the unknown variances of two populations---an approximate solution. Let

r

18

HAIM SHORE

normal populations. Let A try/tr2 be the ratio of the two variances, which has to be estimated by two random samples of equal size n. It is required to choose n so that the positive relative sampling error will not exceed Pl% with probability (1- a), and that the relative negative sampling error will not exceed P2% with probability (1- 32). Let F_ (n 1, n 1) be the (1 a) 100 percentile of an F distribution with S/S2 be the samples’ ratio. Vl v2 n- 1 degrees of freedom and let Since A/ is distributed as F with v v2 n- 1 degrees of freedom we obtain the following conditions:

(30) (31)

Fl_,(n-1, n-1)- 1-P2/100.

In order to determine a proper n we use/, where separately, for the equality sign to obtain

=0, and solve (30) and (31), each

(32) n,= {0.4506 ln[[(1-a,)/a,]2e]/[ln (1 +p/100)]}2+2 (+ for i= 1,- for i=2). To select the sample size, choose max (n, n). To improve the accuracy of the above solution the numerical coefficient in (32) was readjusted for the commonly used range" a-

Suggest Documents