Metodološki zvezki, Vol. 1, No. 1, 2004, 99-108
The Distribution of the Ratio of Jointly Normal Variables Anton Cedilnik1, Katarina Košmelj2, and Andrej Blejec3
Abstract We derive the probability density of the ratio of components of the bivariate normal distribution with arbitrary parameters. The density is a product of two factors, the first is a Cauchy density, the second a very complicated function. We show that the distribution under study does not possess an expected value or other moments of higher order. Our particular interest is focused on the shape of the density. We introduce a shape parameter and show that according to its sign the densities are classified into three main groups. As an example, we derive the distribution of the ratio Z = − Bm−1 /(mBm ) for a polynomial regression of order m. For m = 1 ,
Z is the estimator for the zero of a linear regression, for m = 2 , an estimator for the abscissa of the extreme of a quadratic regression, and for m = 3 , an estimator for the abscissa of the inflection point of a cubic regression.
1
Introduction
The ratio of two normally distributed random variables occurs frequently in statistical analysis. For example, in linear regression, E (Y | x) = β 0 + β 1 x , the value x0 for which the expected response E (Y ) has a given value y0 is often of interest.
The estimator for x0 , the random variable X 0 = ( y0 − B0 ) / B1 , is under the standard regression assumption expressed as the ratio of two normally distributed and dependent random variables B0 and B1 , which are the estimators for β 0 and
β1 and whose distributions and dependence are known from regression theory.
1
Biotechnical Faculty, University of Ljubljana, Jamnikarjeva 101, 1000 Ljubljana, Slovenia;
[email protected] 2 Biotechnical Faculty, University of Ljubljana Jamnikarjeva 101, 1000 Ljubljana, Slovenia;
[email protected] 3 National Institute of Biology, University of Ljubljana, Ve č na pot 111, 1000 Ljubljana, Slovenia;
[email protected]
Anton Cedilnik, Katarina Košmelj, and Andrej Blejec
100
Similar to the example above is the situation of a quadratic regression, E (Y | x) = β 0 + β1 x + β 2 x 2 , where the value sought is the x0 for which E (Y ) reaches its extreme value. At this point, the first derivative must be zero. Hence, X 0 = − B1 / 2 B2 is expressed as the ratio of two normally distributed and dependent variables as well. From the literature it is known that the distribution of the ratio Z = X Y , when X and Y are independent, is Cauchy. The probability density function for a Cauchy b , where the location parameter a is variable U: C (a, b ) is pU ( x) = π ( x − a) 2 + b 2 the median, while the quartiles are obtained from the location parameter a and the positive scale parameter b , q1,3 = a m b . This density function pU ( x) has ‘fat tails’,
(
)
hence U does not possess an expected value or moments of higher order (Johnson et al., 1994). Some results about the ratio from the literature are: (a) The ratio Z of two centred normal variables is a Cauchy variable (Jamnik, 1971: 149): X Y : N (µ X = µY = 0, σ X , σ Y , ρ ≠ ±1) ⇒
σ σ X : C a = ρ X , b = X 1 − ρ 2 σY σY Y The simplest case is the ratio of two independent standardised normal variables which is a ‘standard’ Cauchy variable C (0,1) . (b) The ratio Z of two non-centred independent normal variables is a particular Cauchy-like distribution. This result is shown in Kamerud (1978). (c) The ratio of two arbitrary normal variables is discussed in Marsaglia (1965) and leads again to a Cauchy-like distribution. Z=
The case considered in (b) is not general and the result in the cited article is presented in a very implicit way. Marsaglia dealt with the ratio of two independent normal variables, having shown previously, however that any case could be transformed into this setting. The objective of our work is to derive the probability density for the ratio of components of the bivariate normal distribution for a general setting. Let the vector W = [ X Y ]T : N (µ X , µY , σ X > 0, σ Y > 0, ρ ) be distributed normally, with the density (for ρ ≠ ±1 ):
( x − µ X ) 2 2 ρ ( x − µ X )( y − µY ) ( y − µ Y ) 2 1 − + ⋅ exp − 2 2 2 2 σ σ − 2 ( 1 ) ρ σY 2πσ X σ Y 1 − ρ X Y σ X and with the expected value and the variance-covariance matrix
p W ( x, y) =
1
The Distribution of the Ratio…
101
σ X2 ρσ X σ Y µ X E (W) = , var(W) = . σ Y2 µY ρσ X σ Y Our aim is to express the density function of the ratio Z = X / Y explicitly, in terms of the parameters of the bivariate normal distribution. We shall also discuss the degenerate situation, ρ = ±1 .
2
Probability density for the ratio
The following theorem is the basis for our derivation of the probability density for the ratio (Jamnik, 1971: 148). Theorem 1. Let W = [ X Y ]T be a continuously distributed random vector with a probability density function p W ( x, y ) . Then Z = X Y is a continuously distributed random variable with the probability density function
∫
∞
pZ ( z) =
−∞
y p W ( zy, y )dy = ∫ − ∞
0
∫yp 0
W
( zy, y )dy .
(2.1)
−∞
For the derivation of pZ (z ) for the ratio of the components of a bivariate normal vector we calculated the integral (2.1) using formulae in the Appendix. A long but straightforward calculation gives the next theorem. Theorem 2.
The
probability
density
for
Z=X Y,
where
[ X Y ]T : N (µ X , µY , σ X , σ Y , ρ ≠ ±1) is expressed as a product of two terms: pZ ( z ) =
=
1 σ Xσ Y 1 − ρ 2 R ⋅ Φ( R) = ⋅ exp − ⋅ sup R 2 ⋅ 1 + 2 2 2 π (σ Y z − 2 ρσ X σ Y z + σ X ) 2 ϕ ( R)
σXσY 1− ρ2 1 1 ⋅ exp − ⋅ supR2 + 2π ⋅ R⋅ Φ(R) ⋅ exp − ⋅ [supR2 − R2 ] 2 2 2 π (σY z − 2ρσXσY z +σX ) 2 2 (2.2)
where:
µX µY σ X µX µY ρ − ρ z − − σY (σY2µX − ρσXσY µY )z − ρσXσY µX + σ X2 µY σ X σ X σY σY (2.2a) R = R(z) = = 2 2 2 2 2 σ XσY 1− ρ ⋅ σY z − 2ρσXσY z +σ X σX σX 2 2 1− ρ ⋅ z − 2ρ z + σY σY
µX 2 2 2 2 σ Y µ X − 2 ρσ X σ Y µ X µY + σ X µY σ X 2 sup R = = σ X2 σ Y2 (1 − ρ 2 )
µ µ µ − 2 ρ X Y + Y σ X σY σY , 1− ρ 2 2
2
(2.2b)
102
Anton Cedilnik, Katarina Košmelj, and Andrej Blejec
( µ X − µY z ) 2 = sup R − R = 2 2 σ Y z − 2 ρσ X σ Y z + σ X2 2
µ X σ X µY − σ X σY σY
2
σ z − 2ρ X σY 2
z σ z + X σY 2
2
.
(2.2c)
The first factor in (2.2), the standard part, is the density for a non-centred
σ σ Cauchy variable, C a = ρ X , b = X 1 − ρ 2 . We have to stress that this factor σY σY is independent of the expected values µ X and µY . The second factor, the deviant part, is a complicated function of z, including also the error function Φ(.) (in Gauss form; see Appendix). We need four parameters: ρ ,
µ X µY σ , and X , to fully describe the distribution. It is strictly σ X σY σY
positive and asymptotically constant – it has the same positive value for both σ µ − ρσ X µY z = ±∞ , due to the fact that R(±∞) = ± Y X . Therefore, the asymptotic σ Xσ Y 1 − ρ 2 behaviour of pZ (z ) is the same as that of the Cauchy density, so E (Z ) and other moments do not exist. We wrote the deviant part in (2.2) in two forms. The first form is nicer and can also be found in Marsaglia (1965), but the second form is better for numerical purposes. A more detailed analysis of pZ (z ) led us to the definition of the shape parameter ω :
ω= based on R(±∞) and
µY σY
µX µ −ρ Y σY σ X
,
(2.3)
σ σ 1 − ρ 2 (µ X − µ Y z ) dR = X Y . The sign of ω separates 3/ 2 dz σ Y2 z 2 − 2 ρσ X σ Y z + σ X2
(
three different types of shape of pZ (z ) : ω >0 I. II. ω ρ X , for σY type II:
M (Z ) < ρ
σX , for type III: pZ (z ) is symmetric and σY
M (Z ) = ρ
σX . σY
Variants IIIa and IIIc are unimodal; generally, pZ (z ) may be uni- or bimodal. The distribution function and quantiles require numerical integration.
3
Degenerate situation
Now, let us consider the case ρ = ±1 , but still with σ X > 0 , σ Y > 0 . Then, the distribution of W = [ X Y ]T is degenerate, and with probability 1, it holds Y − µY
σY
=ρ⋅
X − µX
σX
; hence: Z =
X σ =ρ X + Y σY
µX − ρ
σX µY σY
Y
. Since the marginal
distribution Y : N ( µY , σ Y ) is the usual normal distribution, it is easy to find the probability density for Z from the following theorem.
Theorem 3. given by pZ ( z ) =
c
σY
If Y : N ( µY , σ Y ) and Z = a + 1 ⋅ ( z − a) ⋅ exp − 2 2π 2σ Y −2
c , c ≠ 0 , then Z has the density Y
c z − a − µY
2
.
The function pZ (z ) from this theorem is much simpler than (2) and it is rather easy to find its characteristics, including quantiles and distribution function. Also, there are two modes that can be found explicitly, and between them there is a removable singularity pZ (a) = 0 . The expected value, as in non-degenerate cases, does not exist. It is worth noting that in the degenerate case the shape parameter (3) is zero precisely when pZ (z ) is symmetric, as in the non-degenerate case. According to the sign of the shape parameter, the relations between the median M (Z ) and the σ quantity a = ρ X remain the same, as well. σY
106
4
Anton Cedilnik, Katarina Košmelj, and Andrej Blejec
Examples
Now, let us discuss the two problems presented in the Introduction. First, we will consider a linear regression E (Y | x) = β 0 + β 1 x . We shall be interested in the x-axis intercept: X 0 = − B0 / B1 , where B0 and B1 denote the estimators for β 0 and β 1 . Under the assumption that Y | x : N ( β 0+ β1 x , σ reg ) , the variable X 0 is expressed
as the ratio of two normally distributed and dependent random variables − B0 and B1 . Given the data w=
1 2 xi ∑ n
and
{( xi , yi ) , i = 1,..., n}
q=
σ reg n( w 2 − x 2 )
( x1 < xn ), we denote:
x=
1 ∑ xi , n
B0 x B : N β 0 , β1 , qw, q, − w 1
. Then:
,
− B0 x B : N − β 0 , β1 , qw, q, w . Hence, X 0 has a distribution with density function 1 (2) on making the substitution: µ X → − β 0 , µY → β1 , σ X → q ⋅ s , σ Y → q ,
ρ→x w . Now we shall be concerned with a general polynomial regression E (Y | x) = β 0 + β1 x + K + β m x m , m ≥ 1 . Let us define Z = − Bm−1 /(mBm ) . For m = 1 , Z = X 0 from the first example, the estimator for the zero of a linear regression. For m = 2 , Z is an estimator for the abscissa of the extreme of a quadratic regression, and for m = 3 , Z is an estimator for the abscissa of the inflection point of a cubic regression. v = xik
Introduce the following two data matrices: matrix of powers of x-s , and Y = [ yi
i = 1,K, n , the k = 0,K, m n×( m +1)
(i = 1, K , n)]n×1 . The regularity condition, that
there are at least m + 1 distinct x-s, implies that the rank of v is precisely m + 1 . Hence, v T ⋅ v is invertible and d = ( v T ⋅ v ) −1 = d jk ( j , k = 0,K, m ) ( m +1)× ( m +1) . Let
[
]
β = [β k (k = 0,K, m)]( m +1)×1 be the column of the regression coefficients, and B = [Bk (k = 0,K, m)]( m +1)×1
the column of their estimators. The normal system of
equations in matrix form is then v T ⋅ v ⋅ B = v T ⋅ Y , and its solution is B = d ⋅ vT ⋅ Y
(4.1)
As usual, we shall suppose that the y -s are independent normally distributed 2 . Hence, the random variables with E ( yi ) = β 0 + β1 xi + K + β m xi , var( y i ) = σ reg
m
107
The Distribution of the Ratio…
2 vector Y is normally distributed with E (Y ) = v ⋅ β , var(Y) = σ reg I . According to
(4),
is
B
also
normally
E (B) = (d ⋅ v T ) ⋅ E (Y) = β ,
distributed,
2 var(B) = (d ⋅ v T ) ⋅ var(Y) ⋅ (d ⋅ v T ) T = σ reg d.
− B 0 L 0 − 1 0 Introduce two matrices: u = and W = u ⋅ B = m−1 . 0 L 0 0 m 2×( m+1) mBm 2×1
− β E ( W) = u ⋅ E (B) = u ⋅ β = m −1 and mβ m d m −1,m −1 − md m−1, m 2 2 var(W) = u ⋅ var(B) ⋅ u T = σ reg u ⋅ d ⋅ u T = σ reg − md . Therefore, the 2 m d m − 1 , m m , m
W
is
also
a
normal
variable
with
d m−1,m . distribution of W is N − β m−1 , mβ m , σ reg d m−1,m−1 , mσ reg d m,m , − d m−1,m−1d m,m Hence, Z has a distribution with density function (2) with the exchange:
µ X → − β m−1 , ρ →−
µY → mβ m
d m−1,m d m−1,m−1d m,m
,
σ X → σ reg d m −1,m−1
,
σ Y → mσ reg d m,m
,
.
References [1] Jamnik, R. (1971): Verjetnostni račun. Mladinska knjiga, Ljubljana. [2] Johnson N.L, Kotz, S., and Balakrishnan, N. (1994): Continuous Univariate Distributions. 1. John Wiley and Sons. [3] Kamerud D. (1978): The random variable X/Y, X, Y normal. The American Mathematical Monthly, 85, 207. [4] Marsaglia, G. (1965): Ratios of normal variables and ratios of sums of uniforms variables. JASA, 60, 163-204.
108
Anton Cedilnik, Katarina Košmelj, and Andrej Blejec
Appendix a>0 ⇒
∫ t ⋅ exp(−at
m
2
+ bt + c)dt =
b2 b π ec e b 1 − exp(bm − am 2 ) + b ⋅ ⋅ exp ⋅ Φ + Φ m 2a − a 2a 2a 2a 4a 2a 0
=
c
[
]
∫ t ⋅ exp(−at + bt + c)dt = ∫ +
∞
−∞
b where: r = 2a
2 ∫ t ⋅ exp(−at + bt + c)dt =
∞
−∞
0
0
2
ϕ (r ) =
r2
1 −2 e 2π
ec a
Φ(r ) 1 + r ⋅ ϕ (r ) ,
Φ (r ) = ∫ ϕ ( x) dx = r
0
1 r erf . 2 2