module 7 basic probability and statistics part ii ...

2 downloads 0 Views 586KB Size Report
Probability and Statistics Part II examine such interval estimation problems here. 3.1. Interval Estimation Problems. In order to explain the various technical terms ...
MODULE 7 BASIC PROBABILITY AND STATISTICS PART II STATISTICS

CHAPTER 3

INTERVAL ESTIMATION 3.0. Introduction In Chapter 2 we looked into point estimation in the sense of giving single values or points as estimates for well-defined parameters in a preselected population density/ probability function. If p is the probability that someone contesting an election will win and if we give an estimate as p = 0.7, then we are saying that there is exactly 70% chance of winning. From a layman’s point of view such an exact number may not be that reasonable. If we say that the chance is between 60 and 75 percent it may be more acceptable to a layman. If the waiting time in a queue at a checkout counter in a grocery store is exponentially distributed with expected waiting time θ minutes, time being measured in minutes, and if we give an estimate of θ as between 5 and 10 minutes it may be more reasonable than giving a single number such as the expected waiting time is exactly 6 minutes. If we give an estimate of the expected life-time of individuals in a certain community of people as between 80 and 90 years it may be more acceptable rather than saying that the expected life time exactly 83 years. Thus, when the unknown parameter θ has a continuous parameter space Ω it may be more reasonable to come up with an interval so that we can say that the unknown parameter θ is somewhere on this interval. We will 117

118

Probability and Statistics Part II

examine such interval estimation problems here. 3.1. Interval Estimation Problems In order to explain the various technical terms in this area it is better to examine a simple problem and then define various terms appearing there, in the light of the illustrations. Example 3.1. Let x1 , ..., xn be iid variables from an exponential population with density f (x, θ) =

1 −x/θ e , x ≥ 0, θ > 0 θ

and zero elsewhere. Compute the densities of (1): u = x1 + ... + xn ; (2): v = uθ and then evaluate a and b such that P r{a ≤ v ≤ b} = 0.95. Solution 3.1. The moment generating function (mgf) of x is known and it is Mx (t) = (1 − θt)−1 , 1 − θt > 0. Since x1 , ..., xn are iid the mgf of u = x1 + ... + xn is Mu (t) = (1 − θt)−n , 1 − θt > 0 or u has a gamma distribution with parameters (α = n, β = θ). The mgf of v is available from Mu (t) as Mv (t) = (1 − t)−n , 1 − t > 0. In other words, v has a gamma density with the parameters (α = n, β = 1) or it is free of all parameters since n is known. Let the density of v be denoted by g(v). Then all sorts of probability statements can be made on the variable v. Suppose that we wish to find an a such that P r{v ≤ a} = 0.025 then we have Z a n−1 v e−v dv = 0.025. Γ(n) 0 We can either integrate by parts or use incomplete gamma function tables to obtain the exact value of a since n is known. Similarly we can find a b such that Z ∞ n−1 v P r{x ≥ b} = 0.025 ⇒ e−v dv = 0.025. Γ(n) b This b is also available either integrating by parts or from the incomplete gamma function tables. Then the probability coverage over the interval [a, b] is 0.95 or P r{a ≤ v ≤ b} = 0.95.

Interval Estimation

119

We are successful in finding a and b because the distribution of v is free of all parameters. If the density of v contained some parameters then we could not have found a and b because those points would have been functions of the parameters involved. Hence the success of our procedure depends upon finding a quantity such as v here, which is a function of the sample values x1 , ..., xn and the parameter (or parameters) under consideration, but whose distribution is free of all parameters. Such quantities are called pivotal quantities. Definition 3.1 Pivotal quantities. A function of the sample values x1 , ..., xn and the parameters under consideration but whose distribution is free of all parameters is called a pivotal quantity. Let us examine Example 3.1 once again. We have a probability statement P r{a ≤ v ≤ b} = 0.95. Let us examine the mathematical inequalities here. (x1 + ... + xn ) ≤b θ θ 1 1 ≤ ⇒ ≤ b (x1 + ... + xn ) a (x1 + ... + xn ) (x1 + ... + xn ) ⇒ ≤θ≤ . b a

a≤v≤b⇒a≤

Since these inequalities are mathematically identical we must have the probability statements over these intervals identical. That is, (x1 + ... + xn ) (x1 + ... + xn ) (x1 + ... + xn ) ≤ b} = P r{ ≤θ≤ }. θ b a (3.1.1) Thus we have converted a probability statement over v into a probability statement over θ. What is the difference between these two probability statements? The first one says that the probability that the random variable falls on the fixed interval [a, b] is 0.95. In the second statement θ is not a random variable but a fixed but unknown parameter and the random variables are at the end points of the interval or here the interval is random, P r{a ≤

120

Probability and Statistics Part II

not θ. Hence the probability statement over θ is to be interpreted as the probability for the random interval [ ub , ua ] covers the unknown θ is 0.95. In this example we have cut off 0.025 area at the right tail and 0.025 area at the left tail so that the total area cut off is 0.025 + 0.025 = 0.05. If we had cut off an area α2 each at both the tails then the total area cut off is α and the area in the middle if 1 − α. In our Example 3.1, α = 0.05 and 1 − α = 0.95. We will introduce some standard notations which will come in handy later on. Notation 3.1. Let y be a random variable whose density f (y) is free of all parameters. Then we can compute a point b such that from that point onward to the right the area cut off is a specified number, say α. Then this b is usually denoted as yα or the value of y from there onward to the right the area under the density curve or probability function is α or P r{y ≥ yα } = α.

(3.1.2)

Then from Notation 3.1 if a is a point below which of the left tail area is α then the point a should be denoted as y1−α or the point from where onward to the right the area under the curve is 1 − α or the left tail area is α. In Example 3.1 if we wanted to compute a and b so that equal areas α 2 is cut off at the right and left tails then the first part of equation (3.1.1) could have been written as P r{v1− α2 ≤ v ≤ v α2 } = 1 − α. Definition 3.2 Confidence intervals. Let x1 , ..., xn be a sample from the population f (x, θ) where θ is the parameter. Suppose that it is possible to construct two functions of the sample values φ1 (x1 , ..., xn ) and φ2 (x1 , ..., xn ) so that the probability for the random interval [φ1 , φ2 ] covers the unknown parameter θ is 1 − α for a given α. That is, P r{φ1 (x1 , ..., xn ) ≤ θ ≤ φ2 (x1 , ..., xn )} = 1 − α for all θ in the parameter space Ω. Then 1 − α is called the confidence coefficient, the interval [φ1 , φ2 ] is called a 100(1 − α)% confidence interval

Interval Estimation

121

for θ, φ1 is called the lower confidence limit, φ2 is called the upper confidence limit and φ2 − φ1 the length of the confidence interval. When a random interval [φ1 , φ2 ] is given we are placing 100(1 − α)% confidence on our interval saying that this interval will cover the true parameter value θ with probability 1 − α. The meaning is that if we construct the same interval by using samples of the same size n then in the long run 100(1 − α)% of the intervals will contain the true parameter θ. If one interval is constructed then that interval need not contain the true parameter θ, the chance that this interval contains the true parameter θ is 1 − α. In our Example 3.1 we were placing 95% confidence in the interval n ) (x1 +...+xn ) [ (x1 v+...+x , v0.975 ] to contain the unknown parameter θ. 0.025 From Example 3.1 and the discussions above, it is clear that we will be successful in coming up with a 100(1 − α)% confidence interval for a given parameter θ if we have the following: (i): A pivotal quantity Q, that is, a quantity containing the sample values and the parameter θ but whose distribution is free of all parameters. [Note that there may be many pivotal quantities in a given situation]; (ii): Q enables us to convert a probability statement on Q into an equivalent statement on θ. How many such 100(1−α)% confidence intervals can be constructed for a given θ, if one such interval can be constructed? The answer is: infinitely many. From our Example 3.1 it is seen that instead of cutting off 0.025 or in general α2 at both ends we could have cut off α at the right tail, or α at the left tail or any α1 at the left tail and α2 at the right tail so that α1 + α2 = α. In our example, vα ≤ v < ∞ would have produced an interval of infinite length. Such an interval may not be of much use because it is of infinite length, but our aim is to give an interval which covers the unknown θ with a given confidence coefficient 1 − α, and if we say that an interval of infinite length will cover the unknown parameter then such a statement may not have much significance. Hence a very desirable property is that the expected length of the interval is as short as possible. Definition 3.3 Central intervals. Confidence intervals, obtained by cutting off equal areas α2 at both the tails of the distribution of the pivotal quantity so that we obtain a 100(1 − α)% confidence interval, are called

122

Probability and Statistics Part II

central intervals. It can be shown that if the pivotal quantity has a symmetric distribution then the central interval is usually the shortest in expected value. Observe also that when the length, which is the upper confidence limit minus the lower confidence limit, is taken it may be free of all variables. In this case the length and the expected length are one and the same. 3.2. Confidence Interval for Parameters in an Exponential Population We have already given one example for setting up confidence interval for the parameter θ in the exponential population f (x, θ) =

1 −x e θ , x ≥ 0, θ > 0 θ

n) and zero elsewhere. Our pivotal quantity was u = (x1 +...+x where u has θ a gamma distribution with the parameters (α = n, β = 1) where n is the sample size, which is known. Hence there is no parameter and therefore probabilities can be read from incomplete gamma tables or can be obtained by integration by parts. Then a 100(1 − α)% confidence interval for θ in an exponential population is given by

·

¸ (x1 + ... + xn ) (x1 + ... + xn ) , = a 100(1 − α)% confidence interval u α2 u1− α2 where Z u1− α 2 α g(u) du = 2 0 Z ∞ α g(u) du = (3.2.1) 2 uα 2

and g(u) =

un−1 −u e , u ≥ 0. Γ(n)

Example 3.2. Construct a 100(1−α)% confidence interval for the location parameter γ in an exponential population, where the scale parameter θ

Interval Estimation

123

is known, say θ = 1. Assume that a simple random sample of size n is available. Solution 3.2. The density function is given by f (x, γ) = e−(x−γ) , x ≥ γ and zero elsewhere. Let us consider the MLE of γ which is the smallest order statistic xn:1 = y1 . Then the density of y1 is available as d [P r{xj ≥ z}]n |z=y1 dz = ne−n(y1 −γ) , y1 ≥ γ

g(y1 , γ) = −

and zero elsewhere. Let u = y1 − γ. Then u has the density, denoted by g1 (u), as follows: g1 (u) = ne−nu , u ≥ 0 and zero elsewhere. Then we can read off u α2 and u1− α2 for any given α from this density. That is, Z u1− α 2 α α −nu1− α 2 = ne−nu du = ⇒ 1 − e 2 2 0 1 α ⇒ u1− α2 = − ln(1 − ) (a) n 2 Z ∞ α α −nu α 2 = ne−nu du = ⇒ e 2 2 uα 2

⇒ u α2 = −

1 α ln( ). n 2

(b)

Now, we have the probability statement P r{u1− α2 ≤ y1 − γ ≤ u α2 } = 1 − α. That is, P r{y1 − u α2 ≤ γ ≤ y1 − u1− α2 } = 1 − α. Hence a 100(1 − α)% confidence interval for γ is given by [y1 − u α2 , y1 − u1− α2 ].

(3.2.2)

124

Probability and Statistics Part II

For example, for an observed sample 2, 8, 5 of size 3 a 95% confidence interval for gamma is given by the following: α = 0.025. 2 1 α 1 = − ln( ) = − ln(0.025). n 2 3 1 = − ln(0.975). 3

α = 0.05 ⇒ u α2 u1− α2

An observed value of y1 = 2. Hence a 95% confidence interval for γ is [2 + 13 ln(0.975), 2 + 31 ln(0.025)]. Note 3.1. If both scale parameter θ and location parameter γ are present then we need simultaneous confidence intervals or a confidence region for the point (θ, γ). Confidence region will be considered later. Note 3.2. In Example 3.2 we have taken the pivotal quantity as the smallest order statistic -γ. We could have constructed confidence interval by using a single observation or sum of observations or the sample mean. 3.3. Confidence Interval for the Parameters in a Uniform Density Consider x1 , ..., xn , iid from a one parameter uniform density f (x, θ) =

1 , 0≤x≤θ θ

and zero elsewhere. Let us construct a 100(1 − α)% confidence interval for θ. Assume that a simple random sample of size n is available. The largest order statistic seems to be a convenient starting point. Let yn = xn:n be the largest order statistic. Then yn has the density g(yn , θ) =

d n [P r{xj ≤ z}]n |z=yn = n ynn−1 , 0 ≤ yn ≤ θ. dz θ

Let us take the pivotal quantity as u = yθn . The density of u, denoted by g1 (u) is given by g1 (u) = nun−1 , 0 ≤ u ≤ 1

Interval Estimation

125

and zero elsewhere. Hence Z u1− α 2 α α 1 nun−1 du = ⇒ u1− α2 = [ ] n 2 2 0 and Z 1 α α 1 nun−1 du = ⇒ u α2 = [1 − ] n . 2 2 uα 2

Therefore © ª P r u1− α2 ≤ u ≤ u α2 = 1 − α nα 1 α 1o yn ⇒ Pr [ ]n ≤ ≤ [1 − ] n = 1 − α θ 2 ) (2 yn yn = 1 − α. ⇒ Pr 1 ≤ θ ≤ 1 α n α n (1 − 2 ) (2) Hence a 100(1 − α)% confidence interval for θ in this case is "

yn (1 −

1 α n 2)

,

yn 1 α n (2)

# .

(3.3.1)

For example, for an observed sample 8, 2, 5 from this one parameter uniform population a 90% confidence interval for θ is given by [ 8 1 , 8 1 ]. (0.95) 3

(0.05) 3

Note 3.3. If the uniform population is over [a, b], b > a then by using the largest and smallest order statistics one can construct confidence intervals for b when a is known, for a when b is known. Simultaneous intervals for a and b will be discussed later. 3.4. Confidence Intervals in Discrete Distributions Here we will consider a general procedure of setting up confidence intervals for the Bernoulli parameter p and the Poisson parameter λ. In discrete cases, such as a binomial, cutting off tail probability equal to α2 each may not be possible because the probability masses are at individually distinct points. When we add up the tail probabilities we may not get exact values α 2 , for example 0.025. When we add up a few point the sum of the probabilities may be less than 0.025 and when we add up the next probability

126

Probability and Statistics Part II

the total may exceed 0.025. Hence in discrete situations we take the tail probabilities as ≤ α2 so that the middle probability will be ≥ 1 − α. Take the nearest point so that the tail probability is closest to α2 but less than or equal to α2 . 3.4.1. Confidence interval for the Bernoulli parameter p We can set up confidence intervals for the Bernoulli parameter p by taking n observations from a Bernoulli population or one observation from a binomial population. The binomial population has the probability function µ ¶ n x f (x, p) = p (1 − p)n−x , 0 < p < 1, x = 0, 1, ..., n x and zero elsewhere. We can assume n to be known. We will see that we cannot find a pivotal quantity Q so that the probability function of Q is free of p. For a binomial random variable x we can make a statement α P r{x ≤ x1− α2 } ≤ , (3.4.1) 2 that is, the left tail probability is less than or equal to α2 for any given α if p is known. But since x is not a pivotal quantity x1− α2 will be a function of p, that is x1− α2 (p). For a given p we can compute x1− α2 for any given α. For a given p we can compute two points x1 (p) and x2 (p) such that P r{x1 (p) ≤ x ≤ x2 (p)} ≥ 1 − α or we can select x1 (p) and x2 (p), for a given p, such that α P r{x ≤ x1 (p)} ≤ 2 and α P r{x ≥ x2 (p)} ≤ . 2

Figure 3.1: Lower and upper confidence bands

(a) (b) (c)

Interval Estimation

127

For every given p the points x1 (p) and x2 (p) are available. If we plot x = x1 (p) and x = x2 (p), against p then we may get the graphs as shown in Figure 3.1. Let the observed value of x be x0 . If the line x = x0 cuts the bands x1 (p) and x2 (p) then the inverse images will be p1 and p2 as shown in Figure 3.1. The cut on x1 (p) will give p2 and that on x2 (p) will give p1 or a 100(1 − α)% confidence interval for p is [p1 , p2 ]. Note that the region below the line x = x0 is characterized by the probability α2 and similarly the region above the line x = x0 is characterized by α2 . Hence the practical procedure is the following: Consider equation (b) with x1 (p) = x0 and search through the binomial table for a p, then the solution in (b) will give p2 . Take equation (c) with x2 (p) = x0 and search through the binomial tables for a p, then the solution in (c) gives p1 . Note that in some situations the line x = x0 may not cut one or both of the curves x1 (p) and x2 (p). We may have situations where p1 and p2 cannot be found or p1 may be 0 or p1 may be 1. Let the observed value of x be x0 , for example suppose that we observed 3 successes in n = 10 trials. Then our x0 = 3. We can take x1 (p) = x1− α2 (p) = x0 and search for that p, say p2 , which will satisfy the inequality x0 µ ¶ X n x α p2 (1 − p2 )n−x ≤ . x 2 x=0

(3.4.2)

This will give one value of p, namely, p2 for which (3.4.1) holds. Now consider the upper tail probability. Consider the inequality P r{x ≥ x α2 (p)} ≤

α . 2

(3.4.3)

Again let us take x2 (p) = x α2 = x0 and search for p for which (3.4.3) holds. Call it p1 . That is, n µ ¶ X n x α p1 (1 − p1 )n−x ≤ . 2 x x=x

(3.4.4)

P r{p1 ≤ p ≤ p2 } ≤ 1 − α

(3.4.5)

0

Then is the required 100(1 − α)% confidence interval for p.

128

Probability and Statistics Part II

Example 3.3. If 10 Bernoulli trials gave 3 successes compute a 95% confidence interval for the probability of success p. Note that for the same p both (3.4.2) and (3.4.4) cannot hold simultaneously. Solution 3.3. Consider the inequality 3 µ ¶ X 10 x p2 (1 − p2 )10−x ≤ 0.025. x x=0

Look through a binomial table for n = 10 and all values of p. From tables we see that for p = 0.5 the sum is 0.1710 which indicates that the value of p2 is bigger than 0.5. Most of the tables are given only for p up to 0.5. The reason being that for p > 0.5 we can still use the same table. By putting y = n − x and writing 3 µ ¶ X 10 x=0

x

x

10−x

p (1 − p)

=

3 µ ¶ X 10 x=0

y

q y (1 − q)n−y

10 µ ¶ X 10 y = q (1 − q)10−y ≤ 0.025 y y=7

where q = 1 − p. Now looking through the binomial tables we see that q = 0.4. Hence p2 = 1 − q = 0.6. Now we consider the inequality 10 µ ¶ X 10 x p1 (1 − p1 )10−x ≤ 0.025, x x=3

which is the same as saying 2 µ ¶ X 10 x p1 (1 − p1 )10−x ≥ 0.975. x x=0

Now, looking through the binomial table for n = 10 and all p we see that p1 = 0.05. Hence the required 95% confidence interval for p is [p1 , p2 ] = [0.05, 0.60]. We have 95% confidence on this interval. Note 3.4. We can use this exact procedure of this section to construct confidence interval for the parameter θ of a one-parameter distribution whether

Interval Estimation

129

we have a pivotal quantity or not. Take any convenient statistic T for which the distribution can be derived. This distribution will contain θ. Let T0 be the observed value of T . Consider the inequalities P r{T ≤ T0 } ≤

α 2

(a)

P r{T ≥ T0 } ≤

α . 2

(b)

and

If the inequalities have solutions, note that both cannot be satisfied by the same θ value, then the solution of (a) gives θ2 and the solution of (b) gives θ1 and then [θ1 , θ2 ] is a 100(1−α)% confidence interval for θ. As an exercise the student is advised to use this exact procedure to construct confidence interval for θ in an exponential population. Use the sample sum as T . This exact procedure can be adopted for getting confidence intervals for the Poisson parameter λ. In this case make use of the property that the sample sum is again a Poisson with the parameter nλ. This is left as an exercise to the student. Exercises 3.1-3.4. 3.4.1. Construct a 95% confidence interval for the location parameter γ in an exponential population in Example 3.2 by using (1): x ¯ the sample mean of a sample of size n; (2): the sample sum for a sample of size 2; (3) one observation from the population. 3.4.2. By using the observed sample 3, 8, 4, 5 from an exponential population 1 f (x, θ, γ) = e−(x−γ) , x ≥ γ, θ > 0 θ and zero elsewhere, construct a 95% confidence interval for (1): θ if γ = 2; (2): γ if θ = 4. 3.4.3. Consider a uniform population over [a, b], b > a. Assume that the observed sample 2, 8, 3 is available from this population. Construct a 95% confidence interval for (1): a when b = 8; (2): b when a = 1, by using order statistics.

130

Probability and Statistics Part II

3.4.4. Consider the same uniform population in Exercise 3.4.3 with a = 0. Assume that a sample of size 2 is available. (1): Compute the density of the sample sum y = x1 + x2 ; (2): by using y construct a 95% confidence interval for b if the observed sample is 2, 6. 3.4.5. Construct a 90% confidence interval for the Bernoulli parameter p if 2 successes are obtained in (1): ten trials; (2) eight trials. 3.4.6. Consider a Poisson population with parameter λ. Construct a 90% confidence interval for λ if 3, 7, 4 is an observed sample. 3.5.

Confidence Intervals for Parameters in N (µ, σ 2 )

First we will consider a simple problem of constructing a confidence interval for the mean value µ in a normal population when the population variance is known. Then we will consider intervals for µ when σ 2 is not known. Then we will look at intervals for σ 2 . In the following situations we will be constructing the central intervals for convenience. These central intervals will be the shortest when the pivotal quantities have symmetric distributions. In the case of confidence intervals for the population variance the pivotal quantity taken is a chisquare variable, which does not have a symmetric distribution and hence the central interval cannot be expected to be the shortest, but for convenience we will consider the central intervals in all situations. 3.5.1. Confidence intervals for µ Case 1: Population variance σ 2 is known. Here we can take a pivotal quantity as the standardized sample mean √ z=

n(¯ x − µ) ∼ N (0, 1) σ

which is free of all parameters when σ is known. Hence we can read off z α2 and z1− α2 so that P r{z1− α2 ≤ z ≤ z α2 } = 1 − α. Since a standard normal density is symmetric at z = 0 we have z1− α2 = −z α2 . Let us examine the mathematical inequalities.

Interval Estimation

131



n(¯ x − µ) ≤ z α2 σ σ σ ¯ − µ ≤ z α2 √ ⇒ −z α2 √ ≤ x n n σ σ ⇒x ¯ − z α2 √ ≤ µ ≤ x ¯ + z α2 √ n n

−z α2 ≤ z ≤ z α2 ⇒ −z α2 ≤

and hence

½

P r{−z α2

σ σ ¯ + z α2 √ ≤ z ≤ z α2 } = P r x ¯ − z α2 √ ≤ µ ≤ x n n = 1 − α.

¾

Hence a 100(1 − α)% confidence interval for µ, when σ 2 is known, is given by · ¸ σ σ x ¯ − z α2 √ , x ¯ + z α2 √ . (3.5.1) n n The following Figure 3.2 gives an illustration of the construction of the central confidence interval for µ in a normal population with σ 2 known.

Figure 3.2: Confidence interval for µ in a N (µ, σ 2 ), σ 2 known. Example 3.4. Construct a 95% confidence interval for µ in a N (µ, σ 2 = 4) from the following observed sample: −5, 0, 2, 15. Solution 3.4. Here the sample mean x ¯ = (−5 + 0 + 2 + 15)/4 = 3. 1 − α = 0.95 means α2 = 0.025. From a standard normal table we have z0.025 = 1.96 approximately. σ 2 is given to be 4 and hence σ = 2. Therefore from (3.5.1), one 95% confidence interval for µ is given by 2 σ σ 2 [¯ x − z α2 √ , x ¯ + z α2 √ ] = [3 − 1.96( ), 3 + 1.96( )] 2 2 n n = [1.04, 4.96].

132

Probability and Statistics Part II

We have 95% confidence that the unknown µ is on this interval. Note that the length of the interval in this case is σ σ σ [¯ x + z α2 √ ] − [¯ x − z α2 √ ] = 2z α2 √ n n n which is free of all variables and hence it is equal to its expected value, or the expected length of the interval in this case is 2z α2 √σn = 2(1.96) = 3.92 in Example 3.4. Example 3.5. For a binomial random variable x it is known that for large n (n ≥ 20, np ≥ 5, n(1 − p) ≥ 5) the standardized binomial variable is approximately a standard normal. By using this approximation set up an approximate 100(1−α)% confidence interval for p the probability of success. Solution 3.5. We will construct a central interval. We have x − np p ≈ z, z ∼ N (0, 1). np(1 − p) From a standard normal table we can obtain z α2 so that an approximate probability is the following: P r{−z α2 ≤ p

x − np np(1 − p)

≤ z α2 } ≈ 1 − α.

The inequality can be written as (x − np)2 ≤ z 2α2 . np(1 − p) Opening this up as a quadratic equation in p, when the equality holds, and then solving for p, one has

p=

(x + 12 z 2α ) ∓ 2

q (x + 12 z 2α )2 − x2 (1 + n1 z 2α ) 2

n(1 +

1 2 ) nzα 2

2

.

(3.5.2)

These two roots are the lower and upper 100(1 − α)% central confidence limits for p approximately. For example, for n = 20, x = 8, α = 0.05

Interval Estimation

133

we have z0.025 = 1.96. Substituting these values in (3.5.2) we obtain the approximate roots as 0.22 and 0.61. Hence an approximate 95% central confidence interval for the binomial parameter p in this case is [0.22, 0.61]. [Simplifications of the computations are left to the student]. Case 2: Confidence intervals for µ when σ 2 is unknown. In this case we cannot take the standardized normal variable as our pivotal quantity because, even though the distribution of the standardized normal is free of all parameters, we have a σ present in the standardized variable, which acts as a nuisance parameter here. Definition 3.3 Nuisance parameters. These are parameters which are not relevant for the problem under consideration but which are going to be present in the computations. Hence our aim is to come up with a pivotal quantity involving the sample values and µ only and whose distribution is free of all parameters. We have such a quantity here, which is the Student-t variable. Consider the following pivotal quantity, which has a Student-t distribution: Pn √ ¯ )2 n(¯ x − µ) j=1 (xj − x 2 ∼ tn−1 , s1 = (3.5.3) s1 n−1 where s21 is an unbiased estimator for the population variance σ 2 . Note that a Student-t distribution is symmetric around t = 0. Hence we can expect the central interval being the shortest interval in expected value. For constructing a central 100(1 − α)% confidence interval for µ read off the upper tail point tn−1, α2 such that P r{tn−1 ≥ tn−1, α2 } =

α . 2

Then we can make the probability statement P r{−tn−1, α2 ≤ tn−1 ≤ tn−1, α2 } = 1 − α.

(3.5.4)

Substituting for tn−1 and converting the inequalities into inequalities over µ we have the following: s1 s1 P r{¯ x − tn−1, α2 √ ≤ µ ≤ x ¯ + tn−1, α2 √ } = 1 − α n n

(3.5.5)

134

Probability and Statistics Part II

which gives a central 100(1 − α)% confidence interval for µ. Figure 3.3 gives the illustration of the percentage points.

Figure 3.3: Percentage points from a Student-t density The interval is of length 2tn−1, α2 √s1n , which contains the variable s1 and hence it is a random quantity. We can compute the expected value of this length by using the fact that (n − 1)s21 ∼ χ2n−1 σ2 where χ2n−1 is a chisquare variable with (n − 1) degrees of freedom. Example 3.6. Construct a 99% confidence interval for µ in a normal population with unknown variance, by using the observed sample 1, 0, 5 from this normal population. Solution 3.6. The sample mean x ¯ = (1 + 0 + 5)/3 = 2. An observed value of s21 is given by 1 [(1 − 2)2 + (0 − 2)2 + (5 − 2)2 ] = 7 2 √ ⇒ s1 = 7 = 2.6457513.

s21 =

Now, our α = 0.01 ⇒ α2 = 0.005. From a Student-t table for n − 1 = 2 degrees of freedom, t2,0.005 = 9.925. Hence a 99% central confidence interval for µ here is given by √ √ 7 7 [2 − 9.925 √ , 2 + 9.925 √ ] ≈ [−13.16, 17.16]. 3 3 Note 3.5. In some books the students may find the statement that when the sample size n ≥ 30 one can get a good normal approximation for

Interval Estimation

135

Student-t and hence take zα from a standard normal table instead of tν,α from the Student-t table with ν degrees of freedom, for ν ≥ 30. The student may look into the exact percentage points from the Student-t table to see that even for the degrees of freedom ν = 120 the upper tail areas of the standard normal and Student-t do not agree with each other. Hence taking zα instead of tν,α for ν ≥ 30 is not a proper procedure. 3.5.2. Confidence intervals for σ 2 in N (µ, σ 2 ) Here we can consider two situations. (1): µ is known, (2): µ is not known, and we wish to construct confidence intervals for σ 2 in N (µ, σ 2 ). Convenient pivotal quantities are the following: When µ is known we can use n n X X (xj − µ)2 (xj − x ¯)2 2 ∼ χ and ∼ χ2n−1 . n 2 2 σ σ j=1 j=1

Then from a chisquare density we have   n   2 X (xj − µ) 2 2 P r χn,1− α2 ≤ ≤ χn, α2 = 1 − α   σ2

(3.5.6)

j=1

and

  n   2 X (xj − x ¯) 2 α P r χ2n−1,1− α2 ≤ ≤ χ = 1 − α. n−1, 2   σ2 j=1

The percentage points are marked in Figure 3.4.

Figure 3.4: Percentage points from a chisquare density

(3.5.7)

136

Probability and Statistics Part II

Note that (3.5.6) can be rewritten as ( Pn Pr

2 j=1 (xj − µ)

χ2n, α 2

Pn ≤ σ2 ≤

2 j=1 (xj − µ)

) = 1 − α.

χ2n,1− α 2

A similar probability statement can be obtained by rewriting (3.5.7). Therefore a 100(1 − α)% central confidence interval for σ 2 is the following: " Pn

2 j=1 (xj − µ)

χ2n, α 2

Pn ,

2 j=1 (xj − µ)

χ2n,1− α 2

# " Pn ;

¯)2 j=1 (xj − x χ2n−1, α 2

Pn ,

¯)2 j=1 (xj − x χ2n−1,1− α

# .

2

(3.5.8) Note that a χ2 distribution is not symmetric and hence we cannot expect to get the shortest interval by taking the central intervals. The central intervals are P taken only for convenience. When µ is unknown then we cannot use n

(xj −µ)2

∼ χ2n because the nuisance parameter µ is present. We can use the pivotal quantity j=1

σ2

n X (xj − x ¯)2 ∼ χ2n−1 2 σ j=1

and construct a 100(1 − α)% central confidence interval, and it is the second one given in (3.5.8). When µ is known we can also use the standardized normal √ n(¯ x − µ) ∼ N (0, 1) σ as a pivotal quantity to construct confidence interval for σ, thereby the confidence interval for σ 2 . Note that if [T1 , T2 ] is a 100(1 − α)% confidence interval for θ then [g(T1 ), g(T2 )] is a 100(1−α)% confidence interval for g(θ) when θ to g(θ) is a one to one function. Example 3.7. If −2, 1, 7 is an observed sample from a N (µ, σ 2 ) construct a 95% percent confidence interval for σ 2 when (1): µ = 1, (2): µ is unknown. P3 Solution 3.7. x ¯ = (−2+1+7) = 2, j=1 (xj − x ¯)2 = (−2 − 2)2 + (1 − 2)2 + 3 P 3 2 1 2 2 (7 − 2)2 = 42. j=1 (xj − µ) = (−2 − 1) + (1 − 1) + (7 − 1) = 45.

Interval Estimation

137

1 − α = 0.95 ⇒ α2 = 0.025. From a chisquare table χ2n, α = χ23,0.025 = 9.35, 2 χ2n−1, α = χ22,0.025 = 7.38, χ2n,1− α = χ23,0.975 = 0.216, χ2n−1,1− α = χ22,0.975 = 2 2 2 0.0506. (2): Then when µ is unknown a 95% central confidence interval for σ 2 is given by " Pn

¯)2 j=1 (xj − x χ2n−1, α

Pn ,

2

¯)2 j=1 (xj − x

#

χ2n−1,1− α 2

·

42 42 = , 7.38 0.0506

¸

= [5.69, 830.04] . (1): When µ = 1 we can use the above interval as well as the following interval: " Pn

j=1 (xj − χ2n, α 2

µ)2

Pn ,

j=1 (xj − χ2n,1− α 2

µ)2

#

·

45 45 = , 9.35 0.216

¸

= [4.81, 208.33]. Note that when the information about µ is used the interval is shorter. Note 3.6. The student may be wondering whether it is possible to construct confidence intervals for σ, once confidence interval for σ 2 is established. Then take the corresponding square roots. If [φ1 (x1 , .., xn ), φ2 (x1 , ..., xn )] is a 100(1 − α)% confidence interval for θ then [h(φ1 ), h(φ2 )] is a 100(1 − α)% confidence interval for h(θ) as long as θ to h(θ) is a one to one function. Exercises 3.5. 3.5.1. Consider a 100(1−α)% confidence interval for µ in a N (µ, σ 2 ) where σ 2 is known, by using the standardized sample mean. Construct the interval so that the left tail area left out is α1 and the right tail area left out is α2 so that α1 + α2 = α. Show that the length of the interval is shortest when α1 = α2 = α2 . 3.5.2. Let x1 , ..., xn be iid as N (µ, σ 2 ) where σ 2 is known. Construct a 100(1 − α)% central confidence interval for µ by using the statistic c1 x1 + ... + cn xn where c1 , ..., cn are known constants. Illustrate the result for c1 = 2, c2 = −3, c3 = 5 and based on the observed sample 2, 1, −5.

138

Probability and Statistics Part II

3.5.3. Construct (1): a 90%, (2): a 95%, (3): a 99% central confidence interval for µ in Exercise 3.5.1 with σ 2 = 2 and based on the observed sample −1, 2, 5, 7. 3.5.4. Do the same Exercise 3.5.3 if σ 2 is unknown. 3.5.5. Compute the expected length in the central interval for the parameter µ in a N (µ, σ 2 ), where σ 2 is unknown, and based on a Student-t statistic. 3.5.6. Compute the expected length as in Exercise 3.5.5 if the interval is obtained by cutting off the areas α1 at the left tail and α2 at the right tail. Show that the expected length is least when α1 = α2 . 3.5.7. Construct a 95% central confidence interval for µ in a N (µ, σ 2 ), when σ 2 is unknown, by using the statistic u = 2x1 + x2 − 5x3 , and based on the observed sample 5, −2, 6. 3.5.8. By using the standard normal approximation for a standardized binomial variable construct a 90% confidence interval (central) for p the probability of success if (1): 7 successes are obtained in 20 trials; (2): 12 successes are obtained in 22 trials. 3.5.9. The grades obtained by students in a Statistics course are assumed to be normally distributed with mean value µ and variance σ 2 . Construct a 95% confidence interval for σ 2 when (1): µ = 80, (2): µ is unknown, based on the following observed sample: 75, 85, 90, 90; (a) Consider central intervals, (b) Consider cutting off 0.5 at the right tail. 3.5.10. Show that for the problem of constructing confidence interval for σ 2 in a N (µ, σ 2 ), based on a pivotal quantity having a chisquare distribution, the central interval is not the shortest in expected length when the degrees of freedom is small. 3.6. Confidence Intervals for Linear Functions of Mean Values Here we are mainly interested in situations of the following types: (1) A new drug is administered to lower blood pressure in human beings. A random sample of n individuals is taken. Let xj be the blood pressure before administering the drug and yj be the blood pressure after adminis-

Interval Estimation

139

tering the drug on the j-th individual, for j = 1, ..., n. Then we have paired values (xj , yj ), j = 1, ..., n. Our aim may be to estimate the expected difference, namely µ2 − µ1 , µ2 = E(yj ), µ1 = E(xj ) and test a hypothesis that (xj , yj ), j = 1, ..., n are identically distributed. But obviously, y = the blood pressure after administering the drug depends on x = the blood pressure before administering the drug. Here x and y are dependent variables and may have a joint distribution. (2) A sample of n1 test plots are planted with corn variety 1 and a sample of n2 test plots are planted with corn variety 2. Let x1 , ..., xn1 be the observations on the yield x of corn variety 1 and let y1 , ..., yn2 be the observations on the yield y of corn variety 2. Let the test plots be homogeneous in all respects. Let E(x) = µ1 and E(y) = µ2 . Someone may have a claim that the expected yield of variety 2 is 3 times that of variety 1. Then our aim may be to estimate µ2 − 3µ1 . If someone has the claim that variety 2 is better than variety 1 then our aim may be to estimate µ2 − µ1 . In this example, without loss of generality, we may assume that x and y are independently distributed. (3) A random sample of n1 students of the same background are subjected to method 1 of teaching (consisting of lectures followed by one final examination), and a random sample of n2 students of the same background, as of the first set of students, are subjected to method 2 of teaching (may be consisting of each lecture followed by problem sessions and three cumulative tests). Our aim may be to claim that method 2 is superior to method 1. Let µ2 = E(y), µ1 = E(x) where x and y represent the grades under method 1 and method 2 respectively. Then we may want to estimate µ2 − µ1 . Here also it can be assumed that x and y are independently distributed. (3) Suppose that a farmer has planted 5 varieties of paddy (rice). Let the yield per test plot of the 5 varieties be denoted by x1 , ..., x5 with µi = E(xi ), i = 1, ..., 5. The market prices of these varieties are respectively Rs 20, Rs 25, Rs 30, Rs 32, Rs 38 per kilogram. Then the farmer’s interest may be to estimate the money value, that is, 20µ1 + 25µ2 + 30µ3 + 32µ4 + 38µ5 . Variety i may be planted in ni test plots so that the yields are xij , j = 1, ..., ni , i = 1, ..., 5, where xij is the yield of the j-th test plot under variety i. Problems of the above types are of interest in this section. We will consider only situations involving two variables. The procedure is exactly parallel when more variables are involved. In the two variables case also we will look at situations where the variables are dependent in the sense of having a joint distribution, and situations where the variables are assumed

140

Probability and Statistics Part II

to be statistically independently distributed in the sense of holding product probability property will be considered later. 3.6.1. Confidence intervals for mean values when the variables are dependent When we have paired variables (x, y), where x and y are dependent, then one way of handling the situation is to consider u = y − x, in situations such as blood pressure before administering the drug (x) and blood pressure after administering the drug (y), if we wish to estimate µ2 − µ1 = E(y) − E(x). If we wish to estimate a linear function aµ1 + bµ2 then consider the function u = ax + by. For example, a = −1 and b = 1 gives µ2 − µ1 . When (x, y) has a bivariate normal distribution then it can be proved that every linear function is univariate normal. That means, u ∼ N (˜ µ, σ ˜2) 2 2 2 2 2 2 where µ ˜ = aµ1 + bµ2 and σ ˜ = a σ1 + b σ2 + 2abCov(x, y), σ1 = Var(x), 2 σ2 = Var(y). Now, construct confidence intervals for the mean value of u in situations where (1): Var(u) is known, (2): Var(u) is unknown, and confidence intervals for Var(u) for the cases when (1): E(u) is known, (2): E(u) is unknown, by using the procedures in Section 3.5. Note that we need not know about the individual parameters µ1 , µ2 , σ12 , σ22 and Cov(x, y) in this procedure. Note 3.7 Many books may proceed with the assumption that x and y are independently distributed, in situations like blood pressure example, claiming that the effect of the drug is washed out after two hours or dependency is gone after two hours. Assuming statistical independence in such situations is not a proper procedure. When paired values are available we can handle by using u as described above, which is a correct procedure when the joint distribution is normal. If the joint distribution is not normal then we may evaluate the distribution of a linear function first and then use a linear function to construct confidence intervals for linear functions for mean values. Example 3.8. The following are the paired observations on (x, y) = (1, 4), (4, 8), (3, 6), (2, 7) where x is the amount of a special animal feed and y is the gain in weight. It is conjectured that y is approximately 3x + 1. Construct a 95% confidence interval for (1): E(u) = E[y − (3x + 1)] = µ2 − 3µ1 − 1, E(y) = µ2 , E(x) = µ1 , (2): variance of u, assuming that (x, y) has a bivariate normal distribution.

Interval Estimation

141

Solution 3.8. Let u = y − 3x − 1. Then the observations on u are the following: u1 = 4 − 2(1) − 1 = 1, u2 = 8 − 2(4) − 1 = −1, u3 = 6 − 2(3) − 1 = −1, 1 1 u4 = 7 − 2(2) − 1 = 2, u ¯ = (1 − 1 − 1 + 2) = 4 4 n 2 X (uj − u ¯) s21 = ; Observed value n−1 j=1 =

1 1 1 1 1 108 [(1 − )2 + (−1 − )2 + (−1 − )2 + (2 − )2 ] = . 3 4 4 4 4 16 × 3 √ n[¯ u − E(¯ u)] ∼ tn−1 = t3 s1

(3.6.1)

is Student -t with 3 degrees of freedom. [Since all linear functions of normal variables (correlated or not) are normally distributed, u is N (µ, σ 2 ) where µ = E(u), σ 2 = Var(u)]. tn−1, α2 = t3,0.025 = 3.182 from Student-t tables (see the illustration in Figure 3.3). Hence a 95% central confidence interval for E(u) = µ2 − 3µ1 − 1 is the following: √ √ 1 108 1 108 s1 s1 ¯ + tn−1, α2 √ ] = [ − 3.182 √ , + 3.182 √ ] [¯ u − tn−1, α2 √ , u 4 n n 4( 12) 4 4( 12) = [−2.14, 2.64]. For constructing a 95% confidence interval for Var(u) one can take the pivotal quantity as n X (uj − u ¯)2 ∼ χ2n−1 = χ23 ; χ23,0.025 = 9.35, χ23,0.975 = 0.216. 2 σ j=1

See the illustration of the percentage points in Figure 3.4. Then a 95% central confidence interval is given by the following: " Pn

¯)2 j=1 (uj − u χ2n−1, α 2

Pn ,

¯)2 j=1 (uj − u χ2n−1,1− α 2

#

·

108 108 = , 16(9.35) 16(0.216) = [0.72, 31.25].

¸

142

Probability and Statistics Part II

Note 3.8. Note that in the paired variable (x, y) case if our interest is to construct a confidence interval for µ2 − µ1 then take u = y − x and proceed as above. Whatever be the linear function of µ1 and µ2 , for which a confidence interval is needed, take the corresponding linear function of x and y as u and then proceed. Do not assume statistical independence of x and y unless there is theoretical justification to do so. 3.6.2. Confidence intervals for linear functions of mean values when there is statistical independence If x and y are statistically independently distributed with E(x) = µ1 , E(y) = µ2 , Var(x) = σ12 , Var(y) = σ22 and if simple random samples of sizes n1 and n2 are available from x and y then how can we set up confidence intervals for aµ1 + bµ2 + c where a, b, c are known constants? Let x1 , ..., xn1 and y1 , ..., yn2 be the samples from x and y respectively. If x and y are normally distributed then the problem is easy, otherwise one has to work out the distribution of the linear function first and then proceed. Let us assume that x ∼ N (µ1 , σ12 ), y ∼ N (µ2 , σ22 ) and be independently distributed. Let Pn1 x ¯=

j=1

n1

xj

Pn2 , y¯ =

j=1

n2

yj

,

v12

=

n1 X

2

(xj − x ¯) ,

j=1

v22

=

n2 X

(yj − y¯)2 (3.6.2)

j=1

and u = ax + by + c. Then u ∼ N (µ, σ 2 ), where µ = E(u) = aE[¯ x] + bE[¯ y ] + c = aµ1 + bµ2 + c σ 2 = Var(u) = Var(a¯ x + b¯ y + c) = Var(a¯ x + b¯ y) 2 2 = a Var(¯ x) + b Var(¯ y) since x ¯ and y¯ are independently distributed σ2 σ2 = a2 1 + b2 2 . n1 n2 Our interest here is to set up confidence intervals for aµ1 + bµ2 + c. Usual situation may be to set up confidence intervals for µ2 − µ1 . In that case c = 0, b = 1, a = −1. Various situations are possible. Case 1: σ12 and σ22 are known. In this case we can take the pivotal quantity as the standardized u. That is,

Interval Estimation

143

u − E(u) u − (aµ1 + bµ2 + c) p = q ∼ N (0, 1). σ12 σ22 Var(u) 2 2 a n1 + b n2

(3.6.3)

Hence a 100(1 − α)% central confidence interval for aµ1 + bµ2 + c is the following:   s s 2 2 2 2 u − z α a2 σ1 + b2 σ2 , u + z α a2 σ1 + b2 σ2  (3.6.4) 2 2 n1 n2 n1 n2 where z α2 is illustrated in Figure 3.2. Case 2: σ12 = σ22 = σ 2 = unknown. In this case the population variances are given to be equal but it is unknown. In that case we can use a Student-t statistic. Note from (3.6.2) that E[v12 ] = (n1 − 1)σ12 and E[v22 ] = (n2 − 1)σ22 and hence when σ12 = σ22 = σ 2 then E[v12 + v22 ] = (n1 + n2 − 2)σ 2 or

E[v 2 ] = E

# " Pn1 Pn2 ( j=1 (xj − x ¯)2 + j=1 (yj − y¯)2 ) n1 + n2 − 2

= σ2 .

(3.6.5)

Hence σ ˆ 2 = v 2 can be taken as an unbiased estimator of σ 2 . In the standardized normal variable if we replace σ 2 by σ ˆ 2 then we should get a Student-t with n1 + n2 − 2 degrees of freedom because the corresponding chisquare has n1 + n2 − 2 degrees of freedom. Hence the pivotal quantity that we will use is the following: (a¯ x + b¯ y + c) − (aµ1 + bµ2 + c) (a¯ x + b¯ y + c) − (aµ1 + bµ2 + c) q q = 2 2 2 2 b a σ ˆ n1 + n2 v na 1 + nb 2 ∼ tn1 +n2 −2

(3.6.6)

where v is defined in (3.6.5). Now a 100(1 − α)% central confidence interval for aµ1 + bµ2 + c is given by   s 2 2 a b  (a¯ x + b¯ y + c) ∓ tn1 +n2 −2, α2 v + . (3.6.7) n1 n2

144

Probability and Statistics Part II

The percentage point tn1 +n2 −2, α2 is available from Figure 3.3 and v is available from (3.6.5). If the confidence interval for µ2 − µ1 is needed then put c = 0, b = 1, a = −1 in (3.6.7). Case 3: σ12 and σ22 are unknown but n1 ≥ 30, n2 ≥ 30. In this case one may use the following approximation to standard normal for setting up confidence intervals. (a¯ x + b¯ y + c) − (aµ1 + bµ2 + c) q 2 ∼ N (0, 1) s1 s22 + n1 n2

(3.6.8)

Pn1 (xj −¯x)2 2 Pn2 (yj −¯y)2 approximately, where s21 = j=1 , s2 = j=1 n2 are the sample n1 variances. When n1 and n2 are large, dividing by ni or ni −1 for i = 1, 2 will not make a difference. Then the approximate 100(1−α)% central confidence interval for aµ1 + bµ2 + c is given by s b2 s22 a2 s21 (a¯ x + b¯ y + c) ∓ z α2 + (3.6.9) n1 n2 where the percentage point z α2 is available from the standard normal density in Figure 3.2. 3.6.3. Confidence intervals for the ratio of variances Here again we consider two independently distributed normal variables x ∼ N (µ1 , σ12 ) and y ∼ N (µ2 , σ22 ) and simple random samples of sizes n1 and n2 from x and y respectively. We would like to construct a 100(1 − α)% σ2 confidence interval for θ = σ12 . We will make use of the property that 2

Pn1

j=1 (xj σ2 Pn2 1 j=1 (yj σ22

−x ¯)2

∼ χ2n1 −1

− y¯)2

∼ χ2n2 −1 Pn1 µ ¶ µ ¶ (xj − x ¯)2 /(n1 − 1)] 1 [ j=1 1 u = Pn2 θ [ j=1 (yj − y¯)2 /(n2 − 1)] θ ∼ Fn1 −1,n2 −1 .

(3.6.10)

Interval Estimation

145

From this one can make the following probability statement:

Figure 3.5: Percentage points from a F -density 1 P r{Fn1 −1,n2 −1,1− α2 ≤ u( ) ≤ Fn1 −1,n2 −1, α2 } = 1 − α. θ Rewriting this as a statement on θ we have u u P r{ ≤θ≤ }=1−α Fn1 −1,n2 −1, α2 Fn1 −1,n2 −1,1− α2

(3.6.11)

where the percentage points Fn1 −1,n2 −1, α2 and Fn1 −1,n2 −1,1− α2 are given in Figure 3.5, and Pn1

¯)2 /(n1 − 1)] j=1 (xj − x P ∼ θFn1 −1,n2 −1 . u= n2 (yj − y¯)2 /(n2 − 1)] [ j=1 [

(3.6.12)

σ2

Note 3.9. If confidence intervals for a σ12 = aθ, where a is a constant, is 2 needed then multiply and divide u in (3.6.10) by a, absorb the denominator a with θ and proceed to get the confidence intervals from (3.6.11). Also note that only the central interval is considered in (3.6.11). 1 Note 3.10. Since F -random variable has the property that Fm,n = Fn,m we can convert the lower percentage point Fm,n,1−α/2 to an upper percentage point on Fn,m,α/2 . That is,

Fm,n,1− α2 =

1 Fn,m, α2

.

(3.6.13)

Hence usually the lower percentage points are not given in F -tables. Example 3.9. Nine test plots of variety 1 and 5 test plots of variety 2 of tapioca gave the following summary data: s21 = 10kg and s22 = 5kg,

146

Probability and Statistics Part II

where s21 and s22 are the sample variances. The yield x under variety 1 is assumed to be distributed as N (µ1 , σ12 ) and the yield y of variety 2 is assumed to be distributed as N (µ2 , σ22 ) and independently of x. Construct σ2 a 90% confidence interval for 3 σ12 . 2

Solution 3.9. We want to construct a 90% confidence interval and hence σ2 in our notation, α = 0.10, α2 = 0.05. The parameter of interest is 3θ = 3 σ12 . 2 Construct interval for θ and then multiply by 3. Hence the required statistic, in observed value, is Pn1

¯)2 /(n1 − 1)] j=1 (xj − x u = Pn2 [ j=1 (yj − y¯)2 /(n2 − 1)] [

=

[9s21 /(8)] ∼ F8,4 [5s22 /(4)]

and in observed value =[

9 (9)(10) (5)(5) ]/[ ]= . 8 4 5

From F -tables we have F8,4,0.05 = 6.04 and F4,8,0.05 = 3.84. Hence a 90% central confidence interval for 3θ is given by ·

¸ 27(F4,8,0.05 ) , = , 5(F8,4,0.05 ) 5(F8,4,0.95 ) 5(F8,4,0.05 ) 5 · ¸ 27 27(3.84) = , = [0.89, 20.74]. 5(6.04) 5 27

27

¸

·

27

Note 3.11. Confidence regions. In a population such as gamma (real scalar random variable) there are usually two parameters, the scale parameter β, β > 0 and the shape parameter α, α > 1. If relocation of the variable is involved then there is an additional location parameter γ, −∞ < γ < ∞. In a real scalar normal population N (µ, σ 2 ) there are two parameters µ, −∞ < µ < ∞ and σ 2 , 0 < σ 2 < ∞. The parameter spaces in the 3-parameter gamma density is Ω = {(α, β, γ)|1 < α < ∞, 0 < β < ∞, −∞ < γ < ∞}.

Interval Estimation

147

In the normal case the parameter space is Ω = {(µ, σ 2 )| − ∞ < µ < ∞, 0 < σ 2 < ∞}. Let θ = (θ1 , ..., θs ) represent the set of all parameters in a real scalar population. In the above gamma case θ = (α, β, γ), s = 3 and in the above normal case θ = (µ, σ 2 ), s = 2. We may be able to come up with a collection of one or more functions of the sample values x1 , ..., xn and some of the parameters from θ, say, P = (P1 , ..., Pr ) such that the joint distribution of P is free of all parameters in θ. Then we will be able to make a statement of the type P r{P ² R1 } = 1 − α

(3.6.14)

for a given α, where R1 is a subspace of Rr = R × R × ... × R where R is the real line. If we can convert this statement into a statement of the form P r{S1 covers θ} = 1 − α

(3.6.15)

where S1 is a subspace of the sample space S, then S1 is the confidence region for θ. Since computations of confidence regions will be more involved, we will not be discussing this topic further. Exercises 3.6. 3.6.1. In a weight reduction experiment, a random sample of 5 individuals underwent a certain dieting program. The weight of a randomly selected person, before the program started, is x and when the program is finished it is y. (x, y) is assumed to have a bivariate normal distribution. The following are the observations on (x, y): (80, 80), (90, 85), (100, 80), (60, 55), (65, 70). Construct a 95% central confidence interval for (a) µ1 −µ2 , when (1) variance of x − y is 4, (2) when the variance of x − y is unknown; (b) 0.2µ1 − µ2 when (1): variance of u = 0.2x − y is known to be 5, (2): variance of u is unknown. 3.6.2. Two methods of teaching are experimented on sets of n1 = 10 and n2 = 15 students. These students are assumed to have the same backgrounds and are independently selected. If x and y are the grades of randomly selected students under the two methods respectively and if x ∼ N (µ1 , σ12 ) and y ∼ N (µ2 , σ22 ) construct 90% confidence intervals for (a) µ1 − 2µ2 when (1): σ12 = 2, σ22 = 5, (2): σ12 = σ22 but unknown; (b)

148

Probability and Statistics Part II

2σ12 /σ22 when (1): µ1 = −10, µ2 = 5, (2): µ1 , µ2 are unknown. The following summary statistics are given, with the usual notations: x ¯ = 90, y¯ = 80, 2 2 s1 = 25, s2 = 10. 3.6.3. Consider the same problem as in Exercise 3.6.2 with n1 = 40, n2 = 50 but σ12 and σ22 are unknown. Construct a 95% confidence interval for µ1 − µ2 , by using the same summary data as in Exercise 3.6.2. 3.6.4. Prove that Fm,n,1−α =

1 Fn,m,α .

3.6.5. Let x1 , .., xn be iid variables from some population (discrete or continuous) with mean value µ and variance σ 2 < ∞. Use the result that √

n(¯ x − µ) ∼ N (0, 1) σ

approximately for large n, and set up a 100(1 − α)% confidence interval for µ when σ 2 is known. 3.6.6. The temperature reading x at location 1 and y at location 2 gave the following data. A simple random sample of size n1 = 5 on x gave x ¯ = 20c and s21 = 5c, and a random sample of n2 = 8 on y gave y¯ = 30 and s22 = 8c. If x ∼ N (µ1 , σ12 ) and y ∼ N (µ2 , σ22 ) and independently distributed σ2 then construct a 90% confidence interval for σ12 . 2