Statistical hypothesis testing for dot-matrix type

0 downloads 0 Views 237KB Size Report
Each sensor in a CCD device reads light in some range of wavelength. Hence, the ..... is the cdf of Student's t-distribution with the degrees of freedom. 1. − n. We.
Int. J. Quality Engineering and Technology, Vol. 1, No. 1, 2009

27

Statistical hypothesis testing for dot-matrix type products Chanseok Park Department of Mathematical Sciences, Clemson University, Clemson, SC 29634, USA Fax: +1 864 656 5230 E-mail: [email protected] Abstract: In this paper, we develop statistical hypothesis testing procedures for determining whether manufacturing processes of dot-matrix type products is considered to be currently passing its quality validation test. We provide a test statistic and a rejection region for a given Type I error probability for 1 the case where the observed quality characteristic has a normal distribution 2 the case where the observed quality characteristic has a uniform distribution. In addition to developing these hypothesis testing procedures, we also develop procedures for determining the sample size required for a specified Type II error probability. The proposed methods can be applied to a variety of products made up of identical components or items. Such products include candies in a box, light emitting diode (LED), traffic lights, charge coupled device (CCD) sensors, monitors, scanners, etc. Illustrative examples are provided. Keywords: hypothesis test; Type I and II errors; statistical power; quality assurance. Reference to this paper should be made as follows: Park, C. (2009) ‘Statistical hypothesis testing for dot-matrix type products’, Int. J. Quality Engineering and Technology, Vol. 1, No. 1, pp.27–39. Biographical notes: Chanseok Park is an Associate Professor of Mathematical Sciences at Clemson University, Clemson, SC. He received his BS in Mechanical Engineering from Seoul National University; his MA in Mathematics from the University of Texas at Austin; and his PhD in Statistics in 2000 from the Pennsylvania State University. His research interests include engineering statistics, robust inference, reliability, competing risks model, statistical computing and simulation, acoustics, and solid mechanics.

1

Introduction

Many products consist of a number of identical components or units. Such products include candies in a box, light emitting diode (LED), traffic lights, charge coupled device (CCD) sensors, monitors, LED traffic lights, scanner sensors, etc. For example, suppose that we are interested in quality of candies in a box. To test the quality of candies in a Copyright © 2009 Inderscience Enterprises Ltd.

28

C. Park

box, we measure weights of candies and investigate if each weight is in a specified weight range, say [a ,b ]. Another example for this kind of product is a CCD in digital cameras. Sensors in a CCD read light (chrominance) and convert into digital data (Ang, 2002). This dot-matrix type CCD device is also a typical example of the above products. Each sensor in a CCD device reads light in some range of wavelength. Hence, the quality characteristic of interest is that a sensor passes a quality validation test if it reads wavelengths in an interval [a ,b ]. Many other electronic products also have a quality criterion of this type. Other typical examples include monitors, LED traffic lights, scanners, etc. In this paper, we develop a statistical hypothesis test and a rejection region for 1

the case where the observed quality characteristic has a normal distribution

2

the case where the observed quality characteristic has a uniform distribution.

In addition, we develop a procedure for determining the sample size n needed for a specified Type II error probability γ. The rest of the paper is organised as follows. In Section 2, we provide procedures for testing the quality of manufacturing processes. In Section 3, we provide procedures for determining the sample size required for a specified Type II error probability. The paper ends with concluding remarks in Section 4.

2

Statistical quality test

In this section, we develop statistical hypothesis test procedures for determining whether a manufacturing process is considered to be currently passing its quality validation test. We provide a test statistic and a rejection region for a given Type I error probability for the case where the observed quality characteristic measurements are distributed as a normal distribution and for the case where those are distributed as a uniform distribution. For convenience, we explain our methodology with a CCD device although we can apply the developed methodology to a variety of products. Consider a CCD device. Let the wavelength that a certain sensor in the CCD device can read be represented by the random variable X. A sensor is said to be acceptable if it can read the wavelength in the interval [a ,b ]. Then, the probability that a sensor is acceptable is given by p ( θ ) = Pr {a ≤ X ≤ b} ,

where θ is an unknown parameter of the distribution of the random variable X. An array passes a quality validation test if a minimum portion p0 of its sensors read a wavelength in the interval [a ,b ]. Consequently, the statistical hypotheses are of the form: H 0 : p ( θ ) ≥ p0

against H1 : p ( θ ) < p0 .

(1)

For the above hypotheses, we provide a statistical test procedure for the normal and uniform distribution models for the wavelength X.

Statistical hypothesis testing for dot-matrix type products

29

2.1 Normal distribution model If the wavelength, X, that a censor can read has a normal distribution with unknown mean θ and variance σ 2 , denoted by N (θ, σ 2 ), then the probability that a sensor is acceptable is calculated as b −θ ⎞ ⎛a − θ ⎞, p ( θ ) = Φ ⎛⎜ ⎟ −Φ⎜ ⎟ ⎝ σ ⎠ ⎝ σ ⎠

where Φ(⋅) is the cumulative distribution function (cdf) of the standard normal distribution, N(0, 1). It is easily verified that p ( θ ) is unimodal and symmetric about the centre of the interval [a ,b ]. Let c = (a + b ) 2 be the centre of an interval [a ,b ]. Then, it is easily verified that p (c − t ) = p (c + t ) for any t , so p ( θ ) is symmetric about θ = c. The derivative of p ( θ ) is p′ (θ ) =

1 ⎧ ⎛a −0⎞ ⎛ b − 0 ⎞⎫ ⎨φ ⎜ ⎟ −φ⎜ ⎟⎬ , σ⎩ ⎝ σ ⎠ ⎝ σ ⎠⎭

where φ(⋅) is the probability density function (pdf) of N(0, 1). We have p ′(c ) = 0, p ′(θ ) > 0 for θ < c , and p ′(θ ) < 0 for θ > c. So p ( θ ) is unimodal and has a maximum at θ = c. Hence, the inequality p ( θ ) ≥ p0 is equivalent to θ − c ≤ d , where d is positive and satisfies L − 2d ⎞ ⎛ −L − 2d ⎞ = p , p (c + d ) = Φ ⎛⎜ ⎟ −Φ ⎜ ⎟ 0 ⎝ 2σ ⎠ ⎝ 2σ ⎠

(2)

where L = b − a is the interval length. Hence, we can rewrite the hypotheses in equation (1) as follows: H0 : θ −c ≤ d

against H1 : θ − c > d .

(3)

Since p (c ) is a maximum, the inequality p (c ) > p0 should be satisfied. Otherwise, there is no d satisfying equation (2). It follows that ⎛ b −c ⎞ ⎛ a −c ⎞ ⎛ L ⎞ p (c ) = Φ ⎜ ⎟ −Φ ⎜ ⎟ = 2Φ ⎜ ⎟ − 1 > p0 . ⎝ σ ⎠ ⎝ σ ⎠ ⎝ 2σ ⎠

From this, we have σ
k,

where Xn = (1 n )

∑X

i

and the cutoff k is chosen so that the test has the size α. Since

the null hypothesis is composite, the size α is defined by ⎧⎪ X − c α = sup Pr ⎨ n >k θ −c ≤d ⎩⎪ σ n

⎫⎪ θ ⎬. ⎭⎪

{

We can see that α = Pr | ( Xn − c ) ( σ

}

n ) | > k θ = c ± d . Hence, the test with the above

rejection region is α- similar on the boundary c ± d . Therefore, this is a uniformly most powerful (UMP) unbiased test. For details on UMP tests, see Ferguson (1967) and Lehmann and Romano (2008). Assuming that the variance σ 2 is known, it follows that Xn is distributed as N (θ, σ 2 / n ). Therefore, the power function is easily calculated as ⎪⎧ X − c ⎪⎫ > k θ⎬ K ( θ ) = Pr ⎨ n ⎪⎩ σ n ⎪⎭ ⎛ c −θ ⎞ ⎛ c −θ ⎞ − k ⎟ +1− Φ⎜ + k ⎟. = Φ⎜ ⎝σ n ⎠ ⎝σ n ⎠

(5)

It is easily shown that K ( θ ) is symmetric about θ = c and has minimum at c. So α is obtained at either c − d or c + d . Hence, we end up only needing to solve the following equation for k . ⎛ d ⎞ ⎛ d ⎞ + k ⎟ − Φ⎜ − k ⎟ = 1 − α. Φ⎜ ⎝σ n ⎠ ⎝σ n ⎠

(6)

In practice, the variance of the wavelength is unknown and needs to be estimated using the sample variance Sn2 =

n

1 2 Xi − Xn ) . ( n − 1 i =1



Before doing the hypothesis test, we have to check if the sensor has enough precision by considering equation (4). In this case of σ 2 , we propose to use the sample variance. It is immediate upon replacing σ with Sn in equation (4) that we have

Statistical hypothesis testing for dot-matrix type products Sn
12.68.

Example 2: In this example, we assume that the true variance σ 2 is unknown and needs to be estimated. Therefore, we will need to use the sample variance Sn2 rather than the

actual variance σ 2 . Also, we will need to use the Student’s t-distribution rather than the normal distribution as the reference distribution. To be able to compare the results, we will use the same data that was used in Example 1 but assume that the sample variance is

32

C. Park

calculated to be Sn2 = 0.012. By using Sn = 0.01 in equation (2), we calculate d = 0.02207, the same value of d as in Example 1. Just as was the case in Example 1, we need to check if the sensor has enough precision by checking whether equation (7) is satisfied. With Sn = 0.01, equation (7) is satisfied. Next, we solve equation (8) and obtain k = 12.75 so that the rejection region becomes Xn − 0.545 0.01

25

> 12.75.

This result is very close to the result in Example 1 where σ 2 was assumed to be known. We should also note that, in practice, if the sample size is greater than 30, the normal distribution can be used as the reference distribution. This is because, as n becomes large, the Student’s t-distribution converges to the normal distribution.

2.2 Uniform distribution model Suppose that the wavelength X that a sensor can read has a uniform distribution U (θ − β , θ + β ) with the cdf if ⎧1 ⎪ ⎪x − (θ − β ) FU (x ) = ⎨ if 2β ⎪ ⎪⎩0 if

x >θ+β x −θ ≤ β, x < θ −β

where −∞ < θ < ∞ and β > 0. Then, this probability that a sensor is acceptable is calculated as ⎛b − θ ⎞ ⎛a −θ ⎞ p (θ ) = F0 ⎜ ⎟ − F0 ⎜ β ⎟ , β ⎝ ⎠ ⎝ ⎠

where F0 (⋅) is the cdf of U (−1,1). This probability is displayed in Figure 1. It is easily seen that p (θ ) is unimodal and symmetric about c = (a + b ) 2. Hence, the inequality p (θ ) ≥ p0 is equivalent to θ − c < d , where d is positive and satisfies ⎛ L − 2d ⎞ ⎛ −L − 2d ⎞ p (c + d ) = F0 ⎜ ⎟ − F0 ⎜ 2β ⎟ = p0 , 2 β ⎝ ⎠ ⎝ ⎠

where L = b − a is an interval length. Following the same line of reasoning that was used in the case of the normal distribution, we can rewrite the hypotheses in equation (1) as follows: H0 : θ −c ≤ d

against H1 : θ − c > d .

Since p (c ) is a maximum, the inequality p (c ) > p0 should be satisfied. It follows that ⎛ b −c ⎞ ⎛ a −c ⎞ ⎛ L ⎞ − F0 ⎜ = 2F0 ⎜ p (c ) = F0 ⎜ ⎟ ⎟ ⎟ − 1 > p0 . ⎝ β ⎠ ⎝ β ⎠ ⎝ 2β ⎠

(9)

Statistical hypothesis testing for dot-matrix type products Figure 1

33

The probability that a sensor is acceptable, p (θ ) = F0 ((b − θ ) / β ) − F0 ((a − θ ) / β ), when (a) β ≤ L / 2 (b) β > L / 2

From this, we have β
k, β

(12)

where the cutoff value k is chosen so that the test has the size α. To find the cutoff value k , we need to find the pdf of M . It is well-known [see Carlton (1946) and Example 5.4.7 of Casella and Berger (2002)] that the joint pdf of X (1) and X (n ) is

34

C. Park ⎧ n (n − 1) x − x n − 2 ( (n ) (1) ) if θ − β < x (1) < x (n ) < θ + β ⎪ , h ( x (1) , x (n ) ) = ⎨ ( 2β )n ⎪0 otherwise ⎩

and the joint pdf M and R is ⎧ n (n − 1) n − 2 r if θ − β < m − r < m + r < θ + β ⎪ f (m , r ) = ⎨ 2β n . ⎪0 otherwise ⎩

The marginal density of M is then ⎧ n ⎧ m − θ ⎫n −1 m −θ ⎪ k θ = c ± d } . Then, we have

35

Statistical hypothesis testing for dot-matrix type products

α=



c −βk

−∞

fM (m c −d , β )dm +





c + βk

fM (m c −d , β )dm

(14)

⎞n ⎛ d ⎞ ⎛ d ⎞n ⎛ d ⎞⎤ 1 ⎡⎛ d = ⎢⎢⎜⎜1 + − k ⎟⎟ ⋅ I ⎜⎜1 + > k ⎟⎟ + ⎜⎜1− − k ⎟⎟ ⋅ I ⎜⎜1− > k ⎟⎟⎥⎥ , ⎟⎠ ⎜⎝ β ⎟⎠ ⎝⎜ β 2 ⎢⎣⎝⎜ β ⎠⎟ ⎝⎜ β ⎠⎟⎥⎦

where I (⋅) is an indicator function. Since the above term is strictly decreasing in k when k ≤ 1 + d / β and becomes zero when k > 1 + d / β , the cutoff k is uniquely determined. The test given by equation (12) is UMP unbiased, provided that the scale parameter β is known. In practice, the scale parameter β is known. This is a nuisance parameter. One simple remedy is to plug in a consistent estimator of β. It is easily shown that 1 1 θˆ = ( X (1) + X (n ) ) and βˆ = ( X (n ) − X (1) ) are the MLE of θ and β , respectively. It 2 2 can also be verified that T = ( X (1) , X (n ) ) is a complete sufficient statistic for (θ, β ). It follows that the best unbiased estimator (BUE) is n +1⎞ βˆ * = ⎛⎜ ⎟ ⎝ n −1⎠

X (n ) − X (1) 2

(15)

.

Both the estimators βˆ and βˆ * are consistent. So, although in practice either estimator can be used, we suggest using the BEU because the MLE is biased. It follows from the consistency of this estimator and Slutsky’s theorem (Lehmann, 1999) that FM m θ, βˆ *

(

)

converges to FM (m θ , β ) . Therefore, before doing the hypothesis test, we have to check the precision of the sensor. By replacing β with βˆ * obtained in equation (15), we have βˆ *
k βˆ *

(17)

has asymptotic size α. Example 3: Liquid crystal display (LCD) screens are made up of liquid crystal cells. Each cell in a filter array is acceptable if it can pass wavelengths in a range of 0.600–0.780 μm. The filter array is said to pass the overall quality validation test if the proportion of its sensors which pass the quality validation test is p0 = 0.995 or higher. Suppose that we sample n = 30 cells in the filter and the scale parameter is known to be β = 0.05. If the size of the test is set at α = 0.05, what is the rejection region of the hypothesis test?

First, we need to check if the sensor has enough precision by checking whether equation (10) is satisfied. We have a = 0.600, b = 0.780, c = 0.690, p0 = 0.995,

36

C. Park

n = 30, and L = 0.180. From equation (10) β = 0.02 satisfies a desired precision β < 0.0905. So, since equation (10) is satisfied, we can proceed with the hypothesis test. From equation (11), we have d = 0.0702. Now using d = 0.0702, we can find the rejection region by first solving equation (14) for k , which gives k = 3.584. Therefore, we reject the null hypothesis at the α = 0.05 level if M − 0.690 > 3.584 0.02

In this example, we consider the case when β is known. In the case that β is unknown, we can perform the statistical quality test by using equations (16) and (17).

3

Determination of sample size

In this section, we develop a procedure for determining the sample size required when the Type II error probability γ is specified for a given proportion, p1 < p0 . In other words, if the true proportion is known to be p1 and is less than p0 , what is the sample size required for rejecting the null hypothesis with a probability of 1 − γ ? We provide sample size determination methods for both the normal and uniform distribution models.

3.1 Normal distribution model First, we have to find the value of θ that satisfies ⎛b − θ ⎞ ⎛a −θ ⎞ p (θ ) = Φ ⎜ ⎟ −Φ⎜ ⎟ = p1. ⎝ σ ⎠ ⎝ σ ⎠

(18)

Since p (⋅) is unimodal and symmetric, we have two solutions satisfying (18). Let θ1 and θ2 be the solutions of the above equation with θ1 < θ2 . Then p (⋅) is symmetric about c = (θ1 + θ2 ) 2. The power function K (⋅) is also symmetric about c , so we have K ( θ1 ) = K ( θ2 ) . Using equation (5) and the fact that the power K (θ ) is equal to

1 − γ ( θ ) , we have ⎛ c − θ1 ⎞ ⎛ c − θ1 ⎞ + k ⎟ − Φ⎜ − k ⎟. γ ( θ1 ) = γ ( θ2 ) = Φ ⎜ σ n σ n ⎝ ⎠ ⎝ ⎠

(19)

To determine the sample size n , we have to solve equations (6) and (19) simultaneously. The variable n is required to be an integer and, in general, it is not possible to solve equations (6) and (19) simultaneously and guarantee that n will take on an integer value. Instead, we find the value of n that satisfies equation (6) and ⎛ c − θ1 ⎞ ⎛ c − θ1 ⎞ + k ⎟ − Φ⎜ − k ⎟ ≤ γ ( θ1 ) . Φ⎜ σ n σ n ⎝ ⎠ ⎝ ⎠

This iterative procedure is illustrated in Example 4 which is described below.

(20)

37

Statistical hypothesis testing for dot-matrix type products

Example 4: We use the same data that was used in Example 1 except that, in this case, the required sample size n needs to be determined. From previous experience, we know that the CCD device should fail the quality validation test if the proportion of acceptable sensors is p1 = 0.99 or less. We want to determine the sample size n with the Type II error probability γ = 0.01 at the level α = 0.05. From Example 1, we have c = 0.545 and d = 0.02207. First, we need to solve equation (18) for θ1 : ⎛ 0.598 − θ1 ⎞ ⎛ 0.492 − θ1 ⎞ Φ⎜ ⎟ − Φ⎜ ⎟ = p1 = 0.99, ⎝ 0.01 ⎠ ⎝ 0.01 ⎠

which gives θ1 = 0.5153. We need to solve equation (6) for k in ⎛ 0.02207 ⎞ ⎛ 0.02207 ⎞ + k ⎟ − Φ⎜ − k ⎟ = 1 − α = 0.95, Φ⎜ ⎝ 0.01 n ⎠ ⎝ 0.01 n ⎠

(21)

subject to equation (20): ⎛ 0.545 − 0.5153 ⎞ ⎛ 0.545 − 0.5153 ⎞ +k ⎟ −Φ⎜ − k ⎟ ≤ 0.01. Φ⎜ σ n σ n ⎝ ⎠ ⎝ ⎠

(22)

Since the power increases as the sample size n increases, the Type II error probability decreases as n increases. Using this property, we can find n as follows: 1

For fixed n , say n = 2, find k by solving equation (21).

2

Using this k , find the Type II error probability [the left hand side term in equation (22)].

3

If equation (22) is not satisfied, increase n and then repeat from 1.

Using the method described above, we obtain the results shown below: Table 1

Type II error probability (γ ) for n

n

k

Type II error probability

2

4.766

0.712

26

12.898

0.012

27

13.112

0.0095

The table leads to the conclusion that if we want the probability of rejecting the null hypothesis to be K ( θ1 ) = 0.99 (γ = 0.01) when the true proportion of acceptable sensors is p1 = 0.99, then the required sample size n is 27.

3.2 Uniform distribution model In this case, we have to find the value of θ that satisfies

38

C. Park ⎛b − θ ⎞ ⎛a −θ ⎞ p ( θ ) = F0 ⎜ − F0 ⎜ ⎟ ⎟ = p1. ⎝ β ⎠ ⎝ β ⎠

(23)

Since p (⋅) is unimodal and symmetric, we have two solutions satisfying (23). Let θ1 and θ2 be the solutions of the above equation with θ1 < θ2 . Then p (⋅) is symmetric about c = (θ1 + θ2 ) 2. The solutions are calculated as θ1 = a + β ( 2p1 − 1) and θ2 = b − β ( 2p1 − 1) .

(24)

The power function K (θ ) is also symmetric about θ = c , so we have K ( θ1 ) = K ( θ2 ) . Using equation (13) and the fact that the power K (θ ) is equal to 1 − γ ( θ ) , we have β ( θ1 ) = β ( θ2 ) = FM (c + β k θ , β ) − FM (c − β k θ , β ) .

(25)

To determine the sample size n , we have to solve equations (14) and (25) simultaneously. Again, just as in the case of the normal distribution, the variable n is required to be an integer and, in general, it is not possible to solve equations (14) and (25) simultaneously and guarantee that n will take on an integer value. Instead, we find the value of n that satisfies equation (14) and FM (c + β k θ, β ) − FM (c − β k θ, β ) ≤ γ ( θ1 ) .

(26)

This iterative procedure is illustrated in Example 5 which is described below. Example 5: We use the same data in Example 3 with the sample size n undetermined. From previous experience, we know that the CCD device should fail the quality validation test if the proportion of acceptable sensors is p1 = 0.90 or less. We want to determine the sample size n with the Type II error probability γ = 0.01 at the level α = 0.05. First, we need to solve equation (23) for θ1 and θ2 . From equation (24), we have θ1 = 0.616 and θ2 = 0.764. Next, we need to find the cutoff k at the level α = 0.05 subject to equation (26). We can obtain the cutoff k using equation (14) with (26): α=

1 ⎡⎢⎛ d ⎞n ⎛ ⎞ ⎛ ⎞n ⎛ ⎞⎤ ⎜⎜1 + − k ⎟⎟ ⋅ I ⎜⎜1 + d > k ⎟⎟ + ⎜⎜1− d − k ⎟⎟ ⋅ I ⎜⎜1− d > k ⎟⎟⎥ ⎢ ⎟ ⎟ ⎟ ⎟ 2 ⎣⎢⎝⎜ β ⎠ ⎝⎜ β ⎠ ⎜⎝ β ⎠ ⎜⎝ β ⎠⎥⎦⎥

(27)

subject to FM (c + β k θ1 , β ) − FM (c − β k θ1 , β ) ≤ γ ( θ1 ) ,

(28)

where c = 0.690, β = 0.02, θ1 = 0.616, and d = 0.0702. Since the power increases as the sample size n increases, the Type II error probability decreases as n increases. Using this property, we can find n as follows: 1

For fixed n , say n = 2, find k by solving equation (27) with α = 0.05.

2

Using this k , find the Type II error probability [the left hand side term in equation (28)].

3

If equation (28) is not satisfied, increase n and then repeat from 1.

39

Statistical hypothesis testing for dot-matrix type products

Using the method described above, we obtain the results shown below: Table 2

Type II error probability (γ ) for n

n

k

Type II error probability

2

4.194

0.8719

30

3.584

0.0123

31

3.582

0.0101

32

3.579

0.0082

The table leads to the conclusion that if we want the probability of rejecting the null hypothesis to be K ( θ1 ) = 0.99 (γ = 0.01) when the true proportion of acceptable sensors is p1 = 0.90, then the required sample size n is 32.

4

Concluding remarks

In this paper, we developed statistical hypothesis testing procedures for determining whether manufacturing processes of dot-matrix type products are considered to be currently passing its quality validation test. In the first case, we assumed that the operating characteristic had a normal distribution and followed that up with a similar derivation in which we assumed that the operation characteristic had a uniform distribution. In addition to developing these hypothesis testing procedures, we also constructed algorithms for the case where one wants to detect the lack of validation with a certain probability of Type II error. Both of the algorithms can be easily implemented programmatically and can be very useful for practitioners dealing with the question of whether their manufacturing process is currently satisfying its quality standard.

Acknowledgements This work was done in memory and honour of Professor Byung Ho Lee of Nuclear Engineering at Seoul National University and KAIST. Professor Lee passed away in July 2001. His mentoring was extremely influential in the development of the author’s current interests in applied statistics and engineering.

References Ang, T. (2002) Digital Photographer’s Handbook, DK Publishing, Inc., New York. Carlton, A.G. (1946) ‘Estimating the parameters of a rectangular distribution’‚ Annals of Mathematical Statistics, Vol. 17, pp.355–358. Casella, G. and Berger, R.L. (2002) Statistical Inference, 2nd ed., Duxbury. Ferguson, T.S. (1967) Mathematical Statistics: A Decision Theoretic Approach, Academic Press. Lehmann, E.L. (1999) Elements of Large-Sample Theory, Springer. Lehmann, E.L. and Romano, J.P. (2008) Testing Statistical Hypothesis, 3rd ed., Springer.