Bootstrap Methods for the Nonparametric Assessment of Population Bioequivalence and Similarity of Distributions Claudia Czado 1 Technische Universitat Munchen SCA Zentrum Mathematik 80290 Munchen, Germany
Axel Munk Ruhr-Universitat Bochum Fakultat fur Mathematik 44780 Bochum, Germany
and
Abstract A completely nonparametric approach to population bioequivalence in crossover trials has been suggested by Munk and Czado (1999). It is based on the Mallows (1972) metric as a nonparametric distance measure which allows the comparison between the entire distribution functions of test and reference formulations. It was shown that a separation between carry-over and period eects is not possible in the nonparametric setting. However when carry-over eects can be excluded, treatment eects can be assessed when period eects are present or not. Munk and Czado (1999) proved bootstrap limit laws of the corresponding test statistics because estimation of the limiting variance of the test statistic is very cumbersome. The purpose of this paper is to investigate the small sample behavior of various bootstrap methods and to compare it with the asymptotic test obtained by estimation of the limiting variance. The percentile (PC) and bias corrected and accelerated (BCA) bootstrap were compared for multivariate normal and nonnormal populations. From the simulation results presented, the BCA bootstrap is found to be less conservative and provides higher power compared to the PC bootstrap, especially when skewed multivariate populations are present.
Keywords and phrases: bioequivalence, bootstrap, bias correction, population equivalence, period eects, crossover trials, Mallows distance, small sample properties.
1 Introduction: Bioequivalence studies are conducted in order to show similar bioavailability for dierent formulations of a drug, typically a reference formulation and a generic one. In this case it is accepted that the formulations are therapeutically similar, which implies that the generic one is allowed to replace the standard drug. Even though current regulatory guidelines (FDA (1993), CPMP (1991) or EC-GCP (1993)) consider only average bioequivalence, Hauck and Anderson (1992) have argued that prescribability requires the assessment of population bioequivalence, which mans equivalence with respect to the underlying distribution functions (cf. also the recent draft guidance for industry: Average, Population and Individual Approaches to Establishing Bioequivalence. (1999). U.S. Department of Health and Human Services FDA (CDER), Rockville, MD). Although there is general agreement that the entire distribution functions of the test and reference formulation should be taken into account for 1 Address for correspondence: Claudia Czado, Technische Universitat Munchen, SCA Zentrum Mathematik, 80290
Munchen, Germany. e-mail:
[email protected] Claudia Czado is Associate Professor for Applied Mathematical Statistics at Munich University of Technology, Center for Mathematical Sciences, Munich, Germany. C. Czado was supported by research grant OGP0089858 of the Natural Sciences and Engineering Research Council of Canada. Axel Munk is Assistant Professor for Mathematics at the Fakultat und Institut fur Mathematik, Ruhr-Universitat Bochum, 44780 Bochum, Germany.
1
the assessment of prescribability, the suggested methodology of testing is restricted in most cases to moment-based criteria (e.g. Bauer & Bauer (1994), Guilbauld (1993), Holder & Hsuan (1993), Hauck et al. (1997) or Wang (1997)). In a parametric setup (typically it is assumed that data are lognormally or normally distributed) similarity of the rst two moments implies also similarity of distribution functions, however, in a nonparametric framework this is not sucient. To overcome these methodological shortcomings, Munk & Czado (1999) developed a completely nonparametric assessment of population equivalence. It is based on the trimmed Mallows distance between distributions and generalizes the approach taken in Munk & Czado (1998) and Czado & Munk (1998) to dependent samples as they occur in crossover trials. Here trimming is an important task in order to protect against outliers as pointed out by Chow & Tse (1990). It is also shown that period and carry over eects can no longer be distinguished in a nonparametric setting in contrast to a parametric setting with normal errors. However, when carry over eects are excluded (which, e.g., can be guaranteed by a sucient washout period), a nonparametric assessment of population bioequivalence is possible in the presence of period eects. For this Munk & Czado (1999) distinguish between strong and weak period eects and test procedures are modi ed when period eects can be excluded apriori. The scope of applicability of the presented tests is by no means restricted to bioequivalence testing. Whenever, the aim of a study is to show the similarity of distributions rather than a dierence the proposed test can be used. For various examples see Munk & Czado (1998) or Munk & P uger (1999). We will present the results of a large simulation study to investigate the small sample behavior of the proposed bootstrap procedures in a 2 2 crossover trial. Two dierent bootstrap methods, the percentile (PC) and the bias corrected and accelerated (BCA) method (cf. Efron & Tibsherani 1993) were suggested for the nonparametric assessment of population equivalence. These methods will be compared in the case of two independent groups with the test obtained by estimating the limiting variance as suggested by Munk & Czado (1998). We are interested in answering the following questions: 1. How does the PC and BCA compare in small samples with regard to the observed signi cance level and power? 2. What is the eect on the performance of the PC and BCA method when normal or nonnormal populations are present? 3. What is gain in power when apriori period eects are excluded? 4. Does the degree of dependence in the crossover trial in uence the performance of the proposed bootstrap procedures? 5. How do these bootstrap tests perform compared to the test when the limiting variance is estimated? To answer the above questions 6 simulation studies have been designed. Normal bivariate populations were assumed in the rst 3 studies. The rst study allows for period eects, while the second one assumes no weak period eects and the third one assumes no strong period eects apriori. In addition two settings of mean and variance parameters were investigated as well for a variety of correlations. For the last three simulation studies we allowed for skewed bivariate populations. Scaled bivariate Gamma populations were utilized with dierent degrees of skewness and peakedness. Again 3 sub simulations were conducted to study the performance when period eects are allowed, no weak period eects allowed and no strong period eects allowed, respectively. For each simulation study observed 2
signi cance levels and the observed power at several points in the null and alternative hypothesis were calculated. We also investigated the standardized bias and MSE of the empirical estimate of the quantity used to measure populations bioequivalence. The principal conclusions of the simulation study are that the PC method is conservative with lower power compared to the BCA method under normal or nonnormal populations. Power can be gained when period eects can be excluded and/or when high positive correlation within the sequences can be expected. Furthermore, we found that the dierences in accuracy of the approximation of the type I error as well as in power are practically negligible. In summary, our simulation study reveals the BCA test as a powerful and accurate method for the nonparametric assessment of equivalence. In addition, the standardized bias and MSE values of the empirical estimate of the quantity used to assess population bioequivalence shows that this estimate gives good results in normal populations and reasonable results in nonnormal populations for the sample size investigated. The paper is organized as follows. In Section 2 the setup and results of Munk & Czado (1999) necessary to perform a nonparametric assessment of population equivalence is given. This includes the de nition of the trimmed Mallows distance as nonparametric measure of population equivalence as well as the bootstrap limit law derived. Section 3 describes in detail the bootstrap procedures and their modi cation necessary in the presence of period eects. The simulation setup and the results are presented in Section 4. Section 5 contains the comparison of the bootstrap tests with the test when the limiting variance is estimated. The paper concludes with a summary and discussion of the results achieved.
2 Nonparametric Crossover Designs In this section we summarize the setup and results needed for a completely nonparametric analysis for bioequivalence given in Munk & Czado (1999). 2.1. A Nonparametric Measure of Population Bioequivalence. The Mallows (1972) distance between distribution functions is being used for a measure of population bioequivalence. This distance was previously investigated for the situation of two independent treatment groups by Munk & Czado (1998) and Czado & Munk (1998) and has been generalized to dependent observations as they occur in a prepost or crossover trial in Munk & Czado(1999). For two continuous cdf's F and G the -trimmed Mallows distance is de ned as (2.1)
(F; G) =
1
Z1
(1 2) provided F and G are in the class (2.2)
12 1 1 2 jF (u) G (u)j du ;
F2 := fF : F is a continuous c.d.f. and
Z
2 [0; 21 );
jxj2 dF (x) < 1g:
For the untrimmed case ( = 0) the Mallows (1972) L2 -distance is obtained which is also called Wasserstein or Kantorovitch-Rubinstein metric (cf. also Dobrushin (1970) and Wassserstein (1969)). Additionally, := 0 has also an appealing interpretation in location-scale families which are generated by some H 2 F2 (2.3)
Z Z 2 F; G 2 FLS := H ; xdH (x) = 0; x dH (x) = 1 : 3
Here we have (2.4)
= ( )2 + ( )2 c
where c denotes a quantity depending on H and on (cf. Munk & Czado (1998)) and := 2 . Here, ; and ; are the location and scale parameter of F and G; respectively, i.e. F (x) = H ( x ) and G(x) = H ( x ): If = 0; this reduces to the Euclidian distance of these parameters (2.5)
0 = ( )2 + ( )2 :
Hence, in location scale families the Mallows distance leads to an aggregate bioequivalence criterion which controls simultaneously dierences in means and in variability. Further, in a pure location model we nd for any 2 [0; 1=2) that = j j; i.e. it coincides with the classical criterion of average bioequivalence. For the case of two independent samples X1 ; : : : ; Xm F and Y1 ; : : : ; Yn G and under some smoothness and growth conditions on F and G Munk & Czado (1998) derived asymptotic normality of 1=2 ( nnm + m ) f m^n (Fm ; Gn ) (F; G)g
provided > 0. Here m ^ n denotes the minimum of m and n and m^n is a sequence of trimming bounds which does not converge too fast to zero. Let throughout this paper Fm and GP n denote the 1 empirical c.d.f. of X1 ; ; Xm F and Y1 ; ; Yn G, respectively, i.e. Fm (x) = m mi=1 1fXi xg with corresponding quantile function (2.6)
X Fm 1 (t) = inf fx : Fm (x) tg = X(i) 1f(i 1)=m 20 versus K : (F; G) 20 (3.5) at level sig if T0 ;B;1 sig 0:
Efron and Tibshirani (1993) proposed a bias corrected and accelerated (BCA) method for constructing bootstrap con dence intervals. For this the (1 )-th percentile of the bootstrap sample is replaced by the up -th percentile, where up is de ned as follows ^0 + Z1 sig ! Z : up = Z^0 + 1 a^(Z^0 + Z1 sig ) Here Pn b < T ; b = 1; ; B g # f T (T() T(i) )3 0 i =1 1 0 ^ ) and a^ = Pn Z0 = ( B 6( i=1 (T() T(i) )2 ) 32 and T(i) is the resulting Pn test statistic when the ^i-th observation is removed. Finally T() is the mean 1 of T(i) , i.e. T() = n i=1 T(i) . Note that if a^ = Z0 = 0 we have up = 1 sig , i.e. the BCA method coincides with the PC method. This correction allows the bootstrap con dence interval to be second order accurate (Efron and Tibshirani (1993), p. 325). Munk & Czado (1999) showed that it is possible to calculate corresponding p-values. For the PC method the p-value they are given by PB b=1 1fTb 0 0g p-value(PC) = 1 (3.6) B while for the BCA method it is (3.7) p-value(BCA) = up1 (1
p-value(PC)):
3.2 Allowing for Period Eects. Modi cations of the bootstrap are now given when period eects are present, while carry over eects are excluded (see Assumption A). The distinction between strong and weak period eects becomes important. 7
In the case of no strong period eect but a weak period eect, the bioequivalence measure reduces to (F; G) but bootstrap samples have to be drawn separately for each sequence. Nevertheless, the empirical Mallows distance is still computed from the entire sample Z1 ; ; Zn . For this we resample n1 times from fZi ; i = 1; ; n1g and n2 times from fZi ; i = n1 + 1; ; n1 + n2g to construct the same empirical marginal distributions F^n and G^ n . For the case of a strong period eect not only separate resampling (for dierent sequences) but also separate estimation of (F1 ; G2 ) and (F2 ; G1 ) is necessary. The appropriate bootstrap test statistic is therefore r nn 1 n n n n 1 2 1 2 ^ ^ ^ ^ (3.8) T0 ; (F1 ; F2 ; G1 ; G2 ) = n 1+ 2n [ 2 ( (F^1n1 ; G^ n2 2 ) + (F^2n2 ; G^ n1 1 )) 20 ]: 1 2 We now bootstrap from each sequence separately, i.e. we draw n1 times from the bivariate data f(Yi11 ; Yi21 ); i = 1; ; n1g and n2 times from f(Yi22 ; Yi12 ); i = 1; ; n2g. Again the PC and BCA method can be used. In particular, we have for the PC method Reject H : 21 [ (F1 ; G2 ) + (F2 ; G1 )] > 20 versus K : 12 [ (F1 ; G2 ) + (F2 ; G1 )] 20 (3.9) at level sig if T0 ;B;1 sig 0; where T0 ;B;1 sig is the (1 sig ) th empirical quantile based on B bootstrapped n n n n 1 2 1 2 ^ ^ ^ ^ T0 ; (F1 ; F2 ; G1 ; G2 ) (see (3.8)). Similar results to Theorem 3.1 can be proved for the case of period eects.
4 Simulation Results for the 2 2 Crossover Designs Since treatment eects are the primary interest in crossover bioequivalence trials we investigate for the ease of brevity the behavior of the nonparametric tests using bootstrap for treatment eects only. Allowing for period eects as well as for no period eects are considered separately. The PC and the BCA method will be studied for bivariate normal and non normal populations. Simulations were conducted on Sun workstations using Splus. 500 replications for each simulation setup were run.
4.1 Simulation Results for Crossover Designs with Normal Populations 4.1.1 Allowing for Period Eects To compare to standard parametric crossover bioequivalence trials (cf. Chow & Liu 1992) we choose 1 p1 :225) n1 = n2 = 12 and 0 = log(1 . We trim on both tails by 1 observation, i.e. = 12 . For normal bivariate populations we assume that Hk ; k = 1; 2 are bivariate normal with mean vector
# 2 k k 1 k 2 k 1 k = and covariance matrix k = k k1k2 k22 for k = 1; 2. A wide range of correlations is investigated. Note that k21 = k22 and 1 = 2 corresponds k 1 k 2
!
"
to the case of same intra-subject variability when inter-subject variability is present. We chose the following combinations for (1 ; 2 ): (1 ; 2 ) = f( :8; :8); ( :5; :5); (0; 0); (:5; :5); (:8; :8); ( :8; :8); ( :5; :5); (:5; :5); (:8; :8)g: 8
The cases of equal means but unequal variances and equal variances but unequal means were studied and Table 4.1 summarizes the particular parameter choices: Case Unequal Means Equal Variances Unequal Variances Equal Means
Means k1 = 0 k = 1; 2 k1 = k2 = log(14:25) ; k = 1; 2
Standard Deviations k 1 = k 2 = :5; :25 = log(1:25); k = 1; 2 k1 = log(14:25) = 2; 1 k = 1; 2
Table 4.1: Parameter Settings for the Bivariate Normal Simulation To evaluate the performance of the bootstrap tests for testing H : 21 [ (F1 ; G2 ) + (F2 ; G1 )] > 20 versus K : 12 [ (F1 ; G2 ) + (F2 ; G1 )] 20 (4.1) we q are interested in determining the observed power function of the tests at = 1 [ (F1 ; G2 ) + (F2 ; G1 )]. Four values of have been chosen: 1:250 ; 0 ; :50 and 0. Since the 2 test hypothesis is composite the power at does not determine uniquely the underlying individual Mallows Distances (F1 ; G2 ) and (F2 ; G1 ). The choices made are given in Table 4.2. Note that xing (F1 ; G2 ) and (F2 ; G1 ) will now uniquely determine k2(k2 ) for k = 1; 2 in the unequal means-equal variances (unequal variances-equal means) simulation. 500 independent bivariate samples from H1 and H2 were generated and the bootstrap tests for (4.1) using the PC and BCA method, respectively, conducted at level sig = :05 and trimming = 121 based on B = 1000 bootstraps.
q = 12 [ 2 (F1 ; G2 ) + 2 (F2 ; G1 )] 1:25 1:250 1:250 p0 1:25 20 0 p 1:250 0 1:25 20 1:250 (F1 ; G2 )
(F2 ; G1 )
0 0 p0 20 0p 0 0 20 0 :5 :50 :50 p0 :5 20 0p :50 0 :5 20 :50 0 0 0 Table 4.2: Settings of q (F1 ; G2 ) and (F2 ; G1 ) to evaluate the power at = 12 [ 2 (F1 ; G2 ) + 2 (F2 ; G1 )] Table 4.3 gives the observed signi cance level of these tests. From this table we see that tests based on the BCA method are more liberal than tests based on the PC method. The largest observed signi cance level is .108 (.054) for the BCA (PC) method. The theoretical value is sig = :05. However the liberalness of the BCA method is not too severe since only about 20% of the cases have an observed signi cance level > :075. The tests based on the PC method are however quite conservative in the Unequal Variances - Equal Means cases. Here achieves the PC method a maximal observed signi cance level of .028 only. As important results we nd that the eect of correlation on the observed signi cance level is negligible as well as the eect of , the magnitude of the variances. 9
(1 ; 2 ) = (.5,.5) (.8,.8) (-.8,.8) (-.5,.5) (.5,-.5) (.8,-.8)
( (F1 ; G2 ); (F2 ; G1 )) (-.8,-.8) (-.5,-.5) (0,0) PC Method ( p 0 ; ;00)) ( 2 p0 (0; 20 ) BCA Method ( p 0 ; 0 ) ( 2 p 0; 0) (0; 20 ) PC
BCA
PC
BCA
PC
BCA
Unequal Means - Equal Variances Case ( = :5) 0.040 0.028 0.036
0.038 0.026 0.026
0.014 0.020 0.020
0.024 0.030 0.024
0.022 0.028 0.026
0.018 0.010 0.026
0.058 0.068 0.094 0.048 0.046 0.062 0.082 0.098 0.062 0.074 0.108 0.058 0.100 0.068 0.056 0.086 0.078 0.066 Unequal Means - Equal Variances Case ( = :25)
0.066 0.068 0.066
0.064 0.070 0.068
0.070 0.058 0.064
( p 0 ; 0 ) ( 2 p 0; 0) (0; 20 )
0.054 0.052 0.024
0.054 0.036 0.060
0.038 0.030 0.046
0.026 0.044 0.034
0.034 0.030 0.042
( p 0 ; ;00)) ( 2 p0 (0; 20 )
0.068 0.062 0.054 0.068 0.076 0.082 0.068 0.040 0.054 0.066 0.050 0.046 0.046 0.064 0.056 0.072 0.070 0.086 Unequal Variances - Equal Means Case ( = 2)
0.068 0.052 0.074
0.034 0.068 0.048
0.064 0.040 0.062
( p 0 ; 0 ) ( 2 p 0; 0) (0; 20 )
0.000 0.016 0.014
0.000 0.016 0.014
0.000 0.022 0.012
0.000 0.018 0.010
0.006 0.012 0.030
( p 0 ; ;00)) ( 2 p0 (0; 20 )
0.048 0.066 0.056 0.052 0.058 0.062 0.100 0.074 0.072 0.070 0.060 0.066 0.058 0.070 0.086 0.054 0.074 0.042 Unequal Variances - Equal Means Case ( = 1)
0.044 0.068 0.068
0.052 0.046 0.070
0.084 0.058 0.074
( p 0 ; 0 ) ( 2 p 0; 0) (0; 20 )
0.004 0.010 0.028
0.004 0.008 0.006
0.002 0.020 0.020
0.002 0.022 0.016
0.052 0.028 0.040
0.004 0.014 0.014
0.000 0.020 0.028
0.036 0.006 0.024 0.026 0.014 0.040
0.038 0.032 0.038 0.038 0.034 0.042
0.000 0.002 0.010 0.020 0.016 0.014
0.002 0.002 0.010 0.016 0.010 0.004
0.002 0.038 0.030
0.016 0.032 0.046
0.002 0.008 0.020
0.000 0.010 0.014
0.000 0.020 0.022
( 0.058 0.050 0.054 0.050 0.048 0.052 0.068 0.066 0.052 p 0 ; 0 ) ( 2 0.078 0.084 0.064 0.092 0.072 0.090 0.066 0.072 0.066 0 ; 0) p (0; 20 ) 0.082 0.084 0.046 0.050 0.060 0.088 0.064 0.078 0.096 Table 4.3: Observed signi cance level of bootstrapped tests for treatment eects allowing for period eects for the bivariate normal simulation (sig = :05)
Figure 4.1 (Figure 4.2) shows the observed power curves of the bootstrapped tests for the Unequal Means - Equal Variance Case (Unequal Variances - Equal Means Case) for = :5 ( = 2). To arrive at a single power curve we took the maximum power over the three underlying individual Mallows distances (F1 ; G2 ) and (F2 ; G1 ) for 0 and the minimum power for > 0 . The dotted line corresponds to BCA , while the solid line corresponds to PC. We observe that the test based on BCA has higher power than the test based on PC. This gain in power for the BCA compared to the PC is largest in the Unequal Variances - Equal Means cases. However, recall that the BCA method is liberal with regard to the signi cance level. There is a slight eect of correlation on the power. 10
The correlations ( :8; :8) and ( :5; :5) give the lowest power. A smaller variance ( = :25) in the Unequal Means - Equal Variance Case gives higher power, while a smaller mean ( = 1) Unequal Variances - Equal Means Case has a negligible eect on the power (Figures not shown). This indicates that a lower variance improves the performance. Finally we investigated the quality of the estimation of D(F1 ; F2 ; G1 ; G2 ) = 21 ( (F1 ; G2 )+ (F2 ; G1 )) using D(F^1n1 ; F^2n2 ; G^ n1 1 ; G^ n2 2 ) = 12 ( (F^1n1 ; G^ n2 2 )+ (F^2n2 ; G^ n1 1 )). For this we calculated the standardized bias and standardized MSE de ned as ^ n1 ^ n2 ^ n1 SBIAS = D(F1 ; F2D;(GF1 ; F) ; GD(;FG1 ; F) 2 ; G1 ; G2 ) 1 2 1 2 and 2 ^ n1 ^ n2 ^ n1 SMSE = [D(F1 ; F2D;(GF1 ; F) ; GD(;FG1 ; F)22 ; G1 ; G2 )] : 1 2
(-.8,-.8) (-.5,-.5) (0,0)
1 2
(1 ; 2 ) = (.5,.5) (.8,.8) (-.8,.8) (-.5,.5) (.5,-.5) (.8,-.8)
Unequal Means - Equal Variances Case ( = :5) 0.061 0.064 0.059 0.064 0.068 0.064 0.068 0.067 0.110 0.070 0.072 0.071 0.077 0.077 0.073 0.074 0.101 0.093 0.100 0.090 0.109 0.100 0.097 0.098 0.180 0.120 0.126 0.119 0.122 0.131 0.113 0.121 0.362 0.381 0.370 0.366 0.383 0.389 0.384 0.413 0.766 0.636 0.642 0.593 0.619 0.628 0.637 0.653 Unequal Means - Equal Variances Case ( = :25) SBIAS 1.250 0.017 0.015 0.017 0.018 0.019 0.016 0.018 0.019 0.020 SMSE 0.031 0.025 0.017 0.018 0.017 0.018 0.018 0.017 0.017 SBIAS 0 0.030 0.028 0.030 0.024 0.027 0.027 0.020 0.031 0.031 SMSE 0.048 0.040 0.027 0.026 0.027 0.027 0.026 0.029 0.028 SBIAS 0 /2 0.076 0.092 0.091 0.090 0.098 0.094 0.099 0.088 0.096 SMSE 0.205 0.174 0.121 0.117 0.120 0.117 0.123 0.112 0.118 Unequal Variances - Equal Means Case ( = 2) SBIAS 1.250 0.36 0.36 0.33 0.34 0.34 0.34 0.34 0.34 0.33 SMSE 0.60 0.55 0.53 0.53 0.59 0.55 0.59 0.61 0.57 SBIAS 0 0.35 0.37 0.37 0.38 0.38 0.40 0.39 0.38 0.37 SMSE 0.60 0.66 0.59 0.66 0.67 0.61 0.59 0.61 0.63 SBIAS 0 /2 0.44 0.55 0.53 0.55 0.58 0.57 0.58 0.55 0.53 SMSE 0.90 0.99 0.90 1.00 0.97 0.89 0.94 0.95 0.96 Unequal Variances - Equal Means Case ( = 1) SBIAS 1.250 0.35 0.35 0.37 0.35 0.34 0.35 0.36 0.34 0.37 SMSE 0.54 0.56 0.58 0.58 0.55 0.55 0.57 0.55 0.67 SBIAS 0 0.38 0.37 0.40 0.36 0.37 0.36 0.37 0.38 0.39 SMSE 0.61 0.62 0.65 0.66 0.65 0.60 0.66 0.63 0.68 SBIAS 0 /2 0.41 0.53 0.55 0.57 0.59 0.56 0.52 0.52 0.53 SMSE 0.98 0.90 1.01 0.90 0.85 0.95 1.03 0.93 0.92 Table 4.4: Standardized bias and MSE of 21 ( (F^1n1 ; G^ n2 2 ) + (F^2n2 ; G^ n1 1 )) in the bivariate normal simulation ( = :05). SBIAS 1.25d0 SMSE SBIAS 0 SMSE SBIAS 0 /2 SMSE
0.065 0.114 0.100 0.189 0.329 0.881
For the same simulation setup Table 4.4. gives the calculated values. Here the maximal values of SBIAS and SMSE are recorded for dierent (F1 ; G2 ) and (F2 ; G1 ) values which yield the same 11
=
q
1 [ (F1 ; G2 ) + (F2 ; G1 )]. We see that SBIAS and SMSE increase as the decreases. The 2
magnitudes of SBIAS and SMSE are much smaller for the unequal means - equal variances cases than for the unequal variances - equal means cases. A smaller value, i.e. smaller variances, in the unequal means - equal variances case decreases SBIAS and SMSE values considerably. Higher negative correlations increase SBIAS and SMSE. Overall we can conclude that the estimate of D(F1 ; F2 ; G1 ; G2) = 12 ( (F1 ; G2 ) + (F2 ; G1 )) performs well especially in the unequal means - equal variances cases, but less well for the unequal variances - equal means cases.
4.1.2 Allowing for No Period Eects Recall that no period eect in the weak sense means that the marginals are equal, i.e. F1 = F2 and G1 = G2 , while for no period eect in the strong sense we assume that H1 = H2 . In any case we combine the samples with marginals F1 and F2 as well as those with marginals G1 and G2. As discussed in Section 3 we can now test for a treatment eect by using the testing problem (4.2)
H : (F; G) > 20 versus K : (F; G) 20 :
For bivariate normal populations, no period eect in the weak sense means, that 1k = 2k and 12k = 22k for k = 1; 2. If in addition 1 = 2 , we have no period eect in the strong sense. Recall that even though the test problem is (4.2) for both cases, the resampling is done dierently. In the case of allowing no weak period eects ((1 ; 2 ) = ( :8; :8); ( :5; :5); (0; 0); (:5; :5); (:8; :8)) the resampling is done on the combined sample, while in the case allowing for no strong period eect ((1 ; 2 ) = ( :8; :8); ( :5; :5); (:5; :5); (:8; :8)) the resampling has to be done separately for each sequence. The requirements on the mean and variance now uniquely determine the power at = (F; G) where F = F1 = F2 and G = G1 = G2. As in the simulation allowing for period eects we chose to evaluate the power at = 1:250 ; 0 ; :50 ; 0 and the following parameter settings for the mean and variance values; Case Unequal Means Equal Variances Unequal Variances Equal Means
Means k1 = 0; 12 = 22 k = 1; 2 k1 = k2 = log(14:25) ; k = 1; 2
Standard Deviations k 1 = k 2 = :5; :25 = log(1:25); k = 1; 2 2 = 2; 1 2 = 22 k1 = log(14:25) ; 12 k = 1; 2
Table 4.5: Parameter Settings for the Bivariate Normal Simulation with no Period Eects The observed power for the bivariate normal simulation allowing for no period eects is given in Table 4.6. In contrast to the simulation allowing for period eects, we see that the BCA method and the PC method nearly always maintain the required signi cance level of .05. The maximal observed signi cance level for the BCA (PC) method is now .082 (.076). Further only in 8% (3%) of the cases a signi cance level > :075 was observed for the BCA (PC) method. However, the PC method is much more conservative than the BCA method in the Unequal Variances-Equal Means cases. Here the maximal observed signi cance level for the PC method is only .018. Again, there is little eect of the correlation (1 ; 2 ) and on the observed signi cance level. We observe that the test based on the BCA method has higher power than the test based on the PC method. There is an eect of the correlation; the power increases as the degree of positive correlation 12
in both samples increases. Even if only one population is highly positively correlated the power is high. This explains the gain in power for the no weak period eect apriori simulation compared to the no strong period eect apriori simulation. As in the simulation allowing for period eects, a smaller variance ( = :25) in the Unequal Means - Equal Variances case gives higher power, while a smaller means ( = 1) in the Unequal Variances - Equal Means case has a negligible eect on the power. No strong period eect No weak period eect (1 ; 2 ) = (1 ; 2 ) = Method (F; G) (-.8,-.8) (-.5,-.5) (0,0) (.5 ,.5) (.8,.8) (-.8,.8) (-.5,.5) (.5,-.5) (.8,-.8) Unequal Means - Equal Variances Case ( = :5) PC 1:250 0.002 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0 0.036 0.054 0.036 0.030 0.032 0.056 0.046 0.044 0.046 :50 0.722 0.778 0.912 0.988 1.000 0.914 0.902 0.904 0.918 0 0.998 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 BCA 1:250 0.002 0.002 0.002 0.000 0.00 0.000 0.000 0.000 0.000 0 0.048 0.076 0.060 0.052 0.07 0.064 0.060 0.056 0.052 :50 0.776 0.816 0.930 0.994 1.00 0.926 0.912 0.926 0.938 0 0.998 1.000 1.000 1.000 1.00 1.000 1.000 1.000 1.000 Unequal Means - Equal Variances Case ( = :25) PC 1:250 0.00 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0 0.05 0.042 0.032 0.048 0.052 0.054 0.064 0.064 0.076 :50 1.00 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0 1.00 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 BCA 1:250 0.00 0.00 0.000 0.000 0.000 0.000 0.000 0.00 0.000 0 0.05 0.05 0.036 0.058 0.074 0.062 0.076 0.07 0.082 :50 1.00 1.00 1.000 1.000 1.000 1.000 1.000 1.00 1.000 0 1.00 1.00 1.000 1.000 1.000 1.000 1.000 1.00 1.000 Unequal Variances - Equal Means Case ( = 2) PC 1:250 0.002 0.000 0.000 0.000 0.004 0.000 0.000 0.000 0.000 0 0.002 0.002 0.010 0.010 0.018 0.014 0.014 0.012 0.010 :50 0.776 0.822 0.832 0.902 0.958 0.920 0.878 0.880 0.912 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 BCA 1:250 0.012 0.006 0.008 0.002 0.012 0.004 0.004 0.006 0.012 0 0.036 0.050 0.070 0.052 0.056 0.056 0.066 0.066 0.058 :50 0.914 0.934 0.950 0.960 0.984 0.974 0.960 0.952 0.980 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 Unequal Variances - Equal Means Case ( = 1) PC 1:250 0.000 0.000 0.000 0.000 0.000 0.002 0.002 0.000 0.000 0 0.004 0.008 0.012 0.010 0.012 0.004 0.006 0.008 0.008 :50 0.770 0.814 0.840 0.912 0.968 0.884 0.888 0.898 0.898 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 BCA 1:250 0.008 0.006 0.006 0.002 0.010 0.004 0.010 0.002 0.008 0 0.066 0.068 0.060 0.044 0.056 0.044 0.056 0.062 0.048 :50 0.906 0.946 0.942 0.974 0.996 0.964 0.964 0.958 0.960 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 Table 4.6: Observed signi cance level and power of bootstrapped tests for treatment eects allowing for no period eects for the bivariate normal simulation (sig = :05, PC = percentile, BCA = bias corrected and accelerated)
Comparing the results to the results obtained when one does allow for period eects we see a signi cant 13
gain in power when period eects can be excluded. This is of course to be expected since only in the case of no period eects we are able to pool the samples.
4.2 Simulation Results for Crossover Designs with Nonnormal Populations 4.2.1 With Period Eects 1 p1 :225) As for normal populations we use n1 = n2 = 12; 0 = log(1 and = 12 for trimming. For the bivariate populations Hk ; k = 1; 2, we choose scaled bivariate gamma populations. In particular, let G(a) denote the Gamma distribution with parameter a > 0 having density 1 exp(
ga (x) = x
x) for x > 0;
(a) where (a) is the Gamma function. Bivariate Gamma random variables (X1 ; X2 ) with marginals given by G(a1 ) and G(a2 ), respectively, and correlation = Cor(X1 ; X2 ) can be generated easily by using trivariate reduction methods (see for example Devroye (1986), p. 586 - 588). For this we need to require (4.3) 0 < < min(a1 ; a2 ) :
pa a
1 2
To evaluate the power at speci ed values of we need to calculate the Mallows distance (F; G), where F and G are Gamma distributions. Using the following approximation for the inverse distribution function Fw 1 of a scaled and centered Gamma variable W = Xpaa , where X G(a) given by
p Fw 1 (p) a
f1 91a + 3p1 a 1(p)g3 1 (Johnson et. al. (1994), p. 348), hence for F G(a1 ); G G(a2 ) : (F; G) ja1 a2 j; which is numerically found to be a good approximation for small and a1 ; a2 not too close to zero. Utilizing the above results we choose for the simulation Hk bivariate Gamma with correlation k and marginals Fk G(ak1 ) and Gk G(ak2 ) for k = 1; 2. We investigated the following correlations: (1 ; 2 ) = f(0; 0); (:3; :3); (:6; :6); (:6; 0); (0; :6)g and Table 4.7 summarizes the remaining parameter settings: a11 a21
4 4 .25 log(1.25) 7 7 .25 log(1.25) 4 10 .25 log(1.25) 10 4 .25 log(1.25) 10 10 .25 log(1.25) 10 10 .1 log(1.25) Table 4.7: Parameter Settings for the Bivariate Gamma Simulation Figure 4.3 plots the underlying marginal densities used in the simulation. The power is evaluated at the same points as for the normal populations simulation except for the point in the null hypothesis we used = 1:10 . Table 4.8 gives the observed power of the bootstrapped tests based on 500 replications and B=1000. Missing entries correspond to correlation combinations (1 ; 2 ) which did not satisfy the condition (4.3). 14
a11
a21
PC Method 10 10 :25 log(1:25) BCA Method PC Method 10 4 :25 log(1:25) BCA Method PC Method 4 10 :25 log(1:25) BCA Method PC Method 4 4 :25 log(1:25) BCA Method PC Method 7 7 :25 log(1:25) BCA Method PC Method 10 10 :1 log(1:25) BCA Method
( (F1 ; G2 ); (F2 ; G1 )) (0,0)
(1 ; 2 ) = (.3 ,.3) (.6,.6) (0,.6) (.6,0)
(p0 ; 0 ) (0p; 20 ) ( 20 ; 0)
0.002 0.002 0.004
0.000 0.004 0.010
0.000 0.004 0.004
0.000 0.002 0.004 0.004 0.000 0.002
(p0 ; 0 ) (0p; 20 ) ( 20 ; 0)
0.040 0.046 0.040
0.026 0.058 0.064
0.010 0.054 0.066
0.024 0.028 0.040 0.068 0.068 0.042
(p0 ; 0 ) (0p; 20 ) ( 20 ; 0)
0.004 0.014 0.006
0.004 0.008 0.002
0.000 0.010 0.000
(p0 ; 0 ) (0p; 20 ) ( 20 ; 0)
0.034 0.040 0.028
0.032 0.056 0.030
0.030 0.062 0.026
(p0 ; 0 ) (0p; 20 ) ( 20 ; 0)
0.000 0.002 0.002
0.002 0.008 0.010
0.002 0.002 0.030
(p0 ; 0 ) (0p; 20 ) ( 20 ; 0)
0.032 0.028 0.050
0.030 0.042 0.044
0.020 0.030 0.072
(p0 ; 0 ) (0p; 20 ) ( 20 ; 0)
0.006 0.012 0.002
0.002 0.010 0.010
0.002 0.016 0.008
0.008 0.006 0.012 0.018 0.014 0.012
(p0 ; 0 ) (0p; 20 ) ( 20 ; 0)
0.034 0.030 0.026
0.016 0.036 0.018
0.012 0.030 0.046
0.024 0.030 0.038 0.046 0.026 0.024
(p0 ; 0 ) (0p; 20 ) ( 20 ; 0)
0.000 0.008 0.006
0.002 0.002 0.012
0.000 0.008 0.006
0.008 0.000 0.008 0.002 0.006 0.008
(p0 ; 0 ) (0p; 20 ) ( 20 ; 0)
0.020 0.048 0.034
0.030 0.064 0.062
0.014 0.044 0.052
0.024 0.026 0.044 0.044 0.036 0.036
(p0 ; 0 ) (0p; 20 ) ( 20 ; 0)
0.002 0.008 0.004
0.004 0.004 0.010
0.000 0.004 0.010
0.002 0.000 0.008 0.006 0.004 0.004
(p0 ; 0 ) (0p; 20 ) ( 20 ; 0)
0.010 0.016 0.008
0.008 0.014 0.020
0.004 0.016 0.018
0.010 0.002 0.020 0.014 0.012 0.010
Table 4.8: Observed signi cance level of the bootstrapped tests for treatment eects allowing for period eects for the bivariate Gamma simulation (sig = :05) As ak1 (ak2 ) increases the skewness of Fk (Gk ) decreases. The eect of decreasing is to make the distribution more peaked while decreasing the mode of the distribution as well. From Table 4.7 we see 15
that the PC method is very conservative especially for smaller ak1 ; k = 1; 2 values and smaller values. The BCA method is less liberal in the bivariate gamma simulation than in the normal simulation. The maximal observed signi cance level for the BCA (PC) method is .072 (.03). However as skewness decreases the observed signi cance level increases. This is to be expected since the gamma distributions become more symmetric in this case. An opposite eect can be seen as the populations become more peaked, here is the BCA method is also quite conservative (a11 = a21 = 10; = :1 log(1:25)). (1 ; 2 ) = (0,0) (.3 ,.3) (.6,.6) (0,.6) (.6,0) 0 10 10 :25 1:10 SBIAS 0.462 0.451 0.441 0.469 0.438 SMSE 0.603 0.613 0.544 0.564 0.566 0 SBIAS 0.493 0.505 0.483 0.490 0.488 SMSE 0.680 0.703 0.648 0.679 0.676 0 =2 SBIAS 1.181 1.226 1.209 1.205 1.213 SMSE 3.161 3.562 3.554 3.384 3.467 4 10 :25 1:10 SBIAS 0.378 0.416 0.395 SMSE 0.508 0.536 0.490 0 SBIAS 0.413 0.438 0.437 SMSE 0.594 0.580 0.596 0 =2 SBIAS 0.946 0.950 0.923 SMSE 2.729 2.797 2.468 10 4 :25 1:10 SBIAS 0.392 0.405 0.397 SMSE 0.486 0.510 0.509 0 SBIAS 0.417 0.424 0.420 SMSE 0.572 0.587 0.554 0 =2 SBIAS 0.951 0.938 0.972 SMSE 2.627 2.547 2.901 4 4 :25 1:10 SBIAS 0.338 0.346 0.339 0.352 0.346 SMSE 0.325 0.326 0.308 0.318 0.327 0 SBIAS 0.353 0.348 0.352 0.384 0.375 SMSE 0.338 0.336 0.359 0.374 0.347 0 =2 SBIAS 0.636 0.691 0.697 0.685 0.692 SMSE 1.188 1.295 1.446 1.326 1.228 7 7 :25 1:10 SBIAS 0.404 0.386 0.390 0.369 0.406 SMSE 0.461 0.456 0.448 0.426 0.462 0 SBIAS 0.422 0.424 0.397 0.426 0.423 SMSE 0.508 0.495 0.505 0.478 0.487 0 =2 SBIAS 0.953 0.971 0.937 0.961 0.914 SMSE 2.424 2.222 2.283 2.276 2.186 10 10 :10 1:10 SBIAS 0.255 0.268 0.265 0.254 0.252 SMSE 0.140 0.141 0.146 0.143 0.136 0 SBIAS 0.274 0.269 0.260 0.273 0.263 SMSE 0.167 0.163 0.155 0.170 0.162 0 =2 SBIAS 0.392 0.410 0.404 0.382 0.398 SMSE 0.453 0.464 0.438 0.444 0.462 Table 4.9: Observed standardized bias and MSE of 21 ( (F^1n1 ; G^ n2 2 ) + (F^2n2 ; G^ n1 1 )) in the bivariate a11
a21
gamma simulation for = :05.
Figures 4.4-4.6 display the observed power curves for selected parameter settings. With regard to the observed power it can be concluded that the BCA method has considerably higher power than the PC method when skewness is low (see Figure 4.5). This is consistent with the normal simulation results. When the populations are more skewed (a11 = a21 = 4; = :25 log(1:25)) the dierence in power between the two methods becomes less (not shown). The same opposite eect as for the 16
signi cance level can be observed, one achieves higher power when the populations are more peaked (a11 = a21 = 10; = :1 log(1:25)) Overall we can say that if populations are highly skewed and less peaked the BCA method is preferred over the PC method. However for more peaked distributions the dierence decreases. As in the normal population case we investigate the performance of D(F^1n1 ; F^2n2 ; G^ n1 1 ; G^ n2 2 ) as an estimator of D(F1 ; F2 ; G1 ; G2 ) under bivariate Gamma sampling. The results are presented in Table 4.9. Missing entries correspond to correlation combinations (1 ; 2 ) which did not satisfy the condition (4.3). As for normal populations SBIAS and MSE increase as decreases. As skewness increases SBIAS and MSE decrease. Peakedness (a11 = a21 = 10; =0 = :1) improves the situation considerably especially for small values. The eect of correlation on SBIAS and SMSE is negligible. Overall for nonnormal populations the results indicate that higher sample sizes for n and m are preferable especially for small values.
4.2.2 Without Period Eects It is easy to see that if we allow for no period eects in the weak sense we require a1k = a2k for k=1,2. If in addition we have 1 = 2 , then we have no period eects in the strong sense. We interested in evaluating the power at = 1:10 ; 0 ; :50 ; 0. In the case of no period eects this means that only a11 and can be chosen freely. We chose a11 = a21 = 4; 7; 10(a11 = a21 = 7; 10), = :25 ( = :1) and (1 ; 2 ) = (0; 0); (:3; :3); (:6; :6); (:6; 0). For (1 ; 2 ) = (0; 0); (:3; :3); (:6; :6) there are no weak period eects while (1 ; 2 ) = (:6; 0) allows for no strong period eect but a weak one. Dierent resampling is required for the two cases. Out of symmetry reasons (1 ; 2 ) = (0; :6) does not need to be considered. We see that the tests become less conservative in the less skewed and less peaked cases (a11 = 10; = :25). If the distributions are less skewed and more peaked (a11 = 7; 10; = :1) both tests are conservative. The power increases as the degree of positive correlation increases for both methods. The power decreases as the distributions become less skewed (a11 = 10). The dierence between the PC method and the BCA method with regard to power is less for more skewed distributions (a11 = 4) Comparison of these results to the corresponding ones in the case of allowing for period eects show that the power of the tests are about doubled in the case when no period eects are allowed. Again, this is of course to be expected since in the case of no period eects we can pool the samples. Table 4.10 gives the simulations results for the bivariate Gamma simulation allowing for no period eects.
17
No strong period No weak period (1 ; 2 ) = (1 ; 2 ) = a11 a21 =0 Method (F; G) (.0,.0) (.3,.3) (.6,.6) (.6,.0) 4 4 .25 PC 1:10 0.002 0.000 0.002 0.004 0 0.014 0.002 0.004 0.008 :50 0.688 0.826 0.928 0.846 0 0.998 1.000 1.000 0.998 BCA 1:10 0.002 0.002 0.002 0.006 0 0.020 0.014 0.010 0.012 :50 0.768 0.868 0.944 0.894 0 0.998 1.000 1.000 1.000 7 7 .25 PC 1:10 0.002 0.004 0.002 0.004 0 0.010 0.018 0.010 0.014 :50 0.552 0.622 0.830 0.690 0 0.982 0.998 1.000 0.998 BCA 1:10 0.006 0.010 0.002 0.006 0 0.024 0.022 0.028 0.022 :50 0.660 0.744 0.892 0.762 0 0.994 1.000 1.000 1.000 10 10 .25 PC 1:10 0.008 0.000 0.000 0.000 0 0.006 0.012 0.012 0.014 :50 0.376 0.540 0.696 0.526 0 0.940 0.984 1.000 0.980 BCA 1:10 0.018 0.006 0.002 0.008 0 0.024 0.034 0.030 0.032 :50 0.550 0.652 0.796 0.672 0 0.982 0.998 1.000 0.994 10 10 .1 PC 1:10 0.000 0.000 0.000 0.000 0 0.016 0.004 0.002 0.008 :50 0.966 0.992 1.000 0.996 0 1.000 1.000 1.000 1.000 BCA 1:10 0.000 0.000 0.000 0.000 0 0.018 0.010 0.002 0.008 :50 0.984 0.992 1.000 0.998 0 1.000 1.000 1.000 1.000 7 7 .1 PC 1:10 0.000 0.000 0.000 0.000 0 0.000 0.002 0.000 0.002 :50 0.990 0.998 1.000 1.000 0 1.000 1.000 1.000 1.000 BCA 1:10 0.000 0.000 0.000 0.000 0 0.004 0.004 0.002 0.004 :50 0.996 0.998 1.000 1.000 0 1.000 1.000 1.000 1.000 Table 4.10: Observed power of the bootstrapped tests for treatment eects for the bivariate Gamma simulation allowing for no period eects (sig = :05, PC = percentile, BCA = bias corrected and accelerated)
5 Test Comparison for Assessing the Similarity of Distributions As remarked in Section 3, the limiting variance 2 (H ) can be estimated when two independent treatment groups are available. In this case we are assessing the similarity of two distributions and the 18
asymptotic test (AM) developed in Munk & Czado (1998) can be utilized. In the following we compare this test to the bootstrap tests developed in Section 3. Method
PC BCA (F; G) AM Unequal Means - Equal Variances Case ( = :5) 1.250 0.004 0.000 0.002 0 0.054 0.036 0.060 .50 0.950 0.912 0.930 0 1.000 1.000 1.000 Unequal Means - Equal Variances Case ( = :25) 1.250 0.000 0.000 0.000 0 0.058 0.032 0.036 .50 1.000 1.000 1.000 0 1.000 1.000 1.000 Unequal Variances - Equal Means Case ( = 2) 1.250 0.002 0.000 0.008 0 0.056 0.010 0.070 .50 0.926 0.832 0.950 0 1.000 1.000 1.000 Unequal Variances - Equal Means Case ( = 1) 1.250 0.006 0.000 0.006 0 0.066 0.012 0.060 .50 0.932 0.840 0.942 0 1.000 1.000 1.000
Table 5.1: Observed signi cance level and power of the asymptotic test and bootstrapped tests for treatment eects allowing for no period eects in the bivariate independent normal simulation (AM = Asymptotic Method, PC= Percentile Method, BCA = Bias Corrected and Accelerated Method)
5.1 Normal Populations Allowing for no period eects and assuming independence between treatment groups, the observed signi cance level and power for the dierent tests are given in Table 5.1. We can see that the asymptotic test and the bootstrap test based on the BCA method perform similarly, while the bootstrap test based on the PC method is more conservative and less powerful than the other tests. Note that the results for BCA and PC are taken from Table 4.6 with (1 ; 2 ) = (0; 0).
5.2 Nonnormal Populations We use independent bivariate Gamma sampling to investigate the behavior of the tests under nonnormal populations. The corresponding results are given in Table 5.2. From Table 5.2 the same conclusions can be drawn as for the normal populations. There is little loss in power and signi cance level when bootstrapping based on the BCA method is used instead of estimating the asymptotic variance. The PC method is clearly inferior here and should not be used.
19
a11 a21 =0
4
4
.25
7
7
.25
10
10
.25
10
10
.1
7
7
.1
)(F;G)
1:10 0 :50 0 1:10 0 :50 0 1:10 0 :50 0 1:10 0 :50 0 1:10 0 :50 0
AM 0.004 0.034 0.836 1.000 0.020 0.040 0.678 0.998 0.028 0.040 0.586 0.980 0.000 0.016 0.992 1.000 0.002 0.010 1.000 1.000
Method PC 0.002 0.014 0.688 0.998 0.002 0.010 0.552 0.982 0.008 0.006 0.376 0.940 0.000 0.016 0.966 1.000 0.000 0.000 0.990 1.000
BCA 0.002 0.020 0.768 0.999 0.006 0.024 0.660 0.994 0.018 0.024 0.550 0.982 0.000 0.018 0.984 1.000 0.000 0.004 0.996 1.000
Table 5.2: Observed signi cance level and power of the asymptotic test and the bootstrapped tests for treatment eects for the bivariate independent Gamma simulation allowing for no period eects (AM = Asymptotic Method, PC = Percentile Method, BCA = Bias Corrected and Accelerated Method)
6 Discussion and Summary When we were selecting from the various bootstrap procedures available in the literature (see for example DiCiccio&Efron (1996)), we focus on bootstrap procedures which do not rely on the availability of estimates of the limiting variance, due to its complicated structure. This excludes bootstrap tests based on the inversion of bootstrap t-intervals such as percentile t-intervals. In addition they are not translation invariant. In contrast, the PC and the BCA method do not require such estimates. Even though it is known that BCA con dence intervals are second order accurate while the PC con dence intervals are only rst order accurate for smooth functions of the mean (see for example Hall (1988, 1992)), this might not hold when the statistics is not a function of the mean as is the case in this paper. Another method which does not require estimates of the limiting variance are the approximate bootstrap con dence (ABC) intervals introduced by Efron (see for example Efron & Tibshirani (1993) p. 188). They are designed to approximate the BCA interval endpoints analytically without using any Monte Carlo replications. However they require that the statistics is smooth in the data, which is not the case here and therefore we did not use these con dence intervals to construct the appropriate bootstrap tests. The simulation results based on PC and BCA con dence intervals provide the following answers to the questions asked in the introduction. Comparing the two bootstrap methods we conclude that the BCA method shows a better performance than the PC method. With regard to the observed signi cance level the PC method is very conservative in several cases. These cases are the Unequal Variances - Equal Means Cases when normal populations are used. Note that these are cases where average 20
bioequivalence holds but not population bioequivalence. The PC method is also very conservative in nearly all cases considered in the bivariate Gamma simulation. The BCA method is less conservative and in some cases moderately liberal. With regard to the power performance the BCA method shows higher power than the PC method in all cases considered. This includes the normal and nonnormal simulations when period eects are allowed or not allowed. For normal populations this dierence is especially large in the Unequal Variances Equal Means Cases. Here the PC method achieves only a power of below .6 at = :50 while the BCA method has a power above .8. In the bivariate Gamma simulation allowing for period eects the PC method has insucient power ( < :2 at = :50 ) especially when skewness and peakedness are low. In contrast the BCA method yields power of .6 in these cases. Comparing these methods under normal and nonnormal populations we see that skewness and peakedness are important. A higher peakedness increases the power of the tests. This is seen in the normal simulation (Unequal Means - Equal Variance Cases) as well as in the bivariate gamma simulation. A more unusual observation is that both methods perform better when skewness is higher. As expected there is gain in power when period eects can be excluded apriori. This gain is especially large for the PC method and less pronounced for the BCA method. For example the PC method has now sucient power in the low skewness and peakedness cases with higher positive correlation. Another interesting aspect is the role of correlation within the two sequences. While the eect of correlation is marginal when period eects are allowed, the eect is stronger when period eects are excluded. A high degree of positive correlation in the sequences will yield a higher power both for normal and bivariate gamma populations. This gain in power is high for the Unequal Means - Equal Variances Case with larger variance and for highly skewed populations. The results of Section 5 show that the BCA method behaves quite similar to the asymptotic test with estimated limiting variance when similarity of distributions are to be assessed. Again, the PC method is shown to be inferior in this case. Another aspect of the simulation study was the investigation of the performance of the empirical estimate of the quantity used to measure population bioequivalence. The results show that the empirical estimate has considerably lower standardized bias and MSE when the true quantity is larger for nonnormal populations. This might explain why both the asymptotic test with estimated limiting variance and the bootstrap tests tend to be conservative for nonnormal populations. In summary the BCA method performs similar with respect to level and power as the test obtained by estimating the asymptotic variance. The BCA method is preferred over the rather liberal PC method. Therefore, we recommend the use of the BCA-method. Gains in power are possible when period eects can be excluded and when positive positive correlation within the sequences can be assumed.
References { Bauer, P., Bauer, M.M. (1994). Testing equivalence simultaneously for location and dispersion of two normally distributed populations. Biom. Journ. 36, 643-60. { Chow, S-C., Liu, J-P. (1992). Design and Analysis of Bioavailability and Bioequivalence Studies, Marcel Dekker, New York. { Chow, S.C., Tse, S.K. (1990). Outlier detection in bioavailability/bio equivalence studies. Statist. in Medicine 9, 549-558. 21
{ CPMP (1991). Committee for Proprietary Medicinal Products. 'Working Party on Ecacy of Medicinal Products. Note for Guidance: Investigations of Bioavailability and Bioequivalence'. { Czado, C. and Munk, A. (1998). Assessing the similarity of distributions - Finite Sample Performance of the Empirical Mallow Distance, J. Statist. Comput. Simul., 60, 319-346. { Devroye, L. (1986). Non-Uniform Random Variate Generation, Springer Verlag, New York. { DiCiccio, T.J. and Efron, B. (1996). Bootstrap Con dence Intervals, Stat. Sc. , 11, 189-228. { Dobrushin,R.L. (1970). Describing a system of random variables by conditional distributions. Theor. Prob. Appl. 15, 458-86. { EC-GCP (1993). Biostatistical methodology in clinical trials in applications for marketing authorization for medical products. CPMP Working Party on Ecacy of Medical Products, Commission of the European Communities, Brussels, Draft Guideline edition. { Efron, B. and Tibshirani, R.J. (1993). An Introduction to the Bootstrap, Chapman & Hall, London. { FDA (1993). Food and Drug Administration. Guidance on statistical procedures for bioequivalence studies using a standard two-treatment crossover design. Oce of Generic Drugs, Rockville, MD. { Guilbauld, O. (1993). Exact inferences about the within subject variability in 2 2 crossover trials. Journ. Americ. Statist. Assoc. 88, 939-946. { Hall, P. (1988). Theoretical Comparison of Bootstrap Con dence Intervals. Ann. Stat. ,16, 927-953. { Hall, P. (1992). The Bootstrap and Edgeworth Expansion, Springer, New York. { Hauck, W.W., Anderson, S. (1992). Types of bioequivalence and related statistical considerations. Int. Journ. Clinic. Pharmacol., Ther. Toxic. 30, 181-187. { Hauck, W.W., Bois, F.Y., Hyslop, T., Gee, L., Anderson, S. (1997). A parametric approach to population bioequivalence. Statist. Medicine 16, 441-454. { Holder, D.J. and Hsuan, F. (1993).Moment-based criteria for determining bioequivalence. Biometrika, 80, 835-46. { Johnson, N.L., Kotz, S. and Balakrishnan, N. (1994). Continuous univariate distributions, Volume 1, second edition, John Wiley & Sons, New York. { Munk, A. and Czado, C. (1998). Nonparametric validation of similar distributions and assessment of goodness of t Journ. Roy. Statist. Soc. B, 60, 223-241. { Munk, A. and Czado, C. (1999). A completely nonparametric approach to population bioequivalence in crossover trials. preprint. { Munk, A. and P uger, R. (1999). 1 con dence rules are =2 - level tests for convex hypotheses with applications to the multivariate assessment of bioequivalence, Journ. Americ. Statist. Assoc., 94, 1311-1320. { Wang, W. (1997). Optimal unbiased tests for equivalence in intrasubject variability. Journ. Americ. Statist. Assoc. 92, 1163-1170. { Wasserstein, L.N. (1969). Marcov processes with countable state space describing a large system of automata. Problems of Inform. Transm. 5, 47-52. 22
(rho1,rho2)=(0,0) 1.0
(rho1,rho2)=(-.5,-.5) 1.0
1.0
(rho1,rho2)=(-.8,-.8) •
•
•
0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
power 0.4 0.6 0.0
0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
0.0
•
1.0 0.8 power 0.4 0.6 0.2 • •
0.0
1.0
•
•
power 0.4 0.6
•
0.2
power 0.4 0.6
0.8
• •
0.0
•
• •
0.0
• •
0.0
•
0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
•
0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
(rho1,rho2)=(.8,-.8)
0.2
power 0.4 0.6 0.2 0.0
••
• • 0.0
•
0.8
• •
0.8
•
0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
(rho1,rho2)=(.5,-.5) 1.0
(rho1,rho2)=(-.5,.5) •
0.0
•
0.0
1.0 power 0.4 0.6 0.2
0.0
• • 0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
• •
• •
0.0
0.2
power 0.4 0.6
•
•
0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
(rho1,rho2)=(-.8,.8)
•
0.8
•
0.0
• •
(rho1,rho2)=(.8,.8)
•
0.8
1.0
(rho1,rho2)=(.5,.5)
•
0.0
• •
0.0
•
•
0.2
power 0.4 0.6
• •
0.2
power 0.4 0.6 0.2 0.0
• • 0.0
1.0
0.8
0.8
0.8
• • •
0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
0.0
•
0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
Figure 4.1: Observed power of bootstrapped tests for treatment eects allowing for period eects in the Unequal Means - Equal Variance Case with = :5 (| PC,... BCA)
power 0.4 0.6
•
• • 0.0
•
1.0 0.8 power 0.4 0.6 •
0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
1.0 0.8
•
•
power 0.4 0.6
•
•
•
0.2
0.8 ••
••
0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
(rho1,rho2)=(.8,-.8)
•
• •
0.0
0.0
• • 0.0
0.2
power 0.4 0.6
0.8 power 0.4 0.6
•
0.2
•
• •
•
(rho1,rho2)=(.5,-.5) 1.0
(rho1,rho2)=(-.5,.5)
0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
•
0.2 • • 0.0
•
0.0
•
0.0
1.0 0.8
•
0.2
0.0
• • 0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
••
0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
(rho1,rho2)=(-.8,.8)
•
0.0
0.2
•
0.0
• • 0.0
•
power 0.4 0.6
•
power 0.4 0.6
0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
(rho1,rho2)=(.8,.8)
•
0.8
1.0
(rho1,rho2)=(.5,.5)
•
0.0
••
0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
0.0
0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
•
• •
0.0
0.0
1.0
1.0 0.8
•
0.2
0.8
•
0.0
0.0
• •
•
•
0.2
0.8
•
(rho1,rho2)=(0,0)
•
power 0.4 0.6
1.0
(rho1,rho2)=(-.5,-.5)
0.2
•
power 0.4 0.6
1.0
(rho1,rho2)=(-.8,-.8) •
0.0
••
0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
Figure 4.2: Observed power of bootstrapped tests for treatment eects allowing for period eects in the Unequal Variances - Equal Means Case with = 2(| PC,... BCA)
23
6 5 0
1
2
3
4
a=4, lambda=.25*log(1.25) a=7, lambda=.25*log(1.25) a=10, lambda=.25*log(1.25) a=10, lambda=.1*log(1.25)
0.0
0.2
0.4
0.6
0.8
1.0
x
Figure 4.3: Marginal Densities used in the Bivariate Gamma Simulation
1.0
(rho1,rho2)=(.6,.6)
0.8
•
•
•
• •
0.8
1.0
(rho1,rho2)=(.3,.3)
0.8
1.0
(rho1,rho2)=(0,0) ••
0.05 0.10 0.15 0.20 weighted true Mallows Distance
0.05 0.10 0.15 0.20 weighted true Mallows Distance
•• 0.25
• • 0.0
0.05 0.10 0.15 0.20 weighted true Mallows Distance
• 0.25
(rho1,rho2)=(.6,0) 1.0
(rho1,rho2)=(0,.6) ••
0.8
••
0.8
1.0
power 0.2 • •
0.0
0.0
•• 0.25
• 0.4
power 0.4 0.2
0.0
•• 0.0
0.0
0.2
0.4
power
•
0.6
0.6
0.6
• •
• 0.6 power
power
0.6
•
0.2
0.4
•
0.0
• • 0.0
0.05 0.10 0.15 0.20 weighted true Mallows Distance
•• 0.25
• •
0.0
0.2
0.4
•
0.0
0.05 0.10 0.15 0.20 weighted true Mallows Distance
•• 0.25
Figure 4.4: Observed Signi cance Level of the bootstrapped tests for treatment eects allowing for period eects when a11 = 4; a21 = 4; = :25 log(1:25)(| PC,... BCA)
24
(rho1,rho2)=(.6,.6)
•
1.0
(rho1,rho2)=(.3,.3) 1.0
1.0
(rho1,rho2)=(0,0) •
0.8 •
0.4
•
•
0.05 0.10 0.15 0.20 weighted true Mallows Distance
• •
0.25
0.0
0.25
• • 0.0
0.05 0.10 0.15 0.20 weighted true Mallows Distance
• • 0.25
1.0
(rho1,rho2)=(.6,0)
1.0
(rho1,rho2)=(0,.6)
0.05 0.10 0.15 0.20 weighted true Mallows Distance
••
0.0
••
0.0
0.0
• • 0.0
•
•
0.2
0.4 •
• 0.8 •
0.6
0.8
•
0.4 0.2
•
•
0.0
• • 0.0
0.05 0.10 0.15 0.20 weighted true Mallows Distance
• • 0.25
• •
0.0
0.2
0.4
•
•
power
•
power
0.6
power
•
0.2
0.2
0.4
•
0.6
0.8 0.6
•
power
power
0.6
0.8
•
0.0
0.05 0.10 0.15 0.20 weighted true Mallows Distance
•• 0.25
Figure 4.5: Observed Signi cance Level of the bootstrapped tests for treatment eects allowing for period eects when a11 = 10; a21 = 10; = :25 log(1:25)(| PC,... BCA)
• •
power 0.4 0.8
• •
(rho1,rho2)=(.3,.3)
0.0
0.0
power 0.4 0.8
(rho1,rho2)=(0,0)
• • 0.0 0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
• • • • •• • 0.0 0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
0.0
power 0.4 0.8
(rho1,rho2)=(.6,0) • • • • •• • 0.0 0.05 0.10 0.15 0.20 0.25 weighted true Mallows Distance
Figure 4.6: Observed Signi cance Level of the bootstrapped tests for treatment eects allowing for period eects when a11 = 10; a21 = 4; = :25 log(1:25)(| PC,... BCA)
25