'A boundary-optimized rejection region test for the two ...

12 downloads 0 Views 119KB Size Report
this test. Gabriel et al.1 make a commendable effort to obtain an exact test for improving the performance of the best classic tests, including Barnard's until now ...
LETTER TO THE EDITOR Comments on ‘A boundary-optimized rejection region test for the two-sample binomial problem’ From: Antonio Martín Andrés Bioestadística, Facultad de Medicina Universidad de Granada 18071 Granada, Spain Email: [email protected] Testing the equality of two independent proportions (θC and θT) is a classic problem in statistics, which, though apparently simple to resolve, presents numerous conceptual and computational complications; in fact, in the past 50 years, hundreds of articles have been dedicated to this test. Gabriel et al.1 make a commendable effort to obtain an exact test for improving the performance of the best classic tests, including Barnard’s until now well-nigh unassailable CSM test2, with the aim of contrasting H0: θC  θT vs. HA: θC > θT. Unfortunately, the authors do not take into account that the ORRT test they propose (Optimized Rejection Region Test) does not verify certain properties of inferential coherence, meaning that this test is not valid. In the following, these problems are set out, together with other questions which, in my opinion, are relevant. The notation used by the authors is also utilized here for this purpose. The first problem with the ORRT test is that there is no guarantee that it will verify one of Barnard’s 2 properties of convexity (the “monotonicity properties” of Ripamonti et al.3). If (YC=C, YT=T) is a point in the pure rejection region (RR) defined to an error, then Barnard2 justified the need for all the points (YC=C, YTC, YT=T) to belong also in the RR(). The ORRT test (by its definition) verifies the first condition, but there is no guarantee that it will also verify the second condition. The nonverification of Barnard’s 2 properties of convexity has 2 negative effects. On the one hand, it means that the test is inferentially incoherent.2 On the other, it means that the maximization of the RR must be calculated4 in θC  θT, not in θC = θT = θ as Gabriel et al. did; for this reason their results could be incorrect. The Berger and Boos5 test (UET SD BB) cited by the authors has the same problems.6 The second and greater problem with the ORRT test is that it does not guarantee that if 1
θT is equivalent to contrasting H0: 1θT  1θC vs. HA: 1θT > 1θC. The 2 points should lead to the same conclusion, because they refer to different presentations of the same problem, that constitutes the property of “symmetry in the same tail”.7 But if the authors decide to break the ties arbitrarily -as in their example NC=NT=N=4, in which the RR includes the point (4, 1) but not point (3, 0)-, this can also be done with the other unconditional exact tests with which they compare their ORRT test. If one proceeds in this fashion, the differences in power between one lot of methods and the others would not be the same. A second aspect refers to Gabriel et al. bibliographic references, one of which is incorrect and another is omitted. The authors refer to the “one-sided unconditioned exact test with test statistic equal to the score statistic on the difference”, indicating that it originates in Farrington and Manning8; but the method is the work of Garside and Mack9. Similarly, they refer to the “onesided Unconditioned Exact Test ordered by a one-sided Fishers exact test mid P value” without furnishing any references; the procedure was the work of Martín Andrés et al.10

3 Finally, one should point out that Barnard’s CSM method is difficult to improve on, because at each step, it enters into the RR the point which least increases the test size obtained in the previous step. For this reason, the CSM test is the “generally more powerful”10, 11 one, but it is also the most computationally intensive test (because on too many occasions, it has to determine the maximum). An alternative that is almost as good as the CSM test, but computationally less intensive, is the CSM' test, whose order is based on the estimation of the maximum11. Slightly inferior, but needing even less computational intensity, is the unconditional test based on the order “Fisher exact test mid P value”.10 These 3 tests may be executed using the free program SMP.EXE located since 1994 at http://www.ugr.es/local/bioest.

ACKNOWLEDGEMENTS This research was supported by the Spanish Ministry of Economy, Industry and Competitiveness under grant number MTM2016-76938-P (co-financed by funding from FEDER).

REFERENCES 1. Gabriel EE, Nason M, Fay MP, Follmann DA. A boundary-optimized rejection region test for the two-sample

binomial

problem.

Statistics

in

Medicine.

2017;1–12.

https://doi.org/10.1002/sim.7579. 2. Barnard GA. Significance tests for 22 tables. Biometrika 1947; 34:123138. 3. Ripamonti E, Lloyd C, Quatto P. Contemporary frequentist views of the 2×2 binomial trial. Statistal Science 2017;2(4):600-615. 4. Röhmel J, Mansmann U. Unconditioned non-asymptotic one-sided tests for independent binomial proportions when the interest lies in showing non-inferiority and/or superiority. Biometrical Journal 1999;41(2):149-170. 5. Berger RL, Boos DD. P values maximized over a confidence set for the nuisance parameter. Journal of the American Statistical Association 1994;89(427):1012-1016. 6. Röhmel J, Mansmann U. Exact tests of equivalence and efficacy with a non-zero lower bound for comparative studies by I.S.F. Chan (Letters to the Editor). Statistics in Medicine 1999;18:17341737. 7. Silva Mato A, Martín Andrés A. Simplifying the calculation of the P-value for Barnard's test and its derivatives. Statistics and Computing 1997;7:137-143. 8. Farrington CP, Manning G. Test statistics and sample size formula for comparative binomial trials with null hypothesis of non-zero risk difference or non-unity relative risk. Stat Med.

4 1990;9(12):1447-1454. 9. Garside GR, Mack C. Correct confidence limits for the 22 homogeneity contingency table with small frequencies. The New Journal of Statistics and Operational Research 1967;3(2):1-25. 10. Martín Andrés A, Sánchez Quevedo MJ, Silva Mato A. Fisher's mid-p-value arrangement in 22 comparative trials. Computational Statistics and Data Analysis 1998;29(1):107-115. 11. Martín Andrés A, Silva Mato A. Choosing the optimal unconditioned test for comparing two independent proportions. Computational Statistics and Data Analysis 1994;17:555-574.

Suggest Documents