D
Journal of Mathematics and System Science 3 (2013) 246-250
DAVID
PUBLISHING
Use of REGWQ Multiple Comparisons of Qualitative Data Siraj O. Omer Agricultural Research Corporation (ARC), Wad Medani 22212, Sudan Received: February 18, 2013 / Accepted: March 10, 2013 / Published: May 25, 2013 Abstract: The REGWQ (Ryan-Einot-Gabriel-Welsch and Quiot) test produces allow us to compare a large numbers of data while controlling the probability of making at least one Type I error or Family wise error. The purpose of this study was to use the REGWQ multiple comparisons test of qualitative data. Okra characterization data was applied and submitted to ANOVA (P_0.05) with REGWQ for multiple comparisons of the means. The results of this study establish a summary strategy of following a significant ANOVA F with REGWQ test on multiple comparisons of means in summation a large entries/treatments to the small groups when variances are heterogeneous. Cluster analysis should be especially useful for grouping qualitative treatment and could also be used in conjunction of with REFWQ multiple produces. The development of study will be in REGWQ multiple producers in SAS option for distributed the large number of treatment to small group with summering the best choice of treatments. Key words: REGWQ, multiple comparisons, qualitative data.
1. Introduction MCMs (multiple comparison methods) are used to investigate differences between pairs of population means or, more generally, between subsets of population means using sample data [8]. MCMs are designed to take into account and control the inaction of the overall probability of Type I error or the deflation of the overall confidence coefficient [2]. Howell recommends REGWQ over Turkey and Newman-Keuls because it appears to be the most powerful test generally available that still keeps the family wise error rate at 0.05 [9]. This paper evaluat the use of REGWQ multiple comparisons of qualitative data and presents aspect that have emerged from the plant genetics data and its workgroups. Therefore, multiple comparison tests are needed and should be designed to allow testing differences between all possible pairs of means using an appropriate and controlled level of significance (WNIK, 2004). A comparison-wise error rate is relevant whenever sets of statements are considered
independently. It is a fundamental problem of practical importance [3]. When multiple hypothesis tests are performed in a study, there is a risk of inflation of the type I error rate (i.e., the chance of falsely claiming an association when there is none) [10].
2. REGWQ Test Ryan-Einot and Gabriel-Welsch is a procedure, based on the q statistic, but adjusts the per comparison α in such a way, that the family wise error rate is maintained at the specified value (unlike with the SNK), but power will be greater than with the Turkey (a), it is a compromise between Newman-Keuls and Turkey (Howell, 2002). This procedure controls the MEER, but these methods are not as well known as those of Duncan and SNK. The approach developed by REGW (Ryan, Einot and Gabriel, and Welsh) sets:
y p = 1 − (1 − α ) p / t , for p < t ― 1 and y p = α for p ≥ t ― 1.
The REGWQ method performs the comparisons using a range test. This method appears to be among the most powerful step-down multiple range tests and
Corresponding author: Siraj O. Omer, MSc, research fields: biometrician, experimental design and analysis unit. E-mail:
[email protected].
is recommended by SAS for equal replication. Assuming the sample means have been arranged in
Use of REGWQ Multiple Comparisons of Qualitative Data
247
homogeneity of means Yi, ..., Yj, with i < j, is rejected
secondary data on Okra of qualitative traits of vegetative stage and inflorescence traits which
by REGWQ if: Yi ― Yj ≥ q (
y p ; p: present to d.f.
includes General aspect, branching, stem colour,
(degree of freedom) of MSE (mean square error). The
number of ridges per fruit and fruit pubescence), this
R-E-G-W Q test, a new significant level, based on the
specification determines that application undertaken to determine appropriately must be conducted by
descending order from
y 1 through
y k , the
number of steps between means. REGWQ test is a modification
of
the
SNK
that
maintains
the
experiment-wise error rate at 0.05 or less. REGWQ was developed by Ryan (1959) and modified by Einot
comparing many qualitative traits under homogeneity study with different thicknesses of Okra properties. Initially, standardized statistical packages, such as SAS was most appropriates in comparing large
& Gabriel (1975) and then again by Welsh (1977). Its
number of treatment. Data were subjected descriptive
critical value is based on the studentized range, or Q
statistics such frequency and, means REGWQ
distribution. Some authors refer to this procedure as
Multiple Range Test using SAS software, cluster
the modified Ryan test. In general, the REGWQ procedure results in computing a critical q value with a fractional alpha value. When variances are equal and cell sizes are equal, simulations have shown that the REGWQ procedure keeps α EW ≤ 0.05 and is more powerful than Tukey’s HSD, because we lack the tools to compute this test by hand and check SPSS’s calculations [8].
analysis of variance using SPSS software. Data
3. Justification of the Study
described here following cluster analysis of variance.
The application of REGWQ procedures recent was appear in many different such as biosciences, medical, genetics data for evaluating and controlling the error rate. Researchers in agricultural research are often confronted with task of evaluating the means differences of large numbers of entries/verities, with respect to selecting the appropriate error rate; there is importance to adopt the family wise error rate with final. In most situations like qualitative data, the error rate still represents a reasonable approximation to FW (family wiser error rate) or it can be approximated by Family wise error rate is also referred to as experiment wise error rate, when the probability of making a Type I error for the set of all possible comparisons. 3.1 Methodology and Statistical Analysis The methodology used in this study was based on
analyses were conducted using SAS [7]. This paper is concerned with the analysis of qualitative data measured in experiments in the form of rankings and ratings, therefore the multiple comparisons of means will be summary of data types. In order to introduce statistical test of significance of
qualitative
data
associated
with
multiple
comparisons. The mean square of error will be The study suggests that using mean square of error instead of use experimental error due to lack of replications number in the experiment [8]. This is alternatively option to obtain simultaneous tests with greater power using multiple stage tests by REGWQ multiple test which is available in many soft ware (SAS/STAT, 1990).
4. Results and Discussion Table 1 show the frequency of Vegetative traits, the number of accessions/entries which taken Erect traits was 153, 36 for Median traits and Erect with median is 17. The branching trait distributes as 153, 42 and 11 for medium, strong and medium with strong respectively, the stem colors traits was taken 144 green, 15 purple and 47 for green with rep patches. In Table 2, all methods labeled A, B, and C giving 3 homogeneous groups, can be presented in order A >
Use of REGWQ Multiple Comparisons of Qualitative Data
248
Table 1 The frequency of Vegetative traits. General aspect Erect Medium Erect + Medium Table 2
Freg 153 36 17
Branching Medium Strong Strong medium
Freg 153 42 11
Stem colour Green Green with red patches Purple
REGWQ multiple comparisons of the genotypes using SAS (alpha = 0.05 level) of Vegetative traits.
General aspect Branching NT Means NT Means 17 4b 42 7a 36 5a 153 5b 153 3c 11 3c NT: Number of accessions/entries (Grouping of accessions/entries) Table 3
NT 15 191
Stem colour Means 3a 2b 1c
NT 15 47 144
The frequency of Inflorescence traits
Number of ridges per fruit From 5 to 7 From 8 to 10
Table 4
Freq 144 47 15
Freq 191 15
Fruit pubescence Downy Prickly Prickly with downy
Freq 102 60 44
REGWQ multiple comparisons of the genotypes using SAS (alpha = 0.05 level) of Inflorescence t traits. Number of ridges per fruit Means 5a 3b
NT 44 60 102
Fruit pubescence Means 7a 5b 3c
NT: Number of accessions/ entries (Grouping of accessions/entries)
B > C using the maximums of genotypes/traits of means in the group generated by the letters. Results of multiple comparisons using REGWQ, the general aspect were clustered into 3 group’s treatment as 36, 17 and 153 with 5, 4 and 3 means. The branching traits have 3 groups treatments distributed as 42, 153 and 11 with 7, 5 and 3 means, respectively. As for stem color appear were clustered into 3 group of accessions/entries as 15, 47 and 144 with means 3, 2 and 1, respectively. Table 3 show the frequency of Inflorescence traits the number of ridges per fruit which arranged from 5-7 was 191 and from 8-10 was 15. The Fruit pubescence trait distributes as 102, 60 and 44 for downy, prickly and prickly with downy respectively. Table 4. shows the results of multiple comparisons using REGWQ the number of ridges per fruit were clustered into 2 group’s accessions/entries as 15 and
191 with 5, 3 means. The Fruit pubescence have 3 groups accessions/entries distributed as 44, 60 and 102 with 7, 5 and 3 means, respectively. As for stem color appear were clustered into 3 group of accessions/entries as 15, 47 and 144 with means 3, 2 and 1, respectively. Table 5 shows the results of analysis of variance of means square of clutter and error. The results highlighted that there were significant difference of general aspect branching fruit pubescence while stem colour and number of ridges per fruit were not significant, regarding of that the F tests should be used only for descriptive purposes because the clusters have been chosen to maximize the differences among cases in different clusters. The observed significance levels are not corrected for this and thus can not be interpreted as tests of the hypothesis that the cluster means are equal. The study will develop the use of
Use of REGWQ Multiple Comparisons of Qualitative Data
Table 5
249
Results of analysis of variance.
Trait General aspect Branching Stem colour Number of ridges per fruit Fruit pubescence
Cluster 49.829 71.328 0.024 0.129 512.209
Mean square Error 0.356 0.323 0.438 0.068 0.498
MSE of K-means with REGWQ for qualitative data. Based on the error rate for each undivided comparison set at alpha = 0.05, the 206 treatment showed 4 different groups, denoted by letters A, B, C and D. These groups are not necessarily mutually exclusive, but means within each group (i.e. entries with the same letter) are not significantly different at the above error rate. There were significantly different from each other in grouping treatment. The means square of error values taken from analysis of variance results K-means; it is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The use of descriptive statistics like frequency in this study will assess to understand the implication of treatments in grouping and own distribution. In REGWQ method the preliminary significant F-test is necessary based on the error rate for each undivided comparison set at alpha = 0.05, the 206 treatments showed 4 different groups. Results of this study has given an idea by integrating some properties of cluster analysis such as mean square of error with REGWQ test on multiple comparison of means in a substantially inflated rate of Type I error when variances are heterogeneous small group.
0.000 0.000 0.816 0.169 0.000
option for distributed the large number of treatment to small group with summering the best choice of treatments. The use of REGWQ for multiple comparisons of qualitative data typically produce overlapping group of means, unless access multiple comparison of gaulatitive data need new approaches in computer program to calculate exact studentized range values, it is not possible to compute this test by hand. It is author’s impression based on his thesis of master degree of multiple comparison and on his experiences that researchers conducting design experimentally generally with consider the use the LSD and DMRT test whether meet the objectives of the experiment or not. This would suggest tend to suggest that to develop multiple comparisons for qualitative data using nonparametric methods.
Acknowledgements I am very grateful to Dr. Eltahir Ibrahim Mohamed, the head of Plant Genetics Resource Unit, ARC (Agricultural research corporation), Wad Medani, Sudan for providing the data used in the illustration.
References [1]
5. Conclusions The paper explored the use of REGWQ test in case gaulatitive data of Okra crop. REGWQ multiple comparisons should be epically useful for large number of treatment and could also be used in conjunction with cluster analysis for grouping gaulatitive treatment of means. The development of study will be in REGWQ multiple producers in SAS
Level of significant
[2]
[3]
[4]
J.A Raftery, M.L. Abelly, J.P. Braseltony, Multiple comparison methods for means, SIAM review, Society for Industrial and Applied Mathematics 44 (2) (2002) 259-278. Y. Benjamini, H. Braun, W. john, Tukey’s contributions to multiple comparisons, The Annals of Statistics 30 (6) ( 2002) 1576-1594 C.V. Rao, U. Swarupch, Multiple comparison procedures―A note and a bibliography, Journal of Statistics 16 (2009) 66-109 S.K. Sarker, False discovery and false non-discovery rates in single-step Multiple Testing Procedures, USA,
250
[5]
[6]
[7]
Use of REGWQ Multiple Comparisons of Qualitative Data 2003, 256-260. R. Stern, G. Arnold, Appropriate Procedure Comparison Means according Experiment Design, 2001, 65-70 R.A. Cribbie, H.J. Keselman, The effects of non normality on parametric, nonparametric, and model comparison approaches to pair wise comparisons., Educational and Psychological Measurement 63 (4) (2003) 615-635. A.K. Jennifer, Strategies for Performing Multiple Comparisons on Means, Jennifer Aquino Kendall SAS Institute., Inc. Cary, NC [Online],
http://www.sascommunity.org/sugi/SUGI93/Sugi-93-216 %20Kendall.pdf. [8] Siraj O.O, S. Murari, Appropriate Multiple Comparisons of Means in Statistical Analysis. SJAR Email Vol. 19 (2012), ARC. SJAR
[email protected]. [9] D.E. Hinkle, W. Wiersma, S.G. Jurs, Applied Statistics for the Behavioral Sciences, Houghton Mifflin Company, New York, 2003. [10] H.O. Cheong, P.C. Sham. Multiple Testing and Power Calculations in Genetic Association Studies, Adapted from Genetics of Complex Human Diseases, CSHL Press, Cold Spring Harbor, NY, USA.