Excel for Mac includes most of the same computational facilities that are
available for ... way ANOVA”, and on the “Regression” sub-menu are facilities for
“Linear ...
BS128 Biostatistics Statistical Tools in Excel for Mac Excel for Mac includes most of the same computational facilities that are available for Excel for Windows. In particular, all of the statistical functions identified in the document “Excel for BS128 Biostatistics”, as being relevant to the statistical approaches introduced in the module, are also available in the Mac version of the software. Of course there are a few differences in how the various Excel menus are arranged in the Mac version, and in the way in which the information about functions is presented. But these do not affect the functionality of the package, and certainly have very little impact on the functions and calculations that you will need to perform to answer questions on the module worksheet and in the module exam. One useful feature, mentioned in the above document, was the facility to be able to “fix” cell addresses using the F4 key. This same facility is available in Excel for Mac, but is accessed either via the “Formulas” menu bar (within the workbook display), labelled as “Switch Reference” (one click fixes both row and column addresses, a second click fixes the row but not the column, a third click fixes the column but not the row, and a fourth click returns to the original relative address), or using the “cmd-T” shortcut (pressing the “cmd” key and the “T” key together), with multiple presses of this key combination cycling through the settings as indicated above. The “Pivot Table” facility can be accessed from the “Data” menu, and works in exactly the same way as the equivalent facility within Excel for Windows. The only facilities from the Excel for Windows version that are not replicated in the Excel for Mac version are those that were accessed via the “Data Analysis Toolpak”, but we have sourced an alternative “add-on” package, StatPlus:mac, which does provide the equivalent facilities. The version that we have made available on the iMacs in the new Life Sciences Computing Suite is a free version of the package containing a reduced set of facilities – but these include all of the facilities that were introduced in the module. Mac users can download their own copy from: http://www.analystsoft.com/en/ using the download link at the bottom left of this web-page. Further information about the package is also available from these web-pages.
StatPlus:mac facilities for BS128 Biostatistics All of the relevant statistical tools for BS128 Biostatistics can be accessed via the “Statistics” menu. On the “Basic Statistics and Tables” sub-menu are facilities for “Descriptive Statistics”, “Comparing Means (T-Test)”, “F-Test for Variances”, “Linear Correlation (Pearson)” and “Histogram”. On the “Analysis of Variances (ANOVA)” sub-menu are facilities for “One-way ANOVA (simple)” and “Twoway ANOVA”, and on the “Regression” sub-menu are facilities for “Linear Regression”.
Each of these facilities matches the equivalent facilities available in the Data Analysis Toolpak in Excel for Windows. A brief description is given below:
Histograms o All that it is necessary to provide here is the set of cell addresses (under “Continuous Variables”) containing the data to be summarised. Note that the default setting for all of the StatPlus facilities is for there to be a label in the first row of the set of cells selected. Clicking on the “cell selector” button at the right-hand end of each field will take you to Excel to be able to select the cells. Returning to StatPlus will copy the cell addresses into the field. o The “Bin Range” can also be specified, selecting a set of cells giving the boundaries to be used to divide the observations into groups (represented in the different bars on the histogram). o The optional “Frequency Variable” allows us to specify the number of observations represented by each row in the data set (i.e. where we have multiple observations of the same value). o The optional “Layer (Break) Variable” enables the generation of separate frequency tables and histograms for different sub-groups (treatments), labelled by this variable. o The output consists of a frequency table and graphical representation – remember to reduce the “Gap Interval” to 0% (via the Format options) so that the histogram bars are adjacent to each other. Descriptive Statistics o This facility provides a wide range of descriptive statistics (including the arithmetic mean, variance, standard deviation, standard error of the mean, median and quartiles, and upper and lower confidence limits based on sample data (using the appropriate t-distribution). The Significance level (Alpha level) for the confidence limits can be set via the “Preferences” button. o Summary statistics are provided for multiple columns if these are included in the selected cells. Comparing Means (T-Test) o Allows the calculation of three different forms of T-Test to compare sample means. o Data are provided in two separate ranges for the two different variables (samples). o The three different forms of test can be selected under “T-Test Type” at the bottom of the window Options are: “Two-Sample T-Test Assuming Equal Variances (homoscedastic)”, the default approach which should be preceded by an FTest to check that the sample variances are not too different; “Two-Sample T-Test Assuming Unequal Variances (heteroscedastic), which is the version to use if the F-Test for the comparison of the sample variances leads to the rejection of the null hypothesis of equal variances; and “Paired Two-Sample T-Test”, which is appropriate if the data come from paired observations, one observation for each of the two samples or “treatments”. o The other options to set before doing the test are the significance level (labelled as “Alpha (reliability level)”), which will usually be set to 5%, and the “Hypothesized
Mean Difference”, which will be zero if the null hypothesis is that the means are equal. o The output provides various summaries, including the test statistic, degrees of freedom, significance level (P-level) and critical value (from the appropriate tdistribution). These latter two summaries are provided for both two-tailed and onetailed tests, so consider carefully your alternative hypothesis before interpreting the results of the test. F-Test for Variances o The only required inputs are the two sets of cell addresses containing the data for the two samples. o The significance level can be set via the “Preferences” button. Linear Correlation (Pearson) o The only required input is the set of cell addresses containing the data – each column within the selected cells is considered to be a separate variable. Column labels can again be included (and are by default). o The output includes a test of whether the correlation coefficient is different from zero (null hypothesis is that the correlation coefficient is equal to zero, alternative (two-sided) is that it is different). One-way ANOVA (simple) o As for the equivalent tool in the Data Analysis Toolpak, the data need to be presented in columns, one for each treatment, ideally with a treatment label in the first row of each column. o The output produced is almost identical to that from the Data Analysis Toolpak version, including summary statistics for each treatment (group) and an Analysis of Variance table summarising the variability “Between Groups” and “Within Groups”, presenting an F-test for the ratio of these two variances and a p-level and critical value for the test statistic. Remember that the “Within Groups MS” (mean square) is our estimate of the underlying variance, s2, that we can use to construct SEMs and SEDs, to produce confidence intervals for means and differences between means, or to test for significant differences between pairs of means, in interpreting the results of the analysis. Two-way ANOVA o The data for this tool need to be presented in a different format to the equivalent tool in the Data Analysis Toolpak. The response, that we want to analyse, should be presented in a single column. The values of the treatment factor, then need to be presented in a second column – note that these need to be specified using numeric values not alphabetic labels. A third column should contain the values of the blocking variable, again using numeric values. o An example of this layout is shown below for the Egg Production data introduced in Lecture 8, with treatment O (control) labelled as treatment 1, treatment E (extended) as treatment 2, and treatment F (flash lighting) as treatment 3.
Lighting effects on egg production Block
Treatment 1 1 1 2 2 2 3 3 3 4 4 4
1 2 3 1 2 3 1 2 3 1 2 3
Eggs 330 372 359 288 340 337 295 343 373 313 341 302
Under “Advanced Options” select “No interaction(s) (Randomized block design)”. The output will produce summaries for the observations for each treatment and each block (as well as information for each individual value!), and an analysis of variance table, showing the variability due to treatments (Factor #1), blocks (Factor #2) and Within Groups, plus variance ratios, p-levels and critical values. The output also gives pairwise comparison between each pair of treatments (and blocks), using a number of so-called “multiple comparison tests”. Please ignore these for the purposes for BS128, and calculate SEDs as described in the lecture notes. Linear Regression o The only required inputs are the set of cells containing the values of the Dependent variable (the response variable, usually plotted on the vertical axis) and the set of cells containing the values of the Independent variable (the explanatory variable, usually plotted on the horizontal axis). As with the equivalent facility in the Data Analysis Toolpak, more than one explanatory variable can be specified, but this is beyond the material covered in BS128. o The output is almost identical to that for the equivalent Data Analysis Toolpak facility, showing various summary statistics associated with the analysis, the summary analysis of variance table, the fitted parameters, and the residuals. The “Advanced Options” provides access to graphics to plot the residuals against the observed values (Residual Plots), to show the fitted line (Line Fit Plots), and an option to constrain the intercept (Constant) to be zero. Of these, it is probably only the Line Fit Plot that is relevant for BS128. o o
NOTE: The outputs from each of the StatPlus facilities are generated in separate Excel workbooks, and there doesn’t appear to be any way of changing this. So, if you want to include these output alongside other calculations and the original data in a single workbook/worksheet, then you will need to copy the relevant cell values from one workbook to the other.