Confidence Intervals

Chapter 6: Confidence Intervals and Hypothesis Testing When analyzing data, we can’t just accept the sample mean or sample proportion as the official

ˆ (sample mean and sample mean or proportion. When we estimate the statistics x , p proportion), we get different answers due to variability. So we have to perform statistical inference: ‐ Confidence Interval: when you want to estimate a population parameter ‐ Significance Testing: when we want to assess the evidence provided by the data in favor of some claim about the population. Section 6.1: Confidence Intervals ‐allow us to estimate a range of values for the population mean or proportion. The true mean or proportion for the population exists and is a fixed number, but we don’t know it! Using sample statistics we get an estimate of where we expect the population parameter to be.

If we take a single sample, our single confidence interval “net” may or may not include the population parameter. However if we take many samples of the same size and create a confidence interval from each sample statistic, over the long run 95% of our confidence intervals will contain the true population parameter (if we are using a 95% confidence level).

1

If you increase your sample size (n), you decrease your margin of error

If you increase your confidence level (C), then you increase your margin of error

A smaller margin of error is good because we get a smaller range of where to expect the true population parameter. Confidence interval formulas look like estimate  margin of error. We write the intervals as (lower bound, upper bound).

2

Confidence Interval for a Population Mean, :

x  z*

x n

where z* is the value on the standard normal curve with are C between –z* and z*. z*

1.645

1.960

2.576

C

90%

95%

99%

(Table D in the back of the book contains more values, but these are the most common) Sample Size, n, for Desired Margin of Error, m:

 z * x  n   m  2

Note that it is the sample size, n, that influences the margin of error. The population size has nothing to do with it. Ways to reduce your margin of error: 1.) Increase sample size 2.) Use a lower level of confidence (smaller C) 3.) Reduce  x

x  z*

x

n under certain circumstances: Be careful!!!! You can only use the formula  Data must be an SRS from the population.  Do not use if the sampling is anything more complicated than an SRS.  Data must be collected correctly (no bias). The margin of error covers only random sampling errors. Undercoverage and nonresponse are not covered.  Outliers can have a big effect on the confidence interval. (This makes sense because we use the mean and standard deviation to get a CI.) 

You must know the standard deviation of the population,

 x .

3

EXAMPLE 1: A questionnaire of spending habits was given to a random sample of college students. Each student was asked to record and report the amount of money they spent on textbooks in a semester. The sample of 130 students resulted in an average of $422 with standard deviation of $57. a) Give a 90% confidence interval for the mean amount of money spent by college students on textbooks. b) Is it true that 90% of the students spent the amount of money found in the interval from part (a)? Explain your answer. c) What is the margin of error for the 90% confidence interval? d) How many students should you sample if you want a margin of error of $5 for a 90% confidence interval? 4

EXAMPLE 2: A sample of 12 STAT 301 students yields the following Exam 1 scores: 78 62 99 85 94 53 88 90 86 92 75 92 Assume that the population standard deviation is 10. The sample mean can be calculated using SPSS or calculator to be 82.83. (Note: Do NOT use any SPSS confidence intervals—they are good only for Chapter 7, not this type of CI. You must get these Z confidence intervals by hand.) a) Find the 90% confidence interval for the mean score  for STAT 301 students. b) Find the 95% confidence interval. c) Find the 99% confidence interval. d) How do the margins of error in (b), (c), and (d) change as the confidence level increases? Why?

5

Section 6.2: Hypothesis Testing The 4 steps common to all tests of significance: 1. State the null hypothesis H0 and the alternative hypothesis Ha. 2. Calculate the value of the test statistic. 3. Draw a picture of what Ha looks like, and find the P‐value. 4. State your conclusion about the data in a sentence, using the P‐value and/or comparing the P‐value to a significance level for your evidence. STEP 1: State the null hypothesis H0 and the alternative hypothesis Ha. To do a significance test, you need 2 hypotheses:  H0, Null Hypothesis: the statement being tested, usually phrased as “no effect” or “no difference”.  Ha, Alternative Hypothesis: the statement we hope or suspect is true instead of H0. Hypotheses always refer to some population or model. Not to a particular outcome. Hypotheses can be one‐sided or two‐sided.  One‐sided hypothesis: covers just part of the range for your parameter H0:  = 10 OR H0:  = 10 Ha:  10  Two‐sided hypothesis: covers the whole possible range for your parameter H0:  = 10 Ha:   10 Even though Ha is what we hope or believe to be true, our test gives evidence for or against H0 only. We never prove H0 true, we can only state whether we have enough evidence to reject H0 (which is evidence in favor of Ha, but not proof that Ha is true) or that we don’t have enough evidence to reject H0. 6

Example (Exercise 6.37, p. 418): Each of the following situations requires a significance test about a population mean . State the appropriate null hypothesis H0 and alternative hypothesis Ha in each case: a. Census Bureau data shows that the mean household income in the area served by a shopping mall is $72,500 per year. A market research firm questions shoppers at the mall to find out whether the mean household income of mall shoppers is higher than that of the general population. b. Last year, your company’s service technicians took an average of 1.8 hours to respond to trouble calls from business customers who had purchased service contracts. Do this year’s data show a different average response time?

STEP 2: Calculate the value of the test statistic. A test statistic measures compatibility between the H0 and the data. The formula for the test statistic will vary between different types of problems. In problems like those we studied in Chapter 6, the test statistic will be the Z‐score.

STEP 3: Draw a picture of what Ha looks like, and find the P‐value. P‐value: the probability, computed assuming that H0 is true, that the test statistic would take a value as extreme or more extreme than that actually observed due to random fluctuation. It is a measure of how unusual your sample results are.  The smaller the P‐value, the stronger the evidence against H0 provided by the data.  Calculate the P‐value by using the sampling distribution of the test statistic (only the normal distribution for Chapter 6).

STEP 4: Compare your P‐value to a significance level. State your conclusion about the data in a sentence.  Compare P‐value to a significance level, . 

If the P‐value  , we can reject H0.



If you can reject H0, your results are significant.



If you do not reject H0, your results are not significant.

7

Z‐Test for a Population Mean To test the hypothesis H0:  = 0 based on an SRS of size n from a population with unknown mean  and known standard deviation ,

Z0 

x  0 / n



compute the test statistic:



the P‐values for a test of H0 against:

Ha:  > 0 is P( Z  Z0)

Ha: 