Chapter 2: The Basic Concepts of Set Theory

88 downloads 6755 Views 11MB Size Report
Dots representing equal values are stacked. Example: construct a dotplot from the following data using females: From Elementary Statistics, Mario F. Triola ...
Qualitative (Categorical) Data: Graphs and Tables 1.

2.

3.

Construct and Interpret a frequency distribution and a relative frequency distribution for qualitative data Construct and Interpret bar graphs and Pareto charts Construct and Interpret pie charts

Frequency Distribution •



A frequency or count of a category is the number of observations in each category. A frequency distribution for a qualitative variable is a listing of all categories that the variable can take together with the frequencies for each value.

Relative Frequency Distribution •



Relative frequency is the frequency divided by the sample size. A relative frequency distribution for a qualitative variable is a listing of all categories that the variable can take together with the relative frequencies for each value.

Example:

Example: page 43, Table 2.8

Example: page 43, problem 6

Use table 2.8 to construct: 6(a) Frequency distribution 6(b) Relative frequency distribution for the variable continent

ANSWER: 6(a), 6(b)

Bar Graph •

A bar graph is used to represent frequencies or relative frequencies for categorical data.

Example: frequency bar graph

Bar Graph • 1.

2.

To construct a bar graph: On the horizontal axis, put the label for each category Draw rectangles (bars) of equal width for each category. The height of each rectangle represents the frequency or relative frequency for that category. The bars should not touch each other.

Example: page 43, problem 6

6(c) Frequency bar graph 6(d) Relative frequency bar graph

ANSWER: page 43, problem 6(c)

ANSWER: page 43, problem 6(d)

Pareto Chart •

A Pareto chart is a bar graph in which the rectangles are presented in decreasing order from left to right.

Example: Pareto chart

Example: page 43, problem 6(e)

6(e) Pareto chart using relative frequencies

Pie Chart •

A Pie chart is a circle divided into sections (slices or wedges) with each section representing a particular category. The size of the section is proportional to the relative frequency of the category.

Example: pie chart

Example: page 43, problem 6(f)

6(f) Pie chart using relative frequencies

Quantitative Data: Graphs and Tables 1.

2.

3.

Construct and Interpret a frequency distribution and a relative frequency distribution for quantitative data Use histograms and frequency polygons to summarize quantitative data Recognize distribution shape, symmetry, and skewness

Frequency Distribution •

A frequency distribution for a quantitative variable separates the data into classes and counts the number of data values or frequencies in each class.

data: heights (inches) of 25 women 67

64

65

65

64

59

67

67

72

65

64

62

66

67

66

60

70

68

61

64

60

68

65

66

62

Same data, but sorted (least to greatest): 59

62

65

66

67

60

64

65

66

68

60

64

65

67

68

61

64

65

67

70

62

64

66

67

72

Quantitative Data: Graphs and Tables 1.

Classes represent a range of data values and are used to group the values in a data set.

2.

The frequency for a particular class is the number of original values that fall into that class.

Frequency Distribution HEIGHT

classes Etc.

FREQUENCY

59-60

3

61-62

3

63-64

4

65-66

7

67-68

6

69-70

1

71-72

1

Lower Class Limits are the smallest numbers that belong to the different classes HEIGHT

FREQUENCY

59-60

3

61-62

3

63-64

4

65-66

7

67-68

6

69-70

1

71-72

1

Lower class limits are red.

Upper Class Limits are the largest numbers that belong to the different classes HEIGHT

FREQUENCY

59-60

3

61-62

3

63-64

4

65-66

7

67-68

6

69-70

1

71-72

1

Upper class limits are red.

Class Width is the difference between two consecutive lower or upper class limits HEIGHT

FREQUENCY

59-60

3

61-62

3

63-64

4

65-66

7

67-68

6

69-70

1

71-72

1

Class width is 2 since:

61 59

2

Class Boundaries are the numbers midway between the numbers that separate classes HEIGHT

FREQUENCY

59-60

3

61-62

3

63-64

4

65-66

7

67-68

6

69-70

1

71-72

1

Class boundaries are:

58.5, 60.5, 62.5, 64.5, 66.5, 68.5, 70.5, 72.5

Calculate Class Boundaries HEIGHT

FREQUENCY

59-60

3

61-62

3

63-64

4

65-66

7

67-68

6

69-70

1

71-72

1

60 61 60.5 2

Calculate Class Boundaries HEIGHT

FREQUENCY

59-60

3

60-62

3

63-64

4

65-66

7

67-68

6

69-70

1

71-72

1

62 63 62.5 2

Class Midpoints are the numbers midway between the lower class limit and the upper class limit HEIGHT

FREQUENCY

59-60

3

61-62

3

63-64

4

65-66

7

67-68

6

69-70

1

71-72

1

Class midpoints are: 59.5, 61.5, 63.5, 65.5, 67.5, 69.5, 71.5

Calculate Class Midpoints HEIGHT

FREQUENCY

59-60

3

61-62

3

63-64

4

65-66

7

67-68

6

69-70

1

71-72

1

59 60 59.5 2

Calculate Class Midpoints HEIGHT

FREQUENCY

59-60

3

61-62

3

63-64

4

65-66

7

67-68

6

69-70

1

71-72

1

61 62 61.5 2

Constructing a frequency distribution by hand 1. Sort the data and determine the number of classes (should be between 5 and 20). Choose the number of classes large enough to show the variability in the data, but not so large that many classes are empty.

Constructing a frequency distribution 2. Calculate the class width using:

maximum data value minimum data value number of classes and round up

Constructing a frequency distribution (cont.) 3. Starting point: Choose the minimum data value or a convenient value below it as the first lower class limit. 4. Using the first lower class limit and class width, proceed to list the other lower class limits. 5. List the lower class limits in a vertical column and proceed to enter the upper class limits. 6. Take each individual data value and put a tally mark in the appropriate class. Add the tally marks to get the frequency.

data Amounts of Strontium-90 (in millibecquerels) in a simple random sample of baby teeth obtained from Philadelphia residents born after 1979 Note: this data is related to Three Mile Island nuclear power plant Accident in 1979.

Data can be sorted using the graphing calculator. The website below has useful graphing calculator tips: www.mathbits.com/MathBits/TISection/Openpage.htm

Entering Data: Data is stored in Lists on the calculator. Locate and press the STAT button on the calculator. Choose EDIT. The calculator will display the first three of six lists (columns) for entering data. Simply type your data and press ENTER. Use your arrow keys to move between lists.

To clear data from a list: Press STAT. From the EDIT menu, move the cursor up ONTO the name of the list (L1). Press CLEAR. Move the cursor down. NOTE: The list entries will not disappear until the cursor is moved down. (Avoid pressing DEL as it will delete the entire column. If this happens, you can reinstate the column by pressing STAT #5 SetUpEditor.)

You may also clear a list by choosing option #4 under the EDIT menu, ClrList. ClrList will appear on the home screen waiting for you to enter which list to clear. Enter the name of a list by pressing the 2nd button and the yellow L1 (above the 1).

Sorting Data • Locate and press the STAT button. Choose option #2, SortA(.

• Specify the list you wish to sort by pressing the 2nd STAT and enter chosen list name. • Press ENTER and the list will be put in ascending order (lowest to highest). SortD will put the list in descending order.

Data has been sorted.

Range of data is Max data value minus min data value:

188 114

74

If we use 8 classes, class width will be

74 / 8 9.25 which we will round up to 10.

Strontium-90 Level

List lower class limits: (start with 110 and use 8 classes of width 10)

110120130140150160170180-

Frequency

Strontium-90 Level 110-119

List upper class limits:

120-129 130-139 140-149 150-159 160-169 170-179 180-189

Frequency

Count the number of data values in each class and enter each count in frequency column:

Strontium-90 Level

Frequency

110-119

2

120-129

2

130-139

5

140-149

9

150-159

13

160-169

6

170-179

2

180-189

1

Relative Frequency Distribution includes the same class limits as a frequency distribution, but the frequency of a class is replaced with relative frequencies (a proportion) or a percentage frequency ( a percent) relative frequency =

class frequency sum of all frequencies (or sample size)

Strontium-90 Level

Frequency

110-119

2

120-129

2

130-139

5

140-149

9

150-159

13

160-169

6

170-179

2

180-189

1

Sum of all frequencies is the total number of data values: 2+2+5+9+13+6+2+1=40

Relative Frequency Distribution Strontium-90 Level

Relative Frequency

110-119

2/40=0.050

120-129

2/40=0.050

130-139

5/40=0.125

140-149

9/40=0.225

150-159

13/40=0.325

160-169

6/40=0.150

170-179

2/40=0.050

180-189

1/40=0.025

Note: sum of relative frequencies is 1

A histogram is a graph of the frequency distribution.

Frequency Distribution Strontium90 Level

Frequency

110-119

2

120-129

2

130-139

5

140-149

9

150-159

13

160-169

6

170-179

2

180-189

1

Histogram

Relative Frequency Histogram

A histogram consists of bars of equal width drawn adjacent to each other (without gaps). The horizontal scale represents the classes of quantitative data values and the vertical scale represents the frequencies. The heights of the bars correspond to the frequency values.

Horizontal Scale for Histogram: Use class boundaries or class midpoints. Vertical Scale for Histogram: Use the class frequencies or relative frequencies.

Histogram with Graphing Calculator •See textbook page 59

Histogram with Graphing Calculator •Press 2nd STATPLOT and choose #1 PLOT 1. You should see the screen below. Be sure the plot is ON, the histogram icon is highlighted, and that the list you will be using is indicated next to Xlist. Freq: 1 means that each piece of data will be counted one time.

Histogram with Graphing Calculator • To see the histogram, press ZOOM and #9 ZoomStat. (ZoomStat automatically sets the window to an appropriate size to view all of the data.) • Press the TRACE key to see on-screen data about the histogram. The spider will jump from bar to bar showing the range of values contained within each bar and the number of entries from the list (n) that fall within that range.

Histogram with Graphing Calculator • To

manually adjust the histogram: • Under your WINDOW button, the Xscl value controls the width of each bar beginning with Xmin and ending with Xmax. (If you wish to see EACH piece of data as a separate interval, set the Xscl to 1) • Select GRAPH (not ZoomStat) • NOTE: If you wish to adjust your own viewing window, (Xmax-Xmin)/Xscl must be less than or equal to 47 for the histogram to be seen in the viewing window.

• NOTE: choosing ZoomStat automatically adjusts Xmin, Xmax, Ymin, Ymax, and Xscl.

Example: page 61

ANSWERS:

(a) 35 (b) 1 (c)10; approximately 115 (d) we will answer this later

Example: page 61

ANSWERS:

(a) 81 inches; 61 inches (b) 70 inches (c) we will answer this later

Example: page 62 A portfolio contains stocks of 19 technology firms. The stock prices are show in this histogram.

Example: page 62, problem 18

ANSWERS: (a) 2 (b) 0.1053 (c) 5 (d) 0.2632

Frequency Polygon: line graph representation of a histogram

Example:

Strontium90 Level

Frequency

110-119

2

120-129

2

130-139

5

140-149

9

150-159

13

160-169

6

170-179

2

180-189

1

Frequency Polygon:

Constructing a frequency polygon For each class, plot a point at the class midpoint with a height equal to the frequency for that class. Then join each consecutive pair of points with a line segment.

Example: page 63

ANSWERS: (a)Approximately 72 (b)Approximately 2 (c)Approximately 18

Dot Plot Each data value is plotted as a point (or dot) along a scale of values. Dots representing equal values are stacked. Example: construct a dotplot from the following data using females:

From Elementary Statistics, Mario F. Triola

Dot Plot Each data value is plotted as a point (or dot) along a scale of values. Dots representing equal values are stacked.

From Elementary Statistics, Mario F. Triola

Stemplot (or Stem-and-Leaf Plot) Represents quantitative data by separating each value into two parts: the stem (such as the leftmost digit) and the leaf (such as the rightmost digit)

From Elementary Statistics, Mario F. Triola

Construct a stemplot of the data:

Stem Leaf (100s/10s) (units) 11

46

12

89

13

03678

14

022455579

15

00011112566 88

16

133569

17

02

18

8

Quantitative Data: Graphs and Tables

3. Recognize distribution shape, symmetry, and skewness

A distribution of a variable is a table, graph, or formula that identifies the variable values and frequencies for all elements in the data set.

Bell Shaped Distribution (1) frequencies increase to a maximum, and then decrease, and (2) symmetry, with the left half of the graph roughly a mirror image of the right half.

Bell Shaped Distribution Figure 2.12 on page 55

Right-Skewed Distribution Right-hand tail is longer. Figure 2.14 on page 56

Left-Skewed Distribution Left-hand tail is longer. Figure 2.15 on page 56

Example: page 61, 10(d)

ANSWERS:

(d) Right skewed

Example: page 61

ANSWER:

(c) Approximately symmetrical

Further Graphs and Tables for Quantitative Data 1.

2. 3.

Build cumulative frequency distributions and cumulative relative frequency distributions Create frequency ogives and relative frequency ogives Construct and Interpret time series graphs

Cumulative Frequency Distribution For a discrete variable, this shows the total number of observations less than or equal to the category value. For a continuous variable, this shows the total number of observations less than or equal to the upper class limit.

Cumulative Relative Frequency Distribution For a discrete variable, this shows the proportion of observations less than or equal to the category value. For a continuous variable, this shows the proportion of observations less than or equal to the upper class limit.

Example: construct the cumulative frequency distribution using: HEIGHT

FREQUENCY

59-60

3

61-62

3

63-64

4

65-66

7

67-68

6

69-70

1

71-72

1

HEIGHT

CUMULATIVE FREQUENCY

59-60

3

61-62

6

63-64

10

65-66

17

67-68

23

69-70

24

71-72

25

3+3 3+3+4 3+3+4+7 ETC.

Cumulative Frequency Distribution

Makes it easy to answer questions such as: how many heights are 68 inches or less? ANSWER: 23

Example: page 70, problem 9

Example: page 70, problem 9 Use histogram from exercise 16 in 2.2

ANSWERS:

Ogive (“oh-jive”) A line graph that depicts cumulative frequencies. The x coordinates are upper class limits and y coordinates are the cumulative frequencies or relative cumulative frequencies.

Example of Frequency Ogive (problem 13, page 70)

Example: page 70, problem 16

ANSWERS: (a) 5 (b) 44.99 (c) 42.49

Time Series Plot A line graph that depicts a time series. The x coordinates are time (hours, days, months, etc.) and y coordinates are the values of the time series data.

Example: page 70, problem 18

ANSWERS: (a) approximately 950 (b) approximately 50

(Cont.) (c) Between which two years do we see the steepest decline in the number of cases? (d) Overall, does this graph represent encouraging or discouraging news?

ANSWERS: (c) Between 1996 and 1997 (d) encouraging

Graphical Misrepresentation of Data 1.

Understand what can make a graph misleading, confusing, or deceptive

Methods for Making a Graph Misleading

1. Graphing/selecting an inappropriate statistic 2. Omitting the zero on the relevant scale 3. Manipulating the scale 4. Inconsistent dimensions between graph and data.

Methods for Making a Graph Misleading

5. Careless combination of categories in a bar graph 6. Inaccuracy in relative lengths or sizes of bars in a bar graph. 7. Biased distortion or embellishment 8. Unclear labeling.

What is the problem with this bar graph?

From Elementary Statistics, Mario F. Triola

ANSWER: Misleading because the bar graphs do not start at zero on the y-axis.

Better comparison of stats:

Annual Incomes of Groups with Different Education Levels From Elementary Statistics, Mario F. Triola

What is the problem with this bar graph?

ANSWER: Misleading. Depicts onedimensional data with three-dimensional boxes. Last box is 64 times as large as first box (in volume), but income is only 4 times as large.

Example 2.18 page 73

What is the problem with this bar graph?

ANSWER: Inappropriate statistic since the population of the US is greater than the other countries depicted. Better to use per capita car theft rate (divide the number of cars stolen by the total number of people in each country).

Example 2.20 page 75 (majors chosen by 25 business school students)

What is the problem with this bar graph?

ANSWER: Scale has been manipulated to deemphasize the differences in relative frequencies.

Example 2.20 page 75 (after scale change)