Dots representing equal values are stacked. Example: construct a dotplot from
the following data using females: From Elementary Statistics, Mario F. Triola ...
Qualitative (Categorical) Data: Graphs and Tables 1.
2.
3.
Construct and Interpret a frequency distribution and a relative frequency distribution for qualitative data Construct and Interpret bar graphs and Pareto charts Construct and Interpret pie charts
Frequency Distribution •
•
A frequency or count of a category is the number of observations in each category. A frequency distribution for a qualitative variable is a listing of all categories that the variable can take together with the frequencies for each value.
Relative Frequency Distribution •
•
Relative frequency is the frequency divided by the sample size. A relative frequency distribution for a qualitative variable is a listing of all categories that the variable can take together with the relative frequencies for each value.
Example:
Example: page 43, Table 2.8
Example: page 43, problem 6
Use table 2.8 to construct: 6(a) Frequency distribution 6(b) Relative frequency distribution for the variable continent
ANSWER: 6(a), 6(b)
Bar Graph •
A bar graph is used to represent frequencies or relative frequencies for categorical data.
Example: frequency bar graph
Bar Graph • 1.
2.
To construct a bar graph: On the horizontal axis, put the label for each category Draw rectangles (bars) of equal width for each category. The height of each rectangle represents the frequency or relative frequency for that category. The bars should not touch each other.
Example: page 43, problem 6
6(c) Frequency bar graph 6(d) Relative frequency bar graph
ANSWER: page 43, problem 6(c)
ANSWER: page 43, problem 6(d)
Pareto Chart •
A Pareto chart is a bar graph in which the rectangles are presented in decreasing order from left to right.
Example: Pareto chart
Example: page 43, problem 6(e)
6(e) Pareto chart using relative frequencies
Pie Chart •
A Pie chart is a circle divided into sections (slices or wedges) with each section representing a particular category. The size of the section is proportional to the relative frequency of the category.
Example: pie chart
Example: page 43, problem 6(f)
6(f) Pie chart using relative frequencies
Quantitative Data: Graphs and Tables 1.
2.
3.
Construct and Interpret a frequency distribution and a relative frequency distribution for quantitative data Use histograms and frequency polygons to summarize quantitative data Recognize distribution shape, symmetry, and skewness
Frequency Distribution •
A frequency distribution for a quantitative variable separates the data into classes and counts the number of data values or frequencies in each class.
data: heights (inches) of 25 women 67
64
65
65
64
59
67
67
72
65
64
62
66
67
66
60
70
68
61
64
60
68
65
66
62
Same data, but sorted (least to greatest): 59
62
65
66
67
60
64
65
66
68
60
64
65
67
68
61
64
65
67
70
62
64
66
67
72
Quantitative Data: Graphs and Tables 1.
Classes represent a range of data values and are used to group the values in a data set.
2.
The frequency for a particular class is the number of original values that fall into that class.
Frequency Distribution HEIGHT
classes Etc.
FREQUENCY
59-60
3
61-62
3
63-64
4
65-66
7
67-68
6
69-70
1
71-72
1
Lower Class Limits are the smallest numbers that belong to the different classes HEIGHT
FREQUENCY
59-60
3
61-62
3
63-64
4
65-66
7
67-68
6
69-70
1
71-72
1
Lower class limits are red.
Upper Class Limits are the largest numbers that belong to the different classes HEIGHT
FREQUENCY
59-60
3
61-62
3
63-64
4
65-66
7
67-68
6
69-70
1
71-72
1
Upper class limits are red.
Class Width is the difference between two consecutive lower or upper class limits HEIGHT
FREQUENCY
59-60
3
61-62
3
63-64
4
65-66
7
67-68
6
69-70
1
71-72
1
Class width is 2 since:
61 59
2
Class Boundaries are the numbers midway between the numbers that separate classes HEIGHT
FREQUENCY
59-60
3
61-62
3
63-64
4
65-66
7
67-68
6
69-70
1
71-72
1
Class boundaries are:
58.5, 60.5, 62.5, 64.5, 66.5, 68.5, 70.5, 72.5
Calculate Class Boundaries HEIGHT
FREQUENCY
59-60
3
61-62
3
63-64
4
65-66
7
67-68
6
69-70
1
71-72
1
60 61 60.5 2
Calculate Class Boundaries HEIGHT
FREQUENCY
59-60
3
60-62
3
63-64
4
65-66
7
67-68
6
69-70
1
71-72
1
62 63 62.5 2
Class Midpoints are the numbers midway between the lower class limit and the upper class limit HEIGHT
FREQUENCY
59-60
3
61-62
3
63-64
4
65-66
7
67-68
6
69-70
1
71-72
1
Class midpoints are: 59.5, 61.5, 63.5, 65.5, 67.5, 69.5, 71.5
Calculate Class Midpoints HEIGHT
FREQUENCY
59-60
3
61-62
3
63-64
4
65-66
7
67-68
6
69-70
1
71-72
1
59 60 59.5 2
Calculate Class Midpoints HEIGHT
FREQUENCY
59-60
3
61-62
3
63-64
4
65-66
7
67-68
6
69-70
1
71-72
1
61 62 61.5 2
Constructing a frequency distribution by hand 1. Sort the data and determine the number of classes (should be between 5 and 20). Choose the number of classes large enough to show the variability in the data, but not so large that many classes are empty.
Constructing a frequency distribution 2. Calculate the class width using:
maximum data value minimum data value number of classes and round up
Constructing a frequency distribution (cont.) 3. Starting point: Choose the minimum data value or a convenient value below it as the first lower class limit. 4. Using the first lower class limit and class width, proceed to list the other lower class limits. 5. List the lower class limits in a vertical column and proceed to enter the upper class limits. 6. Take each individual data value and put a tally mark in the appropriate class. Add the tally marks to get the frequency.
data Amounts of Strontium-90 (in millibecquerels) in a simple random sample of baby teeth obtained from Philadelphia residents born after 1979 Note: this data is related to Three Mile Island nuclear power plant Accident in 1979.
Data can be sorted using the graphing calculator. The website below has useful graphing calculator tips: www.mathbits.com/MathBits/TISection/Openpage.htm
Entering Data: Data is stored in Lists on the calculator. Locate and press the STAT button on the calculator. Choose EDIT. The calculator will display the first three of six lists (columns) for entering data. Simply type your data and press ENTER. Use your arrow keys to move between lists.
To clear data from a list: Press STAT. From the EDIT menu, move the cursor up ONTO the name of the list (L1). Press CLEAR. Move the cursor down. NOTE: The list entries will not disappear until the cursor is moved down. (Avoid pressing DEL as it will delete the entire column. If this happens, you can reinstate the column by pressing STAT #5 SetUpEditor.)
You may also clear a list by choosing option #4 under the EDIT menu, ClrList. ClrList will appear on the home screen waiting for you to enter which list to clear. Enter the name of a list by pressing the 2nd button and the yellow L1 (above the 1).
Sorting Data • Locate and press the STAT button. Choose option #2, SortA(.
• Specify the list you wish to sort by pressing the 2nd STAT and enter chosen list name. • Press ENTER and the list will be put in ascending order (lowest to highest). SortD will put the list in descending order.
Data has been sorted.
Range of data is Max data value minus min data value:
188 114
74
If we use 8 classes, class width will be
74 / 8 9.25 which we will round up to 10.
Strontium-90 Level
List lower class limits: (start with 110 and use 8 classes of width 10)
110120130140150160170180-
Frequency
Strontium-90 Level 110-119
List upper class limits:
120-129 130-139 140-149 150-159 160-169 170-179 180-189
Frequency
Count the number of data values in each class and enter each count in frequency column:
Strontium-90 Level
Frequency
110-119
2
120-129
2
130-139
5
140-149
9
150-159
13
160-169
6
170-179
2
180-189
1
Relative Frequency Distribution includes the same class limits as a frequency distribution, but the frequency of a class is replaced with relative frequencies (a proportion) or a percentage frequency ( a percent) relative frequency =
class frequency sum of all frequencies (or sample size)
Strontium-90 Level
Frequency
110-119
2
120-129
2
130-139
5
140-149
9
150-159
13
160-169
6
170-179
2
180-189
1
Sum of all frequencies is the total number of data values: 2+2+5+9+13+6+2+1=40
Relative Frequency Distribution Strontium-90 Level
Relative Frequency
110-119
2/40=0.050
120-129
2/40=0.050
130-139
5/40=0.125
140-149
9/40=0.225
150-159
13/40=0.325
160-169
6/40=0.150
170-179
2/40=0.050
180-189
1/40=0.025
Note: sum of relative frequencies is 1
A histogram is a graph of the frequency distribution.
Frequency Distribution Strontium90 Level
Frequency
110-119
2
120-129
2
130-139
5
140-149
9
150-159
13
160-169
6
170-179
2
180-189
1
Histogram
Relative Frequency Histogram
A histogram consists of bars of equal width drawn adjacent to each other (without gaps). The horizontal scale represents the classes of quantitative data values and the vertical scale represents the frequencies. The heights of the bars correspond to the frequency values.
Horizontal Scale for Histogram: Use class boundaries or class midpoints. Vertical Scale for Histogram: Use the class frequencies or relative frequencies.
Histogram with Graphing Calculator •See textbook page 59
Histogram with Graphing Calculator •Press 2nd STATPLOT and choose #1 PLOT 1. You should see the screen below. Be sure the plot is ON, the histogram icon is highlighted, and that the list you will be using is indicated next to Xlist. Freq: 1 means that each piece of data will be counted one time.
Histogram with Graphing Calculator • To see the histogram, press ZOOM and #9 ZoomStat. (ZoomStat automatically sets the window to an appropriate size to view all of the data.) • Press the TRACE key to see on-screen data about the histogram. The spider will jump from bar to bar showing the range of values contained within each bar and the number of entries from the list (n) that fall within that range.
Histogram with Graphing Calculator • To
manually adjust the histogram: • Under your WINDOW button, the Xscl value controls the width of each bar beginning with Xmin and ending with Xmax. (If you wish to see EACH piece of data as a separate interval, set the Xscl to 1) • Select GRAPH (not ZoomStat) • NOTE: If you wish to adjust your own viewing window, (Xmax-Xmin)/Xscl must be less than or equal to 47 for the histogram to be seen in the viewing window.
• NOTE: choosing ZoomStat automatically adjusts Xmin, Xmax, Ymin, Ymax, and Xscl.
Example: page 61
ANSWERS:
(a) 35 (b) 1 (c)10; approximately 115 (d) we will answer this later
Example: page 61
ANSWERS:
(a) 81 inches; 61 inches (b) 70 inches (c) we will answer this later
Example: page 62 A portfolio contains stocks of 19 technology firms. The stock prices are show in this histogram.
Example: page 62, problem 18
ANSWERS: (a) 2 (b) 0.1053 (c) 5 (d) 0.2632
Frequency Polygon: line graph representation of a histogram
Example:
Strontium90 Level
Frequency
110-119
2
120-129
2
130-139
5
140-149
9
150-159
13
160-169
6
170-179
2
180-189
1
Frequency Polygon:
Constructing a frequency polygon For each class, plot a point at the class midpoint with a height equal to the frequency for that class. Then join each consecutive pair of points with a line segment.
Example: page 63
ANSWERS: (a)Approximately 72 (b)Approximately 2 (c)Approximately 18
Dot Plot Each data value is plotted as a point (or dot) along a scale of values. Dots representing equal values are stacked. Example: construct a dotplot from the following data using females:
From Elementary Statistics, Mario F. Triola
Dot Plot Each data value is plotted as a point (or dot) along a scale of values. Dots representing equal values are stacked.
From Elementary Statistics, Mario F. Triola
Stemplot (or Stem-and-Leaf Plot) Represents quantitative data by separating each value into two parts: the stem (such as the leftmost digit) and the leaf (such as the rightmost digit)
From Elementary Statistics, Mario F. Triola
Construct a stemplot of the data:
Stem Leaf (100s/10s) (units) 11
46
12
89
13
03678
14
022455579
15
00011112566 88
16
133569
17
02
18
8
Quantitative Data: Graphs and Tables
3. Recognize distribution shape, symmetry, and skewness
A distribution of a variable is a table, graph, or formula that identifies the variable values and frequencies for all elements in the data set.
Bell Shaped Distribution (1) frequencies increase to a maximum, and then decrease, and (2) symmetry, with the left half of the graph roughly a mirror image of the right half.
Bell Shaped Distribution Figure 2.12 on page 55
Right-Skewed Distribution Right-hand tail is longer. Figure 2.14 on page 56
Left-Skewed Distribution Left-hand tail is longer. Figure 2.15 on page 56
Example: page 61, 10(d)
ANSWERS:
(d) Right skewed
Example: page 61
ANSWER:
(c) Approximately symmetrical
Further Graphs and Tables for Quantitative Data 1.
2. 3.
Build cumulative frequency distributions and cumulative relative frequency distributions Create frequency ogives and relative frequency ogives Construct and Interpret time series graphs
Cumulative Frequency Distribution For a discrete variable, this shows the total number of observations less than or equal to the category value. For a continuous variable, this shows the total number of observations less than or equal to the upper class limit.
Cumulative Relative Frequency Distribution For a discrete variable, this shows the proportion of observations less than or equal to the category value. For a continuous variable, this shows the proportion of observations less than or equal to the upper class limit.
Example: construct the cumulative frequency distribution using: HEIGHT
FREQUENCY
59-60
3
61-62
3
63-64
4
65-66
7
67-68
6
69-70
1
71-72
1
HEIGHT
CUMULATIVE FREQUENCY
59-60
3
61-62
6
63-64
10
65-66
17
67-68
23
69-70
24
71-72
25
3+3 3+3+4 3+3+4+7 ETC.
Cumulative Frequency Distribution
Makes it easy to answer questions such as: how many heights are 68 inches or less? ANSWER: 23
Example: page 70, problem 9
Example: page 70, problem 9 Use histogram from exercise 16 in 2.2
ANSWERS:
Ogive (“oh-jive”) A line graph that depicts cumulative frequencies. The x coordinates are upper class limits and y coordinates are the cumulative frequencies or relative cumulative frequencies.
Example of Frequency Ogive (problem 13, page 70)
Example: page 70, problem 16
ANSWERS: (a) 5 (b) 44.99 (c) 42.49
Time Series Plot A line graph that depicts a time series. The x coordinates are time (hours, days, months, etc.) and y coordinates are the values of the time series data.
Example: page 70, problem 18
ANSWERS: (a) approximately 950 (b) approximately 50
(Cont.) (c) Between which two years do we see the steepest decline in the number of cases? (d) Overall, does this graph represent encouraging or discouraging news?
ANSWERS: (c) Between 1996 and 1997 (d) encouraging
Graphical Misrepresentation of Data 1.
Understand what can make a graph misleading, confusing, or deceptive
Methods for Making a Graph Misleading
1. Graphing/selecting an inappropriate statistic 2. Omitting the zero on the relevant scale 3. Manipulating the scale 4. Inconsistent dimensions between graph and data.
Methods for Making a Graph Misleading
5. Careless combination of categories in a bar graph 6. Inaccuracy in relative lengths or sizes of bars in a bar graph. 7. Biased distortion or embellishment 8. Unclear labeling.
What is the problem with this bar graph?
From Elementary Statistics, Mario F. Triola
ANSWER: Misleading because the bar graphs do not start at zero on the y-axis.
Better comparison of stats:
Annual Incomes of Groups with Different Education Levels From Elementary Statistics, Mario F. Triola
What is the problem with this bar graph?
ANSWER: Misleading. Depicts onedimensional data with three-dimensional boxes. Last box is 64 times as large as first box (in volume), but income is only 4 times as large.
Example 2.18 page 73
What is the problem with this bar graph?
ANSWER: Inappropriate statistic since the population of the US is greater than the other countries depicted. Better to use per capita car theft rate (divide the number of cars stolen by the total number of people in each country).
Example 2.20 page 75 (majors chosen by 25 business school students)
What is the problem with this bar graph?
ANSWER: Scale has been manipulated to deemphasize the differences in relative frequencies.
Example 2.20 page 75 (after scale change)