Essential statistics, regression, and econometrics / Gary Smith. p. cm. Includes
bibliographical references and index. ISBN 978-0-12-382221-5 (hardcover: alk.
Essential Statistics, Regression, and Econometrics
Essential Statistics, Regression, and Econometrics Gary Smith Pomona College
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Academic Press is an imprint of Elsevier
Academic Press is an imprint of Elsevier 225 Wyman Street, Waltham, MA 02451, USA 525 B Street, Suite 1800, San Diego, California 92101-4495, USA 84 Theobald’s Road, London WC1X 8RR, UK © 2012 Gary Smith. Published by Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the Publisher. Details on how to seek permission, further information about the Publisher’s permissions policies, and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency can be found at our website: www.elsevier.com/permissions This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Smith, Gary, 1945Essential statistics, regression, and econometrics / Gary Smith. p. cm. Includes bibliographical references and index. ISBN 978-0-12-382221-5 (hardcover: alk. paper) 1. Regression analysis–Textbooks. I. Title. QA278.2.S6127 2012 519.5–dc22 2011006233 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN: 978-0-12-382221-5 For information on all Academic Press publications visit our website: www.elsevierdirect.com Printed in the United States of America 11 12 13 9 8 7 6 5 4 3 2 1
Contents Introduction ............................................................................................ xi Chapter 1: Data, Data, Data ..................................................................... 1 1.1 Measurements ......................................................................................... 2 Flying Blind and Clueless ............................................................................. 3 1.2 Testing Models ....................................................................................... 4 The Political Business Cycle ......................................................................... 5 1.3 Making Predictions ................................................................................. 5 Okun’s Law ................................................................................................ 5 1.4 Numerical and Categorical Data ................................................................ 6 1.5 Cross-Sectional Data ............................................................................... 6 The Hamburger Standard .............................................................................. 7 1.6 Time Series Data .................................................................................... 8 Silencing Buzz Saws .................................................................................... 8 1.7 Longitudinal (or Panel) Data ................................................................... 10 1.8 Index Numbers (Optional) ...................................................................... 10 The Consumer Price Index .......................................................................... 11 The Dow Jones Index ................................................................................. 11 1.9 Deflated Data ....................................................................................... 12 Nominal and Real Magnitudes ..................................................................... 13 The Real Cost of Mailing a Letter ................................................................ 15 Real Per Capita ......................................................................................... 16 Exercises .................................................................................................... 17
Chapter 2: Displaying Data ...................................................................... 27 2.1 Bar Charts ............................................................................................ 27 Bowlers’ Hot Hands ................................................................................... 30 Bar Charts with Interval Data ...................................................................... 32 2.2 Histograms ........................................................................................... 34 Letting the Data Speak ............................................................................... 38 Understanding the Data ............................................................................... 39 2.3 Time Series Graphs ............................................................................... 43 Unemployment during the Great Depression .................................................. 45 Cigarette Consumption ................................................................................ 46
© 2010 by Elsevier Inc. All rights reserved.
v
Contents
2.4 Scatterplots ........................................................................................... 47 Okun’s Law .............................................................................................. 48 The Fed’s Stock Valuation Model ................................................................ 48 Using Categorical Data in Scatter Diagrams ................................................... 50 2.5 Graphs: Good, Bad, and Ugly ................................................................. 51 Omitting the Origin .................................................................................... 51 Changing the Units in Mid-Graph ................................................................ 53 Choosing the Time Period ........................................................................... 55 The Dangers of Incautious Extrapolation ....................................................... 57 Unnecessary Decoration .............................................................................. 58 Exercises .................................................................................................... 59
Chapter 3: Descriptive Statistics ................................................................ 73 3.1 Mean ................................................................................................... 73 3.2 Median ................................................................................................ 75 3.3 Standard Deviation ................................................................................ 76 3.4 Boxplots .............................................................................................. 77 3.5 Growth Rates ........................................................................................ 80 Get the Base Period Right ........................................................................... 80 Watch out for Small Bases .......................................................................... 82 The Murder Capital of Massachusetts ........................................................... 82 The Geometric Mean (Optional) ................................................................... 83 3.6 Correlation ........................................................................................... 85 Exercises .................................................................................................... 90
Chapter 4: Probability ........................................................................... 101 4.1 Describing Uncertainty ......................................................................... 102 Equally Likely ......................................................................................... 103 A Chimp Named Sarah ............................................................................. 104 Long-Run Frequencies .............................................................................. 106 A Scientific Study of Roulette .................................................................... 106 Experimental Coin Flips and Dice Rolls ...................................................... 107 Subjective Probabilities ............................................................................. 107 Bayes’s Approach .................................................................................... 107 4.2 Some Helpful Rules ............................................................................. 108 The Addition Rule .................................................................................... 108 Conditional Probabilities ........................................................................... 109 Independent Events and Winning Streaks ..................................................... 111 The Fallacious Law of Averages ................................................................ 112 The Multiplication Rule ............................................................................ 114 Legal Misinterpretations of the Multiplication Rule ....................................... 115 The Subtraction Rule ................................................................................ 116 Bayes’s Rule ........................................................................................... 117 A Bayesian Analysis of Drug Testing ......................................................... 118
© 2010 by Elsevier Inc. All rights reserved.
vi
Contents
4.3 Probability Distributions ....................................................................... 119 Expected Value and Standard Deviation ...................................................... 120 The Normal Distribution ........................................................................... 122 Probability Density Curves ........................................................................ 122 Standardized Variables .............................................................................. 124 The Central Limit Theorem ....................................................................... 125 Finding Normal Probabilities ...................................................................... 127 The Normal Probability Table .................................................................... 128 Nonstandardized Variables ......................................................................... 128 One, Two, Three Standard Deviations ......................................................... 130 Exercises .................................................................................................. 131
Chapter 5: Sampling ............................................................................. 141 5.1 Populations and Samples ...................................................................... 142 5.2 The Power of Random Sampling ........................................................... 143 Random Samples ..................................................................................... 143 Sampling Evidence in the Courtroom .......................................................... 144 Choosing a Random Sample ...................................................................... 145 Random Number Generators ...................................................................... 146 5.3 A Study of the Break-Even Effect .......................................................... 146 Imprecise Populations ............................................................................... 148 Texas Hold ’Em ....................................................................................... 149 5.4 Biased Samples ................................................................................... 149 Selection Bias .......................................................................................... 149 Survivor Bias ........................................................................................... 150 Does Anger Trigger Heart Attacks? ............................................................ 151 5.5 Observational Data versus Experimental Data .......................................... 152 Conquering Cholera .................................................................................. 152 Confounding Factors ................................................................................. 153 Exercises .................................................................................................. 154
Chapter 6: Estimation ........................................................................... 163 6.1 Estimating the Population Mean ............................................................ 164 6.2 Sampling Error .................................................................................... 165 6.3 The Sampling Distribution of the Sample Mean ....................................... 166 The Shape of the Distribution .................................................................... 166 Unbiased Estimators ................................................................................. 167 Sampling Variance ................................................................................... 169 Putting It All Together .............................................................................. 169 Confidence Intervals ................................................................................. 171 6.4 The t Distribution ................................................................................ 172 6.5 Confidence Intervals Using the t Distribution .......................................... 174 Do Not Forget the Square Root of the Sample Size ....................................... 176 Choosing a Confidence Level .................................................................... 177
© 2010 by Elsevier Inc. All rights reserved.
vii
Contents
Choosing a Sample Size ............................................................................ 177 Sampling from Finite Populations ............................................................... 178 Exercises .................................................................................................. 179
Chapter 7: Hypothesis Testing ................................................................. 189 7.1 Proof by Statistical Contradiction ........................................................... 190 7.2 The Null Hypothesis ............................................................................ 191 7.3 P Values ............................................................................................ 192 Using an Estimated Standard Deviation ....................................................... 195 Significance Levels ................................................................................... 196 7.4 Confidence Intervals ............................................................................ 197 7.5 Matched-Pair Data ............................................................................... 198 Immigrant Mothers and Their Adult Daughters ............................................. 200 Fortune’s Most Admired Companies ........................................................... 200 Fateful Initials? ........................................................................................ 201 7.6 Practical Importance versus Statistical Significance ................................... 201 7.7 Data Grubbing .................................................................................... 203 The Sherlock Holmes Inference .................................................................. 204 The Super Bowl and the Stock Market ........................................................ 205 Extrasensory Perception ............................................................................ 206 How (Not) to Win the Lottery .................................................................... 207 Exercises .................................................................................................. 208
Chapter 8: Simple Regression .................................................................. 219 8.1 The Regression Model ......................................................................... 219 8.2 Least Squares Estimation ...................................................................... 224 8.3 Confidence Intervals ............................................................................ 226 8.4 Hypothesis Tests ................................................................................. 228 8.5 R2 ..................................................................................................... 230 8.6 Using Regression Analysis .................................................................... 234 Okun’s Law Revisited ............................................................................... 234 The Fair Value of Stocks ........................................................................... 235 The Nifty 50 ........................................................................................... 236 8.7 Prediction Intervals (Optional) ............................................................... 238 Exercises .................................................................................................. 240
Chapter 9: The Art of Regression Analysis ................................................. 259 9.1 Regression Pitfalls ............................................................................... 260 Significant Is Not Necessarily Substantial .................................................... 260 Correlation Is Not Causation ...................................................................... 261 Detrending Time Series Data ..................................................................... 265 Incautious Extrapolation ............................................................................ 268 Regression Toward the Mean ..................................................................... 270 The Real Dogs of the Dow ........................................................................ 273
© 2010 by Elsevier Inc. All rights reserved.
viii
Contents
9.2 Regression Diagnostics (Optional) .......................................................... 274 Looking for Outliers ................................................................................. 274 Checking the Standard Deviation ................................................................ 275 Independent Error Terms ........................................................................... 278 Exercises .................................................................................................. 280
Chapter 10: Multiple Regression .............................................................. 297 10.1 The Multiple Regression Model ........................................................... 298 The General Model ................................................................................. 299 Dummy Variables ................................................................................... 300 10.2 Least Squares Estimation .................................................................... 303 Confidence Intervals for the Coefficients .................................................... 305 Hypothesis Tests .................................................................................... 306 The Coefficient of Determination, R2 ......................................................... 306 Prediction Intervals for Y (Optional) .......................................................... 307 The Fortune Portfolio .............................................................................. 307 The Real Dogs of the Dow Revisited ........................................................ 308 Would a Stock by Any Other Ticker Smell as Sweet? .................................. 308 The Effect of Air Pollution on Life Expectancy .......................................... 309 10.3 Multicollinearity ................................................................................ 310 The Coleman Report ............................................................................... 311 Exercises .................................................................................................. 312
Chapter 11.1 11.2 11.3
11: Modeling (Optional) ............................................................. 333 Causality .......................................................................................... 333 Linear Models ................................................................................... 334 Polynomial Models ............................................................................ 335
Interpreting Quadratic Models ................................................................... 336 11.4 Power Functions ................................................................................ 337 Negative Elasticities ................................................................................ 339 The Cobb-Douglas Production Function ..................................................... 342 11.5 Logarithmic Models ........................................................................... 344 11.6 Growth Models ................................................................................. 345 The Miracle of Compounding ................................................................... 346 Frequent Compounding ........................................................................... 347 Continuous Compounding ........................................................................ 348 11.7 Autoregressive Models ....................................................................... 350 Exercises .................................................................................................. 353
Appendix ............................................................................................ 365 References .......................................................................................... 367 Index ................................................................................................. 377
© 2010 by Elsevier Inc. All rights reserved.
ix
Introduction Econometrics is powerful, elegant, and widely used. Many departments of economics, politics, psychology, and sociology require students to take a course in regression analysis or econometrics. So do many business, law, and medical schools. These courses are traditionally preceded by an introductory statistics course that adheres to the fire hose pedagogy: bombard the students with information and hope they do not drown. Encyclopedic statistics courses are a mile wide and an inch deep, and many students remember little after the final exam. This textbook focuses on what students really need to know and remember. Essential Statistics, Regression, and Econometrics is written for an introductory statistics course that helps students develop the statistical reasoning they need for regression analysis. It can be used either for a statistics class that precedes a regression class or for a one-term course that encompasses statistics and regression analysis. One reason for this book’s focused approach is that there is not enough time in a one-term course to cover the material in more encyclopedic books. Another reason is that an unfocused course overwhelms students with so much nonessential material that they have trouble remembering the essentials. This book does not cover the binomial distribution and related tests of a population success probability. Also omitted are difference-in-means tests, chi-square tests, and ANOVA tests. These are not crucial for understanding and using regression analysis. Instructors who cover these topics can use the supplementary material at the book’s website. The regression chapters at the end of the book set up the transition to a more advanced regression or econometrics course and are also sufficient for students who take only one statistics class but need to know how to use and understand basic regression analysis. This textbook is intended to give students a deep understanding of the statistical reasoning they need for regression analysis. It is innovative in its focus on this preparation and in the extended emphasis on statistical reasoning, real data, pitfalls in data analysis, modeling issues, and word problems. Too many students mistakenly believe that statistics courses are too abstract, mathematical, and tedious to be useful or interesting. To demonstrate the power, elegance, and even beauty of statistical reasoning, this book includes a large number of
xi
Introduction
interesting and relevant examples, and discusses not only the uses but also the abuses of statistics. These examples show how statistical reasoning can be used to answer important questions and also expose the errors—accidental or intentional—that people often make. The examples are drawn from many areas to show that statistical reasoning is an important part of everyday life. The goal is to help students develop the statistical reasoning they need for later courses and for life after college. I am indebted to the reviewers who helped make this a better book: Woody Studenmund, The Laurence de Rycke Distinguished Professor of Economics, Occidental College; Michael Murray, Bates College; Steffen Habermalz, Northwestern University; and Manfred Keil, Claremont Mckenna College. Most of all, I am indebted to the thousands of students who have taken statistics courses from me—for their endless goodwill, contagious enthusiasm, and, especially, for teaching me how to be a better teacher.
© 2010 by Elsevier Inc. All rights reserved.
xii