20 Jun 2011 ... Course description: A comprehensive survey course in statistical theory ...
Textbook: The required text is John Rice, Mathematical Statistics and ...
Stat 135 Summer 2011: syllabus Michael Lugo June 20, 2011 Instructor: Michael Lugo. E-mail: mlugo at stat dot berkeley dot edu. Office: 325 Evans. Office hours TBA, 325 Evans. GSI: Siqi Wu. E-mail: siqi at stat dot berkeley dot edu. Office: 397 Evans. Office hours Tuesday 4-5pm, Thursday 3-5pm and Friday 4-5pm, 397 Evans. Class schedule: The lecture meets Monday, Wednesday, and Friday, 10:10 AM to 12 noon, in 534 Davis. The laboratory sections meet Monday and Wednesday, either from 1:10 - 3:00 PM (Sec 101) or 3:10 - 5:00 PM (Sec 102), in 241 Cory.. The first class is on Monday, June 20; the last class is on Friday, August 12. There is no class on Monday, July 4. Course description: A comprehensive survey course in statistical theory and methodology. Topics include descriptive statistics, maximum likelihood estimation, goodness-of-fit tests, analysis of variance, and least squares estimation. The laboratory includes computerbased data-analytic applications to science and engineering. Course web site: The course web site is at http://www.stat.berkeley.edu/~mlugo/ stat135. You should check the web site regularly. Summaries of each lecture and summaries of any slides, code, or data used, as well as links to further resources. Homework assignments will also be posted there. There will be a bspace site for the course as well; however this will only be used for distributing solutions to the homework, making announcements, and posting grades. Textbook: The required text is John Rice, Mathematical Statistics and Data Analysis, third edition. We will follow this book fairly closely. There are older editions of this book; it might be possible to use them but you should make friends with somebody who has the current edition. In particular: • the Bayesian approach to statistics is segregated in the final chapter of the second edition; that chapter no longer exists in the third edition and the material in it has been scattered among the previous chapters. • many of the data sets that we’ll use in homework assignments are on the CD that comes with the third edition. We’ll cover Chapters 7 through 14 of the text. The first six chapters cover probability at the level of Stat 134. They should be familiar with a couple exceptions: sections 4.5 and 4.6 cover the moment-generating function and some approximate methods that we’ll need, 1
and chapter 6 (which is short) covers certain distributions which rarely arise in probability proper but are useful in statistics. We’ll talk about these as they arise. The book Stat Labs by Deborah Nolan and Terry Speed has on occasion been used as a text for this course. It is also available through the UC Library’s “ebrary” site. You may find it of interest if you’d like to see some in-depth applications. Although we won’t make much explicit use of it, Michael Lavine of the University of Massachusetts has written a textbook entitled “Introduction to Statistical Thought” which is available free online. You may find this useful for alternate explanations of some of the concepts we cover. It also makes heavier use of R than the Rice text, so if you want to see more code it’s a good place to look. (The Rice text has some exercises that are fairly data-heavy but it doesn’t assume the use of any particular language.) R: R is an open-source statistical computing package. We will make a lot of use of it, both in class and on the homework assignments. One major purpose of the lab sections is to get you to use R. You do not need to bring a computer to lecture. It might help if you brought one to the lab section. If you have your own computer you’ll want to install R on it. For Windows users: go to http://cran.cnr.berkeley.edu/bin/windows/base/, click on the link “Download R 2.13.0 for Windows”. This will download the installer, which you can then run. For Mac users: go to http://cran.cnr.berkeley.edu/bin/macosx/ and download “R2.13.0.pkg”. (If this doesn’t work let me know; I don’t have a Mac, so I can’t test this.) For Linux users: you probably can figure this out better than I can. If you don’t have your own computer – or even if you do – it would be a good idea for you to get an account from the Statistical Computing Facility. This enables you to use the computers in the computer lab on the third floor of Evans Hall. You need to fill out a form for this, which we’ll provide to you. However if it’s at all possible you should install R on a computer you have access to at all times, since the computer labs are only open during the day on weekdays. Some pointers to R resources: Nolan and Speed have a web site which gives a short list of commonly used commands: http://www.stat.berkeley.edu/users/statlabs/software.html Some books I like on R: Joseph Adler, R in a nutshell; Paul Teetor The R Cookbook. These are more books on R than on statistics. I don’t recommend buying them specially for this course but if you expect that you’re going to go on to use R a lot they might be worth purchasing. There are also a variety of textbooks on “statistics with R”, including the Lavine text already mentioned and Dalgaard, Introductory Statistics with R, which you can access through the library web site. Since R is open-source software, there are a lot of free “how to use R” resources out there on the Internet. Many of them are quite good. If you find some that are not listed here that you think should be shared with the class, send me an e-mail and I’ll post a link to them on the web page. Google or other search engines are also useful; however as you might expect, searching for the letter “R” does not help much. Include other terms in your query, such
2
as “statistics”, “language”, “programming”, or the specific R command you want to know about, to get better answers. There are also a couple meta-resources. The R project has a document on http://www. inside-r.org/howto/how-learn-r. The website Stack Overflow, a question-and-answer website for programming, answers to the question “books for learning the R language” http: //stackoverflow.com/questions/192369/books-for-learning-the-r-language. In particular one highly ranked answer there includes pointers to a large number of books, some of which are available free online. Finally, there are a lot of people out there on the Internet who will help you with R-related things. Good question-and-answer sites include stackoverflow.com and stats.stackexchange.com. However you should not ask people at these sites to do your homework for you. Homework: There will be regularly assigned homework assignments. Most of the homework problems will come from the Rice textbook. There will most likely be one homework assignment for each chapter. Since the chapters are not all of the same length this means that the homework schedule will be somewhat irregular. Each homework assignment will indicate the total number of points that it is worth. In general longer and more complicated problems will count for more points. You must make sure to stay on top of the homework! No late homework will be accepted. However, life is complicated. Therefore you are allowed to drop one homework assignment; I’ll drop the homework assignment that gives you the highest weighted average of the remaining grades. If extenuating circumstances will force you to miss more homeworks, please let me know as soon as possible. Homework is due at the beginning of lecture unless otherwise specified. I’m serious about this. There are two reasons. First, I may sometimes discuss the homework in lecture. Second, if homework is collected at the end of lecture then many students do the homework in class and don’t pay attention. Homework should be neatly written in English and use complete sentences. One goal of this course is to give you some training in scientific communication. Homework does not need to be typed; however, if you plan to go on to graduate school in a technical subject you may want to learn LATEX, a mathematical typesetting system, and now is a good a time as any. See me if you want some advice on this. Collaboration policy: Feel free to discuss the problems amongst yourselves; I recommend this, as you really need to talk about this material in order to learn it. However, you must write up the homework individually. Exams: Two exams – a midterm and a final. The midterm will be on Wednesday, July 20; the final exam will be on the last day of class, Friday, August 12. Note that the latest date to drop a course is Friday, July 1, at the end of the second week of the course. You should have received some graded homework back by this time but you will not have had an exam. The latest date to change the grading option for the course is Friday, July 29; you’ll have the midterm back by this time. If you are a student at another university, you should be aware that taking the course P/NP may create difficulties in obtaining transfer credit.
3
No makeup exams will be given. If you can’t take the exams at these times, you should not take the course. If you require extra time for the exams due to disability, please contact the Disabled Students’ Program (http://dsp.berkeley.edu/); they will inform me of the necessary accommodations. Grading: Exams: 30 percent each. Homework: 40 percent. The homeworks are not equally weighted but will be weighted roughly in proportion to their length. The final exam is somewhat cumulative. The standard grade distribution in upper-division statistics classes assigns roughly equal numbers of As, Bs, and Cs, with a small number of Ds and Fs. I intend to keep roughly to this distribution. However, an overall score of at least 90% will guarantee a grade of at least A−, an overall score of at least 80% will guarantee a grade of at least B−, and an overall score of at least 70% will guarantee a grade of at least C−. Prerequisites: this course has fairly extensive prerequisites. General mathematical prerequisites are Math 1A-1B (two-semester first-year calculus sequence) and 54 (a onesemester course in linear algebra and differential equations). We will occasionally need to take partial derivatives or multiple integrals as is done in Math 53; we won’t need vector calculus. We’ll pretty much only see linear algebra towards the end of the course, when we do least-squares estimation and regression. The only statistics course that is a prerequisite is Statistics 134: Concepts of Probability. Stat 134 is usually taught from Pitman, Probability. There are a few topics that aren’t always covered in an intro probability course (and which Pitman relegates to exercises or “optional” sections) that are important in our course – moment generating functions, the δ method, and order statistics are the most prominent of these. I’ll teach these topics when we need them. Otherwise I won’t spend much time on Stat 134 material. If you’re a student from an institution other than Berkeley and you’re taking this course, you should look at the course descriptions of the prerequisite Berkeley courses. Non-prerequisites: You do not need to have taken any of the introductory “nonmathematical” statistics courses (Stat 2, 20, 21, or 131) to take this course. If you have taken such a course, you’ll have seen some of the ideas that we discuss in this course (survey sampling, some basic hypothesis testing, correlation, and regression) but you are allowed to take both 135 and one of those courses for credit. You also do not need to have taken Stat 133 (Concepts in Computing with Data) to take this course, though it’s recommended; this course will be self-contained with regard to computation. Warning: there are eight weeks in the summer session. There are sixteen weeks in the semester. Sixteen divided by eight is two. Therefore in order to cover the same amount of material as is usually covered in the academic year, we will have to go twice as fast and you will need to do twice as much work. Don’t get behind.
4
Course schedule Below is a tentative schedule for the course. We may (indeed, we probably will) deviate from this. You should regularly check the course web site at http://www.stat.berkeley. edu/~mlugo/stat135 for updates. Numbers refer to sections in Rice, third edition. Monday, June 20: introduction to course. 7.2: population parameters. Start 7.3: simple random sampling. Wednesday, June 22 Continue 7.3: the normal approximation, confidence intervals. 7.4: ratio estimation. Friday, June 24 7.5: stratified sampling. Some interesting problems. Monday, June 27 8.2: Fitting to the Poisson distribution. 8.3: parameter estimation. 8.4: method of moments. Wednesday, June 29 8.5: maximum likelihood estimation. Friday, July 1 8.6: Bayesian parameter estimation. Last day to drop the class. Monday, July 4: Independence Day. No class. Go eat burgers and see fireworks. Wednesday, July 6 8.7: efficiency. 8.8: sufficiency. Friday, July 8 9.1-9.4: introduction to hypothesis testing. Monday, July 11 9.5-9.10: likelihood ratio tests for the multinomial. Poisson dispersion. Some graphical methods. Wednesday, July 13 10.1-10.3: The CDF. Survival and hazard functions. Q-Q plots. Histograms. Friday, July 15 10.4-10.7: Mean, median, trimmed mean. Bootstrapping. Measures of dispersion. Boxplots and scatterplots. continued on next page
5
Course schedule, continued Monday, July 18 11.2: comparing two independent samples (except the Mann-Whitney test) Wednesday, July 20: MIDTERM EXAM Friday, July 22 11.2: Mann-Whitney test. 11.3: comparing paired samples. Monday, July 25 11.4: experimental design. 12.2.1: analysis of variance, the F-test. Wednesday, July 27 12.2.2-12.3.1: multiple comparisons. Bonferroni and Tukey methods. Kruskal-Wallis test. The two-way layout. Friday, July 29 12.3.2-4: randomized block designs. Friedman’s test. 13.2-13.4: Fisher’s exact test, χ2 tests. Last day to change grading option. Monday, August 1: 13.5-6: matched-pair designs and odds ratios. 14.1: introduction to least squares, the bivariate normal. Wednesday, August 3: 14.2: simple linear regression Friday, August 5: 14.3: matrix approach to linear least squares. 14.4: statistical properties of least-squares estimates. Monday, August 8: 14.5-7: multiple regression, inference, local linear smoothing. Wednesday, August 10: TBA. (Perhaps review. Perhaps we’ll be behind at this point and we’ll need the time to get caught up.) Friday, August 12: FINAL EXAM
6