ERRATA UPDATE as of Nov 24, 2007
Negative Binomial Regression Cambridge University Press Joseph M. Hilbe Arizona State University
[email protected];
[email protected] The book was first released for sale on July 29, 2007 at the Joint Statistical Meetings (JSM) of the American Statistical Association, held at Salt Lake City, UT. I have read through the text and identified errors that were overlooked during the editing process. I apologize for these oversights. If readers find other errors, please contact me and I shall post them to this sight. I hope that these errors can be corrected in the second printing of the text. Below the list of Errata, I providing a few thoughts regarding the discussion found in the text, perhaps giving you a better insight into the reason I wrote it, to whom it is directed, and thoughts about the statistics involved. I finished writing the main part of the text in 2006; the book was completed in early 2007. The subject has advanced during the interval, and I wish to provide a brief update. I have rewritten large parts of chapter 10 in light of recent advances in the area – in particular for GEE models. You can download the revised Chapter 10 at: http://www.statistics.com/other/hilbe/index.php Both this Errata page and the various data sets and user authored statistical commands for examples used in the text are posted to the web site for the book at: http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=9780521857727 Find the place on the lower left side of the web site to access the 33 data sets identified in Appendix E as well as Stata commands used to create examples in the text. Data files are available in the following formats: Stata, SAS, SPSS, Excel, R, and Limdep. I am posting Stata files in both version 10 format as well as in the older version 8-9 format. Users of Stata 10 can read files saved in older versions, but users of versions under 10 cannot read files saved using version 10. Details of these files, as well as of the other formats, are explained on the web site. Several of the examples used in the text were modeled using Stata commands not found in commercial Stata or on the text’s web site. These commands should mostly be found at the following web site: http://ideas.repec.org/s/boc/bocode.html
The majority of the relevant commands are from 2005, associated with my name. The site allows for easy searching. Others not on the site are posted to this book’s web site. I should give a quick note as to my intended audience. The book is not directed to professional statisticians, although many will likely find new and hopefully interesting information in the text. Rather, the book was primarily written for those researchers who have little background in count response models, but who find that they have a need to learn about them for an upcoming project or study. I have written the text in as clear a manner as possible, many times re-emphasizing important items that need to be remembered in the modeling process. My tone is more like a classroom presentation rather than a formal text. I have attempted to speak directly to the reader, giving advice as to the comparative modeling process --- outlining the algorithmic basis of the respective count models, detailing different methods of estimation, selecting the appropriate model, assessing fit, interpreting parameter estimates and ancillary parameters, and so forth. Examples are given for each model discussed. The end result is a book that can prove useful to researchers, to graduate students who need to have a workable understanding of count models, as well as anyone else who is simply interested in this area of statistical modeling. The focus of discussion is negative binomial regression, which the reader will find designates a broad range of models. I have primarily used Stata throughout the text for examples, with Limdep being used for examples where no Stata command yet exists. Stata is one of the most popular statistical applications worldwide, and has commands that accommodate nearly every negative binomial model discussed in this text. I am including user-authored commands that are easily attainable as well. It is second only to Limdep in the range of count response model offerings. SAS, SPSS, S-Plus, Statistica, Genstat, and other popular commercial packages have only minimal count model capabilities. R has many user count models, but not nearly as many as found in Limdep or Stata. Moreover, I have previously authored a number of published statistical procedures using the Stata language – the majority of which relate to count response modeling. Stata was therefore the reasonable choice for displaying example output.
ERRATA PREFACE, Page x: 2nd line of 3rd complete paragraph. Web address: “www.cambridge.org/XXXXX” should read www.cambridge.org/9780521857727”. This latter correct address is found in Appendix E (p. 240). INTRODUCTION, Page 2: last sentence of first paragraph: The SPSS program name is mistaken. It is now GENLIN, not GLZ. Ch 2, Page 20, 1st paragraph, line 8: Add “family”, so that the line should read: “link, variance, and family functions. The algorithm took care …”
CH 2, Page 21: line 2: First word of line reads “or”. It should read "and". Ch 2, Page 22: Equation 2.6 : Should read: X1 = X0 – f(X0)/f ’(X0) + : Equation 2.8 : 2nd equation has a missing “=” sign. Should read ∂2L = ∂2L/∂β∂β’ CH 2, Page 27, Table 2.1, AIC statistic note: The note to the program line defining the AIC statistic reads: “/* AIC is sometimes defined w/o η */”. The last term, η, should read, n, not η. See comments below. Ch 3, page 40, 1st line under equation 2.53. “caste” should be spelled “cast”. CH 3, page 47, equation 3.23: There should be an equal sign between Zi and the other terms CH 4, Page 52, second to last line of program code on page. The line should read: . gen xb = 1 + .5*x1 - .75*x2 + .25*x3 CH 4, Page 63: last line of code in Table 4.1: It should read: "w = 1/sqrt(sc)" instead of "w = µ/sqrt" CH 4, Page 73, 6th line of Stata commands: should read: . di 20.02131 + .3*20.02312^2 /* mean + alpha*mean^2 */ CH 5, Page 78. Table 5.1: The last two items should be numbered 24 and 25, and not 22 and 22. CH 5, Page 82:Figures 5.19 and 5.20: Choose functions should not have division sign between top and bottom terms. CH 5, Page 94: value of variance under the word “hence”: In the list of formulae there are two equations with a left hand side of V(µ). The 2nd of these two equations should read V’(µ) instead of V(µ). Therefore V’(µ) = 1 + 2αµ CH 5, Page 95: table 5.5, 1st line in loop: Should read: w = (/)+(y-){/(1+2+2+2)} CH 5, Page 96: Section 5.5, line 6: Should be “two-parameter”, not “one-parameter”. The line should read: “Poisson variance and µ2/ν the two-parameter gamma distribution variance. We” CH 5, page 96: Section 5.5, Paragraph 3, line 2. Delete comma between “version” and “of”.
CH 6, Page 112: Formula for the negative binomial variance is mistaken Should read: µ + αµ2 The formula as it currently reads is missing α in the 2nd term. CH 6, Page 119: Example 3, first line: “log” rather than “data” Should read: “These data come from the 1912 Titanic survival log. It consists…” CH 6, Page 128: line under first table display on top of page: There is now a “?” at the end of the line. It should be deleted. CH 7, Page 136: Paragraph 2, line 4, first word: Word “variance” should read “assumptions”. Line 4 of 2nd paragraph should start out as “assumptions of the Poisson distribution. Other models are …” CH 7, Page 139, line 3 under table of parameter estimates, 1st word of line: Word “restricted” should be changed to “expected”. The line should read, in part: “expected for a geometric model. Recall that unless …” CH 7, Page 143, 2nd last line of last full paragraph. Replace term “Figure 7.1” with “Figure 7.2”. CH 7, Page 153, last two lines on page. Should read: -2{(-60322.021) – (-60258.97)} 126.102 CH 7, Page 154, second line top paragraph: Should read: Chi2tail(1,126.102) = 2.921e-29. CH 7, Page 155, 7th-6th line from bottom: Yang, Hardin, and Addy (2006), not 2007. CH 8, Page 171, bottom-page model output. Negative binomial age3 coefficient should read: .023721. The decimal point was inadvertently dropped. CH 8, Page 172, AIC Formula should read AIChurdle = ((AICZT * (1-N>0/N) + AICbinary CH 8, Page 174, Final full paragraph, final 4 lines: The three terms predicting zero counts are parameterized in terms of µ rather than xβ, as shown in the two formulae mid-page. To be consistent, the discussion should be in terms of xβ in both places. The last 4 lines should read: “zero count are; (1) logistic inverse link, i.e. 1/(1+exp(-xβ)), the prediction that y==1, (2) 1- [1/(1+exp(-xβ)), and (3) the negative …”
CH 9, Page 180, Section 9.1.1, 2nd paragraph, line 3: “lower” should read “higher”, comma to semicolon: Therefore, the line should read: “wish to extend C to any higher value in the observed distribution; the value to” CH 9, Page 181, Header for bottom table: Should read: “POISSON: DROPPED VALUES 0-3” CH 9, Page 193, line 2 under top table of parameter estimates: “selection” instead of “selected”. The sentence should therefore read: “to the selection corrected Poisson”. CH 10, Pages 213-225, panel models using the progabide data. It is preferable to use i(id) and t(t) options rather than what are used. Both make sense, but using the above two options are more sensical and result in a better interpreted model. CH 10, Page 226, Section 10.5, line 1: “also” should read “sometimes”. Line should read: “Multilevel models are sometimes called hierarchical models, particularly …” In fact, for clarification purposes, I would rather add an additional sentence after the first. The first three sentences of the paragraph should then read: “Multilevel models are sometimes called hierarchical models, particularly in educational and social science research. However, the majority of statisticians now tend to draw a distinction between multilevel and hierarchical models, primarily because of the manner in which the methods define order of levels, or nesting. Regardless, the idea behind multilevel modeling is to model…” NOTE: Chapter 10 has been rewritten and can be downloaded from: http://www.statistics.com/other/hilbe/index.php Many of the comments made above and below related to chapter 10 are accommodated in the new Chapter. See comments on the new chapter at the end of this document.
COMMENTS ON DISCUSSION Page 27: AIC and BIC statistics. First note that text between /* and */ is comment, and not processed by the algorithm. The problem here is the last term, η. It should read, n, not η. n represents the number of observations in the model; η is the link function, which is also the linear predictor. The book also implies in various places that the BIC statistic can only be defined using the deviance function, not the log-likelihood. I don’t believe that I actually stated this, but I think it is implied. Of course, this is not the case; the BIC can be defined in terms of the log-likelihood: BIC = -2{LL – k*ln(k))/n, where k is the number of model predictors and n the number of model observations. LL is the log-likelihood function.
The model having the lowest value for its BIC statistic is the preferred model, fitting better than the others. The degree of model preference is based on the absolute difference between the BIC statistics of two models. A table of preferences can be shown as: |difference| Degree of preference ----------------------------------------------0 2 Weak 2 8 Positive 6 - 10 Strong > 10 Very Strong Models A and B: If BICA – BICB If BICA – BICB
< >
0, 0,
then A preferred then B preferred
It is important to recognize that the AIC statistic appears with and without the denominator, n. Both forms are common, so care must be taken when comparing models to make certain that the same definition has been used in all cases you are comparing. As with the BIC, the model having the lowest AIC is preferred over others. Page 28: Section 2.2, opening paragraph. This paragraph may seem confusing, and I offer a re-write below this paragraph. The paragraph as it exists in the book implies that we have not yet addressed Fisher scoring, or the IRLS method of estimation – that it will follow the forthcoming discussion of Newton-Raphson type methods. However, the reader will know that we just finished talking about IRLS methods, detailing the theoretical justification as well as the algorithm. I wrote this section a year and a half ago, and do not recall the exact rationale of the wording. However, I believe that I intended to first discuss N-R methods, then IRLS. I later changed it for pedagogical purposes, but failed to make the change to this paragraph. On the other hand, we will discuss GLMs employing the observed information matrix in section 2.2.2, directly following the derivation of the generalized Newton-Raphson type algorithm. Therefore, there is some (only some) truth to what is implied in this paragraph with respect to order of discussion, but it does need a revision as expressed below. My apologies to the reader. NEW PARAGRAPH In this section we discuss the derivation of the Newton-Raphson type algorithm. Until recently, the only method used to estimate the standard negative binomial model was by maximum likelihood estimation using a Newton-Raphson based algorithm. All other varieties of negative binomial are still estimated using a Newton-Raphson based routine. We shall observe in this section though, that the Iteratively Re-weighted Least Squares method we discussed in the last section, known as Fisher Scoring, is a subset of the Newton-Raphson method. We conclude by showing how the parameterization of the GLM mean, , can be converted to X’. Page 38, Question 7: The AIC and BIC statistics are defined on page 27, as part of Table 2.1. There are some types of models that do not produce a log-likelihood function, and therefore not a deviance function. Quasi-likelihood models do not produce a viable log-
likelihood function that can be used in an AIC statistic. Some software uses a deviance statistic for the basis of IRLS modeling, but does use or define the log-likelihood. In these cases an AIC statistic is not produced. Likewise for a model that uses a loglikelihood function, but has no defined deviance. If it employs the BIC statistic requiring a deviance, then it does not display a BIC statistic. The user can usually calculate it for themselves. Pages 85-90 Graphs The lack of color in the book makes it difficult to determine which line is associated with a specific mean and/or alpha. For Figures 5.1 – 5.6, which show different means for the same value of alpha (α), the values of the mean from top to bottom at count 0, are: [ 0.5, 1, 2, 5, 10] For Figures 5.7 - 5.11, the values of alpha for a specific mean, from top to bottom at count 0, are [3, 1.5, 1, .67, .33, 0]. The BOOK COVER is the same as Figure 5.10. Page 126: line 3 of text (under display of table). Sentence beginning with “Age” is missing the word “the”. It should read,” Age and education are not contributory to the model”. This is not really an error, just better grammar. Page 163: Wald statistic in bottom output (header): The Wald chi2(2) is now reported as a Likelihood Ratio test (LR test) in Stata’s ztnb command. The output was created in a program by the same name I wrote before Stata offered the command. Published on the SSC site, Stata Corp used it as the basis of the new command, with LR rather than Wald, which came automatically with all ML estimations. Page 174: Bottom of page: Pursuant to the amendment I made to the text, as shown above under Errata, the following addition to the correction could be made for clarification: “(2) 1- [1/(1+exp(-xβ)), the prediction that y==0, and …”. Also recall that for the final formula of the paragraph, exp(xβ) can be substituted for µ. CH 10: GEE models using progabide data The version of the Stata software used for developing GEE models using the progabide data did not recognize that the stationary, nonstationary, and unstructured correlation structures are infeasible for this data; i.e. the algorithm did not recognize that the correlation structure was not positive definite. The correlation matrix for these specific models are not reliable. The software has since been amended to display an error message when the matrix is infeasible, and will not produce a table of estimates or a postestimation correlation matrix. Stata versions 9.2 and higher work as they should; I am not certain at which lower version the xtgee command was amended. I suggest that the reader ignore the inherent unreliability of these specific model results, reading the text and its interpretation as if appropriate convergence was achieved – as it appears in the output. The pedagogical value of the discussion is nevertheless valid. The correlation values produced, together with parameter estimates and standard errors,
appear to be reasonable, and can be used with value in demonstrating how the models are to be estimated and evaluated. The “stationary 4” correlation structure is feasible for this data, unlike other stationary correlation structures. It can be developed using the command: . xtgee seizures time timeXprog, fam(poi) i(id) t(t) corr(stat 4) force
It is important to use the force option when time intervals are not all equal. If in fact the time intervals are equal, the force option will have no effect. Note on new Chapter 10, GEE models: Justine Shults at the University of Pennsylvania School of Medicine (Biostatistics) and her colleagues have built on previous work done by N.R. Chaganty to construct an iterative adjustment to the underlying GEE algorithm which guarantees, for selected correlation structures, a consistent estimate of the correlation parameter and a positive definite estimated correlation matrix. The method is called Quasi-Least Squares (QLS) and has particular use when the corresponding GEE model is misspecified. In such cases the model typically fails to converge. Currently the only correlation structures developed for QLS are the exchangeable, first order autoregressive (AR 1), first order stationary (which Shults calls tri-diagonal), and Markov, which is not in any current commercial GEE application. I recommend reading J. Shults, S.J. Ratcliffe, and M. Leonard, (2007). “Improved generalized estimating equation analysis via xtqls for quasi-least squares in Stata”, The Stata Journal, Vol 7:2 pp147-166. In the same issue, James Cui has an article titled, “QIC program and model selection in GEE analysis” pp. 209-220. Both should be of interest to those interested in modeling longitudinal and otherwise correlated data using GEE methodology.