“HANDS-ON INTERMEDIATE. ECONOMETRICS USING R” ..... #qeq has
monotonic series, the correct series is after applying ordxx. #q[ordxx[1:n]]=qseq[1:
n].
Exercises for Chapter 9 of Vinod’s “HANDS-ON INTERMEDIATE ECONOMETRICS USING R” H. D. Vinod Professor of Economics, Fordham University, Bronx, New York 10458 Abstract These are exercises to accompany the above-mentioned book with the URL: http://www.worldscibooks.com/economics/6895.html. At this time all of the following exercises are suggested by H. D. Vinod (HDV) himself. Vinod invites readers to suggest new and challenging exercises (along with full and detailed answers) dealing with the discussion in Chapter 9 and related discussion from econometric literature by sending e-mail to
[email protected]. If the answers involve R programs, they must work. Readers may also suggest improvements to the answers and/or hints to existing exercises. If we include exercises and improvements suggested by readers, we promise to give credit to such readers by name. Furthermore, we will attach the initials of readers to individual exercises to identify the reader. Some R outputs are suppressed for brevity.
9
Exercises Mostly Based on Chapter 9 (Bootstraps) of the text
Bootstrap is a computer intensive method for statistical inference. One needs basic knowledge of such inference (confidence intervals, p-values, critical values, type I and type II errors, etc) for a proper understanding of the bootstrap.
1
9.1
Exercise (Elementary Theory of p values)
Show that computation of the sample mean can be viewed as a regression on a column of ones. Assuming we are interested in sample means, what are p values and how to compute them in R? 1) How likely is it to see a sample mean of 2.84 (or something further below the mean) if the true mean is µ = 3.0, given that the sample size is n=100, and σ=0.8? 2) Find the rejection region for a one-tail test when the left tail area is 0.10. 3) Suppose a sample mean is taken with the following results: n = 64, x¯ = 53.1, and σ=10. Find the test statistic, the p-value and your conclusion for the test of the null hypothesis that µ=52 against the alternative hypothesis: µ 6= 52. ANSWER: Consider an artificial example. set.seed(342); y=sample(20:40); mean(y) reg=lm(y~1);reg The sample mean is 30 and it is also the intercept as seen below. >mean(y) [1] 30 > reg=lm(y~1);reg Call: lm(formula = y ~ 1) Coefficients: (Intercept) 30 Now we remind the reader of some elementary theory important for this chapter on bootstrap. The p-Value Approach to Hypothesis Testing is a somewhat recent approach which refines the older approaches based on confidence intervals and critical values. The steps in p-value approach involve: Convert a sample statistic (e.g. x¯) to a test Statistic ( z or t statistic) Obtain the p-value from a table or computer and then Compare the p-value with α: 2
If p-value < α , reject H0 If p-value ≥ α , do not reject H0 By definition, p-value = “Probability of obtaining a test statistic more extreme ( ≤ or ≥ ) than the observed sample statistic value computed after assuming that the given H0 is true.” The p-value is also called “observed level of significance” and “the smallest value of α for which H0 can be rejected.” Answer to 1): How likely is it to see a sample mean of 2.84 (or something further below the mean) if the true mean is µ = 3.0? n=100, σ=0.8. Note that the sample statistic is: stc=(2.84-3)/SE, SE=0.8/sqrt(100), stc=−2 R has a function called ‘pnorm’ which computes the area under the standard normal distribution z ∼ N(0,1). If we give ‘pnorm’ any z value, the function ‘pnrom’ computes the entire area under the normal curve from −∞ to that z. That is, pnorm computes the cumulative distribution function of z. We use the standard normal density since the sample size is ≥ 30 and population variance σ is known. In general, unless the variance is unknown or sample size is too small for the central limit theorem to kick in, we do not use the t distribution. For examples: normal.left.tail.tillMinus2=pnorm(-2) normal.left.tail.till.0=pnorm(0) rbind(normal.left.tail.tillMinus2, normal.left.tail.till.0) [,1] normal.left.tail.tillMinus2 0.02275013 normal.left.tail.till.0 0.50000000 Thus the answer to the first question is that it has the p-value of 0.0228 when rounded to 4 places after the decimal point. Since p-value of 0.0228 < .05 the α value, we reject the null hypothesis that µ = 3. ANS to 2) Rejection region is defined from knowing the dividing line between accept and reject. Given the tail area or the probability that z random variable is inside the tail, we want the z value which defines the dividing line. The R command: ‘qnorm(0.10, lower.tail=FALSE)’ finds the quantile of the normal density as -1.28. Thus the rejection region is to the left of -1.28
3
ANS. to 3) Recall that n = 64, x¯ = 53.1 σ=10, the null hypothesis that µ=52. The test statistic is generally obtained by subtracting the hypothesized value from observed sample value divided √ by the standard error (SE): stc = (53.1 − 52)/SE, where SE = 10/ (64), hence stc = 0.88 Now we compute the p-value for a two-sided test. Given the test statistic (z value) finding the p value means finding a probability or area under the N(0,1) curve. The R command: pnorm(0.88, lower.tail=FALSE) will give the upper z value. Issuing this command shows that the one tail area is 0.1894. Since this is a two-tail test we must double the just computed one tail area to get the p-value. Thus the p-value= 2*0.1894297 = 0.3788594. Since the p-value > 0.05=α, we conclude that the null is not rejected (is accepted) by the evidence. For philosophical discussion of why we do not say “accept” see http://en.wikipedia.org/wiki/P-value R also comes with some ready-made functions for doing elementary tests in the package ‘PASWR’ (aligned with he book ‘probability and statistics with R’, Arnholt (2008). Of interest here are functions ‘zsum.test’ and ‘tsum.test’ for situations when summary stats are input by the user. Readers weak in probability and statistics are encouraged to read the book cited in the package and try various examples in the package.
9.2
Exercise (Hedonic Price, Parametric Bootstrap)
Use data on hedonic prices (called ‘Hedonic’) of census tracts in Boston available in the package ‘Ecdat’ Croissant (2006). Regress price on selected regressors. Let price be mv (median value of owner occupied homes) and regressors be crim (crime rate), nox (annual average nitrogen oxide concentration in parts per hundred million), and rm (average number of rooms) as a control variable. Discuss the traditional inference to check whether the effect of crime is significantly negative. Now simulate with Student’s t using the actual degrees of freedom. Answer: library(Ecdat); data(Hedonic); attach(Hedonic) reg=lm( mv~crim+nox+rm) confint(reg) As one can see from the following table, the effct of crime and pollution is significantly negative while the effect of room size is significantly positive according to the traditional inference. 4
2.5 % 97.5 % (Intercept) 9.153440 9.412582 crim -0.018794 -0.013118 nox -0.008209 -0.004633 rm 0.020519 0.025648 Use snippet R9.2.1 in the text Vinod (2008). #some hints follow df=summary(reg)$df[2] bigj=999 thet=rep(NA,bigj)# place to store theta
9.3
Exercise (Hedonic Price, Nonparametric Bootstrap)
Use a nonparametric bootstrap for the hedonic price model of the earlier exercise. (Hint: modify the snippet #R9.3.2 for nonparametric iid bootstrap)
9.4
Exercise (Hedonic Price, Kernel Nonlinear Regression Bootstrap)
Use a nonparametric Kernel regression for the hedonic price model of the earlier exercise. First check if the average (amorphous partial derivative) effect of crime on house prices is negative under kernel regression method. Now use the bootstrap by repeating this kernel method a large number (say 99, since kernel method is time consuming) of times. (Hint: modify the snippet #R9.3.3 for nonlinear bootstrap) library(Ecdat); data(Hedonic); attach(Hedonic) #reg=lm( mv~crim+nox+rm) library(np) bw