using the hfcs with stata - Google Groups

26 downloads 199 Views 132KB Size Report
The HFCS has several particularities that make it a rather complex data set, though using the appropriate Stata .... for
USING THE HFCS WITH STATA

HOUSEHOLD FINANCE AND CONSUMPTION SURVEY TECHNICAL SERIES VERSION 1.5 – JULY 2015

© European Central Bank, 2012 Address Kaiserstrasse 29, 60311 Frankfurt am Main, Germany Postal address Postfach 16 03 19, 60066 Frankfurt am Main, Germany Telephone +49 69 1344 0 Internet http://www.ecb.europa.eu Fax +49 69 1344 6000 All rights reserved. Reproduction for educational and non-commercial purposes is permitted provided that the source is acknowledged.

ECB Using the HFCS with Stata July 2015

1

1 INTRODUCTION This document proposes a step-by-step approach to working with the Household Finance and Consumption Survey (HFCS) in Stata (starting in version 11.1), taking into account both the multiple imputation framework and the replicate weights, in order to provide correctly calculated standard errors. The HFCS has several particularities that make it a rather complex local weight="aweight" * statistic by default - the median if "`statistic'"=="" local statistic="p50" /* calculate the statistic of interest with tabstat */ tabstat `varlist' [`weight'`exp'] if `touse', s(`statistic') by(`over') save

8

ECB Using the HFCS with Stata July 2015

matrix _zz=r(Stat1) if _zz[1,1]==. matrix _zz=r(StatTotal) capture matrix drop _z /* construct the e(b) output and assign the correct colnames */ local i=1 local names while _zz[1,1] ~= . { if "`i'"=="1" matrix _z=_zz else matrix _z=_z,_zz local names="`names' `r(name`i')'" local ++i matrix _zz=r(Stat`i') } matrix colnames _z = `names' /* the sample used */ gen one= `touse' ereturn post _z, esample(one) /* arguments required by svy and mi */ ereturn local cmd medianize ereturn local title "Medianize `statistic'" quiet count if `touse' ereturn local N r(N) capture matrix drop _z _zz end The following commands show the use of medianize. mi estimate: svy: medianize hb1701 mi estimate: svy: medianize hb1701, over(sa0100) mi estimate: svy: medianize hb1701, over(sa0100) stat(p10) (Do not forget to add the vceok and the esampvaryok in case of errors.)

7 USEFUL COMMANDS The information specific to each household member (e.g. age, gender, education, personal income) are stored in the P files. To merge this information with the household-level +string(ra0010) drop id reshape wide r* p* f* , i(sa0100 sa0010) j(tmp) string

8 ADDITIONAL COMMANDS AND INSTRUCTIONS 8.1

FINE-TUNING THE CALCULATION OF THE VARIANCE

svy, bsn(1.001001) Adding this option to the svy command can correct the denominator used in the bootstrap variance formula. By default in Stata the number B of replicate weights is used, whereas in the literature it is also possible to find B-1. With 1000 replicate weights, the difference is marginal. 8.2

SPEED IMPROVEMENTS

Working with 60,000 households, 5 implicates, and 1,000 replicate weights takes a consequent amount of time. Optimizing the use of the mi routines is thus helpful. 8.2.1

INSTRUCTIONS THAT CAN BE USED IN (ALMOST) ALL CASES

mi estimate, noupdate:... For commands that do not modify the data, the noupdate option skips a check in Stata, and allows commands to run slightly faster, especially on big and complex datasets like the HFCS. 8.2.2

INSTRUCTIONS WHICH AFFECT THE RESULTS

The following few instructions can be helpful during exploratory work, but need to be rolled back when preparing the final material. mi estimate, nimputations(2):... This option only considers 2 implicates, and not all 5 of them. The point estimates and the standard errors are therefore not correct.

10

ECB Using the HFCS with Stata July 2015

mi svyset [pw=hw0010], bsrweight(wr0001-wr0010) /// vce(bootstrap) Changing the number of bootstrap weights reduces the number of computations that need to be made. The point estimates are correct, but the standard errors are not. By using the two previous instructions, the number of computations drops from 5,000 to 20. The time needed to calculate the mean of a variable is shown in the table below. TABLE

TIME NEEDED TO RUN COMMANDS WITH MI AND SVY Bootstrap

Time (in seconds)

Implicates

weights

noupdate

Mean

Median

5

1000

no

119

759

5

1000

yes

117

757

5

10

yes

20

25

2

1000

yes

48

305

2

10

yes

8

12

1

1000

-

21

154

1

10

-

3

2

Using a low number of replicate weights seems to be a preferable alternative, since the amount of time to be saved by using less implicates is limited. Moreover, point estimates are in that case correct, and only standard errors need to be recomputed.

ECB Using the HFCS with Stata July 2015

11