Stata & Time series

3 downloads 0 Views 534KB Size Report
Oct 12, 2011 - In order to assess for Unit Root we can use the Dickey-Fuller test to examine for stochastic trends, using the following command: dfuller unemp ...
Center for Teaching, Research & Learning Social Science Research Lab American University, Washington, D.C. http://www.american.edu/provost/ctrl/ 202-885-3862

Stata & Time series Stata is a general-purpose statistical software package. Stata's full range of capabilities include: data management, statistical analysis, graphics, simulations, and custom programming.

Course Objective This course is designed to give a basic understanding of some of the features available in Stata when working with time series analysis. Time series data represents a pool of variables observed and recorded over time. For this tutorial we are going to use the “Time series.dta” data set containing the following variables: date, unemployment, consumer price index (CPI), interest rate, and GDP growth. “Time series.dta” contains observations for each quarter from 1960 to 2005.

Learning Outcomes 1. 2. 3. 4. 5.

Opening the data set and data description Declaring the data to be Time Series Useful time series command Autocorrelation and cross-correlation analysis Unit Root test

1. Opening the data set and data description We recommend that you create a log file before you start working in Stata, this way you will have all your computations on a file to review afterwards. To do this, go to: File > Log > Begin. This file will record all the input that you type, as well as all the output produced by STATA. Alternatively, you can type (in the command window): log using "C:\Users\CTRL\Desktop\TSlog.log"

Opening the data file. For this tutorial, we will use Time series.dta, which can be downloaded from: http://www.american.edu/provost/ctrl/trainingguides.cfm. In Stata 11 and earlier versions, before you open the dataset, you may need to set the memory size. (In this instance, this isn’t necessary, as the example dataset is relatively small and does not require a lot of memory.) To tell STATA how much memory to set aside for data, type: set mem 100m (This command is not needed in Stata 12) Once you have downloaded and unzipped the dataset, you can access by going to: File > Open. Alternatively, you can type: use "C:\Users\CTRL\Desktop\Time series.dta", clear where the clear option has been appended. This clears Stata’s memory, allowing you to open a new dataset. In order to get a sense of what the data file contains we can use a couple of commands: summerize and describe, both stata commands provide useful information about our data set and variables. Summarize calculates and displays a variety of univariate summary statistics. If no variable list is specified, summary statistics are calculated for all the variables in the dataset. Describe produces a summary of the dataset in memory or of the data stored in a Stata-format dataset. Example using “Time series.dta” summarize Variable

Obs

Mean

unemp cpi interest gdp datevar

181 181 181 181 181

5.914917 95.91184 6.167403 2.031231 90

Std. Dev. 1.453928 54.13317 3.3706 2.001162 52.39434

Min

Max

3.4 29.39667 .98 -1.703726 0

10.66667 192.1667 19.1 9.718504 180

describe Contains data from C:\Users\CTRL\Desktop\Time series.dta obs: 181 vars: 5 12 Oct 2011 10:00 size: 3,620

variable name unemp cpi interest gdp datevar Sorted by: Note:

storage type float float float float float

display format %9.0g %9.0g %9.0g %9.0g %tq

value label

variable label Unemployment Rate Consumer Price Index Federal Funds Interest Rate GDP annual growth Date variable

datevar dataset has changed since last saved

2. Declaring the data to be Time Series Using the time variable “datevar”, we are able to declare the data as times series in order to use the time series operators. Using the tsset command tsset declares the data in memory to be a time series. tssetting the data is what makes Stata's time-series operators such as L. and F. (lag and lead) work. Also, before using the other time series commands, you must tsset the data first. If you save the data after tsset, Stata will remember that data as being time series and you will not have to tsset again. Example using “Time series.dta” tsset datevar

time variable: delta:

datevar, 1960q1 to 2005q1 1 quarter

3. Useful Time Series commands In this section, we introduce a few basic but very helpful commands.

tin (times in, from time A to time B) option: list datevar unemp if tin(2000q1,2000q4)

161. 162. 163. 164.

datevar

unemp

2000q1 2000q2 2000q3 2000q4

4.033333 3.933333 4 3.9

twithin (times within time A and time B, excluding the two time points) option: list datevar unemp if twithin(2001q1,2001q3)

166.

datevar

unemp

2001q2

4.4

Generating values bases on past observations using the lag operator and forward-looking values using the lead operator: generate unempL1=L1.unemp generate unempL2=L2.unemp list datevar unemp unempL1 unempL2 in 1/5

1. 2. 3. 4. 5.

datevar

unemp

unempL1

unempL2

1960q1 1960q2 1960q3 1960q4 1961q1

5.133333 5.233333 5.533333 6.266667 6.8

. 5.133333 5.233333 5.533333 6.266667

. . 5.133333 5.233333 5.533333

generate unempF1=F1.unemp generate unempF2=F2.unemp list datevar unemp unempF1 unempF2 in 1/5

1. 2. 3. 4. 5.

datevar

unemp

unempF1

unempF2

1960q1 1960q2 1960q3 1960q4 1961q1

5.133333 5.233333 5.533333 6.266667 6.8

5.233333 5.533333 6.266667 6.8 7

5.533333 6.266667 6.8 7 6.766667

To generate the difference between current and previous values, use the D operator. The transformations are as follows: D1 = Yt – Yt-1 and D2 = (Yt–Yt-1) – (Yt-1–Yt-2).

generate unempD1=D1.unemp generate unempD2=D2.unemp list datevar unemp unempD1 unempD2 in 1/5

1. 2. 3. 4. 5.

datevar

unemp

unempD1

unempD2

1960q1 1960q2 1960q3 1960q4 1961q1

5.133333 5.233333 5.533333 6.266667 6.8

. .0999999 .3000002 .7333336 .5333333

. . .2000003 .4333334 -.2000003

4. Autocorrelation and cross-correlation analysis In this section, we show you how to explore autocorrelation and cross-correlation. Autocorrelation represent the correlation between a variable and its previous values; use the ac and pac commands. To explore the relationship between two time series, use the command xcorr, making sure that you always list the independent variable first and the dependent variable second.

ac produces a correlogram (a graph of autocorrelations) with pointwise confidence intervals that is based on Bartlett's formula for MA(q) processes. pac produces a partial correlogram (a graph of partial autocorrelations) with confidence intervals calculated using a standard error of 1/sqrt(n). The residual variances for each lag may optionally be included on the graph. xcorr plots the sample cross-correlation function.

Example using “Time series.dta”

-0.50

0.00

0.50

1.00

ac unemp, lags(10)

0

2

4

6

8

10

Lag Bartlett's formula for MA(q) 95% confidence bands

In this case, the autocorrelation graph indicates that unemployment is correlated with up to eight previous quarters.

-1.00

-0.50

0.00

0.50

1.00

pac unemp, lags(10)

0

2

4

6

8

10

Lag 95% Confidence bands [se = 1/sqrt(n)]

xcorr gdp unemp

1.00 0.50 0.00 -0.50 -1.00

-1.00

-0.50

0.00

0.50

1.00

Cross-correlogram

-20

-10

0 Lag

10

20

The graph above indicates that GDP has a negative correlation with unemployment (six to nine months).

5. Unit Root test In this section, we demonstrate how to evaluate if the series has a unit root. When working with times series data sets it is important to look for unit root. If unit root is found in a series this means that more than one trend is present in the series. Let’s look at unemployment across time and test for unit root.

8 6 4

Unemployment Rate

10

12

line unemp datevar

1960q1 1965q1 1970q1 1975q1 1980q1 1985q1 1990q1 1995q1 2000q1 2005q1 Date variable

In order to assess for Unit Root we can use the Dickey-Fuller test to examine for stochastic trends, using the following command:

dfuller unemp, lag(5)

Augmented Dickey-Fuller test for unit root

Z(t)

Test Statistic

1% Critical Value

-2.481

-3.485

Number of obs

=

175

Interpolated Dickey-Fuller 5% Critical 10% Critical Value Value -2.885

-2.575

MacKinnon approximate p-value for Z(t) = 0.1201

In this case the null hypothesis is that unemployment has a unit root. The Z-score yielded by the test shows that unemployment has a unit root, because it falls within the acceptance interval (i.e. |-2.597| < |-3.481|). When testing for unit root on the first difference of unemployment, we will find out that it does not have unit root: dfuller unempD1, lag(5)

Augmented Dickey-Fuller test for unit root

Z(t)

Test Statistic

1% Critical Value

-4.593

-3.485

Number of obs

=

174

Interpolated Dickey-Fuller 5% Critical 10% Critical Value Value -2.885

-2.575

MacKinnon approximate p-value for Z(t) = 0.0001

In this case The Z-score does not fall within the acceptance interval (i.e. |-5.303| > |-3.481|) therefore we can discard a unit root.