STATA Tutorial - Pearson

5 downloads 40 Views 58KB Size Report
STATA Tutorial to accompany Stock/Watson Introduction to Econometrics ... of the results in Chapters 3-11 of Introduction to Econometrics were computed using.
STATA Tutorial to Accompany

Introduction to Econometrics by James H. Stock and Mark W. Watson

STATA Tutorial to accompany Stock/Watson Introduction to Econometrics

Copyright © 2003 Pearson Education Inc.

Adopters of Stock/Watson, Introduction to Econometrics, may modify the information in this tutorial exclusively for the purpose of their classes.

STATA is a powerful statistical and econometric software package. This tutorial is an introduction to some of the most commonly used features in STATA. These features were used for the statistical analysis reported in Chapters 3-11 of Introduction to Econometrics. This tutorial provides the necessary background to reproduce the results in Chapters 3-11 and to carry out related exercises. 1. GETTING STARTED STATA is available for computer systems such as Windows, Mac, and Unix. Your professor will tell you how to access STATA at your university. Most of you will be using STATA on Windows computers. To access STATA on these computers, you need to double-click on the STATA icon. This icon will probably be labeled Intercooled Stata X, where X is a number indicating the version of STATA available on your system. (All of the results in Chapters 3-11 of Introduction to Econometrics were computed using Intercooled STATA 7.) Once you have started STATA, you will see a large window containing several smaller windows. At this point you can load the dataset and begin the statistical analysis. STATA can be operated “interactively” or in “batch mode.” When you use STATA interactively, you type a STATA command in the STATA Command window and hit the Return/Enter key on your keyboard. STATA executes the command and the results are displayed in the STATA Results window. Then you enter the next command, STATA executes it, and so forth, until the analysis is complete. Even the simplest statistical analysis will involve several STATA commands. When STATA is used in batch mode, all of the commands for the analysis are listed in a file, and STATA is told to read the file and execute all of the commands. These files are called do files by STATA and are saved using a .do suffix. For example, all of the STATA commands for the analysis in Chapter 4 of the text are contained in a file called ch4.do available on the Web site. When STATA executes this file, all of the empirical results for Chapter 4 are produced. Using STATA in batch mode has two important advantages over using STATA interactively. First, the do file provides an audit trail for your work. The file provides an exact record of each STATA command. Second, even the best computer programmers will make typing or other errors when using STATA. When a command contains an error, it won’t be executed by STATA, or worse, it will be executed but produce the wrong result. Following an error, it often necessary to start the analysis from the beginning. If you are using STATA interactively, you must retype all of the commands. If you are using a do file, then you only need to correct the command containing the error and rerun the file. For these reasons, you are strongly encouraged to use do files, and this tutorial will discuss STATA used in this way.

1

2. DO-FILE INGREDIENTS AND KEY COMMANDS A STATA do file has four different kinds of commands or ingredients: 1. Administrative commands that tell STATA where to save results, how to manage computer memory, and so forth 2. Commands that tell STATA to read and manage datasets 3. Commands that tell STATA to modify existing variables or to create new variables 4. Commands that tell STATA to carry out the statistical analysis

There are many commands within each category. A list of some of the most useful commands is given at the end of this tutorial. This section and the next will work through five examples of do files. After you have worked through these examples, you will have a good basic understanding of STATA. To begin, you need to download the do files. These files are called files stata1.do, stata2.do, … , stata5.do and are available on the Web site. Create a folder (or directory) on your computer called statafiles and download the do files to this folder. You also need to download the file caschool.dta to this folder. The do files are “text” or “ascii” files. They can be read with any text editor. For example, if you are using Windows, you can use Notepad to edit these files. If you use a word processor such as MS Word, make sure that you save the files using the Save as type “Text Only” option, and save them using the suffix .do. The file stata1.do contains seven lines of text log using \statafiles\stata1.log, replace use \statafiles\caschool.dta describe generate income = avginc*1000. summarize income log close exit

The first line is an administrative command that tells STATA where to write the results of the analysis. STATA output files are called log files, and the first line tells STATA to open a log file called stata1.log in the folder (or directory) statafiles. If there is already a file with the same name in the folder, STATA is instructed to replace it. The second and third lines concern the dataset. Datasets in STATA are called dta files. The dataset used in this example is caschool.dta, which you downloaded to the folder statafiles. The second line tells STATA the location and name of the dataset to be used 2

for the analysis. The third line tells STATA to “describe” the dataset. This command produces a list of the variable names and any variable descriptions stored in the dataset. The fourth line tells STATA to create a new variable called income. The new variable is constructed by multiplying the variable avginc by 1000. The variable avginc is contained in the dataset and is the average household income in a school district expressed in thousands of dollars. The new variable income will be the average household income expressed in dollars instead of thousands of dollars. The fifth line of the program tells STATA to compute some summary statistics (mean, standard deviations, and so forth) for income. The STATA command for this is summarize.

The last two lines close STATA. The command log close closes the file stata1.log that contains the output. The command exit tells STATA that the program has ended. To execute this do file, first open STATA on your computer. (Double-click on the Intercooled STATA icon on a Windows computer. Follow your professor’s instructions for opening STATA on other computers.) Click on the File menu, then Do, and then select the file \statafiles\stata1.do. This will run the do file. You can see the program being executed in the STATA Results window. You might see that the program execution pauses and that –– more –– is displayed at the bottom of the Results window. If this happens, push any key on the keyboard and execution will continue. (We will get rid of this annoyance with a STATA command in the next example.) You can exit STATA by clicking on the File menu and then Exit. STATA will ask you if you really want to exit. Respond “Yes”. Your output will be in \statafiles\stata1.log. Open this file using a text editor, and you will see the statistical output. This example has used six STATA commands: log using use describe generate log close exit

These commands are summarized at the end of this tutorial. The next section introduces several more STATA commands in four other do files. After you work through these examples, you will understand the elements of STATA.

3

3. MORE EXAMPLES stata2.do Here are the commands in stata2.do # delimit ; *************************************************; * Administrative Commands; *************************************************; set more off; clear; log using \statafiles\stata2.log,replace; *************************************************; * Read in the Dataset; *************************************************; use \statafiles\caschool.dta; describe; *************************************************; * Transform data and Create New Variables; *************************************************; **** Construct average district income in $'s; generate income = avginc*1000; *************************************************; * Carry Out Statistical Analysis; *************************************************; ***** Summary Statistics for Income; summarize income; *************************************************; * End the Program ; *************************************************; log close; exit;

The file stata2.do carries out exactly the same calculations as stata1.do; however it uses four features of STATA for more complicated analyses. The first new command is # delimit ;. This command tells STATA that each STATA command ends with a semicolon. If STATA does not see a semicolon at the end of the line, then it assumes that the command carries over to following line. This is useful because complicated commands in STATA are often too long to fit on a single line. stata2.do contains an example of a STATA command written on two lines: near the bottom of the file you see the command summarize income written on two lines. STATA combines these two lines into one command because the first line does not end with a semicolon. While two lines are not necessary for this command, some STATA commands can get long, so it is good to get used to using this feature. A word of warning: if you use the # delimit ; command, it is critical that you end each command with a semicolon. Forgetting the semicolon on even a single line means that the do file will not run properly. The second new feature in stata2.do is that many of the lines begin with an asterisk. STATA ignores the text that comes after *, so that these lines can be used to describe what the do filing is doing. Notice that each of these lines ends with a semicolon. If the

4

semicolon had not been included, then STATA would have included the next line as part of the text description. The third new feature in the file is the command set more off . This command eliminates the need to type a key whenever STATA fills the Results window, so that – more – will no longer be displayed in the Results window. Finally, the last new command is clear . This command erases any data already in STATA’s memory. It is a good idea to use the clear command before starting a new analysis. stata3.do Here are the commands in stata3.do # delimit ; *************************************************; * Administrative Commands; *************************************************; set more off; clear; log using \statafiles\stata3.log,replace; *************************************************; * Read in the Dataset; *************************************************; use \statafiles\caschool.dta; describe; *************************************************; * Transform data and Create New Variables; *************************************************; generate testscr_lo = testscr if (str=20); *************************************************; * Carry Out Statistical Analysis; *************************************************; * Compute statistics for test scores; summarize testscr; ttest testscr=0; ttest testscr_lo=0; ttest testscr_hi=0; ttest testscr_lo=testscr_hi, unequal unpaired; *************************************************; * Repeat the analysis using STR = 19; *************************************************; replace testscr_lo = testscr if (str=19); *************************************************; * Carry Out Statistical Analysis; *************************************************; * Compute statistics for test scores; ttest testscr_lo=testscr_hi, unequal unpaired; *************************************************; * End the Program ; *************************************************; log close; exit;

5

This file introduces three new features. First, it creates new variables using only a portion of the dataset. Two of the variables in the dataset are testscr (the average test score in a school district) and str (the district’s average class size or student-teacher ratio). The STATA command generate testscr_lo = testscr if (str