Reproducible Research Using Stata

0 downloads 0 Views 464KB Size Report
Jul 11, 2005 - Reproducible Research Using Stata. reStructuredText. Examples. Reproducible .... Page 15 ... Two stata commands: stata2doc and s2d do-file.
Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Reproducible Research Using Stata L. Philip Schumm

Ronald A. Thisted

Department of Health Studies University of Chicago

July 11, 2005

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Common practice

data analysis

do-file

writing

cut & paste re-enter by hand

paper/ report

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Common practice

data analysis

do-file

I

writing

cut & paste re-enter by hand

Inefficient and time-consuming

paper/ report

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Common practice

data analysis

do-file

writing

cut & paste re-enter by hand

I

Inefficient and time-consuming

I

Can lead to non-reproducible results

paper/ report

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

A big improvement: Intermediary files

data analysis

writing

individual results do-file

graphs & tables

paper/ report

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

A big improvement: Intermediary files

data analysis

writing

individual results do-file

graphs & tables

I

Not all results automatically transferred

paper/ report

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

A big improvement: Intermediary files

data analysis

writing

individual results do-file

graphs & tables

I

Not all results automatically transferred

I

Can be difficult to manage

paper/ report

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

A big improvement: Intermediary files

data analysis

writing

individual results do-file

paper/ report

graphs & tables

I

Not all results automatically transferred

I

Can be difficult to manage

I

Data analysis and writing still asynchronous

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Reproducible research

What is reproducible research?

I

Emerging literature (e.g., Buckheit and Donoho, 1995; Gentleman and Lang, 2003)

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Examples

Reproducible research

What is reproducible research?

I

Emerging literature (e.g., Buckheit and Donoho, 1995; Gentleman and Lang, 2003)

I

Dynamic document composed of code chunks and text chunks

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Examples

Reproducible research

What is reproducible research?

I

Emerging literature (e.g., Buckheit and Donoho, 1995; Gentleman and Lang, 2003)

I

Dynamic document composed of code chunks and text chunks Literate programming (Knuth, 1992)

I

I I

tangling weaving

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Examples

Reproducible research

What is reproducible research?

I

Emerging literature (e.g., Buckheit and Donoho, 1995; Gentleman and Lang, 2003)

I

Dynamic document composed of code chunks and text chunks Literate programming (Knuth, 1992)

I

I I

I

tangling weaving

R package called Sweave (Leisch, 2002)

Managing Statistical Output

Reproducible Research Using Stata

Dynamic do-files

A “dynamic” do-file

data analysis

do-file

writing

stata2doc.py

reStructuredText

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Dynamic do-files

Comments, commands, and docstrings // Here is an example dynamic do-file. * here is the docstring for the first command sysuse auto * weightsq equals weight squared gen weightsq=weight^2 reg mpg weight weightsq foreign /* As you can see, commands don’t have to have docstrings. */

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

-stata2doc- and -s2d-

Two stata commands: stata2doc and s2d data analysis

writing

do-file

stata2doc.py

stata2doc.ado log file graphs scalars

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

-stata2doc- and -s2d-

Syntax

stata2doc using do-file, [dirname(dirname) linesize(#) as(type) replace override options] s2d [exp list, nodisplay table noisily warn name(name)] :

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Examples

-stata2doc- and -s2d-

Examples of -s2d- usage . s2d w2coef=_b[weightsq] rsq=e(r2): reg mpg weight weightsq foreign . scalar li s2d_rsq = s2d_w2coef =

.69129599 1.591e-06

. s2d two = (1 + 1), noi s2d_two =

2

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Final output

Putting it all together data analysis

writing

do-file

reST document

stata2rst.py

stata2doc.ado log file graphs scalars

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

What is reStructuredText?

I

A plaintext markup syntax and parser system

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

What is reStructuredText?

I

A plaintext markup syntax and parser system

I

Intuitive, readable, and easy-to-use

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

What is reStructuredText?

I

A plaintext markup syntax and parser system

I

Intuitive, readable, and easy-to-use

I

Powerful and extensible

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Examples

What is reStructuredText?

I

A plaintext markup syntax and parser system

I

Intuitive, readable, and easy-to-use

I

Powerful and extensible

I

via Docutils may be translated into a variety of formats (e.g., HTML, LATEX, PDF, Open Office) (see http://docutils.sourceforge.net for more information)

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Simple command

Simple command: do-file /* ---------------A Simple Example ---------------This is a *very* simple example in which I shall demonstrate the following: 1) 2) 3) 4)

a simple command graphs substitution tables

The Venerable Auto Data ----------------------Let’s start by reading them in: */ sysuse auto

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Simple command

Simple command: reStructuredText ---------------A Simple Example ---------------This is a *very* simple example in which I shall demonstrate the following: 1) 2) 3) 4)

a simple command graphs substitution tables

The Venerable Auto Data ----------------------Let’s start by reading them in: :: . sysuse auto (1978 Automobile Data)

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Simple command

Simple command: PDF via LATEX A Simple Example

This is a very simple example in which I shall demonstrate the following: 1) a simple command 2) graphs 3) substitution 4) tables

The Venerable Auto Data Let’s start by reading them in: . sysuse auto (1978 Automobile Data)

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Graphs

Graph: do-file * Now lets look at a boxplot comparing mpg between * domestic and foreign. * Boxplot comparing domestic and foreign. gr box mpg, over(foreign) name(fig1)

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Graphs

Graph: reStructuredText Now lets look at a boxplot comparing mpg between domestic and foreign. .. gr box mpg, over(foreign) name(fig1) .. figure:: fig1.pdf :scale: 33 Boxplot comparing domestic and foreign.

Examples

Managing Statistical Output

Reproducible Research Using Stata

Graphs

Graph: PDF via LATEX

10

20

Mileage (mpg) 30

40

Now lets look at a boxplot comparing mpg between domestic and foreign.

Domestic

Foreign

Figure 1: Boxplot comparing domestic and foreign.

reStructuredText

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Examples

Substitutions

Substitution: do-file * Using a t-test to compare mpg between foreign and domestic * cars yields a p-value of |s2d_ttp|. s2d ttp=(string(r(p),"%05.4f")): ttest mpg, by(foreign)

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Substitutions

Substitution: reStructuredText Using a t-test to compare mpg between foreign and domestic cars yields a p-value of |s2d_ttp|. .. s2d ttp=(string(r(p),"%05.4f")): ttest mpg, by(foreign) .. |s2d_ttp| replace:: 0.0005

Examples

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Examples

Substitutions

Substitution: PDF via LATEX

Using a t-test to compare mpg between foreign and domestic cars yields a p-value of 0.0005.

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Examples

Tables

Table: do-file * Finally, we’ll try regressing ‘‘mpg‘‘ on ‘‘weight‘‘, ‘‘weightsq‘‘, * and ‘‘foreign‘‘. * Regression of mpg on several covariates. s2d, t: reg mpg weight weightsq foreign

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Examples

Tables

Table: reStructuredText Finally, we’ll try regressing ‘‘mpg‘‘ on ‘‘weight‘‘, ‘‘weightsq‘‘, and ‘‘foreign‘‘. .. s2d, t: reg mpg weight weightsq foreign .. stata-table:: Regression of mpg on several covariates. -----------------------------------------------------------------------------mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------weight | -.0165729 .0039692 -4.18 0.000 -.0244892 -.0086567 weightsq | 1.59e-06 6.25e-07 2.55 0.013 3.45e-07 2.84e-06 foreign | -2.2035 1.059246 -2.08 0.041 -4.3161 -.0909002 _cons | 56.53884 6.197383 9.12 0.000 44.17855 68.89913 ------------------------------------------------------------------------------

Managing Statistical Output

Reproducible Research Using Stata

reStructuredText

Examples

Tables

Table: PDF via LATEX

Finally, we’ll try regressing mpg on weight, weightsq, and foreign. Table 1: Regression of mpg on several covariates. mpg weight weightsq foreign cons

Coef. -.0165729 1.59e-06 -2.2035 56.53884

Std. Err. .0039692 6.25e-07 1.059246 6.197383

t -4.18 2.55 -2.08 9.12

P>|t| 0.000 0.013 0.041 0.000

[95% Conf. Interval] -.0244892 3.45e-07 -4.3161 44.17855

-.0086567 2.84e-06 -.0909002 68.89913