Jul 11, 2005 - Reproducible Research Using Stata. reStructuredText. Examples. Reproducible .... Page 15 ... Two stata commands: stata2doc and s2d do-file.
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Reproducible Research Using Stata L. Philip Schumm
Ronald A. Thisted
Department of Health Studies University of Chicago
July 11, 2005
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Common practice
data analysis
do-file
writing
cut & paste re-enter by hand
paper/ report
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Common practice
data analysis
do-file
I
writing
cut & paste re-enter by hand
Inefficient and time-consuming
paper/ report
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Common practice
data analysis
do-file
writing
cut & paste re-enter by hand
I
Inefficient and time-consuming
I
Can lead to non-reproducible results
paper/ report
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
A big improvement: Intermediary files
data analysis
writing
individual results do-file
graphs & tables
paper/ report
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
A big improvement: Intermediary files
data analysis
writing
individual results do-file
graphs & tables
I
Not all results automatically transferred
paper/ report
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
A big improvement: Intermediary files
data analysis
writing
individual results do-file
graphs & tables
I
Not all results automatically transferred
I
Can be difficult to manage
paper/ report
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
A big improvement: Intermediary files
data analysis
writing
individual results do-file
paper/ report
graphs & tables
I
Not all results automatically transferred
I
Can be difficult to manage
I
Data analysis and writing still asynchronous
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Reproducible research
What is reproducible research?
I
Emerging literature (e.g., Buckheit and Donoho, 1995; Gentleman and Lang, 2003)
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Examples
Reproducible research
What is reproducible research?
I
Emerging literature (e.g., Buckheit and Donoho, 1995; Gentleman and Lang, 2003)
I
Dynamic document composed of code chunks and text chunks
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Examples
Reproducible research
What is reproducible research?
I
Emerging literature (e.g., Buckheit and Donoho, 1995; Gentleman and Lang, 2003)
I
Dynamic document composed of code chunks and text chunks Literate programming (Knuth, 1992)
I
I I
tangling weaving
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Examples
Reproducible research
What is reproducible research?
I
Emerging literature (e.g., Buckheit and Donoho, 1995; Gentleman and Lang, 2003)
I
Dynamic document composed of code chunks and text chunks Literate programming (Knuth, 1992)
I
I I
I
tangling weaving
R package called Sweave (Leisch, 2002)
Managing Statistical Output
Reproducible Research Using Stata
Dynamic do-files
A “dynamic” do-file
data analysis
do-file
writing
stata2doc.py
reStructuredText
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Dynamic do-files
Comments, commands, and docstrings // Here is an example dynamic do-file. * here is the docstring for the first command sysuse auto * weightsq equals weight squared gen weightsq=weight^2 reg mpg weight weightsq foreign /* As you can see, commands don’t have to have docstrings. */
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
-stata2doc- and -s2d-
Two stata commands: stata2doc and s2d data analysis
writing
do-file
stata2doc.py
stata2doc.ado log file graphs scalars
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
-stata2doc- and -s2d-
Syntax
stata2doc using do-file, [dirname(dirname) linesize(#) as(type) replace override options] s2d [exp list, nodisplay table noisily warn name(name)] :
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Examples
-stata2doc- and -s2d-
Examples of -s2d- usage . s2d w2coef=_b[weightsq] rsq=e(r2): reg mpg weight weightsq foreign . scalar li s2d_rsq = s2d_w2coef =
.69129599 1.591e-06
. s2d two = (1 + 1), noi s2d_two =
2
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Final output
Putting it all together data analysis
writing
do-file
reST document
stata2rst.py
stata2doc.ado log file graphs scalars
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
What is reStructuredText?
I
A plaintext markup syntax and parser system
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
What is reStructuredText?
I
A plaintext markup syntax and parser system
I
Intuitive, readable, and easy-to-use
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
What is reStructuredText?
I
A plaintext markup syntax and parser system
I
Intuitive, readable, and easy-to-use
I
Powerful and extensible
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Examples
What is reStructuredText?
I
A plaintext markup syntax and parser system
I
Intuitive, readable, and easy-to-use
I
Powerful and extensible
I
via Docutils may be translated into a variety of formats (e.g., HTML, LATEX, PDF, Open Office) (see http://docutils.sourceforge.net for more information)
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Simple command
Simple command: do-file /* ---------------A Simple Example ---------------This is a *very* simple example in which I shall demonstrate the following: 1) 2) 3) 4)
a simple command graphs substitution tables
The Venerable Auto Data ----------------------Let’s start by reading them in: */ sysuse auto
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Simple command
Simple command: reStructuredText ---------------A Simple Example ---------------This is a *very* simple example in which I shall demonstrate the following: 1) 2) 3) 4)
a simple command graphs substitution tables
The Venerable Auto Data ----------------------Let’s start by reading them in: :: . sysuse auto (1978 Automobile Data)
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Simple command
Simple command: PDF via LATEX A Simple Example
This is a very simple example in which I shall demonstrate the following: 1) a simple command 2) graphs 3) substitution 4) tables
The Venerable Auto Data Let’s start by reading them in: . sysuse auto (1978 Automobile Data)
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Graphs
Graph: do-file * Now lets look at a boxplot comparing mpg between * domestic and foreign. * Boxplot comparing domestic and foreign. gr box mpg, over(foreign) name(fig1)
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Graphs
Graph: reStructuredText Now lets look at a boxplot comparing mpg between domestic and foreign. .. gr box mpg, over(foreign) name(fig1) .. figure:: fig1.pdf :scale: 33 Boxplot comparing domestic and foreign.
Examples
Managing Statistical Output
Reproducible Research Using Stata
Graphs
Graph: PDF via LATEX
10
20
Mileage (mpg) 30
40
Now lets look at a boxplot comparing mpg between domestic and foreign.
Domestic
Foreign
Figure 1: Boxplot comparing domestic and foreign.
reStructuredText
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Examples
Substitutions
Substitution: do-file * Using a t-test to compare mpg between foreign and domestic * cars yields a p-value of |s2d_ttp|. s2d ttp=(string(r(p),"%05.4f")): ttest mpg, by(foreign)
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Substitutions
Substitution: reStructuredText Using a t-test to compare mpg between foreign and domestic cars yields a p-value of |s2d_ttp|. .. s2d ttp=(string(r(p),"%05.4f")): ttest mpg, by(foreign) .. |s2d_ttp| replace:: 0.0005
Examples
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Examples
Substitutions
Substitution: PDF via LATEX
Using a t-test to compare mpg between foreign and domestic cars yields a p-value of 0.0005.
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Examples
Tables
Table: do-file * Finally, we’ll try regressing ‘‘mpg‘‘ on ‘‘weight‘‘, ‘‘weightsq‘‘, * and ‘‘foreign‘‘. * Regression of mpg on several covariates. s2d, t: reg mpg weight weightsq foreign
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Examples
Tables
Table: reStructuredText Finally, we’ll try regressing ‘‘mpg‘‘ on ‘‘weight‘‘, ‘‘weightsq‘‘, and ‘‘foreign‘‘. .. s2d, t: reg mpg weight weightsq foreign .. stata-table:: Regression of mpg on several covariates. -----------------------------------------------------------------------------mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------weight | -.0165729 .0039692 -4.18 0.000 -.0244892 -.0086567 weightsq | 1.59e-06 6.25e-07 2.55 0.013 3.45e-07 2.84e-06 foreign | -2.2035 1.059246 -2.08 0.041 -4.3161 -.0909002 _cons | 56.53884 6.197383 9.12 0.000 44.17855 68.89913 ------------------------------------------------------------------------------
Managing Statistical Output
Reproducible Research Using Stata
reStructuredText
Examples
Tables
Table: PDF via LATEX
Finally, we’ll try regressing mpg on weight, weightsq, and foreign. Table 1: Regression of mpg on several covariates. mpg weight weightsq foreign cons
Coef. -.0165729 1.59e-06 -2.2035 56.53884
Std. Err. .0039692 6.25e-07 1.059246 6.197383
t -4.18 2.55 -2.08 9.12
P>|t| 0.000 0.013 0.041 0.000
[95% Conf. Interval] -.0244892 3.45e-07 -4.3161 44.17855
-.0086567 2.84e-06 -.0909002 68.89913