Reproducible graphics with R and ggplot2 - Google Groups

3 downloads 116 Views 2MB Size Report
May 17, 2012 - plot(d$wavelength, d$extinction, type = "l", lty = 1, xlab = "wavelength", ylab = "cross-section") lines(
Reproducible graphics with R and ggplot2 Baptiste Auguié Victoria University of Wellington

May 17, 2012

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

1 / 21

: A loyal guide in )

}

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

6 / 21

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength

d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength

d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength

d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength

d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength

d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength

d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength

d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

7 / 21

Histograms

0 10

20

40 Major

50

60

70

Density 0.10

Histogram of Minor

10

20

.

Baptiste Auguié (VUW)

30

0.00

d = read.table("../data/rods.txt", header = TRUE) par(mfrow = c(2, 1), # split mar = c(4, 4, 1.5, 0.5), mgp = c(2, 1, 0)) ## top histogram with(d, hist(Major, xlim = c(10, 70))) ## bottom histogram with(d, hist(Minor, prob=TRUE, xlim = c(10, 70))) ## add density estimate with(d, lines(density(Minor), col = "red"))

Frequency 400 800

Histogram of Major

Reproducible graphics with R and ggplot2

30

.

40 Minor .

50

.

60

.

May 17, 2012

70

.

8 / 21

A natural and coherent language to describe graphics Point-and-click

A grammar of graphics

Yeah but, no but, yeah but, no but, :::::::::::::::::::::::::::: yeah but… I swear *** ***** **** :::::::::::::::::::::::: … but yeah. ::::::::::

. Use data d [x, y, z, t, . . . ] 2. Plot lines of variable y vs x 1

. Colour lines following variable z 4. Split into multiple panels 3

according to variable t .5 Add a layer with new data 6. …

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

9 / 21

ggplot2:

A Grammar of Graphics Mapping: data ↔ aesthetic Layers Scales Coordinates + (Stats, …) library(ggplot2) ggplot(data = ..., mapping = ...) + layer(geom = "point", stat = "identity") + layer(geom = "point", data = ...) + facet_grid(... ˜ ...) + coord_polar( ) + scale_colour(...) + opts(...)

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

10 / 21

## Reshape to long format m = melt(d, meas=c("extinction", "scattering", "absorption")) head(m, 3)

variable 0.10

extinction scattering absorption

1 2 3

wavelength variable value 0.400 extinction 0.06616 0.401 extinction 0.06620 0.402 extinction 0.06623

ggplot(m, aes(x = wavelength, y = value, colour = variable)) + geom_path() + opts(legend.position = c(0.8, 0.8), legend.direction = "vertical")

value

0.08

0.06

0.04

0.02

0.00 0.4

0.6

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

0.8

1.0

1.2

wavelength

.

.

.

.

May 17, 2012

.

11 / 21

Data exploration

head(Orange, 10) Tree 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 2 9 2 10 2

age circumference 118 30 484 58 664 87 1004 115 1231 120 1372 142 1582 145 118 33 484 69 664 111

p = ggplot(Orange, aes(x = age, y = circumference, colour = Tree)) + geom_point(aes(shape = Tree )) + layer(geom = "line", stat = "smooth", method = "lm") + opts(legend.position = "top", legend.direction = "horizontal")

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

12 / 21

p

p + facet_grid(Tree ˜ .)

Tree



3

1

5

2

4

Tree

250

50

● ●

500

1000

1500

500

age

1000

1500

age

.

Baptiste Auguié (VUW)





4

0



4

2



circumference



100



2

5

circumference

● ●









5

1

150

1

3

200

250 200 150 100 50 0 250 200 150 100 50 0 250 200 150 100 50 0 250 200 150 100 50 0 250 200 150 100 50 0

3



Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

13 / 21

Small multiples revisited 0

0.001

0.01

0.15 1

0.10 0.05

0.15 1.33

σ µm2

0.00

0.10 0.05 0.00 0.15

1.5

0.10 0.05 0.00

0.4 0.6 0.8 1.0 1.20.4 0.6 0.8 1.0 1.20.4 0.6 0.8 1.0 1.2

Wavelength µm

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

.

May 17, 2012

.

14 / 21

Coordinate transformations

parallel

Dipole orientation 102

ggplot(data = d) + geom_path(aes(x = x, y = y)) + scale_y_log10() + coord_polar() + annotate(...) + ...

101 10

perpendicular

0 −45

45

Air side

0

10−1

Au film

10−2 −90

−45

90

45

Glass side 0

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

15 / 21

DRY principle p + theme_article(10,



200

250 3 1

150

●● ● ●

100 ● ● ●

5 2 4

0 500 10001500

age

circumference

circumference

p + theme_presentation(16)

Tree

250

50

'serif')

200 150



● ●

100





50

● ●

0 500 1000 1500

age

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

16 / 21

Take home message Reproducible research, reproducible graphics

. Saving time: repeat the analysis instantly with a new data set 2. Readable by others: you can share scripts 1

. Safer: anyone can follow the analysis by reading lines of code 4. Visual aspect: better aesthetic choices (think LATEX vs word processors) 3

.

Reproduce this!

. library(knitr); knit2pdf("presentation.rnw") .

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

17 / 21

Getting started –– resources

. Get started, download R and RStudio (IDE) 2. ?ggplot : R’s help system 1

. Documentation pages http://had.co.nz/ggplot2/, wiki 4. An introduction to R 3

. R and ggplot2 mailing lists; Stack Overflow 6. Books: R graphics (Murrell), ggplot2 (Wickham) 5

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

18 / 21

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

19 / 21

Automation ## open a pdf file pdf(file = "all_plots.pdf", width = 10, height = 6) par(mar = c(4, 4, 1.5, 0.5), mgp = c(2, 1, 0)) for (file in lf){ # loop over all data files d = read.table(paste0("../data/spectra/", file), header = TRUE) matplot(d$wavelength, d[ ,2:4], type = "l", lty = 1:3, ylim = c(0, 1.1*max(d[ , -1])), yaxs = "i", xlab = expression(wavelength/mu*m), ylab = expression(sigma/mu*m^2)) legend("topright", expression(sigma[ext], sigma[sca], sigma[abs]), lty = 1:3, bg = "grey95", bty = "o", box.col = NA, inset = 0.05) ## extract parameters from filename as plot title title(gsub("_|\\.txt", " ", file))

} ## close pdf file dev.off() .

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

20 / 21

model = function(p, x) p[3] / pi * p[2] / ((x - p[1])^2 + p[2]^2) + p[4] objective = function(p, d=NULL, x=d$wavenumber, y=d$intensity, ...){ predicted 450)) .

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

21 / 21