Page 1 .... BaseR or a version of it is used at Facebook, Twitter, Foursquare,. Microsoft, Google, Ford, Bank .... (Tools â Global Options â General â Change..).
The R Programming Language for Statistical Computing and Graphics
Introduction 1 / 136
Introduction 2 / 136
R is the lingua franca of data science
1
R is • a programming environment • for statistical computing and graphics. • An open source platform, • working on most platforms (GNU/Linux, OS X, Windows) and • extendable via packages. • running all procedures in the workspace (RAM).
1 http://blog.revolutionanalytics.com/2013/11/the-rise-of-r-as-the-language-of-analytics.html Introduction 3 / 136
”We could have chosen to be commercial, and we would have sold five copies of the software.”
Ross Ihaka, R Developer New York Times, 9. January 2009
Introduction 4 / 136
A short history...
Introduction 5 / 136
1976 John Chambers, Ben Bolker and colleagues begin developing The System”at Bell Labs 1984 First version of S 1993 First public”appearance of R to S news”mailing list 1995 First release of R under GNU general license” 1996 The R-Paperby Ihaka & Gentleman (Google Scholar: 8070 citations) 1997 Established R ”core group” R version 0.49, April 1997 2000 R Version 1.0.0 2009 Article in The New York Times 2014 Most likely, R became the top statistics package used during the summer of this year (Muenchen cited in Nature 517, 109-110)
Introduction 6 / 136
Bell Labs/AT&T
Researchers working at Bell Labs are credited with the development of radio astronomy, the transistor, the laser, the charge-coupled device (CCD), information theory, the UNIX operating system, the C programming language, S programming language and the C++ programming language. Eight Nobel Prizes have been awarded for work completed at Bell Laboratories.
Source: https://en.wikipedia.org/wiki/Bell_Labs
Introduction 7 / 136
Ross Ihaka & Robert Gentleman, New Zealand We wanted a command driven interface and, since we were both very familiar with S, it seemed natural to use an S-like syntax. This decision, more than anything else, has driven the direction that R development has taken. [...] the adoption of the S syntax for our interpreter produced something which feltremarkably close to S. Having taking this first step we found ourselves adopting more and more features from S.
Ross Ihaka, 1998
Introduction 8 / 136
The name is partly based on the (first) names of the first two R authors (Robert Gentleman and Ross Ihaka), and partly a play on the name of the Bell Labs language ’S’.
Hornik (2015), The R FAQ” https://CRAN.R-project.org/doc/FAQ/R-FAQ.html
Introduction 9 / 136
The R Core Group
Doug Bates, John Chambers, Peter Dalgaard, Seth Falcon, Robert Gentleman, Kurt Hornik, Stefano Iacus, Ross Ihaka, Friedrich Leisch, Uwe Ligges, Thomas Lumley, Martin Maechler, Duncan Murdoch, Paul Murrell, Martyn Plummer, Brian Ripley, Deepayan Sarkar, Duncan Temple Lang, Luke Tierney, and Simon Urbanek.
Hornik (2015), The R FAQ” https://CRAN.R-project.org/doc/FAQ/R-FAQ.html
Introduction 10 / 136
The initial work [...] produced what looked like a potentially useful piece of software and we began preparing it for use in our teaching laboratory.
Ross Ihaka, 1998
Introduction 11 / 136
Academic success
• Teaching • R is free and works on many platforms • Methods development • Development, implimentation and application go hand-in-hand • Methods access • How easy is it to run a simple IRT model in SPSS? • Reproducible Research • Open Data • Review scripts, not just manuscripts
Introduction 12 / 136
Commercial success
Open Source: Can be task-tailored”(cf. PHP for Facebook)
BaseR or a version of it is used at Facebook, Twitter, Foursquare, Microsoft, Google, Ford, Bank of America, NASA, Beyer, Pfizer, Roche, The New York Times and other newspapers...
Introduction 13 / 136
Developer success
http://spectrum.ieee.org/computing/software/ the-2015-top-ten-programming-languages Institute of Electrical and Electronics Engineers
Introduction 14 / 136
Extendability: Packages
Functions can be combined to packages. Packages structure R code in a systematic manner. A package can be loaded into the workspace and the according funtions are ready for use. • Expand R’s base capabilities. • R gives common environment/framework, specialists implement specific solutiens. • Tailoredfunctions for routine tasks (A package does not need to be
published) • October 2015: 7395 packages on CRAN
Introduction 15 / 136
Comprehensive R Archive Network (CRAN):
• ”Master site”: https://cran.r-project.org/ • Host the R software, packages etc. for available platforms, both
pre-compiled and source code.
Introduction 16 / 136
Task Views Collections of packages for specific purposes (e.g. Natural Language Processing, Medical Imaging, or Psychometrics) Examples: • https://cran.r-project.org/web/views/
NaturalLanguageProcessing.html • https:
//cran.r-project.org/web/packages/qdap/index.html • https:
//cran.r-project.org/web/packages/RQDA/index.html • https:
//cran.r-project.org/web/views/MedicalImaging.html • https:
//cran.r-project.org/web/views/Psychometrics.html
Introduction 17 / 136
Packages
lavaan Structural Equation Models that mimic e.g. MPlus foreign Read SAS/SPSS/... data xlsx Read/write MS Excel dplyr Easier reshaping/aggregation of data ggplot2 Graphics & plots ggvis Interactive, web based graphics car Regression tools - incl. useful ’recode’ function CTT For Classical Test Theory (cf. also irtoys)
Introduction 18 / 136
What is a ’good’ package?
In general, R packages are routinely tested under Debian/Linux. Packages must stick to the ”CRAN Repository Policy” • Who are the authors? • Is it under continuing revision? • How old are the last versions? • How good is the documentation? • Are there resources - books, tutorials, etc. - using this software? • Is the package used in scientific articles?
Introduction 19 / 136
http://paulbutler.org/archives/visualizing-facebook-friends/
Introduction 20 / 136
Graphics
• https://gjabel.wordpress.com/2014/06/05/
world-cup-players-representation-by-league-system/ • http://motioninsocial.com/tufte/ • http://flowingdata.com/2014/10/23/
moving-past-default-charts/ • http://www.htmlwidgets.org/
Introduction 21 / 136
The competition...
R vs. Python
R vs. Matlab/STATA/SAS/SPSS/
....Julia?
Introduction 22 / 136
Where (for me) the effort of learning R paid of... • Routine reports: For individuals and insitutions • Research: Integration of various data sources (from cleaning to scale
analyses) • R works always and everywhere • You are often expected to speakR as naturally as English
• Everything is a script • Never touch your raw data (again) • Work with batch-scripts and GUIs • Example: OSCE reports (incl. ID-check and Generalizability ’Study’)
Introduction 23 / 136
What R does not
• It’s a pain to enter data manually in R
Rather use Excel and save as *.csv • If you WANT point-and-click, other options are better • Open Source means: Validation and reliability checks of software is
done by the community”. (Usually stressed by commercial companies)
• R is not ”indictable” • The PCs working-memory is the bottle-neck (Usually not relevant in ”ourscenarios)
Introduction 24 / 136
”We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”
Anne H. Milley, Director of technology product marketing at SAS New York Times, 9. January 2009
Introduction 25 / 136
Pro & Con
Pro R is from statisicians for statisticians
Con R is from statisicians for statisticians
Introduction 26 / 136
Further info and help
https://cran.r-project.org/faqs.html https://www.r-project.org/ http://stackoverflow.com/questions/tagged/r https://stat.ethz.ch/mailman/listinfo/r-help https://journal.r-project.org/ http://www.statmethods.net/ https://www.rstudio.com/resources/cheatsheets/
Introduction 27 / 136
Getting started...
Getting started... 28 / 136
CRAN: The Comprehensive R Archive Network • Search for ”cran”in your preferred search engine...
Getting started... 29 / 136
• Download for Linux/MacOS/Windows
Getting started... 30 / 136
• Select Install R for the first time
Getting started... 31 / 136
• Select Download R
Getting started... 32 / 136
Install R (without admin rights) 1
Crate new folder (e.g. c:/RTest/) & install to this directory
2
Do not install to Program Files”
3
Customise installation
Getting started... 33 / 136
Install R (without admin rights)
4
A matter of taste: Seperate windoes
Getting started... 34 / 136
Install R (without admin rights)
5
Plain text help is faster
Getting started... 35 / 136
Install R (without admin rights)
6
Start menu?
Getting started... 36 / 136
Install R (without admin rights)
7
Important: No Registry entries!
Getting started... 37 / 136
Install R (without admin rights)
8
Install...
Getting started... 38 / 136
Install R (without admin rights) 9
Find R as desktop icon or RGui.exe in either 32bit or 64bit folder:
Getting started... 39 / 136
If this doesn’t work... • If your (IT’s) security policy is too rigid, • you may need admin rights to even start the installer. • If so, you can run the steps described above at your private PC • The folder RTest”can then be copied (e.g. via a USB drive) to any
other PC. • If the security policy is still to rigid. Well...
If you work on Windows and your connection goestrough a proxy server, let R know via: #Important for instaling packages #Type into script/console setInternet2(use=TRUE)
Getting started... 40 / 136
R as a calculator
R as a calculator 41 / 136
# Addition and subtraction 1 + 1 - 5 + 3 ## [1] 0 # Multiplication 2 * 13 ## [1] 26 # Square root and exponentiation sqrt(3) ## [1] 1.732051 sqrt(3)^2 ## [1] 3 3^(1/2) ## [1] 1.732051
R as a calculator 42 / 136
# Fraction 3/4 ## [1] 0.75 # Brackets... 2/(4 + 4) * 2 ## [1] 0.5 # ...no brackets 2/4 + 4 * 2 ## [1] 8.5
R as a calculator 43 / 136
Integrated Development Environments (IDEs) (For more convenience...)
Integrated Development Environments (IDEs) 44 / 136
Some options...
• Stick to baseR (Mac has syntax highlighting) • Rstudio (All platforms - Very convienient, but slower than baseR) • Notepad ++ & npptoR (Win only but fast) • Eclipse/Vim/Emacs (Win/Mac/Linux) • .....
Integrated Development Environments (IDEs) 45 / 136
RStudio
Integrated Development Environments (IDEs) 46 / 136
Download at: https://www.rstudio.com/products/rstudio/download/ • Use installers, when possible • In institutional context: ZIP file (no installer - no admin)
Integrated Development Environments (IDEs) 47 / 136
You can select the R-Installation that R Studio works with by hand:
(Tools → Global Options → General → Change..)
Integrated Development Environments (IDEs) 48 / 136
Packages! Packages! Packages!
Packages 49 / 136
Packages and Repositories Packages extend R’s capabilities. They add functions or routines for specific tailored tasks.There are packages for almost every purpose such as psychometrics, machine learing, natural language processing, genetics, medical image processing, or sending emails. Some of the packages used in this work are: • foreign for handling data from e.g. SPSS • ggplot2 for plotting data • TAM for IRT purposes • lme4 for estimating variance componets used in Generalizability
Theory Packages can be retrieved from Repositories (i.e ’online archives’).
Packages 50 / 136
Most popular archives: • CRAN (Comprehensive R Archive Network) • RForge (Often ’Beta’ software) • Bioconductor (Bioinformatics)
Installing manually or via code: → Packages → Install Package(s) → Select Mirror → Select Package
# Example: CRAN packeges form RStudio mirror. install.packages("", repos = "https://cran.rstudio.com/") # Example: R-Forge install.packages("", repos = "http://R-Forge.R-project.org")
Packages 51 / 136
Data Representation in R
Data Representation in R 52 / 136
Principles
In R, everything is an object - (incl. paths, variable names, statistic procedures, operators...) R is vector-based Every object is stored and manipulated in the workspace You can mess up litterally everything - but only as long as the session lasts.
Data Representation in R 53 / 136
Objects are created with a left arrow (