Using R for Creating Predictive Models - The RP Group

3 downloads 140 Views 775KB Size Report
R is an open source statistical package developed and maintained by the user community. ..... o rpart() this is identify
Using R for Creating Predictive Models Multiple Measures Assessment Project Phase II Models

Getting Started With R R is an open source statistical package developed and maintained by the user community. It is free to use and has many basic and advanced methods available including Classification and Regression Trees (CART), which were primary factors leading to the decision to use R for this project. R consumes more memory than other statistical software and computers with at least 8G RAM are recommended.

Download R Main project page: https://cran.r-project.org/ Download link for Windows users: https://cran.r-project.org/bin/windows/base/ Download link for Mac users: https://cran.r-project.org/bin/macosx/

Download RStudio (optional) R uses a command line interface. While not necessary, some users like to use an overlay GUI interface to work with R. The MMAP team uses RStudio and this tutorial will reference that interface: https://www.rstudio.com/products/rstudio/download/ Other common interfaces include: http://www.rcommander.com/ http://mran.revolutionanalytics.com/download/

Read the download notes to install the correct version of software for your system (i.e. Windows, Mac, or Linux, 32 or 64 bit operating system). Before installing any software, it is advisable to scan the downloaded package for viruses and malware using your anti-virus software and other software such as Malwarebytes to ensure the installation package has not been spoofed.

1

Navigating RStudio The RStudio interface has four panes (see image below): Source Pane = R code is written, saved, and run from this pane. Console Pane = Commands and output are shown in this pane. R commands can also be written directly in this pane and it has all the functionality of the basic R terminal. Workspace Pane = Shows ) The tilde (~) points to your home directory set previously. Example: write.table(table1,"~/MathTables/MathFirstCCRank.txt",sep="\t") Two way table of frequencies: table(MMAPMath$hs_last_course_rank,MMAPMath$cc_first_level_rank) Bar plot based on table: First the table will be loaded into the R workspace as an object called “table 1” that can be called in subsequent commands: table1 Save as Image and an options window will appear. Select the file type (MMAP recommends png), set the directory where the image should be saved, other desired options, and click “Save”.

11

f.

Interpreting the Tree

The tree depicted above includes all students in the California Community College (CCC) System who had four full years of high school ,control=ctrl0015)

View Output printcp(fit.m0.Statistics.DM.gp) print(fit.m0.Statistics.DM.gp) rsq.rpart(fit.m0.Statistics.DM.gp) prp(fit.m0.Statistics.DM.gp,main="Statistics with Grade Points, DM",extra=100,varlen=0,left=FALSE)

Testing multiple models with caret CART models were used for the decision rule sets but in Phase 1, predictions were compared against linear regression, support vector machines, and gradient basis models using the caret package. This package allows you to keep the same predictive equation and easily change the algorithm to readily test a variety of different analytical approaches with a minimum of coding. Example code of using caret for MMAP is shown in the Phase 1 R Scripts document. Resources for the caret package are show below. caret packages: install.packages(“caret”) #core package for caret install.packages (“e1071”) #additional packages that fix errors that arise with some models Tutorials on caret: http://www.edii.uclm.es/~useR-2013/Tutorials/kuhn/user_caret_2up.pdf https://www.youtube.com/watch?v=7Jbb2ItbTC4 List of the methods in the caret package: http://topepo.github.io/caret/modelList.html caret training site: http://topepo.github.io/caret/training.html Additional information on training parameters: http://www.inside-r.org/packages/cran/caret/docs/train

14

Correlation Matrix Create a list of variables to use for correlation matrix. Note we are using up through 12th grade ) # type can be pearson or spearman One of several packages to visually display a correlation matrix is corrram. install.packages(“corrgram”) library(corrgram) corrgram(m0.Statistics.subset, order=FALSE, lower.panel=panel.shade, upper.panel=panel.pie, text.panel=panel.txt, main="High School Achievement to College Statistics Grades Correlations")

15

Logistic Regression Create a formula. Note we are again using up through 12th grade data with a delay variable. formulaNDM

Suggest Documents