Predictive Modeling with R and the caret Package useR! - R Project

Recommend Documents

gbm in the gbm package ada in ada blackboost in mboost ... The prediction code is fairly standard predict(object, trials

Predictive Modeling with R and the caret Package useR! 2013 Outline

We'll walk through several types of resampling methods for training set samples. ... Determine k based on the highest cr

Species distribution modeling with R - R Project

Jan 8, 2017 - subject: http://biodiversityinformatics.amnh.org/index.php?section_ · id=111, the ... change), and a number of more advanced topics. This is a ...

Package 'farsi' - R project

Mar 27, 2013 ... Description Allow numbers to be presented in an persian language .... The function as.farsi is provided as a corresponding function to as.roman ...

Package 'coppeCosenzaR' - R Project

Carlos Cosenza [aut]. Maintainer Pier Taranti . Repository CRAN. Date/Publication 2017-07-10 13:54:25 UTC. R topics documented:.

Package 'strucchange' - R Project

Jun 6, 2015 - Title Testing, Monitoring, and Dating Structural Changes .... Evaluation of Programs, The Review of Economics and Statistics, 85(3), 550-558. ...... a numeric from interval (0,1) specifying the bandwidth. determines the size of.

Package 'orddom' - R Project

Feb 20, 2015 -

Package 'rtip' - R Project

Apr 12, 2018 - http://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:At-risk-of- ... a number between 0 and 1 which represents the percentage to be used to calcu- .... These datasets and the function to extract the variables are ...

Package 'spTDyn' - R Project

Jun 11, 2015 - The back-end code of this package is built under c language. Main functions ... This function defines the time series in the spatio-temporal data.

Package 'biomod2' - R Project

Mar 1, 2016 - BugReports . Description Functions for species distribution modeling, calibration and evaluation ...

Package 'NLP' - R project

Package 'NLP'. January 25, 2014. Version 0.1-1. Title Natural Language Processing Infrastructure. Description Basic classes and methods for Natural Language ...

Package 'hsdar' - R Project

Dec 9, 2016 - The hsdar package contains classes and functions to manage, analyse ... Data analysis: Supported methods to analyse vegetation spectra are ...

Package 'cosinor2' - R Project

Aug 21, 2017 - where values of K and g depend on the signs of Î² and Î³ and can be ... Statistical Methods for Estimating and Comparing Cosinor Parameters.

Package 'cvAUC' - R Project

Dec 9, 2014 - Title Cross-Validated Area Under the ROC Curve Confidence Intervals ... cvAUC, which report cross-validated AUC and compute confi-.

Package 'RMAWGEN' - R Project

Feb 11, 2017 - The R package RMAWGEN (R Multi-Sites Auto regressive ... RMAWGEN is free software: you can redistribute it and/or modify it under the terms ...

Package 'pROC' - R Project

Package 'pROC'. June 10, 2017. Type Package. Title Display and Analyze ROC Curves. Version 1.10.0. Date 2017-06-10. Encoding UTF-8. Depends R ...

Package 'inflection' - R Project

May 13, 2017 - R-project.org/package=inflection .... r=0.1;y=5+5*tanh(x-5)+rnorm(N,0,0.05);. # .... large data sets it is better using first EDE method, see ede.

Package 'sets' - R Project

Mar 7, 2017 - Software 31(2), 1â27. http://www.jstatsoft.org/v31/i02/ ... The package provides several fuzzy logic families. ..... fuzzy_rule(service %is% good,.

Package 'PoweR' - R Project

Jun 28, 2017 - Package 'PoweR' ..... statistics considered, whereas the rownames should all be set to the ..... For the 2nd Rayner-Best statistic see stat0056.

Package 'CDVineCopulaConditional' - R Project

Mar 2, 2017 - Package 'CDVineCopulaConditional' ... A function is available to select the best vine .... VineCopula: Statistical Inference of Vine Copulas.

Package 'orddom' - R Project

Feb 20, 2015 - Available online from the CRAN website http://cran.r-project.org/. .... A 1-column matrix with optional column name containing all nx values or.

Package 'rgsp' - R Project

Oct 26, 2018 - Type Package. Title Repetitive Group Sampling Plan Based on Cpk. Version 0.2.0. Author Muhammad Yaseen [aut, cre],. Muhammad Aslam ...

Package 'rDEA' - R Project

Description Data Envelopment Analysis for R, estimating robust DEA scores without and with envi- ronmental variables and doing returns-to-scale tests. Imports slam (>= 0.1-9), .... Constant is automatically included in Z. model a string for the ...

Package 'ionr' - R Project

Feb 11, 2016 - TRUE plots the result to a .tiff file in the working directory. Defaults to FALSE subset. Allows exluding certain indicators from the start.

Predictive Modeling with R and the caret Package useR! - R Project

Download PDF

1 downloads 167 Views 6MB Size Report

Comment

then it is important that the instrument and imaging software can correctly segment cells. Given a ...... package). The

Predictive Modeling with R and the caret Package

useR! 2013 Max Kuhn, Ph.D Pfizer Global R&D Groton, CT [email protected]

Outline

Conventions in R Data Splitting and Estimating Performance Data Pre-Processing Over–Fitting and Resampling Training and Tuning Tree Models Training and Tuning A Support Vector Machine Comparing Models Parallel Processing

Max Kuhn (Pfizer)

Predictive Modeling

2 / 126

Predictive Modeling Predictive modeling (aka machine learning)(aka pattern recognition)(. . .) aims to generate the most accurate estimates of some quantity or event. As these models are not generally meant to be descriptive and are usually not well–suited for inference. Good discussions of the contrast between predictive and descriptive/inferential models can be found in Shmueli (2010) and Breiman (2001) Frank Harrell’s Design package is very good for modern approaches to interpretable models, such as Cox’s proportional hazards model or ordinal logistic regression. Hastie et al (2009) is a good reference for theoretical descriptions of these models while Kuhn and Johnson (2013) focus on the practice of predictive modeling (and uses R). Max Kuhn (Pfizer)

Predictive Modeling

Modeling Conventions in R

3 / 126

The Formula Interface There are two main conventions for specifying models in R: the formula interface and the non–formula (or “matrix”) interface. For the former, the predictors are explicitly listed in an R formula that looks like: outcome ⇠ var1 + var2 + .... For example, the formula modelFunction(price ~ numBedrooms + numBaths + acres, data = housingData) would predict the closing price of a house using three quantitative characteristics.

Max Kuhn (Pfizer)

Predictive Modeling

5 / 126

The Formula Interface The shortcut y ⇠ . can be used to indicate that all of the columns in the data set (except y) should be used as a predictor. The formula interface has many conveniences. For example, transformations, such as log(acres) can be specified in–line. It also autmatically converts factor predictors into dummy variables (using a less than full rank encoding). For some R functions (e.g. klaR:::NaiveBayes, rpart:::rpart, C50:::C5.0, . . . ), predictors are kept as factors. Unfortunately, R does not efficiently store the information about the formula. Using this interface with data sets that contain a large number of predictors may unnecessarily slow the computations.

Max Kuhn (Pfizer)

Predictive Modeling

6 / 126

The Matrix or Non–Formula Interface The non–formula interface specifies the predictors for the model using a matrix or data frame (all the predictors in the object are used in the model). The outcome data are usually passed into the model as a vector object. For example: modelFunction(x = housePredictors, y = price) In this case, transformations of data or dummy variables must be created prior to being passed to the function. Note that not all R functions have both interfaces.

Max Kuhn (Pfizer)

Predictive Modeling

7 / 126

Building and Predicting Models

Modeling in R generally follows the same workflow: 1

Create the model using the basic function: fit data(segmentationData) > # get rid of the cell identifier > segmentationData$Cell > > > > >

training > >

trainX