Powerpoint Presentation: An Introduction to R - IASRI

7 downloads 123 Views 134KB Size Report
McGraw-Hill, ISBN 88-386-6084-0 (in Italian), http://www.ateneonline.it/ LibroAteneo.asp?item_id=1436. ○. P Murrell (2005) “R Graphics” Chapman & Hall/CRC ...
An Introduction to R A. Dhandapani

Email: [email protected]

Outline        

What is R? Installation Take a Look D t Manipulation Data M i l ti Some Statistical Analysis Web Interface Pros and Cons Further Details

What is R?     

R is a statistical computing environment R is an open source software i.e. free Initial developers: Ross Ihaka & Robert Gentleman Developed by scores of volunteers Complete source code is also available

What is R? (Continued)  





Available at www.r-project.org R is closely based on S and S-Plus S Plus language Ideally suited for statistical computations such as carrying out simulation studies Ability to write functions

Installation Download latest R Release Go to http://cran.r-project.org/mirrors.html http://cran r-project org/mirrors html Select any mirror site Navigate to bin/windows/base o oad R-2.6.0-win32.exe 60 3 e e (~30 ( 30 MB)) Download For packages, go to bin/windows/contrib do nload packages (in zip download ip format) fo mat) and install them using “install packages from l local l zip i fil files”” iin R menu

Take a look

Take a Look

Data Manipulation Interaction is through Command line Commands are typed at >  To enter Multiple lines, lines use +  ENTER sends the command to interpreter Eg. >4+4 [1]8 

Data types Basic storage: >p p #Print contents of p [1]2 #C t t off p #Content strName strName [1] “IASRI” NOTE: R is cAsE Sensitive; strName & strname are different

Vectors Easiest way to assign a vector is using the function “c” c Eg. >j j [1] 1 1 1 1 >j r r [1]1 1 1 1 1 >strvec t strvec [1]”first” “second” “third” To g get help p on any y function,, type yp >help(seq)

Function c Function c can be used in many ways >a< c(1:3 2:1) >aa [1] 1 2 3 2 1 “:” does the trick Function c is abbreviated form of concatenate ((cat in UNIX))

Matrices Matrices can be created many ways in R >A= matrix(c(1,4,3,2,1,2),nrow=3,ncol=2, +byrow=FALSE) [,1] [,2] [1 ] 1 2 [1,] [2,] 4 1 [3,] 3 2

Matrix Another way (easier?) to create it >H2=rbind(c(1,1),c(1,-1)) b d(c( , ),c( , )) > H2 #print Had2 [,1]] [,2] [, [, ] [1,] 1 1 [[2,] ,] 1 -1 rbind – row bind cbind – column bind

Matrices Multiplication >A= matrix(c(1,4,3,2,1,2),nrow=3,ncol=2, +byrow=FALSE) by o S ) >B=matrix(c(1,0,0,1),nrow=2,ncol=2) >C= A %*% B >C [,1]] [,2] [, [, ] [1,] 1 2 [[2,]] 4 1 [3,] 3 2

Some more matrices Run

matrix.R

sink(filename) sends the output directly to file and sink() to resume normal output Kronecker Product by %x% operator Inverse is obtained by solving the equation Ax = B, equation, B where B is an identity matrix.

factor R behaves differently when you make a vector ecto as factor. acto  Eg. >x< x [1] 1 1 1 2 2 2 3 3 3 > summary(x) 

Mi 1 Min. 1stt Q Qu. Median M di

1

1

2

Mean M 3rd 3 d Qu. Q

2

3

Max. M

3

factor (contd) >xfactor xfactor [1] 1 1 1 2 2 2 3 3 3 Levels: 1 2 3 > summary(xfactor) 123 ? Frequencies 333

factor Factor can be applied to other variable using tapply. tapply  Eg Run factor.R 

Data type - Lists Ordered collection of objects  Think list as a specialized vector in which components are of different type  Eg. > Lst Lst[1] $name [1] "Fred" >Lst$wife [1]”Mary” [1] Mary >Lst$child.ages[1] [ ] [1]4

First element of the list

Access by Name

Yet another way

Data type –time series Data can be given extra information > month_exp month exp = +c(1200,2100,2000,4000,2000,2140) >tsmonth_exp = ts(month_exp,start= ( , ), q y ) + c(2002,9),frequency=12) >tsmonth_exp 

2002 2003

J Jan Feb F b Mar M Apr A May M Jun J Jul J l Aug A Sep S O Oct Nov N D Dec 1200 2100 2000 4000 2000 2140

Data type - frames  



Most commonly used data type Data is arranged g in rectangular, g , with columns identify variables Eg.

> desig = c("Principal Scientist","Senior

Scientist","Scientist-SS","Scientist") > basicpay = c(16400,12000,10000,8000) > salary_structure = data.frame(designation=desig,basic=basicpay)

Data type - frames > salary_structure designation

basic

1 Principal Scientist

16400

2

12000

Senior Scientist

3

Scientist-SS

4

Scientist

10000 8000

Reading from Files Data can be read easily from files >egframe