User Tools

Site Tools


wiki:online_tutorial

Introduction to R

The object of this document is to help you starting to use the R environment for statistical analysis and graphics.
You can read and follow the the text. At the same time you can copy the commands included, into the frames part of this document and paste them in an interactive R session.
Once you have familiarity with the general functioning of R and of R's objects you can further advance in learning R with online manuals and guides. There is a large variety of documentation available at


In addition to efficacious learning tools, we would recommend that the user experiments with commands by, for example, trying different options to those stated. This experimentation is an important part of learning R using this synthetic document.

Starting R, installing packages, getting help, stopping R

Start R

From a shell window type:

R

In the bash terminal the following text will appear:

R version 2.10.0 (2009-10-26)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale\\
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

the > sign and the following blinking cursor is advising you that you are in the R environment. Enter in administrative mode type sudo R and you will be able to install the packages

stefano@stefano-linux:~\$ sudo R

Install a new package

install.packages(pkgs=“PACKAGE NAME”, lib= “library directories where to install the packages”, dependencies=TRUE)

install.packages(pkgs="randomForest", lib= "/usr/lib/R/site-library", dependencies=TRUE)

to verify which packages you have installed and where the library directories are, type:

library()
class                   Functions for Classification
colorspace              Color Space Manipulation
gdata                   Various R programming tools for data
                        manipulation
gtools                  Various R programming tools
maptools                Tools for reading and handling spatial objects
MASS                    Main Package of Venables and Ripleys MASS
mda                     Mixture and flexible discriminant analysis
nnet                    Feed-forward Neural Networks and Multinomial
                        Log-Linear Models
randomForest            Breiman and Cutlers random forests for
                        classification and regression
rgdal                   Bindings for the Geospatial Data Abstraction
                        Library
sp                      classes and methods for spatial data
spatial                 Functions for Kriging and Point Pattern
                        Analysis
vcd                     Visualizing Categorical Data
xtable                  Export tables to LaTeX or HTML
Packages in library '/usr/lib/R/site-library':
base                    The R Base Package
boot                    Bootstrap R (S-Plus) Functions (Canty)
class                   Functions for Classification
grid                    The Grid Graphics Package
KernSmooth              Functions for kernel smoothing for Wand & Jones
                        (1995)
lattice                 Lattice Graphics
MASS                    Main Package of Venables and Ripleys MASS
Matrix                  Sparse and Dense Matrix Classes and Methods
methods                 Formal Methods and Classes
mgcv                    GAMs with GCV/AIC/REML smoothness estimation
                        and GAMMs by PQL
nlme                    Linear and Nonlinear Mixed Effects Models
nnet                    Feed-forward Neural Networks and Multinomial
                        Log-Linear Models
rpart                   Recursive Partitioning
spatial                 Functions for Kriging and Point Pattern
                        Analysis
splines                 Regression Spline Functions and Classes
stats                   The R Stats Package
stats4                  Statistical Functions using S4 Classes
 survival                Survival analysis, including penalised
                        likelihood.
tcltk                   Tcl/Tk Interface
tools                   Tools for Package Development
utils                   The R Utils Package
(END)

To exit the list of library mode type q

q

Getting help

R gives help on function and commands. Online help also provides useful information. Getting used to R help is key to successful statistical modelling. The online help can be accessed in HTML format by typing:

help.start()

A keyword search is possible using the Search Engine and Keywords link. You can also use the help() or ? functions. For example, if we want to know how to use the matrix() function, the following two commands are equivalents:

help(matrix)
? matrix

Once you are in the help section, you can use “enter” to scroll down the menu and q to go back to the R prompt.

The str(object.name) command is used to display the internal structure of an R object
The summary(object.name) command gives a summary of a a object, usually a statistical summary but it is generic meaning it has different operations for different classes of a

dir() show files in the current directory
ls.str() str() for each variable in the search path
getwd() is asking for the current working directory

Stop R

q()

When you quit, R will ask you if you want to save the workspace (that is, all of the variables you have defined in this session); say “no” in order to avoid clutter.

Should an R command seem to be stuck or take longer than you’re willing to wait, type Control-C.

Calling Linux shell scripting commands

system(“…”) is used to call any Linux scripting commands within the R environment

> system("pwd")

i

s equal to write getwd()

Inputs and outputs

Once you have opened an R session and eventually loaded the relevant library you can start exploring your data

Loading data

load(file.name) function, loads R type datasets written with the save function load(file.name)

Saving data

save(object.name.1, object.name.2, … ) function saves the specified object in XDR platform independent binary format

Reading tables

read.table() Reads a file in table format and creates a data frame from it, with cases corresponding to lines and variables to fields in the file. (Use the help menu for details). For instance the function read.csv() is set to read comma separated files. read.csv(file.name, header = TRUE, sep = “,”, quote=“\”“, dec=”.“, fill =TRUE, comment.char=”“, …)

landuse04=read.csv("~/ost4sem/studycase/Lab_scripts/inputs/2004_landuse.csv", header=TRUE, sep=",", dec=".", na.string=":")

Show variables and data currently existing in your workspace

the list function ls()

ls()
> [1] "landuse04"

Save and remove data or R objects

save(file, …) saves the specified objects (…) in the XDR platform-independent binary format

save(landuse04, file="~/ost4sem/studycase/Lab_scripts/outputs/landuse2004")

save.image(file) saves all objects

save(file="~/ost4sem/studycase/Lab_scripts/outputs/landuse2004_and_more")

rm(file, …) removes objects which you have created or data you have uploaded

rm(landuse04)

No objects are present in the memory now, use ls function to check it

ls()
character(0)

But because you saved the landuse2004 data you can reload it using the load() function and check its structrure using the str function

load("~/ost4sem/studycase/Lab_scripts/outputs/landuse2004")
str(landuse04)

as a result you will see the following information:

'data.frame':   72 obs. of  13 variables:                    
$ geographic.Unit                     : Factor w/ 72 levels "be10 Région de Bruxelles-Capitale/Brussels Hoofdstedelijk Gewest",..: 20 17 16 22 19 18 15 2 1 8 ...                                                                                                                                            
 $ landuse                             : logi  NA NA NA NA NA NA ...                                                                                   
 $ total.Total.area                    : num  NA NA NA NA NA ...                                                                                       
 $ agriarea.Utilized.agricultural.area : num  NA NA NA NA NA ...                                                                                       
 $ arabland.Arable.land                : num  NA NA NA NA NA ...                                                                                       
 $ forest.Wooded.area                  : num  NA NA NA NA NA ...
 $ garden.Private.gardens              : num  NA NA NA NA NA NA 0.2 0 0 0.1 ...
 $ grasland.Permanent.grassland        : num  NA NA NA NA NA ...
 $ greenfod.Green.fodder.on.arable.land: num  NA NA NA NA NA ...
 $ fallow.Fallow                       : num  NA NA NA NA NA NA 23.6 0 0 7.4 ...
 $ olivepl.Olive.plantations           : int  NA NA NA NA NA NA 0 0 0 0 ...
 $ permcrop.Permanent.crops            : num  NA NA NA NA NA NA 21.4 0 0 19.2 ...
 $ vineyard.Vineyards                  : num  NA NA NA NA NA NA 0 0 0 0 ...
>


Variables and calculations

R has an interactive calculations function. The command is executed and results are displayed.
R uses: +, -, /, and ^ for addition, subtraction, multiplication, division and exponentiation, respectively.

2+2 

R will execute and display

> [1] 4

the [1] at the beginning of the line is just R printing an index of element numbers. If you print a result that displays on multiple lines, R will put an index at the beginning of each line.

2*5
[1] 10
10/2
[1] 5
2^3
[1] 8

Variable settings

you can simply create a variable by typing: variable name = function, constant or calculation.
Example:

x =3*2

The results of 3*2 is not displayed. In fact, the x variable value is stored in the memory without printing it. To display the x value you can use:

print(x) 
[1] 6

or

x
[1] 6

Most users apply a similar syntax using the character string instead of the = character.

x <- 3
x
[1] 3

Also remember that R is case sensitive, print(X) or X is different from x. For instance:

a <- 3
a
[1] 3
A
Error: object 'A' not found

Variable names in R must begin with a letter, followed by alphanumeric characters.

3e = 2
Error: unexpected input in "3e "

In long names you can use ”.“ or “_” as in:

very.long.variable.name.X or verylongvariablenameY but you can’t use blank spaces in variable names. Avoid single letter names such us: c, l, q, t, C, D, F, I, and T, which are either built-in R functions or hard to tell apart.

very.long.variable.name.X3 = 3
very.long.variable.name.X3
[1] 3

Interactive calculations

Once you have defined the variables you can use them in interactive calculations :

b = 22 a = 23

a*b
[1] 24

And you can use variables in formulas :

c = 60 /(a+b)
c

[1] 6

typing a;b you can display a and b variables at the same time:

a;b
[1] 6
[1] 4

If you omit to close a parenthesis, R will display a + sign.

c = 60 /(a+b
+

In this case you can either close the parenthesis in the next line or type ctrl + c to go back to a new starting prompt.

Operators order

When using more complex formulas be aware of the importance of the order of operators. Parenthesis have priority on exponentiation, or powers, then comes multiplication and division, and finally addition and subtraction. The following command:

C = ((a + 2 * sqrt(b))/(a + 8 * sqrt(b)))/2
C
[1] 0.2272727

is different from:

C = a + 2 * sqrt(b) / a + 8 * sqrt(b) / 2
C
[1] 14.66667

as well as

100-40/2^4
[1] 97.5

is different from:

(100-40)/2^4 
[1] 3.75

and

  1. 2^4

[1] -16

is different from:

(-2)^4
[1] 16

Logical values

R can perform conditional tests and generate True or False values as results.

x = 60
x > 100
[1] FALSE
x >   3
[1] TRUE
x = 100
[1] TRUE

Logical values can be stored as variables:

x = 60
logical.value =  x >  3
logical.value
[1] TRUE

R objects

The entities that R operates on are technically known as objects. Examples are vectors of numeric (real) or complex values, vectors of logical values and vectors of character strings. These are known as “atomic” structures since their components are all of the same type, or mode (namely numeric, complex, logical, character and raw). R also operates on objects called lists, which are of mode list. These are ordered sequences of objects which individually can be of any mode. Lists are known as “recursive” rather than atomic structures since their components can themselves be lists in their own right.

The other recursive structures are those of mode function and expression. Functions are the objects that form part of the R system along with similar user written functions, which we discuss in some detail later. Expressions as objects form an advanced part of R which will not be discussed in this guide, except indirectly when we discuss formulae used with modeling in R.

By the mode of an object we mean the basic type of its fundamental constituents. This is a special case of a “property” of an object. Another property of every object is its length. The functions mode(object) and length(object) can be used to find out the mode and length of any defined structure 10.

Further properties of an object are usually provided by attributes(object) (see 'Getting and setting attributes'). Because of this, mode and length are also called “intrinsic attributes” of an object. For example, if z is a complex vector of length 100, then in an expression mode(z) is the character string “complex” and length(z) is 100.

Vectors

Vectors are combinations of scalars in a string structure. Vector values must all be of the same mode. Thus any given vector must be unambiguously either logical, numeric, complex, character or raw. (The only apparent exception to this rule is the special “value” listed as NA for quantities which are not available, but in fact there are several types of NA). Note that a vector can be empty and still have a mode. For example the empty character string vector is listed as character(0) and the empty numeric vector as numeric(0).

c(…) is the generic function to combine arguments with the default forming a vector; with RECURSIVE=TRUE descends through lists combining all elements into one vector.
To see details for the generic function c(…) and combine arguments forming a vector:

? c 

As an example we can create a simple vector of seven values by typing:

c(2,3,4,5,10,5,8)
[1]  2  3  4  5 10  5  8

We can generate a sequence using the syntax :

1:10
[1]  1  2  3  4  5  6  7  8  9 10

seq()
We can generate the same sequence of 1:10 command using the seq() function. The syntax will be :

 seq(1,10)
[1]  1  2  3  4  5  6  7  8  9 10

The seq() function “seq(from = number, to = number, by = number)” allows you to create a vector starting from one value to another by a defined increment:

seq(1,10, 0.25)
[1]  1.00  1.25  1.50  1.75  2.00  2.25  2.50  2.75  3.00  3.25  3.50  3.75
[13]  4.00  4.25  4.50  4.75  5.00  5.25  5.50  5.75  6.00  6.25  6.50  6.75
[25]  7.00 7.25  7.50  7.75  8.00  8.25  8.50  8.75  9.00  9.25  9.50  9.75
[37]  10.0
seq(from = 1, to = 10, by =  0.25)
[1]  1.00  1.25  1.50  1.75  2.00  2.25  2.50  2.75  3.00  3.25  3.50  3.75
[13]  4.00  4.25  4.50  4.75  5.00  5.25  5.50  5.75  6.00  6.25  6.50  6.75
[25]  7.00  7.25  7.50  7.75  8.00  8.25  8.50  8.75  9.00  9.25  9.50  9.75
[37] 10.00

rep() The replicate function “rep(x,times)” allows you to replicate a vector several times in a more complex vector. Calculations can also be included to form vectors, and functions can be combined in the same command:

one2tree = 1:3 
rep(one2tree,10) 

[1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3

c(10*0:10)
[1]   0  10  20  30  40  50  60  70  80  90 100
rep(c (5*40:1, 5*1:40, 5, 6,7,8, 3, 2001:2014), 2)
[1]  200  195  190  185  180  175  170  165  160  155  150  145  140  135  130
[16]  125  120  115  110  105  100   95   90   85   80   75   70   65   60   55
[31]   50   45   40   35   30   25   20   15   10    5    5   10   15   20   25
[46]   30   35   40   45   50   55   60   65   70   75   80   85   90   95  100
[61]  105  110  115  120  125  130  135  140  145  150  155  160  165  170  175
[76]  180  185  190  195  200    5    6    7    8    3 2001 2002 2003 2004 2005
[91] 2006 2007 2008 2009 2010 2011 2012 2013 2014  200  195  190  185  180  175
[106]  170  165  160  155  150  145  140  135  130  125  120  115  110  105  100
[121]   95   90   85   80   75   70   65   60   55   50   45   40   35   30   25
[136]   20   15   10    5    5   10   15   20   25   30   35   40   45   50   55
[151]   60   65   70   75   80   85   90   95  100  105  110  115  120  125  130
[166]  135  140  145  150  155  160  165  170  175  180  185  190  195  200    5
[181]    6    7    8    3 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
[196] 2012 2013 2014
rep(seq(1,3,0.5),3)
[1] 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0

Array

Matrix

Matrices, or more generally arrays, are multi-dimensional generalizations of vectors. In fact, they are vectors that can be indexed by two or more indices and will be printed in special ways. See Arrays and matrices.

  • Factors provide compact ways to handle categorical data. (See Factors).
  • Lists are a general form of vector in which the various elements need not be the same type, and are often themselves vectors or lists. Lists provide a convenient way to return the results of a statistical computation. (See Lists).


Matrix() function creates a matrix from the given set of values. We use the matrix(x, nrow=, ncol=) function to set the matrix cell values, the number of rows and the number of columns. We can use the colnames() and rownames() functions to set the column and row names of the matrix-like object.

matrix(data = NA, nrow = 2, ncol = 3) 
example.matrix = matrix(0,2,3)
example.matrix
     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0
example.matrix[1,]
[1] 0 0 0
example.matrix[,2]
[1] 0 0
example.matrix[1,] = 1:3
example.matrix[2,] = c(5,10,4)
example.matrix
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    5   10    4

matrix.head = c(“col a”,”col b“,”column c“)

matrix.side = c("first raw","second raw")
str(matrix.side)
chr [1:2] "first raw" "second raw"

When using ” “ we create and refer to a character type “chr” input

numeric.vector = c(rep(c (5*10:1, 5, 6), 2))
character.vector  = as.character(numeric.vector)
str(character.vector)
chr [1:24] "50" "45" "40" "35" "30" "25" "20" "15" "10" ...
colnames(example.matrix) = matrix.head
rownames(example.matrix) = matrix.side
example.matrix
           col a col b column c
first row      1     2        3
second row     5    10        4
str(example.matrix)
num [1:2, 1:3] 1 5 2 10 3 4
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2] "first row" "second row"
  ..$ : chr [1:3] "col a" "col b" "column c"

Data Frames

Data frames are matrix-like structures, in which the columns can be of different types. Think of data frames as `data matrices' with one row per observational unit but with (possibly) both numerical and categorical variables. Many experiments are best described by data frames: the treatments are categorical but the response is numeric.

Functions

Functions are themselves objects in R which can be stored in the project's workspace. This provides a simple and convenient way to extend R. (See 'Writing your own functions').

wiki/online_tutorial.txt · Last modified: 2017/12/05 22:53 (external edit)