User Tools

Site Tools


wiki:online_tutorial

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

wiki:online_tutorial [2017/12/05 22:53] (current)
Line 1: Line 1:
 +====== Introduction to R ======
 +
 +The object of this document is to help you starting to use the R environment for statistical analysis and graphics. \\
 +You can read and follow the the text.  At the same time you can copy the commands included, into the frames part of this document and paste them in an interactive R session. \\
 +Once you have familiarity with the general functioning of R and of R's objects you can further advance in learning R with online manuals and  guides. There is a large variety of documentation available at 
 +  *  www.r-project.org. ​
 +   * http://​www.statmethods.net/​index.html ​
 +   * http://​wiki.r-project.org/​rwiki/​doku.php ​
 +\\
 +In addition to efficacious learning tools, we would recommend that the user experiments with commands by, for example, trying different options to those stated. ​
 +This experimentation is an important part of learning R using this synthetic document.
 +
 +===== Starting R, installing packages, getting help, stopping R =====
 +==== Start R ====
 +From a shell window type: 
 +  R
 +In the bash terminal the following text will appear:
 +  R version 2.10.0 (2009-10-26)
 +  Copyright (C) 2009 The R Foundation for Statistical Computing
 +  ISBN 3-900051-07-0
 +  R is free software and comes with ABSOLUTELY NO WARRANTY.
 +  You are welcome to redistribute it under certain conditions.
 +  Type '​license()'​ or '​licence()'​ for distribution details.
 +  Natural language support but running in an English locale\\
 +  R is a collaborative project with many contributors.
 +  Type '​contributors()'​ for more information and
 +  '​citation()'​ on how to cite R or R packages in publications.
 +  Type '​demo()'​ for some demos, '​help()'​ for on-line help, or
 +  '​help.start()'​ for an HTML browser interface to help.
 +  Type '​q()'​ to quit R.
 +
 +the **>** sign and the following blinking cursor is advising you that you are in the R environment.
 +Enter in __administrative mode__ type **sudo R** and you will be able to install the packages
 +
 +  stefano@stefano-linux:​~\$ sudo R
 +
 +==== Install a new package ====
 +install.packages(pkgs="​PACKAGE NAME", lib= "​library directories where to install the packages",​ dependencies=TRUE)
 +
 +  install.packages(pkgs="​randomForest",​ lib= "/​usr/​lib/​R/​site-library",​ dependencies=TRUE)
 +
 +to verify which packages you have installed and where the library directories are, type:
 +  library()
 +  class                   ​Functions for Classification
 +  colorspace ​             Color Space Manipulation
 +  gdata                   ​Various R programming tools for data
 +                          manipulation
 +  gtools ​                 Various R programming tools
 +  maptools ​               Tools for reading and handling spatial objects
 +  MASS                    Main Package of Venables and Ripleys MASS
 +  mda                     ​Mixture and flexible discriminant analysis
 +  nnet                    Feed-forward Neural Networks and Multinomial
 +                          Log-Linear Models
 +  randomForest ​           Breiman and Cutlers random forests for
 +                          classification and regression
 +  rgdal                   ​Bindings for the Geospatial Data Abstraction
 +                          Library
 +  sp                      classes and methods for spatial data
 +  spatial ​                ​Functions for Kriging and Point Pattern
 +                          Analysis
 +  vcd                     ​Visualizing Categorical Data
 +  xtable ​                 Export tables to LaTeX or HTML
 +  Packages in library '/​usr/​lib/​R/​site-library':​
 +  base                    The R Base Package
 +  boot                    Bootstrap R (S-Plus) Functions (Canty)
 +  class                   ​Functions for Classification
 +  grid                    The Grid Graphics Package
 +  KernSmooth ​             Functions for kernel smoothing for Wand & Jones
 +                          (1995)
 +  lattice ​                ​Lattice Graphics
 +  MASS                    Main Package of Venables and Ripleys MASS
 +  Matrix ​                 Sparse and Dense Matrix Classes and Methods
 +  methods ​                ​Formal Methods and Classes
 +  mgcv                    GAMs with GCV/​AIC/​REML smoothness estimation
 +                          and GAMMs by PQL
 +  nlme                    Linear and Nonlinear Mixed Effects Models
 +  nnet                    Feed-forward Neural Networks and Multinomial
 +                          Log-Linear Models
 +  rpart                   ​Recursive Partitioning
 +  spatial ​                ​Functions for Kriging and Point Pattern
 +                          Analysis
 +  splines ​                ​Regression Spline Functions and Classes
 +  stats                   The R Stats Package
 +  stats4 ​                 Statistical Functions using S4 Classes
 +   ​survival ​               Survival analysis, including penalised
 +                          likelihood.
 +  tcltk                   ​Tcl/​Tk Interface
 +  tools                   Tools for Package Development
 +  utils                   The R Utils Package
 +  (END)
 +
 +To exit the list of library mode type q
 +  q
 +
 +==== Getting help ====
 +R gives help on function and commands. Online help also provides useful information. Getting used to R help is key to successful statistical modelling. The online help can be accessed in HTML format by typing:
 +
 +  help.start()
 +
 +A keyword search is possible using the Search Engine and Keywords link.
 +You can also use the help() or ? functions. For example, if we want to know how to use the matrix() function, the
 +following two commands are equivalents:​
 +
 +  help(matrix)
 +  ? matrix
 +
 +Once you are in the help section, you can use "​enter"​ to scroll down the menu and q to go back to the R prompt.\\
 +
 +The **str(object.name)** command is used to display the internal structure of an R object\\
 +The **summary(object.name)** command gives a //summary// of a **a** object, usually a statistical summary but it is //generic// meaning it has different operations for different classes of **a**\\
 +  ​
 +**dir()** show files in the current directory\\
 +**ls.str()** str() for each variable in the search path\\
 +**getwd()** is asking for the current working directory\\
 +
 +
 +==== Stop R ====
 +
 +  q()
 +
 +When you quit, R will ask you if you want to save the workspace (that is, all of the variables you have defined in this session); say “no” in order to avoid clutter.
 +
 +Should an R command seem to be stuck or take longer than you’re willing to wait, type **Control-C**.\\
 +
 +==== Calling Linux shell scripting commands ====
 +**system("​..."​)** is used to call any Linux scripting commands within the R environment
 +  ​
 +> system("​pwd"​)
 +is equal to write 
 +getwd()
 +
 +
 +===== Inputs and outputs ​ =====
 +Once you have opened an R session and eventually loaded the relevant library you can start exploring your data
 +
 +==== Loading data ====
 +**load(file.name)** function, loads R type datasets written with the save function ​
 + ​load(file.name) ​
 +
 +==== Saving data ====
 +**save(object.name.1,​ object.name.2,​ ... )** function saves the specified object in XDR platform independent binary format
 +
 +==== Reading tables ====
 +**read.table()** Reads a file in table format and creates a data frame from it, with cases corresponding to lines and variables to fields in the file. (Use the help menu for details). For instance the function read.csv() is set to read comma separated files. read.csv(file.name,​ header = TRUE, sep = ",",​ quote="​\"",​ dec="​.",​ fill =TRUE, ​ comment.char="",​ ...)
 +
 +  landuse04=read.csv("​~/​ost4sem/​studycase/​Lab_scripts/​inputs/​2004_landuse.csv",​ header=TRUE,​ sep=",",​ dec="​.",​ na.string=":"​)
 +
 +==== Show variables and data currently existing in your workspace ====
 +the list function ls() 
 +  ls()
 +  > [1] "​landuse04"​
 +
 +==== Save and remove data or R objects ====
 +**save(file,​ ...)** saves the specified objects (...) in the XDR platform-independent binary format
 +
 +  save(landuse04,​ file="​~/​ost4sem/​studycase/​Lab_scripts/​outputs/​landuse2004"​)
 +
 +**save.image(file)** saves all objects
 +  save(file="​~/​ost4sem/​studycase/​Lab_scripts/​outputs/​landuse2004_and_more"​)
 +
 +**rm(file, ...)** removes objects which you have created or data you have uploaded
 +
 +  rm(landuse04)
 +
 +No objects are present in the memory now, use ls function to check it
 +
 +  ls()
 +  character(0)
 +
 +But because you saved the landuse2004 data you can reload it using the **load()** function and check its structrure using the **str** function
 +
 +  load("​~/​ost4sem/​studycase/​Lab_scripts/​outputs/​landuse2004"​)
 +  str(landuse04)
 +
 +as a result you will see the following information:​
 +
 +  '​data.frame': ​  72 obs. of  13 variables: ​                   ​
 +  $ geographic.Unit ​                    : Factor w/ 72 levels "​be10 Région de Bruxelles-Capitale/​Brussels Hoofdstedelijk Gewest",​..:​ 20 17 16 22 19 18 15 2 1 8 ...                                                                                                                                            ​
 +   $ landuse ​                            : logi  NA NA NA NA NA NA ...                                                                                   
 +   $ total.Total.area ​                   : num  NA NA NA NA NA ...                                                                                       
 +   $ agriarea.Utilized.agricultural.area : num  NA NA NA NA NA ...                                                                                       
 +   $ arabland.Arable.land ​               : num  NA NA NA NA NA ...                                                                                       
 +   $ forest.Wooded.area ​                 : num  NA NA NA NA NA ...
 +   $ garden.Private.gardens ​             : num  NA NA NA NA NA NA 0.2 0 0 0.1 ...
 +   $ grasland.Permanent.grassland ​       : num  NA NA NA NA NA ...
 +   $ greenfod.Green.fodder.on.arable.land:​ num  NA NA NA NA NA ...
 +   $ fallow.Fallow ​                      : num  NA NA NA NA NA NA 23.6 0 0 7.4 ...
 +   $ olivepl.Olive.plantations ​          : int  NA NA NA NA NA NA 0 0 0 0 ...
 +   $ permcrop.Permanent.crops ​           : num  NA NA NA NA NA NA 21.4 0 0 19.2 ...
 +   $ vineyard.Vineyards ​                 : num  NA NA NA NA NA NA 0 0 0 0 ...
 +  >
 +
 +\\
 +===== Variables and calculations =====
 +R has an interactive calculations function. The command is executed and results are displayed.\\
 +R uses:  +, -, /, and ^  for addition, subtraction,​ multiplication,​ division and exponentiation,​ respectively.
 +
 +  2+2 
 +R will execute and display
 +  > [1] 4
 +
 +the [1] at the beginning of the line is just R printing an index of element numbers. If you print a result that displays on multiple lines, R will put an index at the beginning of each line.
 +  ​
 +  2*5
 +  [1] 10
 +
 +  10/2
 +  [1] 5
 +
 +  2^3
 +  [1] 8
 +
 +==== Variable settings ====
 +you can simply create a variable by typing:
 +**variable name = function, constant or calculation.** \\
 +Example:  ​
 +
 +  x =3*2
 +
 +The results of 3*2 is not displayed. In fact, the x variable value is stored in the memory without printing it. To display the x value you can use: 
 +  print(x) ​
 +  [1] 6
 +or 
 +  x
 +  [1] 6
 +
 +Most users apply a similar syntax using the **<-** character string instead of the = character.
 +  x <- 3
 +  x
 +  [1] 3
 +
 +# Also remember that R is case sensitive, print(X) or X is different from x. For instance:
 +
 +  a <- 3
 +  a
 +  [1] 3
 +  A
 +  Error: object '​A'​ not found
 +
 +Variable names in R must begin with a letter, followed by alphanumeric characters. ​
 +  3e = 2
 +  Error: unexpected input in "3e "
 +
 +In long names you can use "​."​ or "​_"​ as in: 
 +
 +very.long.variable.name.X or very_long_variable_name_Y but you can’t use blank spaces in variable names. ​
 +Avoid single letter names such us: c, l, q, t, C, D, F, I, and T, which are either built-in R functions or hard to tell apart.
 +
 +  very.long.variable.name.X3 = 3
 +  very.long.variable.name.X3
 +  [1] 3
 +
 +==== Interactive calculations ====
 +
 +Once you have defined the variables you can use them in interactive calculations :\\
 + 
 +  b = 2*2
 +  a = 2*3
 +  a*b
 +  [1] 24
 +
 +
 +And you can use variables in formulas :\\
 +  c = 60 /(a+b)
 +  c
 + [1] 6
 +
 +
 +typing a;b you can display a and b variables at the same time:\\
 +  a;b
 +  [1] 6
 +  [1] 4
 +
 + 
 + 
 +If you omit to close a parenthesis,​ R will display a *+* sign. 
 +  c = 60 /(a+b
 +  +
 + 
 +In this case you can either close the parenthesis in the next line or type ctrl + c to go back to a new starting prompt. ​
 +
 +==== Operators order ====
 +When using more complex formulas be aware of the importance of the order of operators. Parenthesis have priority on exponentiation,​ or powers, then comes multiplication and division, and finally addition and subtraction. The following command:\\
 +
 +  C = ((a + 2 * sqrt(b))/(a + 8 * sqrt(b)))/2
 +  C
 +  [1] 0.2272727
 +
 +is different from:\\
 +  C = a + 2 * sqrt(b) / a + 8 * sqrt(b) / 2
 +  C
 +  [1] 14.66667
 +
 +
 +as well as 
 +  100-40/2^4
 +  [1] 97.5
 +
 +is different from:\\
 +  (100-40)/​2^4 ​
 +  [1] 3.75
 +
 + 
 +and 
 +  -2^4
 +  [1] -16
 +
 +is different from: \\
 +  (-2)^4
 +  [1] 16
 +
 +==== Logical values ====
 +R can perform conditional tests and generate True or False values as results. \\
 +
 +  x = 60
 +  x > 100
 +  [1] FALSE
 +
 +  x >   3
 +  [1] TRUE
 +
 +  x = 100
 +  [1] TRUE
 +
 +Logical values can be stored as variables: ​
 +  x = 60
 +  logical.value =  x >  3
 +  logical.value
 +  [1] TRUE
 +
 +====== R objects ======
 +The entities that R operates on are technically known as objects. Examples are vectors of numeric (real) or complex values, vectors of logical values and vectors of character strings. These are known as “atomic” structures since their components are all of the same type, or mode (namely numeric, complex, logical, character and raw). 
 +R also operates on objects called lists, which are of mode list. These are ordered sequences of objects which individually can be of any mode. Lists are known as “recursive” rather than atomic structures since their components can themselves be lists in their own right.\\
 +
 +The other recursive structures are those of mode function and expression. Functions are the objects that form part of the R system along with similar user written functions, which we discuss in some detail later. Expressions as objects form an advanced part of R which will not be discussed in this guide, except indirectly when we discuss formulae used with modeling in R.\\
 +
 +By the mode of an object we mean the basic type of its fundamental constituents. This is a special case of a “property” of an object. Another property of every object is its length. The functions mode(object) and length(object) can be used to find out the mode and length of any defined structure 10.\\
 +
 +Further properties of an object are usually provided by attributes(object) (see '​Getting and setting attributes'​). Because of this, mode and length are also called “intrinsic attributes” of an object. For example, if z is a complex vector of length 100, then in an expression mode(z) is the character string "​complex"​ and length(z) is 100.\\
 +
 +
 +===== Vectors =====
 +Vectors are combinations of scalars in a string structure. Vector values must all be of the same mode. Thus any given vector must be unambiguously either logical, numeric, complex, character or raw. (The only apparent exception to this rule is the special “value” listed as NA for quantities which are not available, but in fact there are several types of NA). Note that a vector can be empty and still have a mode. For example the empty character string vector is listed as character(0) and the empty numeric vector as numeric(0). \\
 +
 +
 +**c(...)** is the generic function to combine arguments with the default forming a vector; with RECURSIVE=TRUE descends through lists combining all elements into one vector.\\
 +To see details for the generic function c(...) and combine arguments forming a vector: ​
 +  ? c 
 +
 +As an example we can create a simple vector of seven values by typing: \\
 +
 +  c(2,​3,​4,​5,​10,​5,​8)
 +  [1]  2  3  4  5 10  5  8
 +
 +We can generate a sequence using the syntax :\\
 +  1:10
 +  [1]  1  2  3  4  5  6  7  8  9 10
 +
 +
 +**seq()** \\
 +We can generate the same sequence of //1:10// command using the seq() function. The syntax will be :\\
 +   ​seq(1,​10)
 +  [1]  1  2  3  4  5  6  7  8  9 10
 +
 +The seq() function "​seq(from = number, to = number, by = number)"​ allows you to create a vector starting from one value to another by a defined increment:​\\
 +  seq(1,10, 0.25)
 +  [1]  1.00  1.25  1.50  1.75  2.00  2.25  2.50  2.75  3.00  3.25  3.50  3.75
 +  [13]  4.00  4.25  4.50  4.75  5.00  5.25  5.50  5.75  6.00  6.25  6.50  6.75
 +  [25]  7.00 7.25  7.50  7.75  8.00  8.25  8.50  8.75  9.00  9.25  9.50  9.75
 +  [37]  10.0
 +
 +  seq(from = 1, to = 10, by =  0.25)
 +  [1]  1.00  1.25  1.50  1.75  2.00  2.25  2.50  2.75  3.00  3.25  3.50  3.75
 +  [13]  4.00  4.25  4.50  4.75  5.00  5.25  5.50  5.75  6.00  6.25  6.50  6.75
 +  [25]  7.00  7.25  7.50  7.75  8.00  8.25  8.50  8.75  9.00  9.25  9.50  9.75
 +  [37] 10.00
 +
 +**rep()** ​ The replicate function ​ "​rep(x,​times)" ​ allows you to replicate a vector several times in a more complex vector. ​
 +Calculations can also be included to form vectors, and functions can be combined in the same command:\\
 +
 +  one2tree = 1:3 
 +  rep(one2tree,​10) ​
 + [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
 +
 +  c(10*0:10)
 +  [1]   ​0 ​ 10  20  30  40  50  60  70  80  90 100
 +
 +  rep(c (5*40:1, 5*1:40, 5, 6,7,8, 3, 2001:2014), 2)
 +  [1]  200  195  190  185  180  175  170  165  160  155  150  145  140  135  130
 +  [16]  125  120  115  110  105  100   ​95 ​  ​90 ​  ​85 ​  ​80 ​  ​75 ​  ​70 ​  ​65 ​  ​60 ​  55
 +  [31]   ​50 ​  ​45 ​  ​40 ​  ​35 ​  ​30 ​  ​25 ​  ​20 ​  ​15 ​  ​10 ​   5    5   ​10 ​  ​15 ​  ​20 ​  25
 +  [46]   ​30 ​  ​35 ​  ​40 ​  ​45 ​  ​50 ​  ​55 ​  ​60 ​  ​65 ​  ​70 ​  ​75 ​  ​80 ​  ​85 ​  ​90 ​  ​95 ​ 100
 +  [61]  105  110  115  120  125  130  135  140  145  150  155  160  165  170  175
 +  [76]  180  185  190  195  200    5    6    7    8    3 2001 2002 2003 2004 2005
 +  [91] 2006 2007 2008 2009 2010 2011 2012 2013 2014  200  195  190  185  180  175
 +  [106]  170  165  160  155  150  145  140  135  130  125  120  115  110  105  100
 +  [121]   ​95 ​  ​90 ​  ​85 ​  ​80 ​  ​75 ​  ​70 ​  ​65 ​  ​60 ​  ​55 ​  ​50 ​  ​45 ​  ​40 ​  ​35 ​  ​30 ​  25
 +  [136]   ​20 ​  ​15 ​  ​10 ​   5    5   ​10 ​  ​15 ​  ​20 ​  ​25 ​  ​30 ​  ​35 ​  ​40 ​  ​45 ​  ​50 ​  55
 +  [151]   ​60 ​  ​65 ​  ​70 ​  ​75 ​  ​80 ​  ​85 ​  ​90 ​  ​95 ​ 100  105  110  115  120  125  130
 +  [166]  135  140  145  150  155  160  165  170  175  180  185  190  195  200    5
 +  [181]    6    7    8    3 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
 +  [196] 2012 2013 2014
 +
 +  rep(seq(1,​3,​0.5),​3)
 +  [1] 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0
 +
 +
 +
 +===== Array =====
 +
 +===== Matrix =====
 +Matrices, or more generally arrays, are multi-dimensional generalizations of vectors. In fact, they are vectors that can be indexed by two or more indices and will be printed in special ways. See Arrays and matrices.\\
 +    * Factors provide compact ways to handle categorical data. (See Factors).
 +    * Lists are a general form of vector in which the various elements need not be the same type, and are often themselves vectors or lists. Lists provide a convenient way to return the results of a statistical computation. (See Lists).
 +    ​
 +Matrix() function creates a matrix from the given set of values. We use the matrix(x, nrow=, ncol=) function to set the matrix cell values, the number of rows and the number of columns. We can use the colnames() and rownames() functions to set the column and row names of the matrix-like object.
 +
 +  matrix(data = NA, nrow = 2, ncol = 3) 
 +  example.matrix = matrix(0,​2,​3)
 +  example.matrix
 +       [,1] [,2] [,3]
 +  [1,]    0    0    0
 +  [2,]    0    0    0
 +
 +  example.matrix[1,​]
 +  [1] 0 0 0
 +
 +  example.matrix[,​2]
 +  [1] 0 0
 +
 +  example.matrix[1,​] = 1:3
 +  example.matrix[2,​] = c(5,10,4)
 +  example.matrix
 +       [,1] [,2] [,3]
 +  [1,]    1    2    3
 +  [2,]    5   ​10 ​   4
 +
 + 
 +  matrix.head = c("col a","​col b","​column c")
 +  matrix.side = c("​first raw","​second raw")
 +  str(matrix.side)
 +  chr [1:2] "first raw" "​second raw"
 +
 +When using " " ​ we create and refer to a character type "​chr"​ input
 +
 +  numeric.vector = c(rep(c (5*10:1, 5, 6), 2))
 +  character.vector ​ = as.character(numeric.vector)
 +  str(character.vector)
 +  chr [1:24] "​50"​ "​45"​ "​40"​ "​35"​ "​30"​ "​25"​ "​20"​ "​15"​ "​10"​ ...
 +
 +
 +  colnames(example.matrix) = matrix.head
 +  rownames(example.matrix) = matrix.side
 +  example.matrix
 +             col a col b column c
 +  first row      1     ​2 ​       3
 +  second row     ​5 ​   10        4
 +
 +  str(example.matrix)
 +  num [1:2, 1:3] 1 5 2 10 3 4
 +   - attr(*, "​dimnames"​)=List of 2
 +    ..$ : chr [1:2] "first row" "​second row"
 +    ..$ : chr [1:3] "col a" "col b" "​column c"
 +
 +
 +
 +===== Data Frames =====
 +Data frames are matrix-like structures, in which the columns can be of different types. Think of data frames as `data matrices'​ with one row per observational unit but with (possibly) both numerical and categorical variables. Many experiments are best described by data frames: the treatments are categorical but the response is numeric. ​
 +
 +===== Functions =====
 +Functions are themselves objects in R which can be stored in the project'​s workspace. This provides a simple and convenient way to extend R. (See '​Writing your own functions'​). ​
  
wiki/online_tutorial.txt · Last modified: 2017/12/05 22:53 (external edit)