User Tools

Site Tools


wiki:basicr

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

wiki:basicr [2018/05/10 15:25] (current)
Line 1: Line 1:
 +====== Introduction to R ======
 +
 +The object of this document is to help you starting to use the R environment for statistical analysis and graphics. \\
 +You can read and follow the the text. Meanwhile you can copy the commands included into the frames part of this document, and paste them into an interactive R session. \\
 +Once you are familiar with the general functioning of R and of R's objects you can further advance in learning R with online manuals and guides. There is a great variety of documentation available at: 
 +  *  www.r-project.org. ​
 +   * http://​www.statmethods.net/​index.html ​
 +   * http://​wiki.r-project.org/​rwiki/​doku.php ​
 +\\
 +As well efficacious learning tools we would recommend that the user experiment with commands by, for example, trying different options to those stated. ​
 +This experimentation is an important part of learning R using this synthetic document.\\
 +\\
 +
 +===== Starting R, getting help, stopping R =====
 +==== Start R ====
 +from a shell window type 
 +
 +  R
 +
 +In the bash terminal the following text will appear:
 +  R version 2.10.0 (2009-10-26)
 +  Copyright (C) 2009 The R Foundation for Statistical Computing
 +  ISBN 3-900051-07-0
 +  R is free software and comes with ABSOLUTELY NO WARRANTY.
 +  You are welcome to redistribute it under certain conditions.
 +  Type '​license()'​ or '​licence()'​ for distribution details.
 +  Natural language support but running in an English locale\\
 +  R is a collaborative project with many contributors.
 +  Type '​contributors()'​ for more information and
 +  '​citation()'​ on how to cite R or R packages in publications.
 +  Type '​demo()'​ for some demos, '​help()'​ for on-line help, or
 +  '​help.start()'​ for an HTML browser interface to help.
 +  Type '​q()'​ to quit R.
 +
 +
 +the **>** sign and the following blinking cursor is advising you are in the R environment.
 +If you like, enter in __administrative mode__ type **sudo R** and you will be able to install packages
 +
 +  stefano@stefano-linux:​~\$ sudo R
 +\\
 +\\
 +
 +<code bash| basic_r.r>​
 +# Getting help 
 +
 +# R provides help with function and commands. On-line help gives useful ​
 +# information as well. Getting used to R help is a key to successful ​
 +# statistical modelling. The online help can be accessed in HTML format by 
 +# typing:
 +
 +help.start()
 +
 +
 +# A keyword search is possible using the Search Engine and Keywords link.
 +# You can also use the help() or ? functions. For example, if we want to 
 +# know how to use the matrix() function, the following two commands are 
 +# equivalents:​
 +
 +help(matrix)
 +? matrix
 +
 +# The str(object.name) command is used to display the internal structure ​
 +# of an R object. The summary(object.name) command gives a summary of an 
 +# object, usually a statistical summary but it is generic meaning it has 
 +# different operations for different classes of object.
 +  ​
 +dir() #  show files in the current directory
 +ls.str() # str() for each variable in the search path
 +getwd() #  is asking for the current working directory
 +
 +# When you quit, R will ask you if you want to save the workspace ​
 +# (that is, all of the variables you have defined in this session). ​
 +# Say “no” in order to avoid clutter.
 +
 +# Should an R command seem to be stuck or take longer than you’re willing
 +# to wait, type Control-C.
 +
 +# Calling linux shell scripting commands ​
 +# system("​..."​) is used to call any linux scripting commands within the R
 +# environment.
 +  ​
 +system("​pwd"​)
 +
 +# is equivalent to:
 +
 +getwd()
 +
 +# Inputs and outputs
 +# Once you have opened an R session and eventually loaded the library you need,
 +# you can start exploring your data
 +
 +# Loading data
 +
 +# load(file.name) function, loads R datasets written with the save function ​
 +# load(file.name) ​
 +
 +# Saving data 
 +
 +# save(object.name.1,​ object.name.2,​ ... ) function save the specified object ​
 +# in XDR platform independent binary format
 +
 +# Reading tables
 +
 +# read.table("​filename"​)** Reads a file in table format and creates a data 
 +# frame from it, with cases corresponding to lines and variables to fields ​
 +# within the file. The default separator ​ sep="" ​ is any whitespace. ​
 +# You might need  sep="," ​ or  ";" ​ and so on. 
 +# Use  header=TRUE ​ to read the first line as a header of column names. ​
 +# The **as.is=TRUE** specification is used to prevent character vectors from
 +# being converted to factors. ​
 +
 +# The comment.char=""​ specification is used to prevent "#"​ from being 
 +# interpreted as a comment and use "​skip=n"​ to skip n lines before reading ​
 +# data. For more details:
 +
 +?read.table
 +
 +landuse04=read.csv("​~/​ost4sem/​exercise/​basic_adv_r/​inputs/​2004_landuse.csv", ​
 +                   ​header=TRUE,​ sep=",",​ dec="​.",​ na.string=":"​)
 +
 +# read.csv("​filename"​) ​ is set to read comma separated files. Example usage is:
 +# read.csv(file.name,​ header = TRUE, sep = ",",​ quote="​\"",​ dec="​.", ​
 +# fill =TRUE, ​ comment.char="",​ ...)
 +
 +# read.delim("​filename"​) is used for reading tab-delimited files
 +
 +# read.fwf() reads a table of fixed width formatted data into a ’data.frame’.
 +# Widths is an integer vector, giving the widths of the fixed-width fields
 +
 +# Show variables and data in your workspace ​
 +
 +# The list function ls() outputs a list of existing R objects
 +
 +ls()
 +
 +# The structure function str(object.name) informs you of the structure of a
 +# specific object the summary function summary(object.name) informs you of 
 +# basic statistics of a specific object.
 +
 +# Save and remove data or R objects ​
 +# save(file, ...)  saves the specified objects (...) in the XDR 
 +# platform-independent binary format
 +
 +save(landuse04,​ file="​~/​ost4sem/​exercise/​basic_adv_r/​outputs/​landuse2004"​)
 +
 +# save.image(file) saves all objects
 +
 +save(file="​~/​ost4sem/​exercise/​basic_adv_r/​outputs/​landuse2004_and_more"​)
 +
 +# rm(file, ...) removes the object you created or data you uploaded
 +
 +rm(landuse04)
 +
 +# No objects are present in memory now, use ls function to check it
 +
 +ls()
 +character(0)
 +
 +# But since you saved the landuse2004 data you can reload it using the load() ​
 +# function and check its structrure using the str() function
 +
 +load("​~/​ost4sem/​exercise/​basic_adv_r/​outputs/​landuse2004"​)
 +str(landuse04)
 +
 +#  Variables and calculations
 +
 +# R has an interactive calculations function. The command is executed and 
 +# results are displayed. R uses:  +, -, /, and ^  for addition, subtraction,​
 +# multiplication,​ division and exponentiation,​ respectively
 +
 +2+2 
 +
 +# The [1] at the beginning of the line is just R printing an index of element ​
 +# numbers. If you print a result that appears on multiple lines, R will put 
 +# an index at the beginning of each line.
 +
 +2*5
 +
 +10/2
 +
 +2^3
 +
 +
 +# Variable settings ​
 +
 +# You can simply create a variable by typing: variable name = function, ​
 +# constant or calculation.
 +  ​
 +
 +x =3*2
 +
 +# The results of 3*2 is not displayed. In fact, the x variable value is stored
 +# in the memory without printing it. To display the x value you can use: 
 +
 +print(x) ​
 +
 +# Or 
 +
 +x
 +
 +# Most users apply a similar syntax using the '<​-'​ character string instead ​
 +# of the = character.
 + 
 +x <- 3
 +x
 +
 +# Also remember that R is case sensitive, print(X) or X is different from x. 
 +# For instance:
 +
 +a <- 3
 +a
 +A
 +
 +# Variable names in R must begin with a letter, followed by alphanumeric ​
 +# characters. ​
 +
 +3e = 2
 +
 +# In long names you can use "​."​ or "​_"​ as in 
 +
 +# very.long.variable.name.X or very_long_variable_name_Y but you can’t use 
 +# blank spaces in variable names. Avoid single letter names such us: 
 +# c, l, q, t, C, D, F, I, and T, which are either built-in R functions or hard
 +# to tell apart.
 +
 +very.long.variable.name.X3 = 3
 +very.long.variable.name.X3
 +  ​
 +
 +# Interactive calculations
 +
 +# Once defined, ​ you can use variables in interactive calculations :
 + 
 +b = 2*2
 +a = 2*3
 +a*b
 +
 +# And you can use variables in formulas :
 +
 +c = 60 /(a+b)
 +c
 +
 +# typing a;b you can display a and b variables at the same time:
 + 
 +a;b
 +
 +# If you forget to close a parenthesis,​ R will display a *+* sign. 
 +
 +# c = 60 /(a+b
 +  ​
 +# In this case you can either close the parenthesis in the next line or type 
 +# ctrl + c to go back to a new starting prompt. ​
 +
 +# Order of operations
 +
 +# When using more complex formulas be aware of the importance of the order of 
 +# operators. Parenthesis have priority over exponentiation,​ or powers, then 
 +# comes multiplication and division, finally addition and subtraction. ​
 +
 +# The following command:
 + 
 +C = ((a + 2 * sqrt(b))/(a + 8 * sqrt(b)))/2
 +C
 +
 +# is different from:
 + 
 +C = a + 2 * sqrt(b) / a + 8 * sqrt(b) / 2
 +C
 +
 +# as well as 
 +
 +100-40/2^4
 +  ​
 +# is different from:
 +  ​
 +(100-40)/​2^4 ​
 +  ​
 +# and 
 +    ​
 +-2^4
 +     
 +# is different from: 
 +  ​
 +(-2)^4
 +
 +
 +# Logical values
 +  ​
 +# R can perform conditional tests and generate True or False values as results.
 +# The logical operators are  < ,  <= ,  > ,  >= ,  ==  for exact equality and
 +# != for inequality. ​
 +
 +x = 60
 +x > 100
 +
 +x == 70
 +
 +x >   3
 +
 +x = 100
 +  ​
 +# Logical values can be stored as variables: ​
 +  ​
 +x = 60
 +logical.value =  x >  3
 +logical.value
 +  ​
 +# In addition if c1 and c2 are logical expressions,​ then c1  &  c2 is their 
 +# intersection (“and”),​ c1  |  c2 is their union (“or”), and  !  !c1 is the 
 +# negation of c1. 
 +
 +
 +
 +# R objects
 +  ​
 +# The entities R operates on are technically known as  objects. ​
 +# Examples are "​vectors of numeric (real)"​ or "​complex values",​ "​vectors of 
 +# logical"​ values and "​vectors of character strings"​. ​
 +# These are known as  “atomic” ​ structures since their components are all of 
 +# the same type, or mode, namely numeric, complex, logical, character and raw.
 +# R also operates on objects called "​lists",​ which are of mode list. 
 +# These are ordered sequences of objects which individually can be of any mode. 
 +# Lists are known as  “recursive” ​ rather than atomic structures since their 
 +# components can themselves be lists in their own right.
 +
 +# The other recursive structures are those of mode function and  expression. ​
 +# Functions are the objects that form part of the R system along with similar ​
 +# user written functions, which we discuss in some detail later. Expressions ​
 +# as objects form an advanced part of R which will not be discussed in this 
 +# guide, except indirectly when we discuss formulae used with modeling in R.
 +
 +# By the "​mode"​ of an object we mean the basic type of its fundamental ​
 +# constituents. This is a special case of a  “property” ​ of an object. Another
 +# property of every object is its "​length."​ The functions mode(object) and 
 +# length(object) can be used 
 +# to find out the mode and length of any defined structure 10.
 +
 +# Further properties of an object are usually provided by attributes(object), ​
 +# (see '​Getting and setting attributes'​). Because of this, mode and length are 
 +# also called “intrinsic attributes” of an object. For example, if z is a 
 +# complex vector of length 100, then in an expression mode(z) is the character
 +# string "​complex"​ and length(z) is 100.
 +
 +
 +# Vectors
 +
 +# Vectors are combinations of scalars in a string structure. Vectors must have 
 +# all values of the same mode. Thus any given vector must be unambiguously ​
 +# either logical, numeric, complex, character or raw. (The only apparent ​
 +# exception to this rule is the special “value” listed as NA for quantities not
 +# available, but in fact there are several types of NA). Note that a vector can
 +# be empty and still have a mode. For example the empty character string vector
 +# is listed as character(0) and the empty numeric vector as numeric(0). ​
 +
 +
 +# c(...) is the generic function to combine arguments with the default forming ​
 +# a vector; with RECURSIVE=TRUE descends through lists combining all elements ​
 +# into one vector. To see details for the generic function c(...) and combine ​
 +# arguments forming a vector: ​
 +  ​
 +? c 
 +
 +# As an example we can create a simple vector of seven values typing: ​
 +
 +c(2, 3, 4, 5, 10, 5, 8)
 +
 +# We can generate a sequence using the syntax:
 +  ​
 +1:10
 +  ​
 +# We can generate the same sequence of  1:10  command using the seq() function. ​
 +# The syntax will be :
 +  ​
 +seq(1,10)
 +
 +# The seq() function "​seq(from = number, to = number, by = number)"​ allow to 
 +# create a vector starting from a value to another by a defined increment:
 + 
 +seq(1,10, 0.25)
 +
 +seq(from = 1, to = 10, by =  0.25)
 +  ​
 +# The replicate function ​ "​rep(x,​times)" ​ enables you to replicate a vector ​
 +# several times in a more complex vector. Calculations can be included to 
 +# form vectors as well and functions can be combined in the same command:
 +
 +one2three = 1:3 
 +rep(one2three,​10) ​
 + 
 +c(10*0:10)
 +
 +rep(c (5*40:1, 5*1:40, 5, 6,7,8, 3, 2001:2014), 2)
 +
 +rep(seq(1,​3,​0.5),​3)
 +
 +# Missing Values
 +
 +# In some cases the components of a vector or of an R object more in general, ​
 +# may not be completely known. When an element or value is “not available” ​
 +# or a “missing value” in the statistical sense, a place within a vector may
 +# be reserved for it by assigning it the special value NA. Any operation on 
 +# an NA becomes an NA. 
 +
 +# The function is.na(x) ​ gives a logical vector of the same size as x with 
 +# value TRUE if and only if the corresponding element in x is NA.
 +
 +z <- c(1:3,NA)
 +ind <- is.na(z)
 +ind
 +
 +# There is a second kind of “missing” values which are produced by numerical ​
 +# computation,​ the so-called Not a Number, ​ NaN , values. Examples are 0/0 
 +# or Inf - Inf which both give NaN since the result cannot be defined sensibly.
 +
 +Inf-Inf
 +0/0
 +
 +# In summary, is.na(xx) is TRUE both for NA and NaN values. To differentiate ​
 +# these, is.nan(xx) is only TRUE for NaNs. Missing values are sometimes printed
 +# as <NA> when character vectors are printed without quotes. ​
 +
 +z <- c(1:3,NA)
 +is.not.available <- is.na(z) ​
 +is.not.a.number <​-is.nan(z)
 +
 +is.not.a.number
 +is.not.available ​
 +
 +   
 +# Matrices
 +
 +# Matrices, or more generally arrays, are multi-dimensional generalizations of 
 +# vectors. In fact, they are vectors that can be indexed by two or more indices
 +# and will be printed in a special way. See Arrays and matrices.
 +# Factors provide compact ways to handle categorical data. See Factors.
 +# Lists are a general form of vector in which the various elements need not be 
 +# of the same type, and are often themselves vectors or lists. Lists provide a 
 +# convenient way to return the results of a statistical computation. See Lists.
 +    ​
 +# The matrix() function creates a matrix from the given set of values. We use 
 +# the matrix(x, nrow=, ncol=) function to set the matrix cell values, the 
 +# number of rows and the number of columns. We can use the colnames() and 
 +# rownames() functions to set the column and row names of the matrix-like ​
 +# object.
 +
 +matrix(data = NA, nrow = 2, ncol = 3) 
 +example.matrix = matrix(0,​2,​3)
 +example.matrix
 +
 +example.matrix[1,​]
 +
 +example.matrix[,​2]
 +
 +example.matrix[1,​] = 1:3
 +example.matrix[2,​] = c(5,10,4)
 +example.matrix
 +
 +matrix.head = c("col a","​col b","​column c")
 +matrix.side = c("​first raw","​second raw")
 +str(matrix.side)
 +
 +# When using " " ​ we create and refer to a character type "​chr"​ input
 +
 +numeric.vector = c(rep(c (5*10:1, 5, 6), 2))
 +character.vector ​ = as.character(numeric.vector)
 +str(character.vector)
 +  ​
 +colnames(example.matrix) = matrix.head
 +rownames(example.matrix) = matrix.side
 +example.matrix
 +
 +str(example.matrix)
 +
 +# Array
 +
 +# An array can be considered a multiple subscripted collection of data 
 +# entries, for example numeric. R allows simple facilities for creating ​
 +# and handling arrays, and in particular the special case of matrices. ​
 +
 +# As well as giving a vector structure a dim attribute, arrays can be 
 +# constructed from vectors by the array function, which has the form 
 +# array(data_vector,​ dim_vector)
 +
 +Z <- array(1:24, c(3,4,2))
 +Z
 +   
 +# Data Frames
 +
 +# Data frames are matrix-like structures, in which the columns can be 
 +# of different types. Think of data frames as  data matrices ​ with one row per 
 +# observational unit but with (possibly) both numerical and categorical ​
 +# variables. Many experiments are best described by data frames: the treatments
 +# are categorical but the response is numeric. ​
 +
 +# As a result R dataframes are tightly coupled collections of variables which 
 +# share many of the properties of matrices and of lists. Data frames are used 
 +# as the fundamental data structure by most of R's modeling software.
 +
 +# A data frame is a list with class "​data.frame"​. There are restrictions on 
 +# lists that may be made into data frames, namely :
 +
 +# The components must be vectors (numeric, character, or logical), factors, ​
 +# numeric matrices, lists, or other data frames.
 +# Matrices, lists, and data frames provide as many variables to the new data 
 +# frame as they have columns, elements, or variables, respectively.
 +# Numeric vectors, logicals and factors are included, and character vectors ​
 +# are coerced to be factors, whose levels are the unique values appearing in
 +# the vector.
 +# Vector structures appearing as variables of the data frame must all have the
 +# same length, and matrix structures must all have the same row size. See:
 + 
 +? data.frame
 +
 +# To construct a dataframe:
 +
 +my.data.frame = data.frame(v = 1:4, ch = c("​a",​ "​b",​ "​c",​ "​d"​),​ n = 10)
 +my.data.frame
 +    ​
 +# Or:
 +
 +my.data.frame = data.frame(vector = 1:4,
 +                           ​character = c("​a",​ "​b",​ "​c",​ "​d"​),​
 +                           ​const.vector = 10,
 +                           ​row.names =c("​data1",​ "​data2",​ "​data3",​ "​data4"​))
 +my.data.frame
 +
 +
 +# Data selection and manipulation
 +
 +# You can extract data from dataframes using the    [ [    ] ]  and  $  sign:
 +
 +my.data.frame[["​character"​]]
 +
 +my.data.frame[[2]]
 +
 +# Call the 3rd value of the character vector:
 +
 +my.data.frame[[2]][3]
 +  ​
 +# Or using the $ syntax:
 +  ​
 +my.data.frame$vector
 +  ​
 +my.data.frame$character[2:​3]
 +
 +# You can add single arguments to a data frame, query information,​ select and 
 +# manipulate arguments or single values from a dataframe ​
 +  ​
 +my.data.frame$new
 +
 +my.data.frame$new = c(10,​11,​20,​40)
 +my.data.frame
 +
 +# length(object.name) returns the number of elements in an object such as 
 +# matrix vector or dataframes:
 +  ​
 +length(my.data.frame$new) ​
 +
 +# which(object.name) and which.max(object.name) return the index of a specific
 +# or of the greatest element of an object
 +    ​
 +which.max(my.data.frame$new) ​
 +
 +which(my.data.frame$new == 20) 
 +
 +# max(object.name) returns the value of the greatest element
 +  ​
 +max(my.data.frame$new) ​
 +
 +# sort(object.name) sort from small to big 
 +  ​
 +sort(my.data.frame$new) ​
 +
 +# rev(object.name) sorts from big to small
 +  ​
 +rev(sort(my.data.frame$new)) ​
 +
 +# subset(object.name,​ ...) returns a selection of an R-object with respect to 
 +# criteria (typically comparisons:​ x$V1 < 10). If the R-object is a data frame,
 +# the option select gives the variables to be kept or dropped using a minus 
 +# sign
 +  ​
 +subset(my.data.frame,​ my.data.frame$new == 20)
 +
 +# Sample() allows sampling from a set of values.
 +
 +sample(my.data.frame$new,​ 3)
 +sample(my.data.frame$new,​ 3)
 +sample(my.data.frame$new,​ 3)
 +
 +# More examples
 +
 +# The following R commands give an example of the simple procedure of importing ​
 +# data, cleaning a table by extracting relevant information,​ checking the 
 +# presence of missing data.
 +  ​
 +landuse04=read.csv("​~/​ost4sem/​studycase/​Lab_scripts/​inputs/​2004_landuse.csv", ​
 +                   ​header=TRUE,​ sep=",",​ dec="​.",​ na.string=":"​)
 +
 +forests04 = subset(landuse04,​ landuse04$forest.Wooded.area >= 0 )
 +forests04$landuse = NULL
 +forests04.check=na.fail(forests04)
 +forests04$total.Total.area[1] = NA
 +forests04.check=na.fail(forests04)
 +
 +# The last line above will throw an error.
 +# We can resolve the situation from the beginning with no NA
 +  ​
 +forests04 = subset(landuse04,​ landuse04$forest.Wooded.area >=0 )
 +forests04$landuse = NULL
 +forests04.check=na.fail(forests04)
 +str(forests04)
 +
 +# Do you see something strange? Look at theforests04$geographic.Unit level of
 +# factors and the dataframe number of variables!
 +
 +# Let's fix it now!
 +  ​
 +library(gdata)
 +forests04 = drop.levels(forests04)
 +str(forests04)
 +
 +
 +# Functions
 +
 +# Functions are themselves objects in R which can be stored in the project'​s ​
 +# workspace. This provides a simple and convenient way to extend R.  ​
 +# Usage: in writing your own function you provide one or more arguments or 
 +# names for the function, an expression (or body of the function) and a value 
 +# is produced equal to the output function result.
 +
 +# function(arglist) expr   ​function definition  ​
 +# return(value) ​
 +
 +# Example
 +
 +myfunction <- function(x) x^5
 +myfunction(3)
 +
 +body(myfunction) <- quote(5^x)
 +
 +## or equivalently ​ body(myfunction) <- expression(5^x)
 +
 +myfunction(3) ​
 +
 +body(myfunction)
 +
 +myfunction
 +
 +</​code>​
  
wiki/basicr.txt · Last modified: 2018/05/10 15:25 (external edit)