wiki:basicr

This shows you the differences between two versions of the page.

— |
wiki:basicr [2018/05/10 15:25] (current) |
||
---|---|---|---|

Line 1: | Line 1: | ||

+ | ====== Introduction to R ====== | ||

+ | |||

+ | The object of this document is to help you starting to use the R environment for statistical analysis and graphics. \\ | ||

+ | You can read and follow the the text. Meanwhile you can copy the commands included into the frames part of this document, and paste them into an interactive R session. \\ | ||

+ | Once you are familiar with the general functioning of R and of R's objects you can further advance in learning R with online manuals and guides. There is a great variety of documentation available at: | ||

+ | * www.r-project.org. | ||

+ | * http://www.statmethods.net/index.html | ||

+ | * http://wiki.r-project.org/rwiki/doku.php | ||

+ | \\ | ||

+ | As well efficacious learning tools we would recommend that the user experiment with commands by, for example, trying different options to those stated. | ||

+ | This experimentation is an important part of learning R using this synthetic document.\\ | ||

+ | \\ | ||

+ | |||

+ | ===== Starting R, getting help, stopping R ===== | ||

+ | ==== Start R ==== | ||

+ | from a shell window type | ||

+ | |||

+ | R | ||

+ | |||

+ | In the bash terminal the following text will appear: | ||

+ | R version 2.10.0 (2009-10-26) | ||

+ | Copyright (C) 2009 The R Foundation for Statistical Computing | ||

+ | ISBN 3-900051-07-0 | ||

+ | R is free software and comes with ABSOLUTELY NO WARRANTY. | ||

+ | You are welcome to redistribute it under certain conditions. | ||

+ | Type 'license()' or 'licence()' for distribution details. | ||

+ | Natural language support but running in an English locale\\ | ||

+ | R is a collaborative project with many contributors. | ||

+ | Type 'contributors()' for more information and | ||

+ | 'citation()' on how to cite R or R packages in publications. | ||

+ | Type 'demo()' for some demos, 'help()' for on-line help, or | ||

+ | 'help.start()' for an HTML browser interface to help. | ||

+ | Type 'q()' to quit R. | ||

+ | |||

+ | |||

+ | the **>** sign and the following blinking cursor is advising you are in the R environment. | ||

+ | If you like, enter in __administrative mode__ type **sudo R** and you will be able to install packages | ||

+ | |||

+ | stefano@stefano-linux:~\$ sudo R | ||

+ | \\ | ||

+ | \\ | ||

+ | |||

+ | <code bash| basic_r.r> | ||

+ | # Getting help | ||

+ | |||

+ | # R provides help with function and commands. On-line help gives useful | ||

+ | # information as well. Getting used to R help is a key to successful | ||

+ | # statistical modelling. The online help can be accessed in HTML format by | ||

+ | # typing: | ||

+ | |||

+ | help.start() | ||

+ | |||

+ | |||

+ | # A keyword search is possible using the Search Engine and Keywords link. | ||

+ | # You can also use the help() or ? functions. For example, if we want to | ||

+ | # know how to use the matrix() function, the following two commands are | ||

+ | # equivalents: | ||

+ | |||

+ | help(matrix) | ||

+ | ? matrix | ||

+ | |||

+ | # The str(object.name) command is used to display the internal structure | ||

+ | # of an R object. The summary(object.name) command gives a summary of an | ||

+ | # object, usually a statistical summary but it is generic meaning it has | ||

+ | # different operations for different classes of object. | ||

+ | | ||

+ | dir() # show files in the current directory | ||

+ | ls.str() # str() for each variable in the search path | ||

+ | getwd() # is asking for the current working directory | ||

+ | |||

+ | # When you quit, R will ask you if you want to save the workspace | ||

+ | # (that is, all of the variables you have defined in this session). | ||

+ | # Say “no” in order to avoid clutter. | ||

+ | |||

+ | # Should an R command seem to be stuck or take longer than you’re willing | ||

+ | # to wait, type Control-C. | ||

+ | |||

+ | # Calling linux shell scripting commands | ||

+ | # system("...") is used to call any linux scripting commands within the R | ||

+ | # environment. | ||

+ | | ||

+ | system("pwd") | ||

+ | |||

+ | # is equivalent to: | ||

+ | |||

+ | getwd() | ||

+ | |||

+ | # Inputs and outputs | ||

+ | # Once you have opened an R session and eventually loaded the library you need, | ||

+ | # you can start exploring your data | ||

+ | |||

+ | # Loading data | ||

+ | |||

+ | # load(file.name) function, loads R datasets written with the save function | ||

+ | # load(file.name) | ||

+ | |||

+ | # Saving data | ||

+ | |||

+ | # save(object.name.1, object.name.2, ... ) function save the specified object | ||

+ | # in XDR platform independent binary format | ||

+ | |||

+ | # Reading tables | ||

+ | |||

+ | # read.table("filename")** Reads a file in table format and creates a data | ||

+ | # frame from it, with cases corresponding to lines and variables to fields | ||

+ | # within the file. The default separator sep="" is any whitespace. | ||

+ | # You might need sep="," or ";" and so on. | ||

+ | # Use header=TRUE to read the first line as a header of column names. | ||

+ | # The **as.is=TRUE** specification is used to prevent character vectors from | ||

+ | # being converted to factors. | ||

+ | |||

+ | # The comment.char="" specification is used to prevent "#" from being | ||

+ | # interpreted as a comment and use "skip=n" to skip n lines before reading | ||

+ | # data. For more details: | ||

+ | |||

+ | ?read.table | ||

+ | |||

+ | landuse04=read.csv("~/ost4sem/exercise/basic_adv_r/inputs/2004_landuse.csv", | ||

+ | header=TRUE, sep=",", dec=".", na.string=":") | ||

+ | |||

+ | # read.csv("filename") is set to read comma separated files. Example usage is: | ||

+ | # read.csv(file.name, header = TRUE, sep = ",", quote="\"", dec=".", | ||

+ | # fill =TRUE, comment.char="", ...) | ||

+ | |||

+ | # read.delim("filename") is used for reading tab-delimited files | ||

+ | |||

+ | # read.fwf() reads a table of fixed width formatted data into a ’data.frame’. | ||

+ | # Widths is an integer vector, giving the widths of the fixed-width fields | ||

+ | |||

+ | # Show variables and data in your workspace | ||

+ | |||

+ | # The list function ls() outputs a list of existing R objects | ||

+ | |||

+ | ls() | ||

+ | |||

+ | # The structure function str(object.name) informs you of the structure of a | ||

+ | # specific object the summary function summary(object.name) informs you of | ||

+ | # basic statistics of a specific object. | ||

+ | |||

+ | # Save and remove data or R objects | ||

+ | # save(file, ...) saves the specified objects (...) in the XDR | ||

+ | # platform-independent binary format | ||

+ | |||

+ | save(landuse04, file="~/ost4sem/exercise/basic_adv_r/outputs/landuse2004") | ||

+ | |||

+ | # save.image(file) saves all objects | ||

+ | |||

+ | save(file="~/ost4sem/exercise/basic_adv_r/outputs/landuse2004_and_more") | ||

+ | |||

+ | # rm(file, ...) removes the object you created or data you uploaded | ||

+ | |||

+ | rm(landuse04) | ||

+ | |||

+ | # No objects are present in memory now, use ls function to check it | ||

+ | |||

+ | ls() | ||

+ | character(0) | ||

+ | |||

+ | # But since you saved the landuse2004 data you can reload it using the load() | ||

+ | # function and check its structrure using the str() function | ||

+ | |||

+ | load("~/ost4sem/exercise/basic_adv_r/outputs/landuse2004") | ||

+ | str(landuse04) | ||

+ | |||

+ | # Variables and calculations | ||

+ | |||

+ | # R has an interactive calculations function. The command is executed and | ||

+ | # results are displayed. R uses: +, -, /, and ^ for addition, subtraction, | ||

+ | # multiplication, division and exponentiation, respectively | ||

+ | |||

+ | 2+2 | ||

+ | |||

+ | # The [1] at the beginning of the line is just R printing an index of element | ||

+ | # numbers. If you print a result that appears on multiple lines, R will put | ||

+ | # an index at the beginning of each line. | ||

+ | |||

+ | 2*5 | ||

+ | |||

+ | 10/2 | ||

+ | |||

+ | 2^3 | ||

+ | |||

+ | |||

+ | # Variable settings | ||

+ | |||

+ | # You can simply create a variable by typing: variable name = function, | ||

+ | # constant or calculation. | ||

+ | | ||

+ | |||

+ | x =3*2 | ||

+ | |||

+ | # The results of 3*2 is not displayed. In fact, the x variable value is stored | ||

+ | # in the memory without printing it. To display the x value you can use: | ||

+ | |||

+ | print(x) | ||

+ | |||

+ | # Or | ||

+ | |||

+ | x | ||

+ | |||

+ | # Most users apply a similar syntax using the '<-' character string instead | ||

+ | # of the = character. | ||

+ | |||

+ | x <- 3 | ||

+ | x | ||

+ | |||

+ | # Also remember that R is case sensitive, print(X) or X is different from x. | ||

+ | # For instance: | ||

+ | |||

+ | a <- 3 | ||

+ | a | ||

+ | A | ||

+ | |||

+ | # Variable names in R must begin with a letter, followed by alphanumeric | ||

+ | # characters. | ||

+ | |||

+ | 3e = 2 | ||

+ | |||

+ | # In long names you can use "." or "_" as in | ||

+ | |||

+ | # very.long.variable.name.X or very_long_variable_name_Y but you can’t use | ||

+ | # blank spaces in variable names. Avoid single letter names such us: | ||

+ | # c, l, q, t, C, D, F, I, and T, which are either built-in R functions or hard | ||

+ | # to tell apart. | ||

+ | |||

+ | very.long.variable.name.X3 = 3 | ||

+ | very.long.variable.name.X3 | ||

+ | | ||

+ | |||

+ | # Interactive calculations | ||

+ | |||

+ | # Once defined, you can use variables in interactive calculations : | ||

+ | |||

+ | b = 2*2 | ||

+ | a = 2*3 | ||

+ | a*b | ||

+ | |||

+ | # And you can use variables in formulas : | ||

+ | |||

+ | c = 60 /(a+b) | ||

+ | c | ||

+ | |||

+ | # typing a;b you can display a and b variables at the same time: | ||

+ | |||

+ | a;b | ||

+ | |||

+ | # If you forget to close a parenthesis, R will display a *+* sign. | ||

+ | |||

+ | # c = 60 /(a+b | ||

+ | | ||

+ | # In this case you can either close the parenthesis in the next line or type | ||

+ | # ctrl + c to go back to a new starting prompt. | ||

+ | |||

+ | # Order of operations | ||

+ | |||

+ | # When using more complex formulas be aware of the importance of the order of | ||

+ | # operators. Parenthesis have priority over exponentiation, or powers, then | ||

+ | # comes multiplication and division, finally addition and subtraction. | ||

+ | |||

+ | # The following command: | ||

+ | |||

+ | C = ((a + 2 * sqrt(b))/(a + 8 * sqrt(b)))/2 | ||

+ | C | ||

+ | |||

+ | # is different from: | ||

+ | |||

+ | C = a + 2 * sqrt(b) / a + 8 * sqrt(b) / 2 | ||

+ | C | ||

+ | |||

+ | # as well as | ||

+ | |||

+ | 100-40/2^4 | ||

+ | | ||

+ | # is different from: | ||

+ | | ||

+ | (100-40)/2^4 | ||

+ | | ||

+ | # and | ||

+ | | ||

+ | -2^4 | ||

+ | |||

+ | # is different from: | ||

+ | | ||

+ | (-2)^4 | ||

+ | |||

+ | |||

+ | # Logical values | ||

+ | | ||

+ | # R can perform conditional tests and generate True or False values as results. | ||

+ | # The logical operators are < , <= , > , >= , == for exact equality and | ||

+ | # != for inequality. | ||

+ | |||

+ | x = 60 | ||

+ | x > 100 | ||

+ | |||

+ | x == 70 | ||

+ | |||

+ | x > 3 | ||

+ | |||

+ | x = 100 | ||

+ | | ||

+ | # Logical values can be stored as variables: | ||

+ | | ||

+ | x = 60 | ||

+ | logical.value = x > 3 | ||

+ | logical.value | ||

+ | | ||

+ | # In addition if c1 and c2 are logical expressions, then c1 & c2 is their | ||

+ | # intersection (“and”), c1 | c2 is their union (“or”), and ! !c1 is the | ||

+ | # negation of c1. | ||

+ | |||

+ | |||

+ | |||

+ | # R objects | ||

+ | | ||

+ | # The entities R operates on are technically known as objects. | ||

+ | # Examples are "vectors of numeric (real)" or "complex values", "vectors of | ||

+ | # logical" values and "vectors of character strings". | ||

+ | # These are known as “atomic” structures since their components are all of | ||

+ | # the same type, or mode, namely numeric, complex, logical, character and raw. | ||

+ | # R also operates on objects called "lists", which are of mode list. | ||

+ | # These are ordered sequences of objects which individually can be of any mode. | ||

+ | # Lists are known as “recursive” rather than atomic structures since their | ||

+ | # components can themselves be lists in their own right. | ||

+ | |||

+ | # The other recursive structures are those of mode function and expression. | ||

+ | # Functions are the objects that form part of the R system along with similar | ||

+ | # user written functions, which we discuss in some detail later. Expressions | ||

+ | # as objects form an advanced part of R which will not be discussed in this | ||

+ | # guide, except indirectly when we discuss formulae used with modeling in R. | ||

+ | |||

+ | # By the "mode" of an object we mean the basic type of its fundamental | ||

+ | # constituents. This is a special case of a “property” of an object. Another | ||

+ | # property of every object is its "length." The functions mode(object) and | ||

+ | # length(object) can be used | ||

+ | # to find out the mode and length of any defined structure 10. | ||

+ | |||

+ | # Further properties of an object are usually provided by attributes(object), | ||

+ | # (see 'Getting and setting attributes'). Because of this, mode and length are | ||

+ | # also called “intrinsic attributes” of an object. For example, if z is a | ||

+ | # complex vector of length 100, then in an expression mode(z) is the character | ||

+ | # string "complex" and length(z) is 100. | ||

+ | |||

+ | |||

+ | # Vectors | ||

+ | |||

+ | # Vectors are combinations of scalars in a string structure. Vectors must have | ||

+ | # all values of the same mode. Thus any given vector must be unambiguously | ||

+ | # either logical, numeric, complex, character or raw. (The only apparent | ||

+ | # exception to this rule is the special “value” listed as NA for quantities not | ||

+ | # available, but in fact there are several types of NA). Note that a vector can | ||

+ | # be empty and still have a mode. For example the empty character string vector | ||

+ | # is listed as character(0) and the empty numeric vector as numeric(0). | ||

+ | |||

+ | |||

+ | # c(...) is the generic function to combine arguments with the default forming | ||

+ | # a vector; with RECURSIVE=TRUE descends through lists combining all elements | ||

+ | # into one vector. To see details for the generic function c(...) and combine | ||

+ | # arguments forming a vector: | ||

+ | | ||

+ | ? c | ||

+ | |||

+ | # As an example we can create a simple vector of seven values typing: | ||

+ | |||

+ | c(2, 3, 4, 5, 10, 5, 8) | ||

+ | |||

+ | # We can generate a sequence using the syntax: | ||

+ | | ||

+ | 1:10 | ||

+ | | ||

+ | # We can generate the same sequence of 1:10 command using the seq() function. | ||

+ | # The syntax will be : | ||

+ | | ||

+ | seq(1,10) | ||

+ | |||

+ | # The seq() function "seq(from = number, to = number, by = number)" allow to | ||

+ | # create a vector starting from a value to another by a defined increment: | ||

+ | |||

+ | seq(1,10, 0.25) | ||

+ | |||

+ | seq(from = 1, to = 10, by = 0.25) | ||

+ | | ||

+ | # The replicate function "rep(x,times)" enables you to replicate a vector | ||

+ | # several times in a more complex vector. Calculations can be included to | ||

+ | # form vectors as well and functions can be combined in the same command: | ||

+ | |||

+ | one2three = 1:3 | ||

+ | rep(one2three,10) | ||

+ | |||

+ | c(10*0:10) | ||

+ | |||

+ | rep(c (5*40:1, 5*1:40, 5, 6,7,8, 3, 2001:2014), 2) | ||

+ | |||

+ | rep(seq(1,3,0.5),3) | ||

+ | |||

+ | # Missing Values | ||

+ | |||

+ | # In some cases the components of a vector or of an R object more in general, | ||

+ | # may not be completely known. When an element or value is “not available” | ||

+ | # or a “missing value” in the statistical sense, a place within a vector may | ||

+ | # be reserved for it by assigning it the special value NA. Any operation on | ||

+ | # an NA becomes an NA. | ||

+ | |||

+ | # The function is.na(x) gives a logical vector of the same size as x with | ||

+ | # value TRUE if and only if the corresponding element in x is NA. | ||

+ | |||

+ | z <- c(1:3,NA) | ||

+ | ind <- is.na(z) | ||

+ | ind | ||

+ | |||

+ | # There is a second kind of “missing” values which are produced by numerical | ||

+ | # computation, the so-called Not a Number, NaN , values. Examples are 0/0 | ||

+ | # or Inf - Inf which both give NaN since the result cannot be defined sensibly. | ||

+ | |||

+ | Inf-Inf | ||

+ | 0/0 | ||

+ | |||

+ | # In summary, is.na(xx) is TRUE both for NA and NaN values. To differentiate | ||

+ | # these, is.nan(xx) is only TRUE for NaNs. Missing values are sometimes printed | ||

+ | # as <NA> when character vectors are printed without quotes. | ||

+ | |||

+ | z <- c(1:3,NA) | ||

+ | is.not.available <- is.na(z) | ||

+ | is.not.a.number <-is.nan(z) | ||

+ | |||

+ | is.not.a.number | ||

+ | is.not.available | ||

+ | |||

+ | |||

+ | # Matrices | ||

+ | |||

+ | # Matrices, or more generally arrays, are multi-dimensional generalizations of | ||

+ | # vectors. In fact, they are vectors that can be indexed by two or more indices | ||

+ | # and will be printed in a special way. See Arrays and matrices. | ||

+ | # Factors provide compact ways to handle categorical data. See Factors. | ||

+ | # Lists are a general form of vector in which the various elements need not be | ||

+ | # of the same type, and are often themselves vectors or lists. Lists provide a | ||

+ | # convenient way to return the results of a statistical computation. See Lists. | ||

+ | | ||

+ | # The matrix() function creates a matrix from the given set of values. We use | ||

+ | # the matrix(x, nrow=, ncol=) function to set the matrix cell values, the | ||

+ | # number of rows and the number of columns. We can use the colnames() and | ||

+ | # rownames() functions to set the column and row names of the matrix-like | ||

+ | # object. | ||

+ | |||

+ | matrix(data = NA, nrow = 2, ncol = 3) | ||

+ | example.matrix = matrix(0,2,3) | ||

+ | example.matrix | ||

+ | |||

+ | example.matrix[1,] | ||

+ | |||

+ | example.matrix[,2] | ||

+ | |||

+ | example.matrix[1,] = 1:3 | ||

+ | example.matrix[2,] = c(5,10,4) | ||

+ | example.matrix | ||

+ | |||

+ | matrix.head = c("col a","col b","column c") | ||

+ | matrix.side = c("first raw","second raw") | ||

+ | str(matrix.side) | ||

+ | |||

+ | # When using " " we create and refer to a character type "chr" input | ||

+ | |||

+ | numeric.vector = c(rep(c (5*10:1, 5, 6), 2)) | ||

+ | character.vector = as.character(numeric.vector) | ||

+ | str(character.vector) | ||

+ | | ||

+ | colnames(example.matrix) = matrix.head | ||

+ | rownames(example.matrix) = matrix.side | ||

+ | example.matrix | ||

+ | |||

+ | str(example.matrix) | ||

+ | |||

+ | # Array | ||

+ | |||

+ | # An array can be considered a multiple subscripted collection of data | ||

+ | # entries, for example numeric. R allows simple facilities for creating | ||

+ | # and handling arrays, and in particular the special case of matrices. | ||

+ | |||

+ | # As well as giving a vector structure a dim attribute, arrays can be | ||

+ | # constructed from vectors by the array function, which has the form | ||

+ | # array(data_vector, dim_vector) | ||

+ | |||

+ | Z <- array(1:24, c(3,4,2)) | ||

+ | Z | ||

+ | |||

+ | # Data Frames | ||

+ | |||

+ | # Data frames are matrix-like structures, in which the columns can be | ||

+ | # of different types. Think of data frames as data matrices with one row per | ||

+ | # observational unit but with (possibly) both numerical and categorical | ||

+ | # variables. Many experiments are best described by data frames: the treatments | ||

+ | # are categorical but the response is numeric. | ||

+ | |||

+ | # As a result R dataframes are tightly coupled collections of variables which | ||

+ | # share many of the properties of matrices and of lists. Data frames are used | ||

+ | # as the fundamental data structure by most of R's modeling software. | ||

+ | |||

+ | # A data frame is a list with class "data.frame". There are restrictions on | ||

+ | # lists that may be made into data frames, namely : | ||

+ | |||

+ | # The components must be vectors (numeric, character, or logical), factors, | ||

+ | # numeric matrices, lists, or other data frames. | ||

+ | # Matrices, lists, and data frames provide as many variables to the new data | ||

+ | # frame as they have columns, elements, or variables, respectively. | ||

+ | # Numeric vectors, logicals and factors are included, and character vectors | ||

+ | # are coerced to be factors, whose levels are the unique values appearing in | ||

+ | # the vector. | ||

+ | # Vector structures appearing as variables of the data frame must all have the | ||

+ | # same length, and matrix structures must all have the same row size. See: | ||

+ | |||

+ | ? data.frame | ||

+ | |||

+ | # To construct a dataframe: | ||

+ | |||

+ | my.data.frame = data.frame(v = 1:4, ch = c("a", "b", "c", "d"), n = 10) | ||

+ | my.data.frame | ||

+ | | ||

+ | # Or: | ||

+ | |||

+ | my.data.frame = data.frame(vector = 1:4, | ||

+ | character = c("a", "b", "c", "d"), | ||

+ | const.vector = 10, | ||

+ | row.names =c("data1", "data2", "data3", "data4")) | ||

+ | my.data.frame | ||

+ | |||

+ | |||

+ | # Data selection and manipulation | ||

+ | |||

+ | # You can extract data from dataframes using the [ [ ] ] and $ sign: | ||

+ | |||

+ | my.data.frame[["character"]] | ||

+ | |||

+ | my.data.frame[[2]] | ||

+ | |||

+ | # Call the 3rd value of the character vector: | ||

+ | |||

+ | my.data.frame[[2]][3] | ||

+ | | ||

+ | # Or using the $ syntax: | ||

+ | | ||

+ | my.data.frame$vector | ||

+ | | ||

+ | my.data.frame$character[2:3] | ||

+ | |||

+ | # You can add single arguments to a data frame, query information, select and | ||

+ | # manipulate arguments or single values from a dataframe | ||

+ | | ||

+ | my.data.frame$new | ||

+ | |||

+ | my.data.frame$new = c(10,11,20,40) | ||

+ | my.data.frame | ||

+ | |||

+ | # length(object.name) returns the number of elements in an object such as | ||

+ | # matrix vector or dataframes: | ||

+ | | ||

+ | length(my.data.frame$new) | ||

+ | |||

+ | # which(object.name) and which.max(object.name) return the index of a specific | ||

+ | # or of the greatest element of an object | ||

+ | | ||

+ | which.max(my.data.frame$new) | ||

+ | |||

+ | which(my.data.frame$new == 20) | ||

+ | |||

+ | # max(object.name) returns the value of the greatest element | ||

+ | | ||

+ | max(my.data.frame$new) | ||

+ | |||

+ | # sort(object.name) sort from small to big | ||

+ | | ||

+ | sort(my.data.frame$new) | ||

+ | |||

+ | # rev(object.name) sorts from big to small | ||

+ | | ||

+ | rev(sort(my.data.frame$new)) | ||

+ | |||

+ | # subset(object.name, ...) returns a selection of an R-object with respect to | ||

+ | # criteria (typically comparisons: x$V1 < 10). If the R-object is a data frame, | ||

+ | # the option select gives the variables to be kept or dropped using a minus | ||

+ | # sign | ||

+ | | ||

+ | subset(my.data.frame, my.data.frame$new == 20) | ||

+ | |||

+ | # Sample() allows sampling from a set of values. | ||

+ | |||

+ | sample(my.data.frame$new, 3) | ||

+ | sample(my.data.frame$new, 3) | ||

+ | sample(my.data.frame$new, 3) | ||

+ | |||

+ | # More examples | ||

+ | |||

+ | # The following R commands give an example of the simple procedure of importing | ||

+ | # data, cleaning a table by extracting relevant information, checking the | ||

+ | # presence of missing data. | ||

+ | | ||

+ | landuse04=read.csv("~/ost4sem/studycase/Lab_scripts/inputs/2004_landuse.csv", | ||

+ | header=TRUE, sep=",", dec=".", na.string=":") | ||

+ | |||

+ | forests04 = subset(landuse04, landuse04$forest.Wooded.area >= 0 ) | ||

+ | forests04$landuse = NULL | ||

+ | forests04.check=na.fail(forests04) | ||

+ | forests04$total.Total.area[1] = NA | ||

+ | forests04.check=na.fail(forests04) | ||

+ | |||

+ | # The last line above will throw an error. | ||

+ | # We can resolve the situation from the beginning with no NA | ||

+ | | ||

+ | forests04 = subset(landuse04, landuse04$forest.Wooded.area >=0 ) | ||

+ | forests04$landuse = NULL | ||

+ | forests04.check=na.fail(forests04) | ||

+ | str(forests04) | ||

+ | |||

+ | # Do you see something strange? Look at theforests04$geographic.Unit level of | ||

+ | # factors and the dataframe number of variables! | ||

+ | |||

+ | # Let's fix it now! | ||

+ | | ||

+ | library(gdata) | ||

+ | forests04 = drop.levels(forests04) | ||

+ | str(forests04) | ||

+ | |||

+ | |||

+ | # Functions | ||

+ | |||

+ | # Functions are themselves objects in R which can be stored in the project's | ||

+ | # workspace. This provides a simple and convenient way to extend R. | ||

+ | # Usage: in writing your own function you provide one or more arguments or | ||

+ | # names for the function, an expression (or body of the function) and a value | ||

+ | # is produced equal to the output function result. | ||

+ | |||

+ | # function(arglist) expr function definition | ||

+ | # return(value) | ||

+ | |||

+ | # Example | ||

+ | |||

+ | myfunction <- function(x) x^5 | ||

+ | myfunction(3) | ||

+ | |||

+ | body(myfunction) <- quote(5^x) | ||

+ | |||

+ | ## or equivalently body(myfunction) <- expression(5^x) | ||

+ | |||

+ | myfunction(3) | ||

+ | |||

+ | body(myfunction) | ||

+ | |||

+ | myfunction | ||

+ | |||

+ | </code> | ||

wiki/basicr.txt · Last modified: 2018/05/10 15:25 (external edit)

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International