LAB 3: R Basics

BIO3782: Biologist's Toolkit (Dalhousie University)


In LAB 1, you learned how to identify and tell apart objects like variables, functions, arguments, comments, data and packages. In this lab we will review again all of these objects. However, this time around we will "dive deeper" and learn intrinsic details about each of these objects.


Let's review LAB 1 by doing a spectrogram of a call from a bird named Northern Goshawk (Accipiter gentilis), which is a medium-large raptor that inhabits many of the temperate parts of the Northern Hemisphere (Eurasia and North America).

A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. It is normally created using a Fast Fourier Transform (FFT) to transform audio in the time domain into power in the frequency domain. It is not too complicated to code in R, however there is an even simple solution, you can use one of multiple R packages that do spectrograms, below we use the phonTools package.

The audio file we'll use here was downloaded from xeno-canto and is a call from a Northern Goshawk, recoded in Lunenburg, Nova Scotia.




  1. Make a new directory on your Desktop, call it lab3
  2. Download the file Mono_XC386310__Northern_Goshawk__Accipiter_gentilis.wav from Brightspace, and place it in your new lab3 directory
  3. Listen the the audio file by double click on Mono_XC386310__Northern_Goshawk__Accipiter_gentilis.wav to open it in your default audio player in your computer. Note that you can also listen to it directly in Brightspace.

  4. Open RStudio and change the working directory to lab3. If you don't remember how to do this, review this section

  5. In the , the following to install the phonTools package

  1. In RStudio, make a new R script file and save it in your lab3 directory with the name spectrogram.R.

  2. Copy-paste the code below into spectrogram.R and click the button

All three plots are spectrographs showing the same thing, but using different colormaps. In all plots, you can see 5 brighter "blobs" that represent the 5 calls from the Northern Goshawk, as you can hear from the audio file.




It is time to dissect the code above, if your need to refresh how to tell apart objects like variables, functions, arguments, etc., take a look at Lab1 Micro-introduction to coding



From the code above, in line 1... What is:

phonTools



From the code above, in line 3... What is:

# Load audio file



From the code above, in line 4... What is:

loadsound



From the code above, in line 4... What is:

sound



From the code above, in line 8... What is:

sound



From the code above, in line 8... What is:

spectrogram



From the code above, in line 10... What is:

color = FALSE



From the code above, in line 7... What is:

par

How to get Help

Arguably, programming in any language is mainly knowing "where and how to get help". Even experienced programmers find themselves searching Google many times per day. If you run into a problem, this is the suggested steps you may want to do:

  1. Use the help() function and/or RStudio's "Help panel" (more on this below))
  2. Google your question followed by R. Example: "How to do functions in R"
    1. Note that Google Results from Stack overflow are usually high quality answers
  3. Search using https://rseek.org/
  4. Search directly in Stack Overflow within the [r] tag

Other ports of call:

You should also take a look at the page "Getting Help with R" from the R project: https://www.r-project.org/help.html

Comments and statements

Take a look to the sequence of comments and statements below:

Above...

If you copy-paste the code above to the RStudio's and click , it will print a 5 to the .

Once x <- 5 has been executed, the value of x will be stored "in memory" until you turn off R, until you delete x, or until you update x with a different value.

"Commenting out" code

The beauty of: Control + Shift + c

In RStudio, you can comment many lines at once by selecting the lines that you want to "comment out" and then click Ctr + Shift + c. This will insert a # at the beginning of all the selected lines. Note that you can do the reverse (i.e. "uncomment code") by selecting several lines beginning with a # and again click Ctr + Shift + c. Instead of keyboard shortcuts, you can also click on [Code > Comment/Uncomment Lines].

Note that "commenting out" code is very useful during code development or testing. You want to "disable" a few lines of code while you run tests or diagnostics; then, at a later time, you can "enable" (i.e. uncomment) your code without ever having to delete or re-write your code.




In the file that you created above (i.e. spectrogram.R). Select all lines and click Ctr + Shift + c.

Select all lines again and click Ctr + Shift + c a second time.




Functions and their arguments

As we said before, most of the "magic" done in R is done through functions Functions!

Functions are "commands" to do things like calculate means, make graphs, do stats, etc. Technically speaking, a function is a "program" that does some manipulations to an "input" and returns an "output", as represented in the diagram below:

functions can be divided in three:

  1. Built-in functions: These functions come included in R. The "programs" that make these functions were developed (i.e. coded or written) by the core developers of the R programming language.
  2. Imported functions: These functions are made available to you when you "load a library". These functions were developed (i.e. coded or written) by "community members" that usually develop them to solve their own problems, but that also don't mind sharing their functions with the rest of the world.
  3. Your own functions: Yes! you can make your own functions (we'll learn how to that later). This is the essence of "code reutilization". If you find yourself re-writing over and over very similar code, you can pack that code in a function so that the next time, you can simply "execute" your function.

Regardless of how you got your functions (i.e. built-in, from a library, or made by you), in this section you will learn how to use, or "execute", these functions or "commands". However, before that, we need to learn the basic syntax or "anatomy" of a function:

To use Functions, you must follow one the syntaxes below:

  1. Note that parentheses ( ) ALWAYS accompany functions.
  2. Before the parentheses is the Function name, which may or may not have a "dot" in between. Function names often refer to what the function does. For example, the function print prints text to the screen.
  3. Inside the parentheses are arguments, which are explained below.

Arguments

Arguments are the function's input and other instructions to guide the function on how to do their task properly.

Functions can have zero, one or more arguments. If a Function has more than one argument, they will always be separated by commas , .

If a function has zero arguments (i.e., nothing inside the parentheses), that means that the function will take its input directly from the computer. For example:


If a function has ONE argument, this argumnet is most likely the function's input. For example:

where "Hello world!" is the input for the print functions above.


If a function has many arguments, the first argument is most likely the input of the function, and the rest of the arguments are instructions to guide the function on how to do their task properly, or fine-tune the output. For example:

where 2.797337826 is the input for the print functions above, and digits=3 is an argument that tells the print function to round the output to 3 significant digits.

"Ordered" arguments

These are arguments where the order in which you type them inside the parentheses ( ) is important.

In the example above 2.797337826 is an "Ordered" argument.

"Named" arguments

These are arguments where you must specify the ArgumentName following by an equal sign =. If you have multiple "named" arguments inside the parentheses ( )one function, the order in which you type them is NOT important.

In the example above digits=3 is a "Named" argument.


Now that you know the general "anatomy" of functions, you just need two things before you start using any specific function:

  1. You need to know the FunctionName of the function you want to use
  2. You need to get the "Function manual", where you can read instructions on how to use the function

In the section below we'll address (1). Two sections below (i.e. Help Files), we'll address (2).

How to find Functions

There are literally thousands of functions in R. Finding the FunctionName of the function that you need can be tricky.

As you find yourself needing to "do things" in R, you will need to find out if there is an R function that "does that thing" that you need to do in R. For example: Image that you need calculate the "standard deviation" of your data. You quickly arrive to the question "what is the FunctionName of the function to calculate standard deviation in R?

This is actually a hard question with no easy solution. Below are a few pointers on how to find the function that will solve your problem:

  1. Search in Google or in https://rseek.org/
    For example, you can search in Google "How to calculate a mean in R". The results should point to tutorials or stackoverflow answers using a function to calculate a mean.

  2. You can use the help.search() function to scan the documentation for packages installed in your library.

  3. You can take a look at the lists below, which include some of the most used built-in functions in R.

General use functions

Math functions

Graphical functions

Statistical functions

Help files (i.e. Function manuals)

Once you find a function that you want to use, a big questions arise: How do I use this function? What arguments do I need to provide? What kind of input it takes?

Luckily, each function has its "Help File", which is like an "instruction manual" with specific instructions on how to use the function.

If you search in Google for a specific function, you should be able to find the "Help File" of the function and read in directly in your browser. However, most often you probably will RStudio's help panel as shown below.

Regardless on where you visualize or read a help file of a function, all help files are broken down into sections:

Different functions might have different sections, but these are the main ones you should be aware of.

From within the function help page, you can highlight code in the Examples and hit Ctrl+Return to run it in RStudio console. This gives you a quick way to get a feel for how a function works.

To display a help file in RStudio's Help panel, you can type in the the function name preceded by a ?. For example:

...or you can use the help() function. For example:

... or you can use RStudio's Help panel by clicking on the Help tab; and typing the name of the function in question in the search bar within the Help panel.




Use the ? or the help() function to obtain the "Help files" of 5 functions from the list above. Can you understand what are the required arguments? If not, that is ok, I just want you to start getting used to reading and using "Help files".




Variables

A Variable is a user-defined label that can be used to name anything. I like to think "Variables" as stickers that you can glue to anything to give it a name. In the photo above, the "Variable Name" is what you wrote on the sticker (in this case, "Sugar") and the "Variable Value" is whatever you glued your sticker to (in this case, a 1 gr sugar cube).

In R, the way to create variables is with <-, -> and =. See examples below:

The convention of using the "arrow" symbols (i.e. <- and ->) comes from the precursor of R, the S Programming Language). I personally prefer using = because <- requires you to type two characters (one needing the "shift" key); also the majority of other languages (e.g. Python, C++, etc.) use =. However, note that = only declares the variable in the current workspace (more on this later).

Either way, by far <- is the most used way to create variables by the R community, so we'll stick with this in this course.

In generic terms, the proper nomenclature to "declare" a variable is as follows:

VariableName <- VariableValue



Lets do a variable:

Type in the the line below and click [Enter]

Even though it looks like nothing happened, when you clicked [Enter], R create the new variable x in memory and assigned the value 5 to it. You can see the new variable x, and its value 5, in the Environment panel. Alternatively, you can type the "Variable Name" in the to get the "Variable Value":

Type in the the line below and click [Enter]

Lets do another variable:

Copy-paste the code below into the and click [Enter]:

Above, we bundled two steps in one entry: (1) we create the variable y with the value 3000, and (2) we display the value of y to screen.

Lets do two more:

Copy-paste the code below into the and click [Enter]:




While you can name your variables anything you want, it is a good practice to choose variable names that describe what is stored within the variable. Temperature is better than col1, number_eggs is better than e, TransectDensity is better than td.


Using "Variables" and "Functions" together

You can use variables to "label" the output from a funtion. In this case, the following syntax applies:

VariableName <- FunctionName()

For example:



Take a look at the code to do the spectrogram of the Northern Goshawk call (below, or the beginning of this lab).

library(phonTools)

#Load audio file
sound <- loadsound("Mono_XC386310__Northern_Goshawk__Accipiter_gentilis.wav")

#Make 3 spectrograms
par(mfrow = c(3,1), mar = c(4,4,1,1))
spectrogram(sound)
spectrogram(sound, color = 'alternate')
spectrogram(sound, color = FALSE)


Which of the following are variables?

Check "Variables" loaded in memory

As we saw in the previous lab, you can use RStudio's Environment panel to see what variables are currently loaded into memory, and well as their values.

You can also see a list of what variables are currently loaded into memory by executing the function ls() on RStudio's console:




Type in the the line below and click [Enter]

In the , you should see the variables currently loaded in the memory of your computer.




Overwriting "Variables" values

When you create a Variable, you assign a Variable Value to it. However, you can change its Variable Value at anytime simply by assigning to it a different value (kinda' "overwriting" or "updating" its value). See below:

Deleting "Variables"

Deleting ONE variable







Deleting ALL variables



In R, what function is used to delete one variable?



In R, what statement is used to delete ALL variables in your environment?


Basic data types

I mentioned earlier that R is an object-oriented language, where everything is an object. The "basic data types" are the simplest and most basic kinds of objects. All your data must be represented with one of these data types. Each data type behaves under different rules, thus it is important to always be aware of what data type each statement is working with. Below are some of the most common data types used in R:

You can use the typeof() function to query any variable to see what data type are they made of.

Lets dive a bit deeper into each different basic data type.

Double (or numeric)

Double (or numeric) objects are numbers that are allowed to have a decimal point.

Integers

Integers are "round" numbers (i.e. no decimal point). Note that you have to add a capital L at the end to specify the number is an integer. If you do not ad an L, R will think it is a double with .0 as a decimal.

Character

Characters are letters. Note that you need to wrap the content of the string within quotes to tell R they are a character

Note that even though 2.3 is a number, because we declared it with quotes (i.e. "2.3"), it will be treated by R as "letters". Therefore, you won't be able to do math with "2.3".

Logical

Logicals can only have one of two values TRUE or FALSE. They are used to represent "true or false" statements.



What data type is the following:

324.91



What data type is the following:

"the house is green"




What data type is the following:

87572L




If you execute the following line in R...

a = 32

Then, what data type is "a"?



If you execute the following line in R...

a = 32L

Then, what data type is "a"?


R Operators

Now that you know how to do numeric, integer, character and logical elements, you can start doing arithmetic, relational and logical operations. Below is a "cheat-sheet" of all the operators in R:

We won't dive in all of the operators at this point, but here are few examples:

Data Structures (i.e. "Container" objects)

We mentioned that everything in R is an object, and that the "Basic Data Types" are the simplest type of objects. "Data Structures" are also objects, however a bit more complicated. "Data Structures" are designed to contain groups of other objects... thus, I like to call them "Container" objects.

In the photo above (i.e. jar of sugar cubes), the "Variable Name" is what you wrote on the sticker (in this case, "Sugar"), the "Variable Value" is whatever you glued your sticker to (in this case, a jar with 28 one-gram sugar cubes). The jar is a "Container" object and each of the sugar cubes is a different object, each with its own "Basic Data Type" and value.

The main "Data Structures" that we will discuss here are:

You can use the function class() to query the type of "Data Structure" of a variable, for example:

Vector

A vector is a collection of elements of the same "Basic Data Type" (i.e. numeric, integer, character, or logical). You can think of a "vector" as one single column in a spread-sheet. Technically, vectors can be either (1) atomic vectors (i.e. all elements of the same "Basic Data Type"), or (2) lists (i.e. elements can be of different types). However, the term “vector” most commonly refers to the "atomic vectors" and "list vectors" are referred simply as "lists", which we will discuss in a section below.

Vectors are particularly well suited to do matrix algebra.

To make a vector, use the "combine" function, c.

Note that class(x) returned 'numeric' because ALL the elements within x are of Numeric Basic Data Type.

Note that class(x) returned 'character' because ALL the elements within x are of Character Basic Data Type.


In the example below, lets insert a 'numeric' element among a bunch of 'character' elements to see what happens:

Note that R changed the "number" 7 for a "character" '7'. In this example, R assumed that, since you added three 'character' elements and only one 'numeric' element, you probably want to have an all 'character' vector and thus changed the numeric 7 for the character '7'. This behaviour of R is called "coercion" and is handy to correct for potential errors, but it can also get the programmer into troubles, by changing the data type of an element without warning.

Matrix

Matrices are an extension of vectors. They are simply a vector with dimensions of "number of rows" by "number of columns". As with vectors, the elements of a matrix must be of the same Data Type.

Just like vectors, matrices are particularly well suited to do matrix algebra.

To make a Matrix, use the function matrix()

List

Lists are like vectors, but without the restriction of having their contents of a single Data Type. You can mix numeric, integer, character, and logical elements within a single List. Lists are sometimes called "generic vectors", because the elements of a list can be of any type of R object, even lists containing further lists. This property makes them fundamentally different from the atomic vectors that we discussed above.

To make a list, use the function list()

Below is an example of a list that contains another list. We'll insert the list created in the code above as the 3rd element of the list below, y:

Data Frame

Data frames are a very important data type in R. It’s pretty much the de facto data structure for most tabular data and what we use in the biology field and in plotting in general. Data frames can be created by hand, but most commonly they are generated when importing spreadsheets from your hard-drive or the web.

Remember that a "matrix" is a special type of "vector" with multiple rows and columns? Well, similarly, a Data Frame is a special type of list with multiple rows and column, where every element of the list has same length (i.e. data frame is a “rectangular” list). Same as lists, the elements of a Data frames can be of any type of R object (i.e. numeric, integer, character, logical, lists and other Data Frames).

To make a list, use the data.frame() function:

Factors

Factors are used to represent categorical data. Conceptually, factors are variables in R which take on a limited number of different values (e.g. "male" and "female"). One of the most important uses of factors is in statistical modeling; since categorical variables enter into statistical models differently than continuous variables, storing data as factors insures that the modeling functions will treat such data correctly.

Factors in R are stored as a vector of integer values with a corresponding set of character values to use when the factor is displayed. The factor function is used to create a factor. The only required argument to factor is a vector of values which will be returned as a vector of factor values. Both numeric and character variables can be made into factors, but a factor's levels will always be character values. Factors represent a very efficient way to store character values, because each unique character value is stored only once, and the data itself is stored as a vector of integers.

Below is an example:

You can see the possible levels for a factor through the levels() function:

... and the number of levels using the nlevels() function:



Consider the statement below:

x <- data.frame(x = 1:20, y = 21:40)

What is class, or Data Structure is x?




Consider the statement below:

x <- matrix(c(1, 2, 3, 4), nrow=2, ncol=2)

What is class, or Data Structure is x?




Consider the statement below:

x <- list(1, 2, 3, 4)

What is class, or Data Structure is x?




Consider the statement below:

x <- c(1, 2, 3, 4)


What is class, or Data Structure is x?




Consider the statement below:

x <- factor(c(1, 2, 3, 4))


What is class, or Data Structure is x?




Which of the Data Structures below MUST have all of their elements be of the same Data Type?




Which of the Data Structures below CAN have their elements be of any type of R object, including multiple Data Types?


Indexing (slicing)

There are multiple ways to access or replace values inside data structures. The most common approach is to use “indexing”. This is also referred to as “slicing”.

Note that brackets [ ] are used for indexing, whereas parentheses ( ) are used to call a function.

Consider the following vector x and list y:

To get the first element of vector x:

If the vector x is graphically represented by the blue grid below... then x[1] would access the red box:

To get the second element of list y:

If the list y is graphically represented by the blue grid below... then y[2] would access the red box:

Elements within a list are also lists! To get to the contents inside a "list element" you need to use double brackets [[ ]]:

So, for example, y[1] + y[5] will return an error, because you cannot "add" two lists. To add the two values you need to do:

To get a range of elements use the :. For example, to get the first 3 elements of vector x:

If the vector x is graphically represented by the blue grid below... then x[1:3] would access the red box:

To get the 5th, 6th and 7th elements of list y:

If the list y is graphically represented by the blue grid below... then y[5:7] would access the red box:

To access a list within list, you have to first dive into the values of the first list, using double brackets [[ ]]:

If the list y is graphically represented by the blue grid below... then y[[4]][3] would access the red box:

If you need the last element of a vector or list, but don't know how long is the vector or list, use the function length():

Logical Indexing is used when you need to retrieve elements according to a logical expression. This only works with vectors and list that ONLY contain numeric or integer data types.



Given the following vector v:

v <- c(8, 5, 3, 1, 9, 2)

...which is represented visually with the diagram below, How would you access the value highlighted in the red box?




Given the following vector v:

v <- c(8, 5, 3, 1, 9, 2)

...which is represented visually with the diagram below, How would you access the value highlighted in the red box?




Given the following vector v:

v <- c(8, 5, 3, 1, 9, 2)

...which is represented visually with the diagram below, How would you access the values highlighted within the red box?




Given the following vector v:

v <- c(8, 5, 3, 1, 9, 2)

...which is represented visually with the diagram below, How would you access the value highlighted in the red box?




Given the following vector v:

v <- c(8, 5, 3, 1, 9, 2)

...which is represented visually with the diagram below, How would you access the values highlighted within the red box?




Given the following list L:

L <- list(7L, "hello", 3.6, c(7, 5, 2), 9, "world")

...which is represented visually with the diagram below, How would you access the value highlighted in the red box?




Given the following list L:

L <- list(7L, "hello", 3.6, c(7, 5, 2), 9, "world")

...which is represented visually with the diagram below, How would you access the values highlighted in the red box?




Given the following list L:

L <- list(7L, "hello", 3.6, c(7, 5, 2), 9, "world")

...which is represented visually with the diagram below, How would you access the value highlighted in the red box?




Given the following list L:

L <- list(7L, "hello", 3.6, c(7, 5, 2), 9, "world")

...which is represented visually with the diagram below, How would you access the value highlighted in the red box?




Given the following list L:

L <- list(7L, "hello", 3.6, c(7, 5, 2), 9, "world")

...which is represented visually with the diagram below, How would you access the values highlighted in the red box?


Consider the following matrix m:

To get to a single number within the matrix, use a pair of indices where the first index is the row number and the second index is the column number:

Rather than using pairs, you can also get single index. You can think of this index as a “cell number”. Cells are numbered column-wise (i.e., first the rows in the first column, then the second column, etc.). Thus,

You can also get multiple values at once:

To get a whole column, use a coma before the column index. The statement below read return all rows of column 1:

To get a whole row, use a coma after the row index: The statement below read *return row 1... all columns:

Getting a whole row or column from a matrix returns a vector:



Given the following matrix m:

m <- matrix(c(2,4,7,3,5,3,9,7,0,6,8,2),nrow=4, ncol=3)

...which is represented visually with the diagram below, How would you access the value highlighted in the red box?




Given the following matrix m:

m <- matrix(c(2,4,7,3,5,3,9,7,0,6,8,2),nrow=4, ncol=3)

...which is represented visually with the diagram below, How would you access the values highlighted in the red box?




Given the following matrix m:

m <- matrix(c(2,4,7,3,5,3,9,7,0,6,8,2),nrow=4, ncol=3)

...which is represented visually with the diagram below, How would you access the values highlighted in the red box?




Given the following matrix m:

m <- matrix(c(2,4,7,3,5,3,9,7,0,6,8,2),nrow=4, ncol=3)

...which is represented visually with the diagram below, If you make a new variable with the values highlighted in the red box (using your answer to the question above)... ...what class or Data Structure will be asigned to your new variable?


Consider the following data frame d:

You can extract a column by column number:

Here is an alternative way to address the column number in a data frame:

Note that whereas [2] would be the second element in a matrix, it refers to the second column in a data.frame. This is because a data.frame is a special kind of list and not a special kind of matrix.

You can also use the column name to get the values of a column:

In addition to d[,"x"] above, you can also use the $ symbol.

Using the $ symbol is a very common practice to slice a column of a data.frame:

Note that both d[,"x"] and d$x return a vector. That is, the complexity of the data.frame structure was dropped. This does not happen when you do d["x"], where the outputs remains a data.frame. Take a look:

Why should you care about this drop business? In many cases R functions want a specific data type, such as a matrix or data.frame and report an error if they get something else. One common situation is that you think you provide data of the right type, such as a data.frame, but that in fact you are providing a vector, because the structure dropped.

Either way, you can use [ ] with all three approaches to get to a specific value within a column:



Given the following Data Frame d:

d <- data.frame(x=2:6, y=3:7, z=4:8)

...which is represented visually with the diagram below, How would you access the values highlighted in the red box?




Given the following Data Frame d:

d <- data.frame(x=2:6, y=3:7, z=4:8)

...which is represented visually with the diagram below, Select all the choices that you could use to access the values highlighted in the red box?




Given the following Data Frame d:

d <- data.frame(x=2:6, y=3:7, z=4:8)

...which is represented visually with the diagram below, How would you access the value highlighted in the red box?




Given the following Data Frame d:

d <- data.frame(x=2:6, y=3:7, z=4:8)

and given the following subset of d, created as follows:

subset_d <- d[,"y"]

What class or Data Structure would be subset_d?




Given the following Data Frame d:

d <- data.frame(x=2:6, y=3:7, z=4:8)

and given the following subset of d, created as follows:

subset_d <- d$y

What class or Data Structure would be subset_d?




Given the following Data Frame d:

d <- data.frame(x=2:6, y=3:7, z=4:8)

and given the following subset of d, created as follows:

subset_d <- d["y"]

What class or Data Structure would be subset_d?

Some parts of this lab where borrowed from:

This is the end of lab


Code below is for formatting of this lab. Do not alter!