BIO3782: Biologist's Toolkit (Dalhousie University)
In LAB 1, you learned how to identify and tell apart objects like variables
, functions
, arguments
, comments
, data
and packages
. In this lab we will review again all of these objects. However, this time around we will "dive deeper" and learn intrinsic details about each of these objects.
A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. It is normally created using a Fast Fourier Transform (FFT) to transform audio in the time domain into power in the frequency domain. It is not too complicated to code in R, however there is an even simple solution, you can use one of multiple R packages that do spectrograms, below we use the phonTools
package.
The audio file we'll use here was downloaded from xeno-canto and is a call from a Northern Goshawk, recoded in Lunenburg, Nova Scotia.
Listen the the audio file by double click on Mono_XC386310__Northern_Goshawk__Accipiter_gentilis.wav to open it in your default audio player in your computer. Note that you can also listen to it directly in Brightspace.
Open RStudio and change the working directory to lab3. If you don't remember how to do this, review this section
In the , the following to install the phonTools
package
install.packages("phonTools")
In RStudio, make a new R script file and save it in your lab3 directory with the name spectrogram.R.
Copy-paste the code below into spectrogram.R and click the button
library(phonTools)
# Load audio file
sound <- loadsound("Mono_XC386310__Northern_Goshawk__Accipiter_gentilis.wav")
# Make 3 spectrograms
par(mfrow = c(3,1), mar = c(4,4,1,1))
spectrogram(sound)
spectrogram(sound, color = 'alternate')
spectrogram(sound, color = FALSE)
All three plots are spectrographs showing the same thing, but using different colormaps. In all plots, you can see 5 brighter "blobs" that represent the 5 calls from the Northern Goshawk, as you can hear from the audio file.
It is time to dissect the code above, if your need to refresh how to tell apart objects like variables
, functions
, arguments
, etc., take a look at Lab1 Micro-introduction to coding
Arguably, programming in any language is mainly knowing "where and how to get help". Even experienced programmers find themselves searching Google many times per day. If you run into a problem, this is the suggested steps you may want to do:
help()
function and/or RStudio's "Help panel" (more on this below))R
. Example: "How to do functions in R"Other ports of call:
You should also take a look at the page "Getting Help with R" from the R project: https://www.r-project.org/help.html
Comments are lines of text meant to be read ONLY by humans (i.e. the computer ignores these lines). Comments usually contain annotations and additional information to make easier to understand the code to the programmer. Comments are preceded by a #
(hashtag)
A Statement is a line of text read by the computer. This is "the code". A statement contains instructions for the computer to do a simple task.
Take a look to the sequence of comments and statements below:
# This is a Comment because it is preceded by a # (hashtag)
x <- 5
# Lets print our variables to screen
print(x) # Note that comments can be written after a statement (but not before)
[1] 5
Above...
x
to be equal to 5
(see "Variables" section below for more on this)x
to screen. After the statement, there is another comment If you copy-paste the code above to the RStudio's and click , it will print a 5
to the .
The beauty of: Control + Shift + c
In RStudio, you can comment many lines at once by selecting the lines that you want to "comment out" and then click Ctr
+ Shift
+ c
. This will insert a #
at the beginning of all the selected lines. Note that you can do the reverse (i.e. "uncomment code") by selecting several lines beginning with a #
and again click Ctr
+ Shift
+ c
. Instead of keyboard shortcuts, you can also click on [Code > Comment/Uncomment Lines].
Note that "commenting out" code is very useful during code development or testing. You want to "disable" a few lines of code while you run tests or diagnostics; then, at a later time, you can "enable" (i.e. uncomment) your code without ever having to delete or re-write your code.
In the file that you created above (i.e. spectrogram.R). Select all lines and click Ctr
+ Shift
+ c
.
Select all lines again and click Ctr
+ Shift
+ c
a second time.
As we said before, most of the "magic" done in R is done through functions Functions
!
Functions
are "commands" to do things like calculate means, make graphs, do stats, etc. Technically speaking, a function
is a "program" that does some manipulations to an "input" and returns an "output", as represented in the diagram below:
functions
can be divided in three:
functions
: These functions come included in R. The "programs" that make these functions
were developed (i.e. coded or written) by the core developers of the R programming language.functions
: These functions are made available to you when you "load a library". These functions
were developed (i.e. coded or written) by "community members" that usually develop them to solve their own problems, but that also don't mind sharing their functions
with the rest of the world. functions
: Yes! you can make your own functions (we'll learn how to that later). This is the essence of "code reutilization". If you find yourself re-writing over and over very similar code, you can pack that code in a function
so that the next time, you can simply "execute" your function
.Regardless of how you got your functions
(i.e. built-in, from a library, or made by you), in this section you will learn how to use, or "execute", these functions
or "commands". However, before that, we need to learn the basic syntax or "anatomy" of a function:
To use Functions
, you must follow one the syntaxes below:
( )
ALWAYS accompany functions
.print
prints text to the screen.arguments
, which are explained below.Arguments
are the function's input
and other instructions to guide the function
on how to do their task properly.
Functions
can have zero, one or more arguments
. If a Function
has more than one argument
, they will always be separated by commas ,
.
If a function
has zero arguments
(i.e., nothing inside the parentheses), that means that the function
will take its input
directly from the computer. For example:
# This function returns the computer's current date and time
Sys.time()
[1] "2020-12-28 16:47:33 AST"
If a function
has ONE argument
, this argumnet is most likely the function's input
. For example:
# This function prints-to-screen its input
print("Hello world!")
[1] "Hello world!"
where "Hello world!"
is the input for the print
functions above.
If a function
has many arguments
, the first argument
is most likely the input
of the function
, and the rest of the arguments
are instructions to guide the function
on how to do their task properly, or fine-tune the output. For example:
# This function prints-to-screen its input (i.e. first argument), but also rounds the output to "3" significant digits
print(2.797337826, digits=3)
[1] 2.8
where 2.797337826
is the input for the print
functions above, and digits=3
is an argument that tells the print
function to round the output to 3 significant digits.
These are arguments
where the order in which you type them inside the parentheses ( )
is important.
In the example above 2.797337826
is an "Ordered" argument.
These are arguments
where you must specify the ArgumentName
following by an equal sign =
. If you have multiple "named" arguments inside the parentheses ( )
one function
, the order in which you type them is NOT important.
In the example above digits=3
is a "Named" argument.
Now that you know the general "anatomy" of functions
, you just need two things before you start using any specific function
:
FunctionName
of the function you want to useIn the section below we'll address (1). Two sections below (i.e. Help Files), we'll address (2).
There are literally thousands of functions
in R. Finding the FunctionName
of the function that you need can be tricky.
As you find yourself needing to "do things" in R, you will need to find out if there is an R function that "does that thing" that you need to do in R. For example: Image that you need calculate the "standard deviation" of your data. You quickly arrive to the question "what is the FunctionName
of the function to calculate standard deviation in R?
This is actually a hard question with no easy solution. Below are a few pointers on how to find the function that will solve your problem:
Search in Google or in https://rseek.org/
For example, you can search in Google "How to calculate a mean in R". The results should point to tutorials or stackoverflow answers using a function to calculate a mean.
You can use the help.search()
function to scan the documentation for packages installed in your library.
You can take a look at the lists below, which include some of the most used built-in functions
in R.
builtins() # List all built-in functions
options() # Set options to control how R computes & displays results
# General
print() # Print to screen
?NA # Help page on handling of missing data values
abs(x) # The absolute value of "x"
append() # Add elements to a vector
cat(x) # Prints the arguments
cbind() # Combine vectors by row/column (cf. "paste" in Unix)
diff(x) # Returns suitably lagged and iterated differences
gl() # Generate factors with the pattern of their levels
grep() # Pattern matching
identical() # Test if 2 objects are *exactly* equal
jitter() # Add a small amount of noise to a numeric vector
julian() # Return Julian date
length(x) # Return no. of elements in vector x
mat.or.vec() # Create a matrix or vector
paste(x) # Concatenate vectors after converting to character
range(x) # Returns the minimum and maximum of x
rep(1,5) # Repeat the number 1 five times
rev(x) # List the elements of "x" in reverse order
seq(1,10,0.4) # Generate a sequence (1 -> 10, spaced by 0.4)
sequence() # Create a vector of sequences
sign(x) # Returns the signs of the elements of x
sort(x) # Sort the vector x
order(x) # list sorted element numbers of x
tolower(),toupper() # Convert string to lower/upper case letters
unique(x) # Remove duplicate entries from vector
system("cmd") # Execute "cmd" in operating system (outside of R)
floor(x), ceiling(x), round(x), signif(x), trunc(x) # rounding functions
# Container objects
c(x) # Combine values into a vector or List
vector() # Produces a vector of given length and mode
matrix() # Makes a matrix
data.frame() # Makes data frame
# Environment and working directory
ls() # List objects in current environment
getwd() # Return working directory
setwd() # Set working directory
Sys.getenv(x) # Get the value of the environment variable "x"
Sys.putenv(x) # Set the value of the environment variable "x"
Sys.time() # Return system time
Sys.Date() # Return system date
?files # Help on low-level interface to file system
list.files() # List files in a give directory
file.info() # Get information about files
log(x),logb(),log10(),log2(),exp(),expm1(),log1p(),sqrt() # Fairly obvious
cos(),sin(),tan(),acos(),asin(),atan(),atan2() # Usual stuff
cosh(),sinh(),tanh(),acosh(),asinh(),atanh() # Hyperbolic functions
union(),intersect(),setdiff(),setequal() # Set operations
eigen() # Computes eigenvalues and eigenvectors
sqrt() # Square root
sum() # Sum
pi # Pi constant
deriv() # Symbolic and algorithmic derivatives of simple expressions
integrate() # Adaptive quadrature over a finite or infinite interval.
?Control # Help on control flow statements (e.g. if, for, while)
?Extract # Help on operators acting to extract or replace subsets of vectors
?Logic # Help on logical operators
?regex # Help on regular expressions used in R
?Syntax # Help on R syntax and giving the precedence of operators
help(package=graphics) # List all graphics functions
plot() # Generic function for plotting of R objects
par() # Set or query graphical parameters
curve(5*x^3,add=T) # Plot an equation as a curve
points(x,y) # Add another set of points to an existing graph
arrows() # Draw arrows [see errorbar script]
abline() # Adds a straight line to an existing graph
lines() # Join specified points with line segments
segments() # Draw line segments between pairs of points
hist(x) # Plot a histogram of x
pairs() # Plot matrix of scatter plots
matplot() # Plot columns of matrices
?device # Help page on available graphical devices
postscript() # Plot to postscript file
pdf() # Plot to pdf file
png() # Plot to PNG file
jpeg() # Plot to JPEG file
persp() # Draws perspective plot
contour() # Contour plot
image() # Plot an image
help(package=stats) # List all stats functions
?Chisquare # Help on chi-squared distribution functions
?Poisson # Help on Poisson distribution functions
help(package=survival) # Survival analysis
cor.test() # Perform correlation test
cumsum(); cumprod(); cummin(); cummax() # Cumuluative functions for vectors
density(x) # Compute kernel density estimates
ks.test() # Performs one or two sample Kolmogorov-Smirnov tests
loess(), lowess() # Scatter plot smoothing
mad() # Calculate median absolute deviation
mean(x), weighted.mean(x), median(x), min(x), max(x), quantile(x)
rnorm(), runif() # Generate random data with Gaussian/uniform distribution
splinefun() # Perform spline interpolation
smooth.spline() # Fits a cubic smoothing spline
sd() # Calculate standard deviation
summary(x) # Returns a summary of x: mean, min, max etc.
t.test() # Student's t-test
var() # Calculate variance
sample() # Random samples & permutations
ecdf() # Empirical Cumulative Distribution Function
qqplot() # quantile-quantile plot
lm # Fit liner model
Once you find a function that you want to use, a big questions arise: How do I use this function? What arguments do I need to provide? What kind of input it takes?
Luckily, each function
has its "Help File", which is like an "instruction manual" with specific instructions on how to use the function
.
If you search in Google for a specific function
, you should be able to find the "Help File" of the function and read in directly in your browser. However, most often you probably will RStudio's help panel as shown below.
Regardless on where you visualize or read a help file of a function, all help files are broken down into sections:
Different functions might have different sections, but these are the main ones you should be aware of.
From within the function help page, you can highlight code in the Examples and hit Ctrl
+Return
to run it in RStudio console. This gives you a quick way to get a feel for how a function works.
To display a help file in RStudio's Help panel, you can type in the the function name preceded by a ?
. For example:
?print
...or you can use the help()
function. For example:
help(print)
... or you can use RStudio's Help panel by clicking on the Help tab; and typing the name of the function
in question in the search bar within the Help panel.
Use the ?
or the help()
function to obtain the "Help files" of 5 functions from the list above. Can you understand what are the required arguments
? If not, that is ok, I just want you to start getting used to reading and using "Help files".
A Variable is a user-defined label that can be used to name anything. I like to think "Variables" as stickers that you can glue to anything to give it a name. In the photo above, the "Variable Name" is what you wrote on the sticker (in this case, "Sugar") and the "Variable Value" is whatever you glued your sticker to (in this case, a 1 gr sugar cube).
In R, the way to create variables is with <-
, ->
and =
. See examples below:
x <- 5 # Example 1
5 -> x # Example 2
x = 5 # Example 3
The convention of using the "arrow" symbols (i.e. <-
and ->
) comes from the precursor of R, the S Programming Language). I personally prefer using =
because <-
requires you to type two characters (one needing the "shift" key); also the majority of other languages (e.g. Python, C++, etc.) use =
. However, note that =
only declares the variable in the current workspace (more on this later).
Either way, by far <-
is the most used way to create variables by the R community, so we'll stick with this in this course.
In generic terms, the proper nomenclature to "declare" a variable is as follows:
VariableName <- VariableValue
Lets do a variable:
Type in the the line below and click [Enter]
x <- 5
Even though it looks like nothing happened, when you clicked [Enter], R create the new variable x
in memory and assigned the value 5
to it. You can see the new variable x
, and its value 5
, in the Environment panel. Alternatively, you can type the "Variable Name" in the to get the "Variable Value":
Type in the the line below and click [Enter]
x
Lets do another variable:
Copy-paste the code below into the and click [Enter]:
y <- 3000
y
Above, we bundled two steps in one entry: (1) we create the variable y
with the value 3000
, and (2) we display the value of y
to screen.
Lets do two more:
Copy-paste the code below into the and click [Enter]:
a = 0
a
my_cute_variable <- 35
my_cute_variable
While you can name your variables anything you want, it is a good practice to choose variable names that describe what is stored within the variable. Temperature
is better than col1
, number_eggs
is better than e
, TransectDensity
is better than td
.
You can use variables
to "label" the output from a funtion
. In this case, the following syntax applies:
VariableName <- FunctionName()
For example:
x <- Sys.time()
print(x)
[1] "2020-12-29 17:35:31 AST"
As we saw in the previous lab, you can use RStudio's Environment panel to see what variables are currently loaded into memory, and well as their values.
You can also see a list of what variables are currently loaded into memory by executing the function ls()
on RStudio's console:
Type in the the line below and click [Enter]
ls()
In the , you should see the variables currently loaded in the memory of your computer.
When you create a Variable, you assign a Variable Value to it. However, you can change its Variable Value at anytime simply by assigning to it a different value (kinda' "overwriting" or "updating" its value). See below:
x <- 4 # Assign value of 4 to variable "a"
print(x) # Check the value of "a"
# Lets change the value of "a"
x <- 2986 # Assign value 4
print(x) # Check again the value of "a"
[1] 4 [1] 2986
a <- 4 # Make variable "a"
ls() # Check which variables exist in memory
rm(a) # Remove variable "a"
ls() # Check again which variables exist in memory
X
and the Variable Value 1000
.ls()
function to double-check that your new variable X
is indeed present in memoryrm()
command to remove X
X
is no longer in the Environment Panells()
function to double-check that the variable X
is indeed not present in memoryrm(list = ls())
ls() # Check which variables exist in memory
rm(list = ls()) # Remove ALL variables
ls() # Check again which variables exist in memory
I mentioned earlier that R is an object-oriented language, where everything is an object. The "basic data types" are the simplest and most basic kinds of objects. All your data must be represented with one of these data types. Each data type behaves under different rules, thus it is important to always be aware of what data type each statement is working with. Below are some of the most common data types used in R:
2
, 15.5
(these are numbers allowed to have decimals)2L
(these are "round" numbers; the L tells R to store this as an integer)"a"
, "2"
, "hello"
(these are letters)TRUE
or FALSE
You can use the typeof()
function to query any variable to see what data type are they made of.
Lets dive a bit deeper into each different basic data type.
Double (or numeric) objects are numbers that are allowed to have a decimal point.
x <- 4.67
typeof(x)
Integers are "round" numbers (i.e. no decimal point). Note that you have to add a capital L
at the end to specify the number is an integer. If you do not ad an L
, R will think it is a double with .0 as a decimal.
y <- 4L
typeof(y)
Characters are letters. Note that you need to wrap the content of the string within quotes to tell R they are a character
a <- "hello" # Example 1
b <- "2.3" # Example 2
typeof(a)
typeof(b)
Note that even though 2.3
is a number, because we declared it with quotes (i.e. "2.3"
), it will be treated by R as "letters". Therefore, you won't be able to do math with "2.3".
Logicals can only have one of two values TRUE
or FALSE
. They are used to represent "true or false" statements.
x <- FALSE
typeof(x)
Now that you know how to do numeric
, integer
, character
and logical
elements, you can start doing arithmetic, relational and logical operations. Below is a "cheat-sheet" of all the operators in R:
# Arithmetic Operators -------
+ # Addition
- # Subtraction
* # Multiplication
/ # Division
^ # Exponent
%% # Modulus (Remainder from division)
%/% # Integer Division
# Relational Operators --------
< # Less than
> # Greater than
<= # Less than or equal to
>= # Greater than or equal to
== # Equal to
!= # Not equal to
# Logical Operators -----------
! # Logical NOT
& # Element-wise logical AND
&& # Logical AND
| # Element-wise logical OR
|| # Logical OR
# Assignment Operators ---------
<-, <<-, = # Leftwards assignment
->, ->> # Rightwards assignment
We won't dive in all of the operators at this point, but here are few examples:
7 * 2
x <- 8 - 3
x
2^3
7 > 2
9 >= 18
We mentioned that everything in R is an object, and that the "Basic Data Types" are the simplest type of objects. "Data Structures" are also objects, however a bit more complicated. "Data Structures" are designed to contain groups of other objects... thus, I like to call them "Container" objects.
In the photo above (i.e. jar of sugar cubes), the "Variable Name" is what you wrote on the sticker (in this case, "Sugar"), the "Variable Value" is whatever you glued your sticker to (in this case, a jar with 28 one-gram sugar cubes). The jar is a "Container" object and each of the sugar cubes is a different object, each with its own "Basic Data Type" and value.
The main "Data Structures" that we will discuss here are:
You can use the function class()
to query the type of "Data Structure" of a variable, for example:
class(x)
will return matrix
, if x
is a Matrix Data Structureclass(x)
will return list
, if x
is a List Data Structureclass(x)
will return data.frame
, if x
is a Data Frame Data Structureclass(x)
will return factor
, if x
is a Factor Data Structureclass(x)
will return the Basic Data Type of the elements within x
(see the two examples below).A vector is a collection of elements of the same "Basic Data Type" (i.e. numeric
, integer
, character
, or logical
). You can think of a "vector" as one single column in a spread-sheet. Technically, vectors can be either (1) atomic vectors (i.e. all elements of the same "Basic Data Type"), or (2) lists (i.e. elements can be of different types). However, the term “vector” most commonly refers to the "atomic vectors" and "list vectors" are referred simply as "lists", which we will discuss in a section below.
Vectors are particularly well suited to do matrix algebra.
To make a vector, use the "combine" function, c
.
x <- c(5, 2, 7, 4)
x
class(x)
Note that class(x)
returned 'numeric'
because ALL the elements within x
are of Numeric Basic Data Type.
x <- c("a", "b", "c", "d")
x
class(x)
Note that class(x)
returned 'character'
because ALL the elements within x
are of Character Basic Data Type.
In the example below, lets insert a 'numeric' element among a bunch of 'character' elements to see what happens:
x <- c("a", "b", 7, "d")
x
class(x)
Note that R changed the "number" 7
for a "character" '7'
. In this example, R assumed that, since you added three 'character' elements and only one 'numeric' element, you probably want to have an all 'character' vector and thus changed the numeric 7
for the character '7'
. This behaviour of R is called "coercion" and is handy to correct for potential errors, but it can also get the programmer into troubles, by changing the data type of an element without warning.
Matrices are an extension of vectors. They are simply a vector with dimensions of "number of rows" by "number of columns". As with vectors, the elements of a matrix must be of the same Data Type.
Just like vectors, matrices are particularly well suited to do matrix algebra.
To make a Matrix, use the function matrix()
x <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), nrow=4, ncol=3)
x
class(x)
1 | 5 | 9 |
2 | 6 | 10 |
3 | 7 | 11 |
4 | 8 | 12 |
Lists are like vectors, but without the restriction of having their contents of a single Data Type. You can mix numeric
, integer
, character
, and logical
elements within a single List. Lists are sometimes called "generic vectors", because the elements of a list can be of any type of R object, even lists containing further lists. This property makes them fundamentally different from the atomic vectors that we discussed above.
To make a list, use the function list()
x <- list(1L, "a", 4, 6.87, TRUE, "hello")
x
class(x)
Below is an example of a list that contains another list. We'll insert the list created in the code above as the 3rd element of the list below, y
:
# Example of lists within a list
y <- list(4, "world", x, 3.67)
y
class(y)
Data frames are a very important data type in R. It’s pretty much the de facto data structure for most tabular data and what we use in the biology field and in plotting in general. Data frames can be created by hand, but most commonly they are generated when importing spreadsheets from your hard-drive or the web.
Remember that a "matrix" is a special type of "vector" with multiple rows and columns? Well, similarly, a Data Frame is a special type of list with multiple rows and column, where every element of the list has same length (i.e. data frame is a “rectangular” list). Same as lists, the elements of a Data frames can be of any type of R object (i.e. numeric
, integer
, character
, logical
, lists and other Data Frames).
To make a list, use the data.frame()
function:
x <- data.frame(id = 1:10, x = 11:20, y = 21:30)
x
class(x)
id | x | y |
---|---|---|
1 | 11 | 21 |
2 | 12 | 22 |
3 | 13 | 23 |
4 | 14 | 24 |
5 | 15 | 25 |
6 | 16 | 26 |
7 | 17 | 27 |
8 | 18 | 28 |
9 | 19 | 29 |
10 | 20 | 30 |
Factors are used to represent categorical data. Conceptually, factors are variables in R which take on a limited number of different values (e.g. "male" and "female"). One of the most important uses of factors is in statistical modeling; since categorical variables enter into statistical models differently than continuous variables, storing data as factors insures that the modeling functions will treat such data correctly.
Factors in R are stored as a vector of integer values with a corresponding set of character values to use when the factor is displayed. The factor
function is used to create a factor. The only required argument to factor is a vector of values which will be returned as a vector of factor values. Both numeric and character variables can be made into factors, but a factor's levels will always be character values. Factors represent a very efficient way to store character values, because each unique character value is stored only once, and the data itself is stored as a vector of integers.
Below is an example:
data = c("beluga", "dolphin", "narwhal", "dolphin", "dolphin", "narwhal", "beluga", "beluga")
fdata = factor(data)
fdata
You can see the possible levels for a factor through the levels()
function:
levels(fdata)
... and the number of levels using the nlevels()
function:
nlevels(fdata)
There are multiple ways to access or replace values inside data structures. The most common approach is to use “indexing”. This is also referred to as “slicing”.
Note that brackets [ ]
are used for indexing, whereas parentheses ( )
are used to call a function.
Consider the following vector x
and list y
:
x <- c(5, 2, 7, 4)
y <- list(6, "hello", FALSE, x, 3.67, "world", 8L)
To get the first element of vector x
:
x[1]
If the vector x
is graphically represented by the blue grid below... then x[1]
would access the red box:
To get the second element of list y
:
y[2]
If the list y
is graphically represented by the blue grid below... then y[2]
would access the red box:
Elements within a list are also lists! To get to the contents inside a "list element" you need to use double brackets [[ ]]
:
y[2]
class(y[2])
y[[2]]
class(y[[2]])
So, for example, y[1] + y[5]
will return an error, because you cannot "add" two lists. To add the two values you need to do:
y[[1]] + y[[5]]
To get a range of elements use the :
. For example, to get the first 3 elements of vector x
:
x[1:3]
If the vector x
is graphically represented by the blue grid below... then x[1:3]
would access the red box:
To get the 5th, 6th and 7th elements of list y
:
y[5:7]
If the list y
is graphically represented by the blue grid below... then y[5:7]
would access the red box:
To access a list within list, you have to first dive into the values of the first list, using double brackets [[ ]]
:
# Here is the list stored in element 3
y[4]
# Here is the 3rd element of the list stored inside element 3 of list 'y'
y[[4]][3]
If the list y
is graphically represented by the blue grid below... then y[[4]][3]
would access the red box:
If you need the last element of a vector or list, but don't know how long is the vector or list, use the function length()
:
# To retrieve the last element of x
x[length(x)]
# To retrieve the last 3 elements of x
x[(length(x)-2):length(x)]
Logical Indexing is used when you need to retrieve elements according to a logical expression. This only works with vectors and list that ONLY contain numeric or integer data types.
x[x > 3]
x[x <= 4]
Consider the following matrix m
:
m <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), nrow=4, ncol=3)
m
1 | 5 | 9 |
2 | 6 | 10 |
3 | 7 | 11 |
4 | 8 | 12 |
To get to a single number within the matrix, use a pair of indices where the first index is the row number and the second index is the column number:
m[1,2]
Rather than using pairs, you can also get single index. You can think of this index as a “cell number”. Cells are numbered column-wise (i.e., first the rows in the first column, then the second column, etc.). Thus,
m[6]
You can also get multiple values at once:
m[1:2,2:3]
5 | 9 |
6 | 10 |
To get a whole column, use a coma before the column index. The statement below read return all rows of column 1:
m[, 1]
To get a whole row, use a coma after the row index: The statement below read *return row 1... all columns:
m[1,]
Getting a whole row or column from a matrix returns a vector
:
class(m)
class(m[, 1])
Consider the following data frame d
:
d <- data.frame(id = 1:10, x = 11:20, y = 21:30)
d
id | x | y |
---|---|---|
1 | 11 | 21 |
2 | 12 | 22 |
3 | 13 | 23 |
4 | 14 | 24 |
5 | 15 | 25 |
6 | 16 | 26 |
7 | 17 | 27 |
8 | 18 | 28 |
9 | 19 | 29 |
10 | 20 | 30 |
You can extract a column by column number:
d[,2]
Here is an alternative way to address the column number in a data frame:
d[2]
x |
---|
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
Note that whereas [2]
would be the second element in a matrix, it refers to the second column in a data.frame
. This is because a data.frame
is a special kind of list and not a special kind of matrix.
You can also use the column name to get the values of a column:
d[,"x"]
In addition to d[,"x"]
above, you can also use the $
symbol.
Using the $
symbol is a very common practice to slice a column of a data.frame
:
d$x
d["x"]
x |
---|
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
Note that both d[,"x"]
and d$x
return a vector
. That is, the complexity of the data.frame
structure was dropped
. This does not happen when you do d["x"]
, where the outputs remains a data.frame
. Take a look:
class(d[,"x"]) # This drops the data.frame and returns a vector
class(d$x) # This drops the data.frame and returns a vector
class(d["x"]) # This preserves the data.frame
Why should you care about this drop
business? In many cases R functions want a specific data type, such as a matrix
or data.frame
and report an error if they get something else. One common situation is that you think you provide data of the right type, such as a data.frame
, but that in fact you are providing a vector, because the structure dropped.
Either way, you can use [ ]
with all three approaches to get to a specific value within a column:
d[["x"]][2]
d$x[2]
d$x[2:4]
Some parts of this lab where borrowed from:
Code below is for formatting of this lab. Do not alter!
cssFile <- '../css/custom.css'
IRdisplay::display_html(readChar(cssFile, file.info(cssFile)$size))
IRdisplay::display_html("<style>.Q::before {counter-increment: question_num;
content: 'QUESTION ' counter(question_num) ': '; white-space: pre; }.T::before {counter-increment: task_num;
content: 'Task ' counter(task_num) ': ';</style>")