LAB 4: Plotting and basic data exploration

BIO3782: Biologist's Toolkit (Dalhousie University)


Setup of workspace

Make sure all required files are in the working directory:

As in previous labs, we'll try simulate "real-life" coding, where:

  1. Sometimes you want to type of copy-paste directly in a .r file, using RSTudio's , to build a script file that can be run as a whole to accomplish a series of tasks. This file will allow you to save your work so that you can review it (or expand it) at a later date. In the you can execute sections as you type them in, by highlighting and clicking . We'll guide you when to use the by displaying the following before a "code" cell:



  1. Some other times you want to quickly do "one-off" queries or inspections that you do not want to be part of your script file. In this case, type or copy-paste code into RStudio's and click [enter]. We'll guide you when to use the by displaying the following before a "code" cell:


In this lab we will do the initial exploration of a dataset of animal species diversity and weights. The initial exploration involves:

  1. basic data exploration, and
  2. plotting the data

Basic data exploration

When you first get a dataset, there are some basic steps that should do before actively engaging in proper data analysis. These steps are:


There are many R functions designed to help you explore and understand your data. Some of the most commonly used are:

Reading the data into R

To get R to load the data from the file into memory we use the read.csv function.




First, make sure you have the data file surveys_complete.csv in your working directory.

Then,





View data contents

Head and tail functions

The easiest and fastest way to take a peek at your data is with the head() and tail() function.

head() returns the first few rows of the data frame or vector.

tail() returns the last few rows of the data frame or vector.

Lets try head() first:





Use the tail() function to take a look the last few rows of your data.





What is the last record_id of the data?

RStudio's spreadsheet-style data viewer

Another way to look at your data is by displaying it in RStudio. You can do that by:

  1. Using the view() function (e.g. type view(surveys) on RStudio's and click [enter])
  2. Go to the Environment Panel and double click on the variable you want to view (e.g. surveys)




Using either of the two methods above, take a look at surveys data in RStudio's spreadsheet-style data viewer.





What is the record_id of the 20th data row?

Explore data size and structure

You can use the function names() to take a look at the column "headers".


Note that names() returns a vector of character elements:


The function str() compactly display the internal structure of an R object.


The function dim() returns both, the number of rows and columns:


The function nrow() returns only the number of rows:


The function ncol() returns only the number of columns:




Use the ncol() function to figure out how many columns are in your data.





How many columns are in the "surveys" data?

Basic statistics

The summary() function returns a descriptive summary of each of the columns in a data frame. Note that "numeric" columns are described with statistics like min, max, mean, median, etc.; while "factor" columns are described with counts of the most common factors in the column.



Which are the 3 most abundant species in the "surveys" data?

The summary function no longer returns counts of the most abundant species_id. This question is invalid. I will fix it for next year. In the meantime, the correct answers are: DM, PP and DO.

Introduction to plotting

The ability to produce clear, informative graphics is among the most important skills a biologist can develop. Not only to effectively communicate findings to your audience, but also to quickly be able to visually explore your data (i.e. communicate findings to yourself).

In programming lingo, plotting can be divided into:

While high-level packages are a great innovation that help save time and can produce production-quality graphics, low-level skills are ultimately more powerful, putting no limits on what you can produce.

An example:

This is a somewhat complex plot to produce in anything but a base-level plotting package - each point, line, shade, and colour has been custom edited, making automation meaningless.

So if low-level graphics are the way to produce final, custom graphics, why learn anything else? Because final graphics are only one component of the analytical process useful to biologists; what comes first is data exploration, looking and thinking about complex data to see what the key patterns are and to consider things you may not have thought of before you designed the study. This figure (stolen from Sean Anderson's webiste) illustrates these trade-offs nicely:

Drawing

Here you can see that R base graphics is initially more time consuming than high level graphics in something like ggplot, and that base graphics generally don't scale well (in terms of your time) as data becomes more complex. As Anderson articulately states:

Good graphical displays of data require rapid iteration and lots of exploration. If it takes you hours to code a plot in base graphics, you're unlikely to throw it out and explore other ways of visualizing the data, and you're unlikely to explore all the dimensions of the data.

To get started, we will do some low-level R base package plotting, then will get into ggplot to see how to use both approaches.

Base R plotting

The 'base' package in R - that this, the functions that come pre-loaded with every installation - inludes all the low-level plotting functions that underpin how R produces graphics. High-level graphics packages manipulate these functions (under the hood) to produce graphics that guess at what 'good' should look like, with a minimum number of commands. But these underlying commands are important if you want to re-produce a good graphic.

NB: some R users advocate for creating your graphics in R and then manipulating them in something like Adobe Photoshop or Illustrator before submission. In some cases this is necessary (essential even) but in my experience of revision after revision, it is far better to do as much as possible in your plot scripts, because every time you revise an image you'll have to re-open your graphics program and do all the tweaks over again. So basic plotting...

Histograms

A histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a set of continuous data. In R we use the hist() function, where you can us a "slice" of your data frame as the input (review "slicing" - https://diego-ibarra.github.io/biol3782/week3/R_Basics.ipynb#Indexing-(slicing) ). For example, if you want to do the histogram of the weight column of your surveys data frame, you can do:


You can change the appearance of the graph using arguments. Take a look the "help file" of the hist() function by typing in the ?hist().

Below we made the same graph, but with red bars and custom-made axis labels:


To improve readability, it is common to split the arguments in many lines; particularly when using many arguments (more than can fit in one line of code). For example, to make the same graph above, you can do:



What class of object is surveys?


In the example above, in the line...
hist(surveys$weight, col = "red", breaks = 25, xlab="Weight (gr)", main = "Survey data")

What is hist?


In the example above, in the line...
hist(surveys$weight, col = "red", breaks = 25, xlab="Weight (gr)", main = "Survey data")

What is surveys$weight?


In the example above, in the line...
hist(surveys$weight, col = "red", breaks = 25, xlab="Weight (gr)", main = "Survey data")

What is col = "red"?


In the example above, in the line...
hist(surveys$weight, col = "red", breaks = 25, xlab="Weight (gr)", main = "Survey data")

What is breaks = 25?


In the example above, in the line...
hist(surveys$weight, col = "red", breaks = 25, xlab="Weight (gr)", main = "Survey data")

What is xlab="Weight (gr)"?


In the example above, in the line...
hist(surveys$weight, col = "red", breaks = 25, xlab="Weight (gr)", main = "Survey data")

What is main = "Survey data"?


When we just need to take a quick look at the data, we do not need fancy colours or axis labels. However, often we may need to add more bars, to take a "finer" look at the pattern in frequency distribution:


If we want to only see the data of one genus, we can further slice our data using a conditional statement. For example:

surveys$weight[surveys$genus == "Dipodomys"]

This does not return all of the elements in column weight of data frame surveys, but only returns the columns matching the following statement surveys$genus == "Dipodomys", that is all row with the genus 'Dipodomys'. Take a moment to make sure you understand the example below:



Take a look at the histogram above. What kind of distribution best describes the frequency distribution of Dipodomys genus?

We can plot another genus and accurate compare it with the above plot as long as we keep the range of the x axis the same in both plots; that is, both have the argument xlim = c(0,250):



Take a look at the histogram above. What kind of distribution best describes the frequency distribution of Chaetodipus genus?




Plot a histogram displaying the frequency distribution of Onychomys genus.





Take a look at the histogram above. What kind of distribution best describes the frequency distribution of Onychomys genus?

Boxplot

A boxplot — also called a box and whisker plot — displays the five-number summary of a set of data. The five-number summary is the minimum, first quartile, median, third quartile, and maximum. In a boxplot, we draw a box from the first quartile to the third quartile. In base R, the function to make boxplots is boxplot():


We can use the subset() function to make a subset data.frame that only includes a few genera:


Take a look at the help file of boxplot() and answer the following question:



After taking a look at the "help file" of boxplot, what does the argument range do?

Scatterplots

A scatterplot is a type of data display that shows the relationship between two numerical variables. Each member of the dataset gets plotted as a point whose x-y coordinates relates to its values for the two variables. To make a scatter plot, use the built-in function plot():


For more complicated scatterplots (actually, this applies to all base R graphs), you can (1) make a plot, then (2) ad points, lines, etc. to the plot.

Take a look at the example below. You can see that it takes many lines of code to color-code 3 genera. Remember this when we do a similar plot below using ggplot where a better looking color-coded plot can be done with just 3 lines of code. This example shows the differences between low-level and high-level plotting.





From here on, you will need to make some plots and submit your code. Besides saving your code in your my_plots.r file, you will also need to copy-paste your code to the Brightspace quiz.


Note that, as you ad code to make more and more plots in your my_plots.r file, you should also make sure to include any code you need to subset or filter your data before the code to make a plot needing that subset/filter data.





1) Using the subset function from the boxplot example, create a data.frame that only includes the "DM" species_id
2) Create a scatterplot of weight vs hindfoot_length
3) Make markers "red"
4) Make x and y axis labels that include units (use graph above as example)
5) In RStudio's Plots panel, click on "Export", save your plot as Image, and upload the image to this Brightspace question
6) Copy-paste code: "Copy" the code you used to make the plot, "paste" code to this Brightspace question

Heatmaps and countour plots

Heatmaps are a colored representation of a 2-D matrix of data. They are essential when plotting maps with topography (like below) and any other spatial data.

Our surveys data has no spatial information and thus it is not suited for contour plotting. Luckily R comes with a spatial dataset (called volcano) to use in examples of this nature. Take a look at the plot below:


Plotting with ggplot2

ggplot2 is a plotting package that makes it simple to create complex plots from data in a data frame. It provides a more programmatic interface for specifying what variables to plot, how they are displayed, and general visual properties. Therefore, we only need minimal changes if the underlying data change or if we decide to change from a bar plot to a scatterplot. This helps in creating high quality plots with minimal amounts of adjustments and tweaking... saving a lot of time!

Here are a few usefull links about ggplot:

Also, take a look at some galleries to see exampleas and get ideas:

The basics

Before we start, we need the ggplot package. You probably already have it installed, but if not, type the following in the :


ggplot2 functions like data in the 'long' format, i.e., a column for every dimension, and a row for every observation. Well-structured data will save you lots of time when making figures with ggplot2

ggplot graphics are built step by step by adding new elements. Adding layers in this fashion allows for extensive flexibility and customization of plots.

To build a ggplot, we will use the following basic template that can be used for different types of plots:

ggplot(data = <DATA>, mapping = aes(<MAPPINGS>)) + <GEOM_FUNCTION>() + <FACET_GRID>()

where:

  1. ggplot(): start an object and specify the data

  2. aes(): specifies the "aesthetic" elements; a legend is automatically created

  3. geom_point(): specifies the type of plot we want (e.g. geom_point = scatter plot); a "type" is called a "geom"

  4. facet_grid(): specifies the "faceting" or panel layout

There are also statistics, scales, and annotation options, among others. At a minimum, you must specify the data, some aesthetics, and a geom. Faceting is useful when you want to create the same figure using subsets of your data.


Lets do an example:


Nice! One line of code to make a pretty good looking graph.

To better understand, how ggplot works, lets break the one line of code into its three main elements:

First, lets ONLY use the ggplot() function and bind the plot to a specific data frame using the data argument. In this example, we on purpose will NOT specify aesthetic nor a geom:


As you can see, this makes a plot, but it does not puts anything inside it.

Aesthetics

Aesthetics refer to the attributes of the data you want to display. They map the data to an attribute (such as the size or shape of a symbol) and generate an appropriate legend. Aesthetics are specified with the aes() function.

As an example, the aesthetics available for geom_point() are: x, y, alpha, colour, fill, shape, and size. Read the help files to see the aesthetic options for the geom you’re using. They’re generally self explanatory. Aesthetics can be specified within the main ggplot() function or within a geom(). If they’re specified within the main ggplot() function then they apply to all geoms you specify.

Note the important difference between specifying characteristics like colour and shape inside or outside the aes() function: those inside the aes() function are assigned the colour or shape automatically based on the data.

Lets continue with our piece-by-piece example, this time we will initially the plot and bind it to specific data frame using the ggplot() function (just a s above), but here we will also define an aesthetic mapping. However, in this example, we on purpose will NOT specify a geom:


Here you can see that the plot was made, and also the x and y labels were automatically made as well as data ranges and grid lines.

Geoms

geoms in ggplot refers to geometric objects, or what we typically think of in a plot. 'geoms' specify graphical representations of the data in the plot (points, lines, bars). ggplot2 offers many different geoms

Here are some common ones...

Here are others...

To add a geom to the plot, use + operator. Because we have two continuous variables, let's use geom_point() first. Note that here we - finally - initialize the plot, define an aesthetic mapping AND specify a geom:


The + in the ggplot2 package is particularly useful because it allows you to modify existing ggplot objects. This means you can easily set up plot "templates" and conveniently explore different types of plots, so the above plot can also be generated with code like this:


Note that the newly made variable, surveys_plot, is a different kind of R-object, it is ggplot object!


Notes

# This is the correct syntax for adding layers
surveys_plot +
  geom_point()

# This will not add the new layer and will return an error message
surveys_plot
  + geom_point()

In the examples above, in the line...
library(ggplot2)

What is ggplot2?


In the examples above, in the line...
ggplot(data = surveys, aes(x = weight, y = hindfoot_length)) + geom_point()

What is ggplot?


In the examples above, in the line...
ggplot(data = surveys, aes(x = weight, y = hindfoot_length)) + geom_point()

What is data = surveys?


In the examples above, in the line...
ggplot(data = surveys, aes(x = weight, y = hindfoot_length)) + geom_point()

What is geom_point?


In the examples above, in the line...
ggplot(data = surveys, aes(x = weight, y = hindfoot_length)) + geom_point()

What is aes(x = weight, y = hindfoot_length))?


In the examples above, in the line...
surveys_plot = ggplot(data = surveys, mapping = aes(x = weight, y = hindfoot_length))

What is surveys_plot?


In the examples above, in the line...
surveys_plot = ggplot(data = surveys, mapping = aes(x = weight, y = hindfoot_length))

What type of object is surveys_plot?

Building your plots iteratively (scatterplot)

Building plots with ggplot2 is typically an iterative process. That is, you make a plot, take a look to the resulting plot, make some improvements to the code and make plot again, take a look and start again. We will follow this iterative process of plot-building in the next few sections.

First, we start by defining the dataset we'll use, lay out the axes, and choose a geom:


Transparency

Then, we start modifying this plot to extract more information from it. For instance, we can add transparency (alpha) to avoid overplotting:


Color

We can also add colors for all the points:


Or to color each species in the plot differently, you could use a vector as an input to the argument color. ggplot2 will provide a different color corresponding to different values in the vector. Here is an example where we color with species_id:


Remember that when we used the base R plotting functions we had to write 16 lines of code to add 3 colors to our plot. Here, in ggplot we only needed 2 lines of code to quickly color-code all the species. This is the magic of high-level plotting.


We can change the color palette used to color the markers. There are many ways to do this, we could do a whole lab to talk about color. Here we will quickly review one way, where we use the package RColorBrewer.

The palettes available is RColorBrewer are:

In the example below we use the palette Dark2. Note that you may have to install and load the RColorBrewer package.


Labeling

There are 2 methods to replace the automatically-generated x- and y-axis labels, and/or add a plot title:

  1. You can use the xlab() and ylab() functions for the x- and y-axis labels, and the ggtitle() function for the plot title:


  1. ...or you can do all three labels with on function, labs():


Hexagon 2-D histogram

Scatter plots can be useful exploratory tools for small datasets. For data sets with large numbers of observations, such as the surveys_complete data set, overplotting of points can be a limitation of scatter plots. One strategy for handling such settings is to use hexagonal binning of observations. The plot space is tessellated into hexagons. Each hexagon is assigned a color based on the number of observations that fall within its boundaries. To use hexagonal binning with ggplot2, first install the R package hexbin from CRAN


Then,


Then use the geom_hex() function:


As you can see in the plot above, most of the point in the scatter plot are concentrated in a few spots, mainly around weight = 50 and hindfoot_length = 45.

Smooth trend line

To add a smooth trend line use geom_smooth(). This will not only add a "smoothed" version of your data, but it will also add confidence intervals around it. Lets do an example only for the "SH" species_id. We will use again the subset() function that we used earlier:


When answering the question below, you should also save your code in your my_plots.r file.


1) Using the subset function from the example above, create a data.frame that only includes the "DM" species_id
2) Create a ggplot scatterplot of weight vs hindfoot_length
3) Make markers "red"
4) Make x and y axis labels that include units (use graph from "labeling" section as example)
5) Make markers transparent (alpha = 0.5)
6) Include a smoothed trend line with confidence intervals
7) In RStudio's Plots panel, click on "Export", save your plot as Image, and upload the image to this Brightspace question
6) Copy-paste code: "Copy" the code you used to make the plot, "paste" code to this Brightspace question

Boxplot

We can use boxplots to visualize the distribution of weight within each species:


To make a "less cluttered" plot, we can do a boxplot of only the species that we isolated earlier with the subset() function (i.e. DM, DO and PP).


Jitter

By adding points to the boxplot, we can have a better idea of the number of measurements and of their distribution:


Notice how the boxplot layer is behind the jitter layer? What do you need to change in the code to put the boxplot in front of the points such that it's not hidden? ...





...that's correct! You just need change the order of the geoms. First write the jitter geom, then the boxplot geom:



In the examples above, in the line...
ggplot(data = surveys, mapping = aes(x = species_id, y = weight)) +
       geom_boxplot(alpha = 0) +
       geom_jitter(alpha = 0.1, color = "tomato")

What is ggplot?


In the examples above, in the line...
ggplot(data = surveys, mapping = aes(x = species_id, y = weight)) +
       geom_boxplot(alpha = 0) +
       geom_jitter(alpha = 0.1, color = "tomato")

What is geom_boxplot?


In the examples above, in the line...
ggplot(data = surveys, mapping = aes(x = species_id, y = weight)) +
       geom_boxplot(alpha = 0) +
       geom_jitter(alpha = 0.1, color = "tomato")

What is geom_jitter?


In the examples above, in the line...
ggplot(data = surveys, mapping = aes(x = species_id, y = weight)) +
       geom_boxplot(alpha = 0) +
       geom_jitter(alpha = 0.1, color = "tomato")

What is mapping = aes(x = species_id, y = weight)?


In the examples above, in the line...
ggplot(data = surveys, mapping = aes(x = species_id, y = weight)) +
       geom_boxplot(alpha = 0) +
       geom_jitter(alpha = 0.1, color = "tomato")

What is alpha = 0?


In the examples above, in the line...
ggplot(data = surveys, mapping = aes(x = species_id, y = weight)) +
       geom_boxplot(alpha = 0) +
       geom_jitter(alpha = 0.1, color = "tomato")

What is color = "tomato"?

Violin plot

Violin plots show the entire distribution of the data. This is of interest, especially when dealing with multimodal data, i.e., a distribution with more than one peak.


As in other plots above, here we can also change the color or the violins and their border:


...and we can superimpose many graphs. Here we plot the violin plot and then on top we draw a boxplot:


When answering the question below, you should also save your code in your my_plots.r file.


You need to replicate the plot below... 1) Using the subset function from the example above, create a data.frame that only includes the "OL" AND "OT" species_id
2) Create a ggplot violin plot with a boxplot superimposed
3) For the violin plot, make the "trim" FALSE, make the "fill" to be blue, the "color" to be darkblue, and the transparency to be 0.4
4) For the boxplot, make the "width" 0.1, the "notch" TRUE, "outlier.colour" red, "outlier.fill" darkred, transparency 0.2, and "outlier.size" 3
4) Make y axis label to be "weight (gr)"
5) Make the plots title to be "Violin/Scatterplot made by My_Name", where you replace"My_Name" with your own name 7) In RStudio's Plots panel, click on "Export", save your plot as Image, and upload the image to this Brightspace question
6) Copy-paste code: "Copy" the code you used to make the plot, "paste" code to this Brightspace question

Target plot for question above

Plotting time-series data

First, we are going to cheat a bit; we'll do a bit of data manipulation that we are not supposed to learn until the next lab. Here we won't go over any details. For now, the only thing you need to know is that you need to have the tideverse library installed. You probably already have it installed, but if not, type the following in the :


The code below counts the number of entries in each year, for every genus. We'll explain more on this next lab.


You should also save the code above in your my_plots.r file. When you run code in your file do plots using the yearly_counts variable, it will return an error if your yearly_counts variable has not been made yet.

Time-series data can be visualized as a line plot with years on the x-axis and counts on the y-axis:


Unfortunately, this does not work because we plotted data for all the genera together. We need to tell ggplot to draw a line for each genus by modifying the aesthetic function to include group = genus:


We will be able to distinguish species in the plot if we add colors (using color also automatically groups the data):


To change the line type use the argument linetype. The options are shown below:

In the example below, we use longdash:


When answering the question below, you should also save your code in your my_plots.r file.


You need to replicate the plot below...
Essentially you need to replicate the plot above, but with thicker lines (solid)... and with your name in the plot's title. You need to use the "help files" and Google to figure out how to do it. We have not reviewed in this Lab how to change line width, this is a "real-life" research problem.
1) Make the plots title to be "Time-series made by My_Name", where you replace"My_Name" with your own name
2) In RStudio's Plots panel, click on "Export", save your plot as Image, and upload the image to this Brightspace question
3) Copy-paste code: "Copy" the code you used to make the plot, "paste" code to this Brightspace question


Target plot for question above:

Faceting

ggplot has a special technique called faceting that allows the user to split one plot into multiple plots based on a factor included in the dataset. Here we will use the facet_wrap() function to make a time series plot for each species:


Now we would like to split the line in each plot by the sex of each individual measured. To do that we need to make counts in the data frame grouped by year, genus, and sex. Again, we will learn the details on how to do this next Lab, for now we just need to generate data to use in out plots.


You should also save the code above in your my_plots.r file. When you run code in your file do plots using the yearly_sex_counts variable, R will return an error if your yearly_sex_counts variable has not been made yet.

We can now make the faceted plot by splitting further by sex using color (within a single plot):


We can use the facet_grid() function to facet both by sex and genus in a "grid" fashion:


You can also organise the panels only by rows (or only by columns):


When answering the question below, you should also save your code in your my_plots.r file.


You need to replicate the plot below...

1) Make the plots title to be "Facet plot made by My_Name", where you replace"My_Name" with your own name
2) In RStudio's Plots panel, click on "Export", save your plot as Image, and upload the image to this Brightspace question
3) Copy-paste code: "Copy" the code you used to make the plot, "paste" code to this Brightspace question


Target plot for question above:

ggplot2 themes

Usually plots with white background look more readable when printed. Every single component of a ggplot graph can be customized using the generic theme() function, as we will see below. However, there are pre-loaded themes available that change the overall appearance of the graph without much effort.

For example, we can change our previous graph to have a simpler white background using the theme_bw() function:


In addition to theme_bw(), which changes the plot background to white, ggplot2 comes with several other themes which can be useful to quickly change the look of your visualization. The complete list of themes is available at https://ggplot2.tidyverse.org/reference/ggtheme.html. theme_minimal() and theme_light() are popular, and theme_void() can be useful as a starting point to create a new hand-crafted theme.

The ggthemes package provides a wide variety of options.

Let's do one more theme. Let's do theme_dark():


When answering the question below, you should also save your code in your my_plots.r file.


1) Replicate the plot above but with a different them that is NOT theme_bw() or theme_dark().
2) Make the plots title to be "Themed plot made by My_Name", where you replace"My_Name" with your own name
3) In RStudio's Plots panel, click on "Export", save your plot as Image, and upload the image to this Brightspace question
4) Copy-paste code: "Copy" the code you used to make the plot, "paste" code to this Brightspace question


Customization

Take a look at the ggplot2 cheat sheet, and think of ways you could improve the plot.

Now, let's change names of axes to something more informative than 'year' and 'n' and add a title to the figure:


The axes have more informative names, but their readability can be improved by increasing the font size. This can be done with the generic theme() function:


Note that it is also possible to change the fonts of your plots. If you are on Windows, you may have to install the extrafont package, and follow the instructions included in the README for this package.

After our manipulations, you may notice that the values on the x-axis are still not properly readable. Let's change the orientation of the labels and adjust them vertically and horizontally so they don't overlap. You can use a 90 degree angle, or experiment to find the appropriate angle for diagonally oriented labels. We can also modify the facet label text (strip.text) to italicize the genus names:


If you like the changes you created better than the default theme, you can save them as an object to be able to easily apply them to other plots you may create:


Arranging plots

Faceting is a great tool for splitting one plot into multiple plots, but sometimes you may want to produce a single figure that contains multiple plots using different variables or even different data frames. The patchwork package allows us to combine separate ggplots into a single figure while keeping everything aligned properly. Like most R packages, we can install patchwork from CRAN, the R package repository:


After you have loaded the patchwork package you can use + to place plots next to each other, / to arrange them vertically, and plot_layout() to determine how much space each plot uses:


You can also use parentheses ( ) to create more complex layouts. There are many useful examples on the patchwork website

Figure dimensions

Not the most exciting topic you might say but having control of the output size of your figures is a huge deal if you're trying to get published in a tabloid journal, where dimensions are VERY SPECIFIC:http://www.sciencemag.org/authors/instructions-preparing-initial-manuscript (BTW they even have specific $\LaTeX$ instructions). In any case, it matters.



ParameterValueDescription
din,fin,pin=c(width,height)Dimensions (width and height) of the device, figure and plotting regions (in inches)
fig=c(left,right,bottom,top)Coordinates of the figure region within the device. Coordinates expressed as a fraction of the device region.
mai,mar=c(bottom,left,top,right)Size of each of the four figure margins in inches and lines of text (relative to current font size).
mfg=c(row,column)Position of the currently active figure within a grid of figures defined by either mfcol or mfrow.
mfcol,mfrow=c(rows,columns)Number of rows and columns in a multi-figure grid.
new=TRUE or =FALSEIndicates whether to treat the current figure region as a new frame (and thus begin a new plot over the top of the previous plot (TRUE) or to allow a new high level plotting function to clear the figure region first (FALSE).
oma,omd,omi=c(bottom,left,top,right)Size of each of the four outer margins in lines of text (relative to current font size), inches and as a fraction of the device region dimensions
plt=c(left,right,bottom,top)Coordinates of the plotting region expressed as a fraction of the device region.
pty="s" or "m"Type of plotting region within the figure region. Is the plotting region a square (="s") or is it maximized (="m") to fit within the shape of the figure region.
usr=c(left,right,bottom,top)Coordinates of the plotting region corresponding to the axes limits of the plot.

NB par() is the function you need to use the parameters above.

From this table and the figures above you can see which dimensions affect which attributes of figure output, with particular emphasis on the fact that there are options for output in inches and output in relative dimensions. If you're exporting a file for publication, USE INCHES. The reason is illustrated here:

Relative dimensions Drawing


Inches Drawing

These two figures have the same data, with the same point sizes but the one on top is relative while the one below is in inches. Note that markers in the zoomed panels in the upper figure were resized and look too small in the final product. Deep stuff, but a major deal if you're publishing a paper with one common legend.

Exporting plots

R has several default export options, depending on what kind of file you'd like to create:

After creating your plot, you can save it to a file in your favorite format. The Export tab in the Plot pane in RStudio will save your plots at low resolution, which will not be accepted by many journals and will not scale well for posters. The ggplot2 extensions website provides a list of packages that extend the capabilities of ggplot2, including additional themes.

Instead, use the ggsave() function, which allows you easily change the dimension and resolution of your plot by adjusting the appropriate arguments (width, height and dpi).

Below is the code to make a plot (note that we saved the plot to my_plot variable).


Now lets save the my_plot plot:


Here is the saved result, with different size and resolution than the version that gets rendered in RStudio plot panel:

Note: The parameters width and height also determine the font size in the saved plot.

When answering the question below, you should also save your code in your my_plots.r file.


1) Pick any of the plot examples above.
2) Re-draw the plot with some minor changes (e.g. different line width or color or transparency)
3) Make the plots title to be "Final plot made by My_Name", where you replace "My_Name" with your own name
4) Use the ggsave() function to save your plot at high resolution (dpi=700), and upload the image to this Brightspace question
5) Copy-paste code: "Copy" the code you used to make the plot, "paste" code to this Brightspace question


This is the end of lab


Code below is for formatting of this lab. Do not alter!