LAB 11: Scientific reports with Rmarkdown

BIO3782: Biologist's Toolkit (Dalhousie University)


Setup of workspace

Make sure all required files are in the working directory:

As in previous labs, we'll try simulate "real-life" coding, by using the tags below to indicate when to use RStudio's or






Data analysis reports

A "data analysis report" is a document that includes text, graphs, equations, and code. Data analysts tend to write a lot of reports, describing their analyses and results for their collaborators, or to document their work for future reference.

Many new users begin by first writing a single R script containing all of their work, and then share the analysis by emailing the script and various graphs as attachments. But this can be cumbersome, requiring a lengthy discussion to explain which attachment was which result.

Writing formal reports with MS Word or LaTeX can simplify this process by incorporating both the analysis report and output graphs into a single document. But tweaking formatting to make figures look correct and fixing obnoxious page breaks can be tedious and lead to a lengthy “whack-a-mole” game of fixing new mistakes resulting from a single formatting change.

Creating a web page (as an html file) using RMarkdown makes things easier. The report can be one long stream, so tall figures that wouldn’t ordinarily fit on one page can be kept at full size and easier to read, since the reader can simply keep scrolling. Additionally, the formatting of and RMarkdown document is simple and easy to modify, allowing you to spend more time on your analyses instead of writing reports.

LaTex

Computer programmer Leslie Lamport created LaTeX in the 1980's as a typesetting language that - in stark contrast to the WYSIWYG fussing of MS Word - separates the design and layout of documents from the writing itself. This separation of content and presentation design philosophy allows you to write stuff quickly without any care as to how it looks until the end. LaTeX is popular among physicists and statisticians, in part because LaTeX documents are ideal for version control (e.g. via Git), but also because LaTeX does an excellent job rendering equations relatively quickly. So well in fact that Word now accepts LaTeX notation in its equation editor, and it is available in R-Markdown as well to typeset equations within the text body. Check out this LaTeX cheat sheet for some useful commands and syntax.

Rmarkdown and Knitr

Analysis reports made with Rmarkdown/Knitr are reproducible documents: If an error is discovered, or if some additional subjects are added to the data, you can just re-compile the report and get the new or corrected results rather than having to reconstruct figures, paste them into a Word document, and hand-edit various detailed results.

The key R package here is knitr. It allows you to create a document that is a mixture of text and chunks of code. When the document is processed by knitr, chunks of code will be executed, and graphs or other results will be inserted into the final document.

knitr allows you to mix basically any type of text with code from different programming languages, but we recommend that you use RMarkdown, which mixes Markdown with R. Markdown is one of many light-weight markup language for creating web pages. Other light-weight markup languages include reStructuredText (used in Python), MediaWiki (used in Wikipedia), and many others.

Here is a useful Rmarkdown cheat sheet.

To see the power of Rmarkdown, here is an undergraduate dissertation written using R.




Create an Rmarkdown file

To create a new RMarkdown file (.Rmd), select File -> New File -> R Markdown in RStudio, then choose the file type you want to create. For now we will focus on a .html Document, which can be easily converted to other file types later. Enter a Title (Lab11) and Author Name (your name). Then click OK. Save the file using the following format: Lab11.rmd,

: The document title is not the same as the file name.

The newly created .Rmd file comes with basic instructions, but we want to create our own RMarkdown script, so go ahead and delete everything in the example file.




Knitting (i.e. Rendering) a .Rmd file

The .Rmd file that you created contains the instructions to make a beautiful website (i.e. an .html file that can be seen using a web browser). The process of rendering an .html from the instructions contained in the .Rmd is called "knitting". To "knit", click on the Knit icon and then select Knit to HTML to create your html file.

Take a look at the .html file produced by knitr. Your output should won't look like much at this point. In the following section we will be adding text, code and graphs to your .Rmd file, which then you can "knit" into a incrementally more complex website.

Note that you can also knit to PDS and to a Word file.

Structure of an Rmarkdown file

There are three parts to an .Rmd file:

  1. Header: The text at the top of the document, written in YAML format.
  2. Markdown sections: Text that describes your workflow written using markdown syntax.
  3. Code chunks: Chunks of R code that can be run and also can be rendered using knitr to an output document.

YAML Header

An R Markdown file always starts with a header written using YAML syntax. This header is sometimes referred to as the front matter.

There are four default elements in the RStudio YAML header:

You can also specify more complicated YAML options for citation and document styles like in the example below

Let's edit our header by specifying the reference bibtex list, setting font style and size, and setting Rmarkdown to use the current date and time that the document is created. You can check out this guide for more info.




Markdown syntax

Markdown is a human readable syntax for formatting text documents. Markdown can be used to produce nicely formatted documents including pdfs, web pages and more. When you format text using markdown in a document, it is similar to using the format tools (bold, heading 1, heading 2, etc) in a word processing tool like Microsoft Word or Google Docs.

An R Markdown file can contain text written using the markdown syntax. Markdown text, can be whatever you want. It may describe the data that you are using, how it’s being processed and what the outputs are. You may even add some text that interprets or discusses the outputs.

When you render your document to html, this markdown will appear as text on the output html document.

Below we explain the basic markdown syntax, however it is a good idea for you to a look at the following website to get a very well crafted overview basic markdown syntax: https://www.markdownguide.org/basic-syntax/

Markdown is simple plain text, that is styled using special characters, including:

When you type text in a markdown document with not additional syntax, the text will appear as paragraph text. You can add additional syntax to that text to format it in different ways.

For example, if we want to highlight a function or some code within a plain text paragraph, we can use one backtick on each side of the text (').

To add emphasis to other text you can use bold or italics.




Your output would look like this in html

Click on the Knit icon to create your html file.

Take a look at the .html file produced by knitr. Your output should look like this after knitting.

You can also add the following:




You can also add in-text citations. When you knit your file, Rmarkdown will automatically generate a references section at the end of your document.




Again, click on the Knit icon and re-create your html file. Your html output should look like this:



Section Headings

We create a heading using the pound (#) sign. For the headers to render properly there must be a space between the # and the header text. We can create subheading by adding more pound signs. For example:




The output (after knitting) should look like this:



Code Chunks

Code chunks in an R Markdown document contain your R code. All code chunks start and end with three backticks or graves. A code chunk would look like this:



The first line: {r setup} contains the language (r) in this case, and the name of the chunk. Specifying the language is mandatory. Next to the {r}, there is a chunk name. The chunk name is not necessarily required however, it is good practice to give each chunk a unique name to support more advanced knitting approaches.

You can add new chunks by clicking on the Insert icon.



Code Chunk options

You can add options to each code chunk. These options allow you to customize how or if you want code to be processed or appear on the rendered output (pdf document, html document, etc). Code chunk options are added on the first line of a code chunk after the name, within the curly brackets.

3 Common Chunk Options:

Multiple code chunk options can be used for the same chunk. Below is a table with more code chunk options.

Inserting figures

By default, RMarkdown will place graphs by maximising their height, while keeping them within the margins of the page and maintaining aspect ratio. If you have a particularly tall figure, this can mean a really huge graph. In the following example we modify the dimensions of the figure we created above. To manually set the figure dimensions, you can insert an instruction into the curly braces:




Your output should look like this after you knit the document.

Inserting Tables

Standard Rmarkdown

Rmarkdown can print the contents of a data frame easily by enclosing the name of the data frame in a code chunk.




Although complete, this might not be the best way to display data. Including a formal table requires more effort.

kable() and knitr

The most aesthetically pleasing and simple table formatting function I have found is kable() in the knitr package. The first argument tells kable to make a table out of the object dataframe and that numbers should have two significant figures.




pander()

If you want a bit more control over the content of your table you can use pander() in the pander package. Imagine we want the 3rd column to appear in italics:




Your Rmarkdown output will look different from your html. The html output for the code above is:

Data exploration and analysis exercise

Now that you have a basic handle on Rmarkdown, let's take a look at some biological data from the NBN Gateway

First, let's create a chunk that will read in out data.




Next, let's add some information about the dataset.




We can also examine species richness across groups.




Your output should look like this:

Let's analyze the data graphically.




Your output should look like this:

What would the most common species in each taxonomic group be?




Your output should look like this:

Final report

Once you do a final "kitting" of your Lab11.Rmd file, take a look at the resulting Lab11.html file, it should look somewhat like this: https://diego-ibarra.github.io/biol3782/week11/sample/

Congratulations! You just made your first Rmarkdown report!



What are three parts of a .Rmd file?



What are the default components of a YAML header?



What format would the output argument in the header be to create a word document?



TRUE or FALSE: The title of your markdown document and file name should be the same



What is the output of Markdown text?



What symbol would you use to italicize text?



What symbol would you use to highlight text?



What symbol would you use to change text face to bold?



What language/syntax would you use to add equations?



What symbol would you use to add only the date after an author name in in-text citations?



What symbols would you use to add in-text citations?



What code syntax symbol separates references when creating a multi-reference in-text citation (i.e. (Wickham, 2011, Wicham, 2012))?



What would I use to create a heading?



TRUE or FALSE: An empty space is needed after a pound symbol to specify a new heading level



What does a code chunk contain?



Do I always need a chunk name?



What is the benefit of having a chunk name?



What icon do you use to add a new chunk?



How do we note the beginning and end of a YAML header?



What does eval=FALSE do in a chunk?



What chunk option would I use to stop my code from displaying in the output?



Would results=hide stop my R code from running?



What chunk option specifies figure height?



What chunk option stops error messages from being displayed?



What does the warnings chunk option specify?



Will the Rstudio output and html output look exactly the same?



Can you knit to formats other than html?



What YAML option specifies the bibliography file?



What YAML option specifies the citation style (i.e. MLA, APA)?



What does kable() do?

This is the end of lab


Code below is for formatting of this lab. Do not alter!