LAB 7: Flow control and Functions Explained

BIO3782: Biologist's Toolkit (Dalhousie University)


Setup of workspace

Make sure the required files are in the working directory:

As in previous labs, we'll try simulate "real-life" coding, by using the tags below to indicate when to use RStudio's and when to use the :







Flow Control

So far we have written R scripts that are linear; that is, statments are executed in a sequential manner like the diagram below:



However, there are ways to execute statements using "bifurcations" and "go-back" directives that allow the programmer to control the flow of statement execution. Below we explore some of these ways of flow control.

IF statements

IF statements, also known as conditional statements or conditional expressions, allow programming languages to perform different computations or actions depending on whether a programmer-specified Boolean condition evaluates to TRUE or FALSE.



In the example shown in the diagram above, when R evaluates the Condition, if the result is TRUE, then the statement on the left is executed. On the other hand, if the result is FALSE, then the statement on the right is executed.

There are many types of IF statements. Below we review a few:

IF

In the simplest type of IF statement, R will execute a statement (or set of statements) if the evaluated condition is TRUE (as always). However, if the condition result is FALSE, R will simply do nothing. Take a look at the diagram below:



The syntax of this simplest case of IF statement is:

if (condition is true) {
  perform action
}



First let's do an example where the condition is TRUE. Note that first we will create a variable x, and then we will assess its value with the IF statement:


As you can see, the condition (x >= 10) was TRUE, thus the statement print("x is greater than or equal to 10") was executed.

You probably evaluated the condition (x >= 10) in you head and know that the result is TRUE. However, if you have a more complex conditional, you can simply copy-paste the condition into the and let R do the evaluation:


Now let's try give a new value to x where the condition will result FALSE:


As you can see, the condition (x >= 10) was FALSE, therefore R simply did nothing!


You want to see if y is negative. Which of the following will do that?

IF ... ELSE

In the case where you want R to execute different statements depending on the result of the conditional, you can use "IF ... ELSE" block. Take a look how it works:

The syntax of an "IF ... ELSE" block is:



if (condition is true) {
  perform action
} else { # that is, if the condition is false,
  perform alternative action
}



Take a look at the example below:


As you can see, if the condition is FALSE the ELSE statement is executed... instead of simply "do nothing", like in the example above (i.e. IF example).

If the condition is TRUE, the behavior is the same as with the "simple IF statement".


IF ... ELSE IF ... ELSE

If you want to "daisy chain" multiple conditions, you can use one (or more) else if statements. Take a look at the diagram below:





You can see that if the first condition is FALSE, then the flow is diverted to the second condition. If that result is FALSE, then the flow will be diverted to the third condition, and so on. Note that you can have many else if statements chained together as you want. This type of block ALWAYS must start with an if statement. However, it may or may not finish with an else statement depending on whether you want the last action to be "do nothing".

The syntax of an "IF ... ELSE IF ... ELSE" block is:



if (condition is true) {
  perform action
} else if (condition is true) { # that is, if the condition is false,
  perform alternative action
} else { # that is, if the condition is false,
  perform alternative action
}


Take a look at the example below:


In this case (x >= 10) was FALSE, thus the second condition (x > 5)was evaluated and returned TRUE.

Let's try a case where all conditions in the block are FALSE:


Finally, for completeness, let's try an example where the first condition is TRUE:



If you need to round "numeric" variables, but do nothing if the variable ase not numeric, you should use:


If you need to round up "numeric" variables greater than 10, AND round down "numeric" variables smaller or equal to 10, you should use:


If you need to round up "numeric" variables greater than 10, AND round down "numeric" variables smaller than 10, AND do nothing with "numeric" variables equal to 10,you should use:

FOR Loops

If you want to iterate over a set of values, when the order of iteration is important, and perform the same operation on each, a for() loop will do the job. We saw for() loops in the shell lessons earlier. This is the most flexible of looping operations, but therefore also the hardest to use correctly. In general, the advice of many R users would be to learn about for() loops, but to avoid using for() loops unless the order of iteration is important: i.e. the calculation at each iteration depends on the results of previous iterations. If the order of iteration is not important, then you should learn about vectorized alternatives, such as the purr package, as they pay off in computational efficiency.

The basic structure of a for() loop is:

for (iterator in set of values) {
  do a thing
}


For example:


The 1:10 bit creates a vector on the fly; you can iterate over any other vector as well.

We can use a for() loop nested within another for() loop to iterate over two things at once.


We notice in the output that when the first index (i) is set to 1, the second index (j) iterates through its full set of indices. Once the indices of j have been iterated through, then i is incremented. This process continues until the last index has been used for each for() loop.

Rather than printing the results, we could write the loop output to a new object.


This approach can be useful, but ‘growing your results’ (building the result object incrementally) is computationally inefficient, where long FOR loops can slow down to a crawl. To avoid this, pre-make empty objects with the correct dimensions and let the calculations in the FOR loop to "fill up" or update the values in your pre-existing object. Let's demonstrate!

Below we will do examples:

  1. First we will do the same FOR loop as above but with 10000 iterations. In this case output objects are NOT pre-made and instead they will "grown" inside the for loop as they need to accommodate more and more results.

  2. Then, we will do a the same FOR loop with 10000 iterations, but we will pre-made a results matrix before the loop starts.

In both cases we will use a trick with the proc.time() function to time how long it takes each loop to finish.

Let's go to work!


Then...



How much faster was the FOR LOOP with a pre-made output object compared to the FOR LOOP with a WITHOUT pre-made output object?

As you can see the loop where we pre-made the output object ran about 20 times faster that the one where the output object needed to "grow" inside the loop.

WHILE Loops

Sometimes you may need needing to repeat an operation as long as a certain condition is met. You can do this with a while() loop.

while(this condition is true){
  do a thing
}

In the example below, the while() loop will run while the value of x is greater than zero. The value of x is re-calculated in every iteration of the while() loop (also the value of x is printed to screen):


TRUE/FALSE: When executing long FOR LOOPS, you should pre-make empty vectors before running the loop to prevent the loop to dramatically slow down.


If you need to run a loop where you know the number of iterations required in the loop, you will use a:


If you need to run a loop where you do NOT know the number of iterations required in the loop, however you know a condition that needs to be met before the loop can finish (e.g. x > y), then you will use a:


You want to print to screen the numbers 1 to 100. Which of the following will do that?

Real-life example: NPZ model (part 1)

Here we will built a numerical Nutrients-Phytoplankton-Zooplankton ecosystem model (i.e. an NPZ model) with the dynamics show in the diagram below. The "currency" of the model is nitrogen (i.e. $mmol N m^{-3}$) and the state variables are Nutrients ($N$), Phytoplankton ($P$) and Zooplankton ($Z$).

In numerical models the programmer has to specify two things:

  1. Mathematical equations describing ecosystem processes like photosynthesis, mortality, grazing, etc., and
  2. Initial conditions which represent the state of the ecosystem (i.e. concentrations of Phytoplankton, Zooplankton and Nutrients) at the beginning of the simulation.

The essence of these numerical models is a FOR LOOP where the Mathematical equations are used to calculate the future state of the model (i.e. the concentrations of Phytoplankton, Zooplankton and Nutrients) based on the present state. For every iteration of the FOR LOOP, the model estimates the state of the model at very small step in the future (i.e. only a few seconds into the future). In a a typical model run, the FOR LOOP can go around hundreds of thousands of times, effectively modelling years of ecosystem dynamics.

Let's take as an example the concentration of Phytoplankton, $P$, where the present concentration is $P_t$ and the future concentration is $P_{t+1}$.

Generic equations

I teach a whole course on Ecosystem Modelling, where students learn how to design equations to program the model to do what they what it to do. However, in the interest of time, here I will simply give you the equations. As a programmer, your job is to get the equations supplied by the modeller and turn them into R code.

Using the diagram above as a reference, the following generic equations describe the dynamics among the Nutrients ($N$), Phytoplankton ($P$) and Zooplankton ($Z$) state variables:

(1) $$ \frac{\partial P}{\partial t} = photosynthesis - grazing - mortalityP $$

(2) $$ \frac{\partial N}{\partial t} = mortalityP + mortalityZ - photosynthesis $$

(3) $$ \frac{\partial Z}{\partial t} = grazing - mortalityZ $$

Continuous equations

Using the "generic equations" above, we can define actual partial differential equations to describe the different ecosystem processes and flows of biomass among the different state variables (see table below for symbols and units):

(4) $$ \frac{\partial P}{\partial t} = \mu L^N P - \gamma L^P Z - m_P P$$

(5) $$ \frac{\partial N}{\partial t} = m_P P + m_ZZ - \mu L^N P $$

(6) $$ \frac{\partial Z}{\partial t} = \gamma L^P Z -m_ZZ $$

(7) $$L^N = \frac{N}{K+N}$$

(8) $$L^P = 1-e^{-\Lambda P}$$

Note that equations 1, 2 and 3 are equivalent to equations 4, 5, and 6, respectively. We just replaced the "process terms" with the actual mathematical terms representing those processes.

Table of symbols and units:

Symbol                     Units                   Description
$P$ $mmol N m^{-3}$ phytoplankton concentration
$N$ $mmol N m^{-3}$ nutrients concentration
$Z$ $mmol N m^{-3}$ zooplankton concentration
$L^N$ dimensionless Limitation due to (low) nutrients on phytoplankton (see Eq. 7)
$L^P$ dimensionless Limitation due to (low) phytoplankton on zooplankton (see Eq. 8)
$K$ $mmol N m^{-3}$ Half-saturation for nutrient absorption by phytoplankton
$t$ $days$ time
$\Lambda$ $mmol N^{-1} m^3$ Initial slope of ingestion saturation of zooplankton
$m_P$ $d^{-1}$ mortality rate of phytoplankton
$m_Z$ $d^{-1}$ mortality rate of zooplankton
$\mu$ $d^{-1}$ growth rate of phytoplankton
$\gamma$ $d^{-1}$ growth rate of zooplankton

Discrete equations

The model equations above (i.e. Equations 4 to 8) need to be discretized to convert them into a form that can be processed by computers. We won't talk much about how the discretization process works. We only need to understand that, with some mathematical assumtions, Equations 4 to 8 are converted into Equations 9 to 13 below:

(9) $$ P_{t+1} = P_t + \left\{ \left[ \mu L^N_t P_t - \gamma L^P_t Z_t - m_P P_t \right] \Delta t \right\} $$

(10) $$ N_{t+1} = N_t + \left\{ \left[ m_P P_t + m_Z Z_t - \mu L^N_t P_t \right] \Delta t \right\} $$

(11) $$ Z_{t+1} = Z_t + \left\{ \left[ \gamma L^P_t Z_t -m_ZZ_t \right] \Delta t \right\} $$

(12) $$L^N_t = \frac{N_t}{K+N_t}$$

(13) $$L^P_t = 1-e^{-\Lambda P_t}$$


As you can see, this "discrete equations" are what we need to calculate the future using the present. For example, take a look at Equation 9, where we estimate the value of future value of Phytoplankton, $P_{t+1}$, using only the values of the present state (i.e. $P_t$, $Z_t$ and $N_t$).

Transferring equations to computer code

It is time to convert the model equation above into R code. There are some steps that we need to do before we code the actual model equations, and then there is plotting code we need to do after the model equations. By the time we finish our model code should look a bit like the diagram below:



Note that we will NOT code the model linearly from start to end. The coding process involved a lot of jumping around... code a bit from the beginning, then jump and code a bit in the middle, then jump cade a line and the end, then jump back to beginning for a few more lines of code, and so on. In real-life coding, you start with the bones and then proceed to add flesh little by little until the end. In the sections below, we will replicate the process of coding that a programmer would follow. Let's start with an empty file!

In RStudio, create a new R script, name it NPZmodel.r and make sure you save it in Desktop\Lab7. Here on you will be required to copy-paste bits of code into your NPZmodel.r file. We will simulate the normal process that someone may follow in the development of a new model from scratch.

Roughly speaking, the steps that we need to follow are the following:

  1. Create empty vectors for each variable
  2. Create the FOR LOOP where we will include Equations 9 to 13
  3. Graph the results

There are many intermediate steps in between the steps above (see above diagram), however we can figure those out as we go.

Creating the "time" vector

The first step is to create a time vector. This will be a long column that starts at 0 and ends in the number of days that we want our model to run for (e.g. 365 days). Note that length of the vector will depend on the model's "time-step", which is how much forward in time the model will advance in ever iteration of the FOR LOOP. Remember that for the model to work properly the time-step needs to be very small, in the order of a few seconds.

Quickly Googleing "Create vector of equally spaced numbers in R" reveals that we need to use the seq() and a quick look at the help file of the function (i.e. ?seq) shows that we need 3 input arguments:

  1. from =, which is the starting point of the vector, in our case: 0
  2. to =, which is the ending point of the vector. In our case, for example, we could make this 365 if we want to run the model for a year
  3. length =, which the number of steps between the starting and ending points. We need to calculate this using the "time-step"

Type the following in your NPZmodel.r file, and click :


As you can see, in the first couple of line we define days and dt. The modeller (i.e. you) can change these parameters at any time. For example, if you want to run the model for 10 years just change to days <- 365 * 10... or if you want to make the "time-step" smaller, simply change to dt <- 0.001.

It is a good practice to keep at the top of your script all the parameters that are ment to be "adjusted" by the modeller. These first few lines of code are like "knobs" that the modeller can adjust before running the model.

Note that we needed to do NoSTEPS <- as.integer(days / dt) to calculate the number of steps in the run, which was then used as the length = argument when making the time vector.

As you can see, the starting and focal point was to make the time vector via time <- seq(from = 0, to = days, length = NoSTEPS). The rest of the code was just to calculate all the arguments needed by the function seq().

Let's take a look at our newly created time vector:



How many columns are in the time vector?


How many elements are in the time vector?


What is the first number in the time vector?


What is the last number in the time vector?


What are the units of the time vector?

Creating empty vectors

The second step is to create an empty vector for every model variable (i.e. N, P, Z, time and others).

You should remember from the FOR LOOPS section above, that it is very important to pre-make empty vectors to avoid the loop to drastically slow down!!. During the FOR LOOP, the values of the empty vectors will be later filled out.

The function to create empty vectors is numeric(). If you did not know this, you could have Googled "how to create an empty vector in R" as a starting point.

A quick ?numeric to read the functions help file reveals that we just need one argument: length of vector. This is great because we already calculated this (i.e. NoSTEPS).

We need one empty vector for each of the equations 9 to 13. Go ahead and copy-paste the following at the end of the # Chores section of your code in your NPZmodel.r file, and click :

Let's take a look at one our newly created empty vectors:

As you can see, P is just a bunch of zeros.


How many columns are in the N vector?


How many elements are in the Z vector?


What is the last number in the P vector?


What are the units of the P vector?

Let's pay special attention to the first number of our main state variables:


What is the first number in the N vector?


What is the first number in the P vector?


What is the first number in the Z vector?

Using Checkpoints

CHECKPOINTS (like the one in the yellow link below) are text files containing the R code that you should have in your NPZmodel.r file up to this point. As you continue down the lab, the code within CHECKPOINTS grows incrementally.


CHECKPOINT 1

Compare your code against CHECKPOINTS using "Diffchecker"

CHECKPOINTS are great to compare against your own code to see if you did everything correctly. You can use https://www.diffchecker.com/ to compare your code against the CHECKPOINT's code. Simply copy-paste your code on the left panel in Diffchecker and copy-paste the CHECKPOINT's code in the right panel in Diffchecker, then click on the "Find difference" button.

Additionally, if you get a bit lost, you can always click on an earlier "CHECKPOINT" link, use diffchecker to see where you went wrong, and the copy-paste the CHECKPOINT's contents to your file... that will get you all caught up, without having to redo the entire lab.

Basic plotting

The next step is to do some basic plots. This is useful to see how our state variables change as we start adding more and more code to your NPZmodel.r file.

We could just make our plot using the built-in plotting functions in R. However, if we want to use ggplot2, we need to two things:

(1) We import the ggplot2 package by inserting the following at the top of your NPZmodel.r file. By convention, all package imports are done at the top of the file to allow future readers (including the future you) to easily learn what packages are needed to run your code.


(2) We need to "pack" our vectors into a data frame, since ggplot2 likes to use data frames as input. We do this by inserting the following at the end of your NPZmodel.r file. Note that we also added a print statement to let us know when the model is finished.


Now we are ready to use ggplot2 to make our graphs. We do this by inserting the following at the end of your NPZmodel.r file. When you are done, click to run your model.


As you can see, all your state variables are zeros. You can only see the last line plotted in the graph (i.e. the red line, which is Z). The other two lines (P and N) are underneath Z and thus you cannot see them.

The code in your NPZmodel.r file should look like the checkpoint below:

CHECKPOINT 2

Initializing with initial conditions

Here is where we "kickstart" our model by telling what are the state variable concentrations at the start of the simulation. For this we need to:

(1) Insert 3 new parameters near the top of our NPZmodel.r file, in the "User settings" section, right below # Framework. Again, these parameters are designed to be tweaked by the modeller just before running the model, so it makes sense to keep these parameters near the top, where they are easy to find.


(2) Now we need to use our new parameters to actually initialize the model state variables. Insert the following code just below the # Chores section of your NPZmodel.r file. When you are done, click to run your model.


You should see a plot like the one below. Note that the lines are not only zeros!

To actually see what is in the first few rows of each state variable, do the following:


Once again, let's pay special attention to the first number of our main state variables:


What is the value of the FIRST element of the N?


What is the value of the FIRST element of the P?


What is the value of the FIRST element of the Z?

As you can see, your state variables had been "initialized" with the values of Ninitial, Pinitial and Zinitial.

The code in your NPZmodel.r file should look like the checkpoint below:

CHECKPOINT 3

Parameters

Before we code the main model equations, we should add to our code the needed model parameters. Just like the "Initial Conditions" and the "Framework" parameters, we should add the model parameters near the top of our script, in the "knob tweaking" section, so that you can quickly find (and change if needed) the user-configurable settings.

Insert the following code in the "User settings"... between the # Framework and # Initial conditions sections in your sections in your NPZmodel.r file.


Main Model Loop

Ok! Now comes the moment of truth! We need to write the Main Model Loop... where the model runs and replaces all the zeros in our N P Z vectors for the actual modeled values.

Insert the following code below the # Initializing with initial conditions section (right above the # For cleanliness, let's pack everything into a data frame section) in your NPZmodel.r file.


It is essential you take some time to compare the code above to the Discretized Equations below. For your convenience, I copy-pasted the Discretized Equations just below, so it is easy for you to do the comparison. See how the code above is an exact copy (in R) of those equations. Note that in R all brackets to separate equation terms are "round brackets"; also see how to express the t subindices in R, we simply point (using square brackets) to the index t in the arrays.

(9) $$ P_{t+1} = P_t + \left\{ \left[ \mu L^N_t P_t - \gamma L^P_t Z_t - m_P P_t \right] \Delta t \right\} $$

(10) $$ N_{t+1} = N_t + \left\{ \left[ m_P P_t + m_Z Z_t - \mu L^N_t P_t \right] \Delta t \right\} $$

(11) $$ Z_{t+1} = Z_t + \left\{ \left[ \gamma L^P_t Z_t -m_ZZ_t \right] \Delta t \right\} $$

(12) $$L^N_t = \frac{N_t}{K+N_t}$$

(13) $$L^P_t = 1-e^{-\Lambda P_t}$$


When you are done comparing equations to code... and inserting the code above to your NPZmodel.r file, click to run your model. You should see a plot like the one below. Note that the lines are not only zeros anymore!

Note that the concentration of Phytoplankton increased steadily for the first 5 days, until Nutrients ran out. That is, also about day 5 the concentration of N was decreased to near zero. After day 5, the concentration of Phytoplankton started to decline because it was consumed (faster than what they could grow) by the ever growing zooplankton, which continued to grow exponentially until the end of the experiment (day 10).


What was the last concentration of Phytoplankton (in mmolN m^-3) observed in this modelling experiment (i.e. at day 10)?


What was the last concentration of Zooplankton (in mmolN m^-3) observed in this modelling experiment (i.e. at day 10)?


What was the last concentration of Nutrients (in mmolN m^-3) observed in this modelling experiment (i.e. at day 10)?


What was the maximum concentration of Phytoplankton (in mmolN m^-3) observed in this modelling experiment?


What was the minimum concentration of Nutrients (in mmolN m^-3) observed in this modelling experiment?


How many times was the code inside the "MAIN MODEL LOOP" executed?

The code in your NPZmodel.r file should look like the checkpoint below:

CHECKPOINT 4

Adding sunlight

To add more realism to our model, we could create "virtual sunlight" and then add a limitation of phytoplankton as a function of sunlight.

To create our "virtual sunlight" we'll use a "quick and dirty" trick using sinusoidals, a FOR loop and an IF statement. The steps are shown below:

Sunlight can be expressed as the sum of two sinusoidals, one representing daily oscillations of sunlight, and the other representing yearly oscillations of sunlight:

(14) $$ I = I_{MaxDaily} sin(\omega_{day} t) + I_{MaxYearly} sin(\omega_{year} t)$$

where $I$ is solar irradiance ($\mu mol m^{-2} s^{-1}$), $t$ is time ($days$) and $\omega_{day}$ and $\omega_{year}$ are angular frequencies ($radians * day^{-1}$), which are related to the period, $T$ ($days$) as follows:

(15) $$ \omega = \frac{2 \pi}{T}$$

We will use these equations inside a FOR loop to estimate sunlight at the precise discrete times of our simulation. However, note that the sinusoidal method above produces "negative" sunlight at night, so we'll use an IF statement to make any negative sunlight into zero.

First, we have to make an empty vector for sunlight (i.e. I), otherwise the FOR loop will dramatically slow down.

so... insert the following at the bottom of the # Chores section in your NPZmodel.r file:


Time to insert our "sunlight sub-model"!

Insert the following code below the # Chores section in your NPZmodel.r file. When you are done, click to run your model.


Let's do a quick plot to see if it worked:


Woo-hoo!!! We got sunlight!


What is the value of **I** during **night** time?

Now that we created "artificial sunlight", we can use light to limit the growth of phytoplankton. Again, we won't go into the modelling details here. As a programmer, you only need the new equations to be able to modify the model.

First we need a new "limitation" to impose the effect of low light on phytoplankton growth, lets use the following:

(16) $$L^I = 1-e^{-\alpha P}$$

where $L^I$ (dimensionless) is the limitation due to (low) light on phytoplankton and $\alpha$ is the initial slope of the $PI$ curve (i.e. a parameter calculated in the lab with phytoplankton cultures).

...then we can modify equations (9) and (10) to include the new $L^I$ limitation:


(9) $$ P_{t+1} = P_t + \left\{ \left[ \mu L^N_t L^I_t P_t - \gamma L^P_t Z_t - m_P P_t \right] \Delta t \right\} $$

(10) $$ N_{t+1} = N_t + \left\{ \left[ m_P P_t + m_Z Z_t - \mu L^N_t L^I_t P_t \right] \Delta t \right\} $$



Add the $L^I_t$ vector to our code by copy-pasting the following at the bottom of #Chores section:


Then, inside the # MAIN MODEL LOOP section...

(1) Add the following underneath the statement were we created L_P[t] (i.e. Eq. 13):


(2) replace the "old" phytoplankton equation (i.e. Eq. 9) for the new equation that includes the light limitation:


...and

(3) replace the "old" nutrients equation (i.e. Eq. 10) for the new equation that includes the light limitation:


When you are done inserting the code above to your NPZmodel.r file, click to run your model. You should see a plot like the one below. Note that Phytoplankton only grows during the day, thus causing a "steps" pattern in the P and N time-series.

Two-year simulation

In the graph above, the daily sunlight cycle is very apparent. However, the yearly cycle is not. Let's do a two-year simulation...

In the # Framework section, locate the line where we create the days variable, which sets the duration of the simulation. Then change it to 2 years by doing the following:


When you are done adjusting the variable days in your NPZmodel.r file, click to run your model. You should see a plot like the one below. Note the complexity of the ecosystem dynamics reproduced by this numerical model!


In year 2 (i.e days 366 to 731)... Can you see the large "spike" in phytoplankton (~ day 420) followed by another smaller spike in phytoplankton at ~ day 580?

The spikes in P are consistent with the "spring bloom" and the "fall bloom"... which are known plankton dynamics in temperate regions. Our model somewhat mimics real-life dynamics!

You should probably also take a quick look at the two-year "virtual sunlight" make making a quick plot by typing the following in the :


You should have get a plot like the one below.

The is a simple line plot. However, because the line goes down to zero every night, and the days are so close together, the area looks filled with black. This emphasizes the yearly sunlight cycle, but it makes it impossible to see the diurnal cycles.

Note that from the point of view of light, the first summer roughly peaked at around day 120 and the second summer peaked at around day 480. The two winters peaked at around days 280 and 650, respectively.


Take a look at the two-year time-series of P, N and Z (i.e. two plots above). Why do you think nutrients (N) are so high during the winters (e.g. day 300)?


The code in your NPZmodel.r file should look like the checkpoint below:

CHECKPOINT 5

Plotting limiting functions

Limiting functions are small "mini-equations" that are used to impose the effect of an environmental variable (e.g. light, temperature, food availability) on the growth of a species, like phytoplankton.

In this model we quietly made and used 3 limiting functions, however we have neglected to talk about it... until now.

The 3 limiting functions are:

When the value of a limitation is equal to 1, the growth of the species it limits is unconstrained (i.e. species grows as fast as it can). As the value of the limitation decreases, the constraints increases and growth slows down. Finally, when the value of the limitation reaches zero 0, growth of the species completely stops.

Let's plot the time-series of our The 3 limiting functions.

First, we need to add the 3 limiting functions to our output data frame. Replace the # For cleanliness... section in your NPZmodel.r file, for the code below, which includes the additional 3 limiting functions:


Second, add the following code to the very bottom of your NPZmodel.r file (i.e. at the bottom of the # Plotting section):

When you are done inserting the code above in your NPZmodel.r file, click to run your model. You should see a plot like the one below:

The blue line is the limitation on zooplankton growth because of low food (i.e. phytoplankton). The other two lines are limitations on phytoplankton growth, the cyan is due to low light, and the red is due to low nutrients.

What happens when there ate 2 or more limitation constraining the growth of one species? This is the case of, phytoplankton and, in this case, at any given time the limitation with the lowest value drives the bulk of the observed dynamics.


In the plot above, around day 300, what environmental variable that was the most limiting to the growth of phytoplankton?


In the plot above, around day 420, what environmental variable that was the most limiting to the growth of phytoplankton?


The code in your NPZmodel.r file should look like the checkpoint below:

CHECKPOINT 6

Functions explained

By now you are probably very comfortable using R functions like max(), print(), head(), plot(), etc. However, to fully unlock the power of R, you need to know how to make your own functions!

You want to do your own function to:

  1. Avoid repeating yourself (i.e. efficient coding): when you see that you are using a section of your code over and over again, you may be better of getting that section of code into a function. Then, afterwards, you will only need to write one line of code (i.e. call your function), rather than writing the whole section again.

  2. Protect your code: If you want to make sure that you do not modify your code by mistake (like your NPZ model from above), you can put your code in a function (or set of functions) and then you can simply call those functions, ensuring that your original code is left intact. We'll explain more on this in the next section.

Regardless of motive, the syntax to write a function is shown below:

functionName <- function(arguments) {

  # do a thing

  return(output)
}

Let's do an example...

Imagine that you have to do some code where you have to convert multiple temperatures from Fahrenheit to Celsius. Rather than doing the math every time you need to do a conversion, you can make a simple conversion function and use it every time you need to do a conversion.

We can name the conversion function fahr_to_celsius. For now, you can simply copy-paste the following into the and click [enter] to load your new function into memory, so that we can use it later.


Now that your new fahr_to_celsius() function is loaded in memory, we can take it for a spin:


Named arguments and defaults

In the fahr_to_celsius() function above, the argument temp is a ordered argument. If you add a second argument (e.g. decimals to specify the number of decimals in the output), R will understand that the first argument will be always be temp and the second will always be decimals.

If you want to add a named argument instead, you need to use the = operator and include a default.

Below is the same function as above but with named arguments. Note that we included the additional decimals argument. Also note that we chose a default temp of 29.6°F (which is the freezing temperature of seawater @ 24 ppt) and a default decimals of 3:


Now that your updated fahr_to_celsius() function is loaded in memory, we can take it for a spin WITHOUT any arguments (i.e. function will use its defaults):


Try changing the number of decimals in the output:


Changing both, temp and decimals:


Note that we can still use the arguments as ordered arguments by not specifying their argument names:


In the case above, R uses the order in which the arguments were written in the function to interpret the function call. When possible, you should avoid using ordered arguments! When someone else (or your future you) try to read your code, having named arguments makes it so much easier to understand what every argument is doing.

Try to only include named arguments in the functions you make. Also try to use name arguments every time you use any function that allows them.

Loading functions from a different file

You can write a function in one file, and then use it in a different file, with help of the source() function.





In fact, this is a great way to write and use functions. Keeping your functions in a file separate from the code where you use them make it easier to share functions among many projects.

In the section below we'll do an example that uses source() function.

Testing and documenting

It’s important to both test functions and document them: Documentation helps you, and others, understand what the purpose of your function is, and how to use it, and its important to make sure that your function actually does what you think.

When you first start out, your workflow will probably look a lot like this:

  1. Write a function
  2. Comment parts of the function to document its behaviour
  3. Load in the source file
  4. Experiment with it in the console to make sure it behaves as you expect
  5. Make any necessary bug fixes
  6. Rinse and repeat.

Formal documentation for functions, written in separate .Rd files, gets turned into the documentation you see in help files. The roxygen2 package allows R coders to write documentation alongside the function code and then process it into the appropriate .Rd files. You will want to switch to this more formal method of writing documentation when you start writing more complicated R projects.

Formal automated tests can be written using the testthat package. Additionally, Travis CI is an online service that integrates testing with github. Automated tests are designed to run every time someone make a change to the code, to ensure that no part of the code got broken by mistake. While automated testing is beyond the scope of this course, you are encourage to always test your code to ensure that it is doing what it is supposed to do.

Real-life example: NPZ model (part 2)

In this section we will "pack" the NPZ model that we did earlier into a set of functions.

The idea here is to separate "the model" from "the experiment". Until now every time you want to do an experiment you need to go back to your model code and "tweak" the Framework, Parameters and Initial conditions sections. There are many problems arising from running models this way:

  1. Easy to make errors. Imagine you want to run a few experiments changing only one parameter in each run. Now imagine that in one run you forget to turn the previously changed parameter to its original default value. From then on, your model results will be wrong. However, by the time you figure this out, it may be hard (or impossible) to remember how many model runs ago you forgot to return the parameters to the default.

  2. Hard to keep track on model inputs. Imagine you need to do 100 model runs and you need to keep track on what inputs (parameters, initial conditions, etc.) you used in each run. For this you would need to copy paste your inputs in a different file, run the model and save the results elsewhere and then keep tabs on what file corresponds to what results. Any matching error would be hard to detect and hard to fix.

Ideally, you want to open an experiment file, write something like:

output = run_model(days=20, mu=0.6)

...and that is it! You ran you model, get your results (in output) and know exactly what parameters went into the model (in this case, all defaults except days=20 and mu=0.6). Running your model is a lot easier, and your original model code is safe in a file that you did not touch. This last part is essential! You worked really hard developing, debugging and testing your model; once you are done model development, you really want to "seal" your model code in a file and never tinker with it again!, just to make sure you don't accidentally add or remove something that will break the model.

Making a "functions" file

In RStudio, create a new R script, name it my_model.r and make sure you save it in Desktop\Lab7. In this file we will make functions that will contain your model.

We will first add an empty "template" of the functions that we will use:


As you can see, we will have two functions in our file:

  1. The first function will run the model and return model output in a variable that conveniently called output.
  2. The second function will take the model's output and use it to print graphs.

Splitting our code in these two functions allows you to run models without making plots, and also to make plots without having to run the model every time (this assumes you ran the model at least once earlier and save the output).

Inserting your NPZ model inside a function

It is now time to copy-paste the model from your NPZmodel.r file, into your functions file, my_model.r.

Let's start with the Framework, Parameters and other items in the "User settings" section. We need to add all of these as named arguments of our function:

When you finish, your code in my_model.r should look like this:


The next step is to copy-paste the rest of the model from your NPZmodel.r file, into your functions file, my_model.r. This includes from the # Chores section to the end of the # For cleanliness section. At the end, your my_model.r should look like this:


Let's test it!

In your my_model.r, click . This loaded your functions into memory. Now you can run your model by typing in the the following


The code above ran your model and save the output into the myoutput variable. Let's take a look at myoutput:


Inserting your plotting code inside a function

The last part we need to do is to copy-paste the # Plotting section from your NPZmodel.r file, into your plot_run function in your my_model.r functions file. This time around, we'll have to do the following modifications to be able to output our graphs out of the function:

  1. save the graph from the first plot into a variable named plot1
  2. save the graph from the second plot into a variable named plot2
  3. include list(plot1, plot2) in the return of plot_run

Below is the plot_run function after all the above modifications. Note that, for clarity, I am not showing the first part of the code, the run_model function (see the checkpoint below to see the whole code).

Let's test it!

In your my_model.r, click . This loaded your functions into memory. Now you can run your model by typing in the the following


You should have got the two graphs above.

Commenting your function

While we have been commenting our code all along, you should always include an initial set of comments to give readers a quick introduction of what your code does, and information on arguments, inputs and outputs. Also, this is a good place to include credits.

Insert the following comments below the first couple of lines of your run_model function (i.e. below the first {):


Insert the following comments below the first couple of lines of your plot_run function (i.e. below the first {):


That it is! Your model is finished!!

The code in your my_model.r file should look like the checkpoint below:

CHECKPOINT 7


How many functions are coded in your my_model.r file?


How many arguments are in your run_model function?


How many arguments are in your plt_run function?

Modelling experiment 1

Now that we have our model in a function, running modelling experiments is very easy!

Consider the following question?

What will happen to the concentration of phytoplankton in an embayment if the phytoplankton growth rate increases from 0.5 to 0.6?

To solve this, we need to do two model runs, one with mu=0.5 and another run with mu=0.6 (all other parameters set to be identical in both model runs). Then we can plot the results of the two models side by side to see the difference.

Let's do the experiment:

In RStudio, create a new R script, name it model_experiments.r and make sure you save it in Desktop\Lab7. Copy-paste the experiment code below into your new model_experiment.r file, and click :


Overall, you can see how the phytoplankton growing at mu=0.6 (i.e. blue line) grew more that the phytoplankton growing at mu=0.5 (i.e. red line). However, with our model results we can quantify the differences!


What was the maximum phytoplankton concentration, P, in the experiment with mu = 0.5?


What was the maximum phytoplankton concentration, P, in the experiment with mu = 0.6?


What was the average phytoplankton concentration, P, over the whole year in the experiment with mu = 0.5?


What was the average phytoplankton concentration, P, over the whole year in the experiment with mu = 0.6?


How many phytoplankton blooms were observed over the whole year in the experiment with mu = 0.5?


How many phytoplankton blooms were observed over the whole year in the experiment with mu = 0.6?

Modelling experiment 2

Imagine that you fill a pond with natural seawater, then you want to see what happens to the average concentration of phytoplankton over the year. Ponds are notorious for been very sensitive to initial conditions; that is, the initial concentrations of Nutrients, Phytoplankton and Zooplankton when you fill the pond, have a big impact on what is going to happened in the pond over the year.

What do you think would happen to the average concentration of phytoplankton if the initial concentration of zooplankton happens to be very high?

It is easy to think, given the same initial conditions on Phytoplankton and Nutrients, the more zooplankton to you start with, the lower the average concentration of phytoplankton over the year. Right?

We'll as usual with nature, things are often not linear and quite complex. Luckily, we have an NPZ model to help us with this question?

Let's run 10 models with increasing concentration of Zinitial, while leaving all other parameters (including Pinitial and Ninitial) the same. This could take quite a bit of coding, unless you use a FOOR LOOP!



In experiment 2, what happened to the average concentration of phytoplankton (mean_P) with increasing concentrations of Zinitial?

This is the end of lab


Code below is for formatting of this lab. Do not alter!