LAB 10: Mapping and geospatial analysis: Part 2

BIO3782: Biologist's Toolkit (Dalhousie University)


Setup of workspace

Make sure all required files are in the working directory:

As in previous labs, we'll try simulate "real-life" coding, by using the tags below to indicate when to use RStudio's or .







Make sure you check your data, using both summaries (i.e. str(), dim()) and raw data, when you import/load it and after any manipulation.


We will be moving from working with raster data to working with vector data.

What is vector data?

Vector data provide a way to represent real world features within the GIS environment. A feature is anything you can see on the landscape. Imagine you are standing on the top of a hill. Looking down you can see houses, roads, trees, rivers, and so on. Each one of these things would be a feature when we represent them in a GIS Application.

Vector data is comprised of lines or arcs, defined by beginning and end points, which meet at nodes. The locations of these nodes and the topological structure are usually stored explicitly. Features are defined by their boundaries only and curved lines are represented as a series of connecting arcs. Vector storage involves the storage of explicit topology, which raises overheads, however it only stores those points which define a feature and all space outside these features is 'non-existent'.

A vector feature has its shape represented using geometry. The geometry is made up of one or more interconnected vertices. A vertex describes a position in space using an X, Y and optionally z axis. Vector representation can be in one of the following formats:

Now, we will open and plot point, line and polygon vector data stored in shapefile format in R.

Import shapefiles

We will use the sf package to work with vector data in R. We will also use the raster package so we can explore raster and vector spatial metadata using similar commands. Let's install and load the required packages.




Then...


The shapefiles that we will import are:

  1. A polygon shapefile representing our field site boundary,
  2. A line shapefile representing roads, and
  3. A point shapefile representing the location of the Fisher flux tower located at the NEON Harvard Forest field site.

The first shapefile that we will open contains the boundary of our study area (Area Of Interest or AOI). To import shapefiles we use the sf function st_read(). st_read() requires the file path to the shapefile.

Let’s import our AOI:


Shapefile metadata and attributes

When we import the HarClip_UTMZ18 shapefile layer into R, the st_read() function automatically stores information about the data. We are particularly interested in the geospatial metadata, describing the format, CRS, extent, and other components of the vector data, and the attributes which describe properties associated with each individual vector object.

Spatial metadata

Key metadata for all shapefiles include:

We can view shapefile metadata using the st_geometry_type(), st_crs(), and st_bbox() functions. First, let’s view the geometry type for our AOI shapefile:


aoi_boundary_HARV is a polygon object. The 18 levels shown list the possible categories of the geometry type.

Now let’s check what CRS this file data is in:


Our data is in CRS UTM zone 18N. The CRS is critical to interpreting the object’s extent values as it specifies units. To find the extent of our AOI, we can use the st_bbox() function.




What do spatial extents represent?

We can also view all of the metadata and attributes for this shapefile object by printing it to the screen.


Plotting a shape file

Let’s visualize the data in our sf object using the ggplot package. Unlike with raster data, we do not need to convert vector data to a dataframe before plotting with ggplot.

We’re going to customize our boundary plot by setting the size, color, and fill for our plot. When plotting sf objects with ggplot2, you need to use the coord_sf() coordinate system.





Using the steps above, import the HARV_roads and HARVtower_UTM18N layers into R. Assign the object names as follows:
HARV_roads = lines_HARV
HARVtower_UTM18N = point_HARV
Answer the following questions in Brightspace.






What type of R spatial object is created when you import each layer?



What is the xmin for lines_HARV?



What is the ymin for point_HARV?



Does point_HARV contain lines, points or polygons?



Does lines_HARV contain lines, points or polygons?



How many spatial objects are in point_HARV?



How many spatial objects are in lines_HARV?

Query vector feature metadata

We started to explore our point_HARV object in the previous episode. To see a summary of all of the metadata associated with our point_HARV object, we can view the object with View(point_HARV) or print a summary of the object itself to the console.


We can use the ncol function to count the number of attributes associated with a spatial object too.

The geometry is just another column and counts towards the total, resulting in 16 columns instead of 15 (as above).


We can view the individual name of each attribute using the names() function in R


The head() function is also useful for getting a preview of the data.




How many attributes does point_HARV have?



Who owns the site in the point_HARV object?



Which of the following is not an attribute of the point_HARV object?

Latitude
County
Country

Explore values within one attribute

We can explore individual values stored within a particular attribute. Comparing attributes to a spreadsheet or a data frame, this is similar to exploring values in a column. For spatial objects, we can use the syntax objectName$attributeName.

For example, we can see the contents of the TYPE field of our lines feature.


To see only unique values within the TYPE field, we can use the levels() function for extracting the possible values of a categorical variable.


We can also apply functions from dplyr to select a subset of features from a spatial object in R, just like with data frames.

For example, we might be interested only in features that are of TYPE “footpath”. Once we subset out this data, we can use it as input to other code so that code only operates on the footpath lines.


Our subsetting operation reduces the features count to 2. This means that only two feature lines in our spatial object have the attribute TYPE == footpath. Let's take a look at what that means graphically.


We know that there are two features in our footpaths subset but our plot only shows one feature. Let’s adjust the colors used in our plot. If we have 2 features in our vector object, we can plot each using a unique color by assigning a column name to the color aesthetic.




How many features are there in TYPE == "boardwalk"?



How many features are there in TYPE == "stone wall"?

Sometimes, changing some of the ggplot aesthetic features can communicate our data better. Let's play around with the dataset by applying a unique color scheme for each road type.



What type of variable is TYPE?



What happens when you use the levels function on a character vector?

First, let's see how many road types there are in the data. We will have to coerce TYPE into a factor.


Let's pick four colors, one for each level in our vector object then plot the data.




Can spatial object be manipulated using the same functions as a dataframe?



What function allows ggplot to plot spatial objects directly?



TRUE or FALSE: Shape files contain data on extents



TRUE or FALSE: Shape files DO NOT contain CRS data.

Plotting multiple shapefiles

Now, let’s create a plot that combines our tower location (point_HARV), site boundary (aoi_boundary_HARV) and roads (lines_HARV) spatial objects. We will need to build a custom legend as well.

To begin, we will create a plot with the site boundary as the first layer. Then layer the tower location and road data on top using +.


Next, let’s build a custom legend using the symbology (the colors and symbols) that we used to create the plot above. For example, it might be good if the lines were symbolized as lines. If you want the legend to draw lines or points, you need to add an instruction to the geom_sf call - in this case, show.legend = 'line'.


Plotting raster and vector data together

We can plot vector data layered on top of raster data using the + to add a layer in ggplot. We will create a plot that uses the NEON AOI Canopy Height Model CHM_HARV_df as a base layer.

First let's load up the HARV data from NEON-DS-Airborne-Remote-Sensing files and re-create the CHM_HARV_df we used in the previous lab.


Next, let's plot all the data together staring with CHM_HARV_df.




What operator can add multiple layers to ggplot?

Working with spatial data from different sources

We often need to gather spatial datasets from different sources and/or data that cover different spatial extents. These data are often in different Coordinate Reference Systems (CRSs).

Some reasons for data being in different CRSs include:

  1. The data are stored in a particular CRS convention used by the data provider (for example, a government agency).
  2. The data are stored in a particular CRS that is customized to a region. For instance, many states in the US prefer to use a State Plane projection customized for that state.



Notice the differences in shape associated with each different projection. These differences are a direct result of the calculations used to “flatten” the data onto a 2-dimensional map. Often data are stored purposefully in a particular projection that optimizes the relative shape and size of surrounding geographic boundaries (states, counties, countries, etc).

We will use the st_read() function to import the /US-Boundary-Layers/US-State-Boundaries-Census-2014 layer into R. This layer contains the boundaries of all contiguous states in the U.S. Please note that these data have been modified and reprojected from the original data downloaded from the Census website.


Let's take a look at these values in a plot.


We can add a boundary layer of the United States to our map - to make it look nicer. We will import NEON-DS-Site-Layout-Files/US-Boundary-Layers/US-Boundary-Dissolved-States and re-create the previous plot.


Next, let’s add the location of a flux tower where our study area is. As we are adding these layers, take note of the CRS of each object. First let’s look at the CRS of our tower location object.


Our project string for DSM_HARV specifies the UTM projection.

The zone is unique to the UTM projection. Not all CRSs will have a zone.

Let’s check the CRS of our state and country boundary objects.




What projection does state_boundary_US use?



What units of measure does country_boundary_US use?

Next, let’s view the extent or spatial coverage for the point_HARV spatial object compared to the state_boundary_US object.

First we’ll look at the extent for our study site.


And then the extent for the state boundary data.




Why are the extents of state_boundary_US smaller than point_HARV?

Previously when working with raster data in different CRSs, we needed to convert all objects to the same CRS. We can do the same thing with our vector data, however, we don’t need to! When using the ggplot2 package, ggplot automatically converts all objects to the same CRS before plotting. This means we can plot our three data sets together without doing any conversion.




Do we need to convert all CRS to be the same before plotting with ggplot?

Spatial data in text format

The HARV_PlotLocations.csv file contains point locations for study plot where NEON collects data on vegetation and other ecological metrics. We can do the following:

  1. Create a map of these plot locations.
  2. Export the data in a shapefile format to share with our colleagues. This shapefile can be imported into any GIS software.
  3. Create a map showing vegetation height with plot locations layered on top.

Spatial data are sometimes stored in a text file format (.txt or .csv). If the text file has an associated x and y location column, then we can convert it into an sf spatial object. The sf object allows us to store both the x,y values that represent the coordinate location of each point and the associated attribute data - or columns describing each feature in the spatial object.

To begin let’s import a .csv file that contains plot coordinate x, y locations at the NEON Harvard Forest Field Site and look at the structure of that new object.




How many locations (rows) are there in plot_locations_HARV?



How many attributes are there in plot_locations_HARV?

all of our character data was imported into R as factor (categorical) data. Next, let’s explore the dataframe to determine whether it contains columns with coordinate values. If we are lucky, our .csv will contain columns labeled:

Let’s check out the column names of our dataframe.


Our column names include several fields that might contain spatial information. The plot_locations_HARV$easting and plot_locations_HARV$northing columns contain coordinate values. We can confirm this by looking at the first six rows of our data.


We have coordinate values in our data frame. In order to convert our data frame to an sf object, we also need to know the CRS associated with those coordinate values.

There are several ways to figure out the CRS of spatial data in text format.

  1. We can check the file metadata in hopes that the CRS was recorded in the data.
  2. We can explore the file itself to see if CRS information is embedded in the file header or somewhere in the data columns.


It is not typical to store CRS information in a column. But this particular file contains CRS information this way. The geodeticDa and utmZone columns contain the information that helps us determine the CRS:

To create the proj4 associated with UTM Zone 18 WGS84 we can look up the projection on the Spatial Reference website, which contains a list of CRS formats for each projection. From here, we can extract the proj4 string for UTM Zone 18N WGS84.

However, if we have other data in the UTM Zone 18N projection, it’s much easier to use the st_crs() function to extract the CRS in proj4 format from that object and assign it to our new spatial object. We’ve seen this CRS before with our Harvard Forest study site (point_HARV).




What zone is the point_HARV data in?

We can use the CRS from that spatial object to convert our non-spatial dataframe into an sf object.

Next, let’s create a crs object that we can use to define the CRS of our sf object when we create it.




What class of object is utm18nCRS?

Convert csv to sf object

To covert our dataframe into an sf object, we need to specify:

  1. The columns containing X (easting) and Y (northing) coordinate values
  2. The CRS that the column coordinate represent (units are included in the CRS) - stored in our utmCRS object.

We will use the st_as_sf() function to perform the conversion.


It's always good practice to double check the CRS to make sure it is correct.


Now that the data is in the right class and order, let's take a look at the data graphically.


Export a shapefile

We can write an R spatial object to a shapefile using the st_write function in sf. To do this we need the following arguments:

  1. the name of the spatial object (plot_locations_sp_HARV)
  2. the directory where we want to save our shapefile (to use current = getwd() or you can specify a different path)
  3. the name of the new shapefile (PlotLocations_HARV)
  4. the driver which specifies the file format (ESRI Shapefile)

We can now export the spatial object as a shapefile.




What function converts a dataframe to an sf object?



What parameter should we know about our data before converting into a spatial object?

Crop a raster to vector extent

We often work with spatial layers that have different spatial extents. The spatial extent of a shapefile or R spatial object represents the geographic “edge” or location that is the furthest north, south east and west. Thus this represents the overall geographic coverage of the spatial object.

The graphic below illustrates the extent of several of the spatial layers that we have worked with in this lab:

  1. Area of interest (AOI) – blue
  2. Roads and trails – purple
  3. Vegetation plot locations (marked with white dots)– black
  4. A canopy height model (CHM) in GeoTIFF format – green



Frequent use cases of cropping a raster file include reducing file size and creating maps. Sometimes we have a raster file that is much larger than our study area or area of interest. It is often more efficient to crop the raster to the extent of our study area to reduce file sizes as we process our data. Cropping a raster can also be useful when creating pretty maps so that the raster layer matches the extent of the desired vector layers.

Crop a raster using vector extent

We can use the crop() function to crop a raster to the extent of another spatial object. To do this, we need to specify the raster to be cropped and the spatial object that will be used to crop the raster. R will use the extent of the spatial object as the cropping boundary.

To illustrate this, we will crop the Canopy Height Model (CHM) to only include the area of interest (AOI). Let’s start by plotting the full extent of the CHM data and overlay where the AOI falls within it. The boundaries of the AOI will be colored blue, and we use fill = NA to make the area transparent.


Now that we have visualized the area of the CHM we want to subset, we can perform the cropping operation. We are going to crop() function from the raster package to create a new object with only the portion of the CHM data that falls within the boundaries of the AOI.


Now we can plot the cropped CHM data, along with a boundary box showing the full CHM extent. However, remember, since this is raster data, we need to convert to a data frame in order to plot using ggplot. To get the boundary box from CHM, the st_bbox() will extract the 4 corners of the rectangle that encompass all the features contained in this object.


The plot above shows that the full CHM extent (plotted in green) is much larger than the resulting cropped raster. Our new cropped CHM now has the same extent as the aoi_boundary_HARV object that was used as a crop extent (blue border below).


Let's look at the extents of all the other objects for this site.


Our plot location extent is larger than the AOI Boundary. It would be nice to have the vegetation plot locations on top of the Canopy Height Model information. Let's crop the Canopy Height Model to the extent of the study plot locations then plot the vegetation plot location points on top of the Canopy Height Model.

Let's crop the data.


Then plot the data.


Define an extent

So far, we have used a shapefile to crop the extent of a raster dataset. Alternatively, we can also use the extent() function to define an extent to be used as a cropping boundary. This creates a new object of class extent. Here we will provide the extent() function our xmin, xmax, ymin, and ymax (in that order).


Once we have defined our new extent, we can use the crop() function to crop our raster to this extent object then convert it to a dataframe.


Now we can plot the data.


Extract raster pixels values using vector polygons

Often we want to extract values from a raster layer for particular locations - for example, plot locations that we are sampling on the ground. We can extract all pixel values within 20m of our x,y point of interest. These can then be summarized into some value of interest (e.g. mean, maximum, total).

The extract() function can do this and requires:

  1. The raster that we wish to extract values from,
  2. The vector layer containing the polygons that we wish to use as a boundary or boundaries,

We will begin by extracting all canopy height pixel values located within our aoi_boundary_HARV polygon which surrounds the tower located at the NEON Harvard Forest field site.


When we use the extract() function, R extracts the value for each pixel located within the boundary of the polygon being used to perform the extraction - in this case the aoi_boundary_HARV object (a single polygon). Here, the function extracted values from 18,450 pixels.

We can create a histogram of tree height values within the boundary to better understand the structure or height distribution of trees at our site. We will use the column layer from our data frame as our x values, as this column represents the tree heights for each pixel.


The summary() function can be helpful for viewing descriptive statistics including min, max, and mean height values. These values help us better understand vegetation at our field site.


Summarize extracted raster values

We often want to extract summary values from a raster. We can tell R the type of summary statistic we are interested in using the fun = argument. Let’s extract a mean height value for our AOI. Because we are extracting only a single number, we will not use the df = TRUE argument.




What is the the mean height value, extracted from our LiDAR data derived canopy height model (mean_tree_height_AOI)?

Extract data using x,y locations

We can also extract pixel values from a raster by defining a buffer or area surrounding individual point locations using the extract() function. To do this we define the summary argument (fun = mean) and the buffer distance (buffer = 20) which represents the radius of a circular region around each point. By default, the units of the buffer are the same units as the data’s CRS. All pixels that are touched by the buffer region are included in the extract.

Let’s put this into practice by figuring out the mean tree height within a radius of 20m around the tower location (point_HARV). Because we are extracting only a single number, we will not use the df = TRUE argument.




What is the median tree height within a radius of 20 m around the tower location (point_HARV)?



What does the crop() function do?



What function extracts pixels from a raster object that fall within a particular extent boundary?



What function defines an extent?

This is the end of lab


Code below is for formatting of this lab. Do not alter!