# Lab settings - please ingnore
options(repr.plot.width=7, repr.plot.height=4, repr.plot.res=250 ) # Make plots a resonable size
BIO3782: Biologist's Toolkit (Dalhousie University)
Make sure all required files are in the working directory:
Desktop/
| └───Lab9/
| └───raw_data/
| └───NEON-DS-Airborne-Remote-Sensing/
| └───HARV/
| | └───CHM/
| | | HARV_chmCrop.tif
| | └───DSM/
| | | HARV_dsmCrop.tif
| | | HARV_DSMhill.tif
| | └───DTM/
| | | HARV_dtmCrop.tif
| | | HARV_DTMhill.tif
| | └───RBG_Imagery/
| | | HARV_Ortho_wNA.tif
| | | HARV_RGB_metadata.txt
| | HARV_RGB_Ortho.tif
| └───SJER/
| └───DSM/
| | SJER_dsmCrop.tif
| | SJER_dsmCrop.tif.aux.xml
| | SJER_dsmHill.tif
| | SJER_dsmHill.tif.aux.xml
| | SJER_DSMhill_WGS84.tif
| | SJER_DSMhill_WGS84.tif.aux.xml
| └───DTM/
| SJER_dtmCrop.tif
| SJER_dtmCrop.tif.aux.xml
| SJER_dtmHill.tif
| SJER_dtmHill.tif.aux.xml
In RStudio, change the "working directory" to: Desktop\Lab9. Click here if you need a refresher on the working directory
In RStudio, create a new R script, name it lab9.r and make sure you save it in Desktop\Lab9. You will be writting and copy-pasting code to your lab9.r file so that you can keep a record of all you did in this lab.
As in previous labs, we'll try simulate "real-life" coding, by using the tags below to indicate when to use RStudio's or
Make sure you check your data, using both summaries (i.e. str(), dim()) and raw data, when you import/load it and after any manipulation.
In its simplest form, a raster consists of a matrix of cells (or pixels) organized into rows and columns (or a grid) where each cell contains a value representing information (e.g. temperature, land use, elevation, etc). The spatial distribution of the cell values represent the phenomenon portrayed by the raster dataset. Often raster datasets contain many layers, each displaying different information. Each raster layer can be of one of two types (1) a category layer or (2) a magnitude layer. Category layers could be a land-use class such as grassland, forest, or road. A magnitude layer might represent gravity, noise pollution, or percent rainfall, surface temperature, etc. There are two very common magnitude layers that often are refered with their unique names: (a) height and (b) spectral value. Height (distance) represents surface elevation above mean sea level, which can be used to derive slope, aspect, and watershed properties. Spectral values are used in satellite imagery and aerial photography to represent light reflectance and color.
Cell values can be either positive or negative, integer, or floating point. Integer values are best used to represent categorical (discrete/thematic) data and floating-point values to represent continuous surfaces.
The major use of raster data involves storing map information as digital images, in which the cell values relate to the pixel colours of the image. To reproduce the image the computer reads each of these cell values one by one and applies them to the pixels on the screen.
Rasters are stored as an ordered list of cell values, for example, 80, 74, 62, 45, 45, 34, and so on. The area (or surface) represented by each cell consists of the same width and height and is an equal portion of the entire surface represented by the raster. For example, a raster representing elevation (that is, digital elevation model) may cover an area of 100 square kilometers. If there were 100 cells in this raster, each cell would represent 1 square kilometer of equal width and height (that is, 1 km x 1 km).
The dimension of the cells can be as large or as small as needed to represent the surface conveyed by the raster dataset and the features within the surface, such as a square kilometer, square foot, or even square centimeter. The cell size determines how coarse or fine the patterns or features in the raster will appear. The smaller the cell size, the smoother or more detailed the raster will be. However, the greater the number of cells, the longer it will take to process, and it will increase the demand for storage space. If a cell size is too large, information may be lost or subtle patterns may be obscured. For example, if the cell size is larger than the width of a road, the road may not exist within the raster dataset. The location of each cell is defined by the row or column where it is located within the raster matrix. Essentially, the matrix is represented by a Cartesian coordinate system, in which the rows of the matrix are parallel to the x-axis and the columns to the y-axis of the Cartesian plane. Row and column values begin with 0. In the example below, if the raster is in a Universal Transverse Mercator (UTM) projected coordinate system and has a cell size of 100, the cell location at 5,1 would be 300,500 East, 5,900,600 North. In the diagram below, you can see how this simple polygon feature will be represented by a raster dataset at various cell sizes.
While the structure of raster data is simple, it is exceptionally useful for a wide range of applications. Data stored in a raster format represents real-world phenomena:
Thematic and continuous rasters may be displayed as data layers along with other geographic data on your map but are often used as the source data for spatial analysis with the ArcGIS Spatial Analyst extension. Picture rasters are often used as attributes in tables—they can be displayed with your geographic data and are used to convey additional information about map features. You can learn more about thematic and continuous data here.
While the structure of raster data is simple, it is exceptionally useful for a wide range of applications. Within a GIS, the uses of raster data fall under four main categories:
The advantages of storing your data as a raster are as follows:
There are other considerations for storing your data as a raster that may convince you to use a vector-based storage option.
We will continue to work with the dplyr
and ggplot2
packages. We will use two additional packages in this episode to work with raster data - the raster
and rgdal
packages. Let's install the new packages.
install.packages(c('raster', 'rgdal'))
Now let's load all the requried packages.
library(raster)
library(rgdal)
library(ggplot2)
library(dplyr)
We will be working with a series of GeoTIFF files in this lesson. The GeoTIFF format contains a set of embedded tags with metadata about the raster data. We can use the function GDALinfo()
to get information about our raster data before we read that data into R. It is ideal to do this before importing your data.
GDALinfo("raw_data/NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_dsmCrop.tif")
rows 1367 columns 1697 bands 1 lower left origin.x 731453 lower left origin.y 4712471 res.x 1 res.y 1 ysign -1 oblique.x 0 oblique.y 0 driver GTiff projection +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs file raw_data/NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_dsmCrop.tif apparent band summary: GDType hasNoDataValue NoDataValue blockSize1 blockSize2 1 Float64 TRUE -9999 1 1697 apparent band statistics: Bmin Bmax Bmean Bsd 1 305.07 416.07 359.8531 17.83169 Metadata: AREA_OR_POINT=Area
Let's store this information in an object called HARV_dsmCrop_info
. Each line of text that was printed to the console is now stored as an element of the character vector HARV_dsmCrop_info
.
# Load Harvard Forest metadata information
HARV_dsmCrop_info <- capture.output(
GDALinfo("raw_data/NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_dsmCrop.tif")
)
We can use the raster()
function to open a raster in R. To improve code readability, file and object names should be used that make it clear what is in the file. The data were collected from Harvard Forest, so we’ll use a naming convention of datatype_HARV
.
First we will load our raster file into R and view the data structure.
# Load Harvard Forest raster
DSM_HARV <-
raster("raw_data/NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_dsmCrop.tif")
DSM_HARV
class : RasterLayer dimensions : 1367, 1697, 2319799 (nrow, ncol, ncell) resolution : 1, 1 (x, y) extent : 731453, 733150, 4712471, 4713838 (xmin, xmax, ymin, ymax) crs : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs source : C:/Users/Diego/Documents/6.Biologist's Toolkit/25Feb/biol3782/week9/raw_data/NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_dsmCrop.tif names : HARV_dsmCrop values : 305.07, 416.07 (min, max)
The information above includes a report of min and max values, but no other data range statistics. Similar to other R data structures like vectors and data frame columns, descriptive statistics for raster data can be retrieved.
summary(DSM_HARV)
Warning message in .local(object, ...): "summary is an estimate based on a sample of 1e+05 cells (4.31% of all cells) "
HARV_dsmCrop | |
---|---|
Min. | 305.5500 |
1st Qu. | 345.6500 |
Median | 359.6450 |
3rd Qu. | 374.2825 |
Max. | 413.9000 |
NA's | 0.0000 |
Unless you force R to calculate these statistics using every cell in the raster, it will take a random sample of 100,000 cells and calculate from that instead. To force calculation on more, or even all values, you can use the parameter maxsamp
summary(DSM_HARV, maxsamp = ncell(DSM_HARV))
HARV_dsmCrop | |
---|---|
Min. | 305.07 |
1st Qu. | 345.59 |
Median | 359.67 |
3rd Qu. | 374.28 |
Max. | 416.07 |
NA's | 0.00 |
Right now, the data is still in raster format. To be able to manipulate it more easily, let's convert it to a dataframe.
# Store Harvard Forest raster in a data frame
DSM_HARV_df <- as.data.frame(DSM_HARV, xy = TRUE)
When we view the structure of our data, we will see a standard dataframe format.
str(DSM_HARV_df)
'data.frame': 2319799 obs. of 3 variables: $ x : num 731454 731455 731456 731457 731458 ... $ y : num 4713838 4713838 4713838 4713838 4713838 ... $ HARV_dsmCrop: num 409 408 407 407 409 ...
Let's take a quick look at the data. We'll use scale_fill_viridis_c
to fill countour colors and coord_quickmap
to approximate Mercator projection for our plots. This approximation is suitable for small areas that are not too close to the poles.
# Quick map
ggplot() +
geom_raster(data = DSM_HARV_df , aes(x = x, y = y, fill = HARV_dsmCrop)) +
scale_fill_viridis_c() +
coord_quickmap()+
theme_classic()
This map shows the elevation of our study site in Harvard Forest. From the legend, we can see that the maximum elevation is ~400, but we can’t tell whether this is 400 feet or 400 meters because the legend doesn’t show us the units. We can look at the metadata of our object to see what the units are. Much of the metadata that we’re interested in is part of the Coordinate Reference System (CRS).
The Coordinate Reference System or CRS of a spatial object tells R where the raster is located in geographic space. It also tells R what mathematical method should be used to “flatten” or project the raster in geographic space.
Data from the same location but saved in different coordinate references systems will not line up in any GIS or other program. It’s important when working with spatial data to identify the coordinate reference system applied to the data and retain it throughout data processing and analysis.
Now we will see how features of the CRS appear in our data file and what meanings they have. We can view the CRS string associated with our R object using the crs()
function.
The CRS for our data is given to us by R in proj4
format. Let’s break down the pieces of proj4
string. The string contains all of the individual CRS elements that R or another GIS might need. Each element is specified with a +
sign, similar to how a .csv
file is delimited or broken up by a ,
. After each +
we see the CRS element being defined. For example projection (proj=
) and datum (datum=
).
Our projection string for DSM_HARV
is specified as follows:
crs(DSM_HARV)
CRS arguments: +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
The zone is unique to the UTM projection. Not all CRSs will have a zone.
*The UTM zones across the continental United States. Source: Chrismurf, wikimedia.org.
It is useful to know the minimum or maximum values of a raster dataset. In this case, given we are working with elevation data, these values represent the min/max elevation range at our site.
Raster statistics are often calculated and embedded in a GeoTIFF. Let's take a look at the minimum
minValue(DSM_HARV)
and maximum values in our raster data
maxValue(DSM_HARV)
We can also calculate and set them using the setMinMax()
function
# Store mins and max
DSM_HARV <- setMinMax(DSM_HARV)
DSM_HARV
class : RasterLayer dimensions : 1367, 1697, 2319799 (nrow, ncol, ncell) resolution : 1, 1 (x, y) extent : 731453, 733150, 4712471, 4713838 (xmin, xmax, ymin, ymax) crs : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs source : C:/Users/Diego/Documents/6.Biologist's Toolkit/25Feb/biol3782/week9/raw_data/NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_dsmCrop.tif names : HARV_dsmCrop values : 305.07, 416.07 (min, max)
We can see that the elevation at our site ranges from 305.0700073m to 416.0699768m.
The Digital Surface Model object (DSM_HARV) that we’ve been working with is a single band raster. This means that there is only one dataset stored in the raster: surface elevation in meters for one time period.
A raster dataset can contain one or more bands, meaning that one raster file contains data for more than one variable or time period for each cell. We can use the raster()
function to import one single band from a single or multi-band raster. By default the raster(
) function only imports the first band in a raster regardless of whether it has one or more bands. We can also view the number of bands in a raster using the nlayers()
function.
nlayers(DSM_HARV)
Raster data often has a NoDataValue
associated with it. This is a value assigned to pixels where data is missing or no data were collected.
By default the shape of a raster is always rectangular. So if we have a dataset that has a shape that isn’t rectangular, some pixels at the edge of the raster will have NoDataValues
. This may happens when the data were collected by an airplane which only flew over some part of a defined region.
In the image below, the pixels that are black have NoDataValues. The camera did not collect data in these areas.
In the next image, the black edges have been assigned NoDataValue
. R doesn’t render pixels that contain a specified NoDataValue
. R assigns missing data with the NoDataValue
as NA. The difference here shows up as ragged edges on the plot, rather than black spaces where there is no data.
If your raster already has NA values set correctly but you aren’t sure where they are, you can deliberately plot them in a particular colour. This can be useful when checking a dataset’s coverage. For instance, sometimes data can be missing where a sensor could not ‘see’ its target data, and you may wish to locate that missing data and fill it in.
The value that is conventionally used to take note of missing data (the NoDataValue value) varies by the raster data type. For floating-point rasters, the figure -3.4e+38 is a common default, and for integers, -9999 is common. Some disciplines have specific conventions that vary from these common values.
In some cases, other NA values may be more appropriate. An NA value should be a) outside the range of valid values, and b) a value that fits the data type in use. For instance, if your data ranges continuously from -20 to 100, 0 is not an acceptable NA value! Or, for categories that number 1-15, 0 might be fine for NA, but using -.000003 will force you to save the GeoTIFF on disk as a floating point raster, resulting in a bigger file.
If we are lucky, our GeoTIFF file has a tag that tells us what is the NoDataValue
. If we are less lucky, we can find that information in the raster’s metadata. If a NoDataValue
was stored in the GeoTIFF tag, when R opens up the raster, it will assign each instance of the value to NA. Values of NA will be ignored by R as demonstrated above.
Bad data values are values that fall outside of the applicable range of a dataset.
Examples of Bad Data Values:
Sometimes a raster’s metadata will tell us the range of expected values for a raster. Values outside of this range are suspect and we need to consider that when we analyze the data. Sometimes, we need to use some common sense and scientific insight as we examine the data - just as we would for field data to identify questionable values.
Plotting data with appropriate highlighting can help reveal patterns in bad values and may suggest a solution. Below, reclassification is used to highlight elevation values over 400m with a contrasting colour.
We can explore the distribution of values contained within our raster using a histogram. Histograms are often useful in identifying outliers and bad data values in our raster data.
# Historgram
ggplot() +
geom_histogram(data = DSM_HARV_df, aes(HARV_dsmCrop),bins = 40, color="black", fill="steelblue")+
theme_classic()
The distribution of elevation values for our Digital Surface Model (DSM) looks reasonable. It is likely there are no bad data values in this particular raster.
Exploring Raster Metadata
Use GDALinfo() and the file NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_DSMhill.tif to answer the following questions in Brightspace.
We viewed our data using a continuous color ramp. For clarity and visibility of the plot, we may prefer to view the data “symbolized” or colored according to ranges of values. This is comparable to a “classified” map. To do this, we need to tell ggplot
how many groups to break our data into, and where those breaks should be. To make these decisions, it is useful to first explore the distribution of the data using a bar plot. To begin with, we will use dplyr’s mutate()
function combined with cut()
to split the data into 3 bins.
# Clasifying elevation
DSM_HARV_df <- DSM_HARV_df %>%
mutate(fct_elevation = cut(HARV_dsmCrop, breaks = 3))
ggplot() +
geom_bar(data = DSM_HARV_df, aes(fct_elevation))+
theme_classic()
We can get the count and cutt off values in each group using dplyr’s group_by()
and count()
functions.
DSM_HARV_df %>%
group_by(fct_elevation) %>%
count()
fct_elevation | n |
---|---|
<fct> | <int> |
(305,342] | 418891 |
(342,379] | 1530073 |
(379,416] | 370835 |
We can also customize the cutoff values for these groups. Lets round the cutoff values so that we have groups for the ranges of 301–350 m, 351–400 m, and 401–450 m. To implement this we will give mutate()
a numeric vector of break points instead of the number of breaks we want.
Let's create a vector of cut off ranges called custom_bins
then use the cut()
and mutate()
functions to createa new column called fct_elevation_2
.
# Fine tune the clasification of elevation
custom_bins <- c(300, 350, 400, 450)
DSM_HARV_df <- DSM_HARV_df %>%
mutate(fct_elevation_2 = cut(HARV_dsmCrop, breaks = custom_bins))
ggplot() +
geom_bar(data = DSM_HARV_df, aes(fct_elevation_2))+
theme_classic()
When we assign break values a set of 4 values will result in 3 bins of data.The bin intervals are shown using ( to mean exclusive and ] to mean inclusive. For example: (305, 342] means “from 306 through 342”.
We can use those groups to plot our raster data, with each group being a different color.
# Plot clasified elevation
ggplot() +
geom_raster(data = DSM_HARV_df , aes(x = x, y = y, fill = fct_elevation_2)) +
coord_quickmap()+
theme_classic()
R has a built in set of colors for plotting terrain, which are built in to the terrain.colors()
function. Since we have three bins, we can create a 3-color palette.
# Terrain color palette
terrain.colors(3)
The terrain.colors()
function returns hex colors - each of these character strings represents a color. To use these in our map, we pass them across using the scale_fill_manual()
function.
# Map with terrain color palette
ggplot() +
geom_raster(data = DSM_HARV_df , aes(x = x, y = y,
fill = fct_elevation_2)) +
scale_fill_manual(values = terrain.colors(3)) +
coord_quickmap()+
theme_classic()
We can layer a raster on top of a hillshade raster for the same area, and use a transparency factor to create a 3-dimensional shaded effect. A hillshade is a raster that maps the shadows and texture that you would see from above when viewing terrain. We will add a custom color, making the plot grey.
First let's read in our DSM hillshade data and view the structure.
# Load hillshade
DSM_hill_HARV <-
raster("raw_data/NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_DSMhill.tif")
DSM_hill_HARV
class : RasterLayer dimensions : 1367, 1697, 2319799 (nrow, ncol, ncell) resolution : 1, 1 (x, y) extent : 731453, 733150, 4712471, 4713838 (xmin, xmax, ymin, ymax) crs : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs source : C:/Users/Diego/Documents/6.Biologist's Toolkit/25Feb/biol3782/week9/raw_data/NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_DSMhill.tif names : HARV_DSMhill values : -0.7136298, 0.9999997 (min, max)
Let's convert it to a dataframe so we can plot it with ggplot
.
# Convert to data frame
DSM_hill_HARV_df <- as.data.frame(DSM_hill_HARV, xy = TRUE)
str(DSM_hill_HARV_df)
'data.frame': 2319799 obs. of 3 variables: $ x : num 731454 731455 731456 731457 731458 ... $ y : num 4713838 4713838 4713838 4713838 4713838 ... $ HARV_DSMhill: num NA NA NA NA NA NA NA NA NA NA ...
Now we can plot the data.
# Map hillshade
ggplot() +
geom_raster(data = DSM_hill_HARV_df,
aes(x = x, y = y, alpha = HARV_DSMhill)) +
scale_alpha(range = c(0.15, 0.65), guide = "none") +
coord_quickmap()+
theme_classic()
We've created a plot in grey scale with the alpha
arguement controlling transparency (0 being transparent, 1 being opaque).
We can layer another raster on top of our hillshade by adding another call to the geom_raster()
function. Let’s overlay DSM_HARV
on top of the hill_HARV
.
# Elevation with hillshade
ggplot() +
geom_raster(data = DSM_HARV_df ,
aes(x = x, y = y,
fill = HARV_dsmCrop)) +
geom_raster(data = DSM_hill_HARV_df,
aes(x = x, y = y,
alpha = HARV_DSMhill)) +
scale_fill_viridis_c() +
scale_alpha(range = c(0.15, 0.65), guide = "none") +
ggtitle("Elevation with hillshade") +
coord_quickmap()+
theme_classic()
Sometimes we encounter raster datasets that do not “line up” when plotted or analyzed. Rasters that don’t line up are most often in different Coordinate Reference Systems (CRS).
We will be working with the Harvard Forest Digital Terrain Model data. This differs from the surface model data we’ve been working with so far in that the digital surface model (DSM) includes the tops of trees, while the digital terrain model (DTM) shows the ground level.
Here, we will create a map of the Harvard Forest Digital Terrain Model
(DTM_HARV) draped or layered on top of the hillshade
(DTM_hill_HARV). The hillshade layer maps the terrain using light and shadow to create a 3D-looking image, based on a hypothetical illumination of the ground level.
First, let's import the DTM and DTM hillshade data and convert them to dataframes.
# Import the DTM and DTM hillshade data and convert them to dataframes
DTM_HARV <- raster("raw_data/NEON-DS-Airborne-Remote-Sensing/HARV/DTM/HARV_dtmCrop.tif")
DTM_HARV_df <- as.data.frame(DTM_HARV, xy = TRUE)
DTM_hill_HARV <- raster("raw_data/NEON-DS-Airborne-Remote-Sensing/HARV/DTM/HARV_DTMhill_WGS84.tif")
DTM_hill_HARV_df <- as.data.frame(DTM_hill_HARV, xy = TRUE)
Now we can create a map of the DTM layered over the hillshade.
ggplot() +
geom_raster(data = DTM_HARV_df ,
aes(x = x, y = y,
fill = HARV_dtmCrop)) +
geom_raster(data = DTM_hill_HARV_df,
aes(x = x, y = y,
alpha = HARV_DTMhill_WGS84)) +
scale_fill_gradientn(name = "Elevation", colors = terrain.colors(10)) +
coord_quickmap()+
theme_classic()
Notice that nothing got plotted. Let’s try to plot the DTM on its own to make sure there are data there.
ggplot() +
geom_raster(data = DTM_HARV_df,
aes(x = x, y = y,
fill = HARV_dtmCrop)) +
scale_fill_gradientn(name = "Elevation", colors = terrain.colors(10)) +
coord_quickmap()+
theme_classic()
Now, let’s try to plot the DTM Hillside data on its own to make sure there are data there.
ggplot() +
geom_raster(data = DTM_hill_HARV_df,
aes(x = x, y = y,
alpha = HARV_DTMhill_WGS84)) +
coord_quickmap()+
theme_classic()
Notice that the axis values are different. When this is the case, ggplot
won’t render the image or throw an error message to tell you something has gone wrong. We'll have to look at Coordinate Reference Systems (CRSs) of the DTM and the hillshade data to see how they differ.
If they have different projections, we need to reproject (change the projection of) one to match the other.
How do DTM_HARV and DTM_hill_HARV differ?
Use the CRS of each raster object to answer the following questions in Brightspace.
We can use the projectRaster()
function to re-project a raster into a new CRS. Keep in mind that re-projection only works when you first have a defined CRS for the raster object that you want to re-project. It cannot be used if no CRS is defined.
When we re-project a raster, we move it from one “grid” to another. Thus, we are modifying the data! Keep this in mind as we work with raster data.
To use the projectRaster()
function, we need to define two things:
We want the CRS of our hillshade
to match the DTM_HARV
raster. We can thus assign the CRS of our DTM_HARV
to our hillshade
within the projectRaster()
function as follows:
crs = crs(DTM_HARV)
We are using the projectRaster()
function on the raster object, not the data.frame()
we use for plotting with ggplot.
First we will reproject our DTM_hill_HARV
raster data to match the DTM_HARV
raster CRS.
# Reproject DTM_hill_HARV raster data to match the DTM_HARV raster CRS
DTM_hill_UTMZ18N_HARV <- projectRaster(DTM_hill_HARV,
crs = crs(DTM_HARV))
Now let's check to make sure they're both the same.
crs(DTM_hill_UTMZ18N_HARV)
crs(DTM_hill_HARV)
CRS arguments: +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs
CRS arguments: +proj=longlat +datum=WGS84 +no_defs
Now they both use WGS84. Let's also check to see if their extents are the same.
extent(DTM_hill_UTMZ18N_HARV)
extent(DTM_hill_HARV)
class : Extent xmin : 731397.3 xmax : 733205.3 ymin : 4712403 ymax : 4713907
class : Extent xmin : -72.18192 xmax : -72.16061 ymin : 42.52941 ymax : 42.54234
Notice in the output above that the crs(
) of DTM_hill_UTMZ18N_HARV
is now UTM. However, the extent values of DTM_hillUTMZ18N_HARV
are different from DTM_hill_HARV
. This is because the extent for DTM_hill_UTMZ18N_HARV
is in UTMs so the extent is in meters. While the extent for DTM_hill_HARV
is in lat/long so the extent is expressed in decimal degrees.
Proceed with caution when you are reprojecting raster data. Often it’s best to reproject your vector data as reprojecting a raster means that the entire dataset are interpolated and cast into a new grid system. This adds error and uncertainty to your analysis. There are times when you need to reproject your data. However, consider carefully whether you need to do this, before implementimg it in an analysis.
Sometimes you'll have the same data with different resolution. We can tell R to force our newly reprojected raster to be 1m x 1m resolution by adding a line of code res=1
within the projectRaster()
function.
# Adjust resolution
DTM_hill_UTMZ18N_HARV <- projectRaster(DTM_hill_HARV,
crs = crs(DTM_HARV),
res = res(DTM_HARV))
Let's check to see if both rasters have the same resolution.
res(DTM_hill_UTMZ18N_HARV)
res(DTM_HARV)
Now that they're the same, we can create a dataframe from our newly re-projected raster then plot the data.
# Overlay of DTM_HARV_df with its hillshade
DTM_hill_HARV_2_df <- as.data.frame(DTM_hill_UTMZ18N_HARV, xy = TRUE)
ggplot() +
geom_raster(data = DTM_HARV_df ,
aes(x = x, y = y,
fill = HARV_dtmCrop)) +
geom_raster(data = DTM_hill_HARV_2_df,
aes(x = x, y = y,
alpha = HARV_DTMhill_WGS84)) +
scale_fill_gradientn(name = "Elevation", colors = terrain.colors(10)) +
coord_quickmap()+
theme_classic()
As you can see, not the two layer perfectly line up!
We often want to perform calculations on two or more rasters to create a new output raster. For example, if we are interested in mapping the heights of trees across an entire field site, we might want to calculate the difference between the Digital Surface Model (DSM, tops of trees) and the Digital Terrain Model (DTM, ground level). The resulting dataset is referred to as a Canopy Height Model (CHM) and represents the actual height of trees, buildings, etc. with the influence of ground elevation removed.
We can calculate the difference between two rasters in two different ways:
by directly subtracting the two rasters in R using raster math or for more efficient processing - particularly if our rasters are large and/or the calculations we are performing are complex:
using the overlay()
function.
We can perform raster calculations by subtracting (or adding, multiplying, etc) two rasters. In the geospatial world, we call this “raster math”. Let’s subtract the DTM from the DSM to create then plot a Canopy Height Model.
# Calculating Canopy Height Model using raster math
CHM_HARV <- DSM_HARV - DTM_HARV
CHM_HARV_df <- as.data.frame(CHM_HARV, xy = TRUE)
ggplot() +
geom_raster(data = CHM_HARV_df ,
aes(x = x, y = y, fill = layer)) +
scale_fill_gradientn(name = "Canopy Height", colors = terrain.colors(10)) +
coord_quickmap()+
theme_classic()
Let’s inspect the distribution of values in our newly created Canopy Height Model (CHM).
# Histogram of Canopy Height Model
ggplot(CHM_HARV_df) +
geom_histogram(aes(layer),bins = 40, color="black", fill="steelblue")+
theme_classic()
Warning message: "Removed 1 rows containing non-finite values (stat_bin)."
Notice that the range of values for the output CHM is between 0 and 30 meters. Does this make sense?
Check to see if model makes sense
Using the CHM_HARV raster object we created, answer the following questions in Brightspace.
Raster math, like we just did, is an appropriate approach to raster calculations if:
However, raster math is a less efficient approach as computation becomes more complex or as file sizes become large. The overlay()
function is more efficient when:
The overlay()
function takes two or more rasters and applies a function to them using efficient processing methods.
Let’s perform the same subtraction calculation that we calculated above using raster math, using the overlay()
function.
# CHM calculationusing the overlay() function
CHM_ov_HARV <- overlay(DSM_HARV,
DTM_HARV,
fun = function(r1, r2) { return( r1 - r2) })
CHM_ov_HARV_df <- as.data.frame(CHM_ov_HARV, xy = TRUE)
Now that our data is in the correct format, we can plot it.
# Map CHM
ggplot() +
geom_raster(data = CHM_ov_HARV_df,
aes(x = x, y = y, fill = layer)) +
scale_fill_gradientn(name = "Canopy Height", colors = terrain.colors(10)) +
coord_quickmap()+
theme_classic()
How do the plots of the CHM created with manual raster math and the overlay()
function compare?
So far we've worked with single band rasters. When there are multiple bands, every cell location has more than one value associated with it. With multiple bands, each band usually represents a segment of the electromagnetic spectrum collected by a sensor. Bands can represent any portion of the electromagnetic spectrum, including ranges not visible to the eye, such as the infrared or ultraviolet sections. The term band originated from the reference to the color band on the electromagnetic spectrum. A satellite image, for example, commonly has multiple bands representing different wavelengths from the ultraviolet through the visible and infrared portions of the electromagnetic spectrum. Landsat imagery, for example, is data collected from seven different bands of the electromagnetic spectrum. Bands 1–7, including 6, represent data from the visible, near infrared, and midinfrared regions. Band 6 collects data from the thermal infrared region. Another example of a multiband image is a true color orthophoto in which there are three bands, each representing either red, green, or blue light.
The multi-band data that we are working with is imagery collected using the NEON Airborne Observation Platform high resolution camera over the NEON Harvard Forest field site. Each RGB image is a 3-band raster. The same steps would apply to working with a multi-spectral image with 4 or more bands - like Landsat imagery.
If we read a RasterStack object into R using the raster()
function, it only reads in the first band.
# Load RGB data BAND 1
RGB_band1_HARV <- raster("raw_data/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif")
Let's convert this data into a dataframe and plot it with ggplot
.
# Convert this data into a dataframe and plot it
RGB_band1_HARV_df <- as.data.frame(RGB_band1_HARV, xy = TRUE)
ggplot() +
geom_raster(data = RGB_band1_HARV_df,
aes(x = x, y = y, alpha = HARV_RGB_Ortho)) +
coord_quickmap()+
theme_classic()
This raster contains values between 0 and 255. These values represent degrees of brightness associated with the image band. In the case of a RGB image (red, green and blue), band 1 is the red band. When we plot the red band, larger numbers (towards 255) represent pixels with more red in them (a strong red reflection). Smaller numbers (towards 0) represent pixels with less red in them (less red was reflected). To plot an RGB image, we mix red + green + blue values into one single color to create a full color image - similar to the color image a digital camera creates.
We can use the raster()
function to import specific bands in our raster object by specifying which band we want with band = N (N represents the band number we want to work with). To import the green band, we would use band = 2.
# Load RGB data BAND 2
RGB_band2_HARV <- raster("raw_data/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif", band = 2)
Let's convert this data into a dataframe and plot it with ggplot
.
# Convert this data into a dataframe and plot it
RGB_band2_HARV_df <- as.data.frame(RGB_band2_HARV, xy = TRUE)
ggplot() +
geom_raster(data = RGB_band2_HARV_df,
aes(x = x, y = y, alpha = HARV_RGB_Ortho)) +
coord_equal()+
theme_classic()
Now, we will work with all three image bands (red, green and blue) as an R RasterStack object. We will then plot a 3-band composite, or full color, image.
To bring in all bands of a multi-band raster, we use the stack()
function.
# Load RGB Stack
RGB_stack_HARV <- stack("raw_data/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif")
Let’s preview the attributes of our stack object.
RGB_stack_HARV
We can also view the attributes of each band in the stack in a single output.
RGB_stack_HARV@layers
[[1]] class : RasterLayer band : 1 (of 3 bands) dimensions : 2317, 3073, 7120141 (nrow, ncol, ncell) resolution : 0.25, 0.25 (x, y) extent : 731998.5, 732766.8, 4712956, 4713536 (xmin, xmax, ymin, ymax) crs : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs source : C:/Users/Diego/Documents/6.Biologist's Toolkit/25Feb/biol3782/week9/raw_data/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif names : HARV_RGB_Ortho.1 values : 0, 255 (min, max) [[2]] class : RasterLayer band : 2 (of 3 bands) dimensions : 2317, 3073, 7120141 (nrow, ncol, ncell) resolution : 0.25, 0.25 (x, y) extent : 731998.5, 732766.8, 4712956, 4713536 (xmin, xmax, ymin, ymax) crs : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs source : C:/Users/Diego/Documents/6.Biologist's Toolkit/25Feb/biol3782/week9/raw_data/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif names : HARV_RGB_Ortho.2 values : 0, 255 (min, max) [[3]] class : RasterLayer band : 3 (of 3 bands) dimensions : 2317, 3073, 7120141 (nrow, ncol, ncell) resolution : 0.25, 0.25 (x, y) extent : 731998.5, 732766.8, 4712956, 4713536 (xmin, xmax, ymin, ymax) crs : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs source : C:/Users/Diego/Documents/6.Biologist's Toolkit/25Feb/biol3782/week9/raw_data/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif names : HARV_RGB_Ortho.3 values : 0, 255 (min, max)
Or we can index then examine a specific band.
RGB_stack_HARV[[2]]
class : RasterLayer band : 2 (of 3 bands) dimensions : 2317, 3073, 7120141 (nrow, ncol, ncell) resolution : 0.25, 0.25 (x, y) extent : 731998.5, 732766.8, 4712956, 4713536 (xmin, xmax, ymin, ymax) crs : +proj=utm +zone=18 +datum=WGS84 +units=m +no_defs source : C:/Users/Diego/Documents/6.Biologist's Toolkit/25Feb/biol3782/week9/raw_data/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif names : HARV_RGB_Ortho.2 values : 0, 255 (min, max)
We can also plot the data in any layer of our RasterStack object by first converting it into a dataframe.
# Convert to data frame
RGB_stack_HARV_df <- as.data.frame(RGB_stack_HARV, xy = TRUE)
Let's take a look at RGB_stack_HARV_df
.
str(RGB_stack_HARV_df)
'data.frame': 7120141 obs. of 5 variables: $ x : num 731999 731999 731999 731999 732000 ... $ y : num 4713535 4713535 4713535 4713535 4713535 ... $ HARV_RGB_Ortho.1: num 0 2 6 0 16 0 0 6 1 5 ... $ HARV_RGB_Ortho.2: num 1 0 9 0 5 0 4 2 1 0 ... $ HARV_RGB_Ortho.3: num 0 10 1 0 17 0 4 0 0 7 ...
Now let's plot the raster data of the first band using a histogram. We will specify the band name, HARV_RGB_Ortho.1
, in the aes
agruement of geom_histogram
.
# Histogram of RGB_stack_HARV_df
ggplot() +
geom_histogram(data = RGB_stack_HARV_df, aes(HARV_RGB_Ortho.1),bins = 40, color = "black", fill = "steelblue")+
theme_classic()
We can create raster plots of specific bands by specifying the band name in the aes
arguement. Let's take a look at how this works by plotting the second band HARV_RGB_Ortho.2
.
# PLot second band
ggplot() +
geom_raster(data = RGB_stack_HARV_df,
aes(x = x, y = y, alpha = HARV_RGB_Ortho.2)) +
coord_quickmap()+
theme_classic()
To render a final three band, colored image in R, we use the plotRGB()
function.
This function allows us to:
plotRGB()
function defaults to a 1=red, 2=green, and 3=blue band order. However, you can define what bands you’d like to plot manually. Manual definition of bands is useful if you have, for example a near-infrared band and want to create a color infrared image.Let’s plot our 3-band image.
We can use the plotRGB()
function directly with our RasterStack object (we don’t need a dataframe as this function isn’t part of the ggplot2 package).
# Plot pseudo-color image
plotRGB(RGB_stack_HARV,
r = 1, g = 2, b = 3)
We can explore whether applying a stretch to the image might improve clarity and contrast using stretch="lin"
or stretch="hist"
.
When the range of pixel brightness values is closer to 0, a darker image is rendered by default. We can stretch the values to extend to the full 0-255 range of potential values to increase the visual contrast of the image.
When the range of pixel brightness values is closer to 255, a lighter image is rendered by default. We can stretch the values to extend to the full 0-255 range of potential values to increase the visual contrast of the image.
Let's plot our data with the stretch="lin"
setting.
# Image using stretch="lin"
plotRGB(RGB_stack_HARV,
r = 1, g = 2, b = 3,
scale = 800,
stretch = "lin")
Now, let's plot our data with the stretch="hist"
setting.
# Image using stretch="hist"
plotRGB(RGB_stack_HARV,
r = 1, g = 2, b = 3,
scale = 800,
stretch = "hist")
What do you notice about each setting?
The R RasterStack and RasterBrick object types can both store multiple bands. However, how they store each band is different. The bands in a RasterStack are stored as links to raster data that is located somewhere on our computer. A RasterBrick contains all of the objects stored within the actual R object. In most cases, we can work with a RasterBrick in the same way we might work with a RasterStack. However a RasterBrick is often more efficient and faster to process - which is important when working with larger files.
We can turn a RasterStack into a RasterBrick in R by using brick(StackName
. Let’s use the object.size()
function to compare RasterStack and RasterBrick objects. First we will check the size of our RasterStack object.
object.size(RGB_stack_HARV)
50176 bytes
Now we will create a RasterBrick object from our RasterStack data and view its size.
RGB_brick_HARV <- brick(RGB_stack_HARV)
object.size(RGB_brick_HARV)
170898632 bytes
With a RasterBrick, all of the bands are stored within the actual object. Thus, the RasterBrick object size is much larger than the RasterStack object. The plots for both should be the same (since they use the same data. They're just stored differently).
plotRGB(RGB_brick_HARV)
Code below is for formatting of this lab. Do not alter!
<span class="important"></span>
<img src="imdbtitle.png">
cssFile <- '../css/custom.css'
IRdisplay::display_html(readChar(cssFile, file.info(cssFile)$size))
IRdisplay::display_html("<style>.Q::before {counter-increment: question_num;
content: 'QUESTION ' counter(question_num) ': '; white-space: pre; }.T::before {counter-increment: task_num;
content: 'Task ' counter(task_num) ': ';</style>")