NHDPlusTools Data Access Overview

Since it’s original release in 2019, data access in nhdplusTools has evolved and grown considerably. This vignette, prepared for a workshop in Summer 2024 shows the diversity of data available with nhdplusTools in one holistic overview.

The function naming in nhdplusTools related to data access uses three common terms:
1. “download” will generally download complete data sets for use locally. 2. “get” will pull a data subset from a web service. 3. “subset” will pull a subset from a local file and, when available, a web service.

Dataset function index.

The following summarizes specific data access functions available for the datasets that nhdplusTools has supporting functionality for.

  • NHDPlusV2
    • download_nhdplusv2() - downloads a National seamless geodatabase
    • get_nhdplus() - access NHDPlusV2 from a web service
    • subset_nhdplus() - subsets the National geodatabase OR the web service
    • subset_rpu() - subsets loaded NHDPlusV2 by raster processing unit
    • subset_vpu() - subsets loaded NHDPlusV2 by vector processing unit
    • get_boundaries() - retrieves NHDPlusV2 processing unit boundaries
  • NHDPlusHR
    • download_nhdplushr() - downloads staged geodatabases
    • get_nhdplushr() - assembles nhdplushr datasets from many staged databases
    • get_hr_data() - helps pull data out of staged databases (used by get_nhdplushr())
  • NHD
    • download_nhd()- downloads staged geodatabases
  • 3DHP
    • get_3dhp - access 3DHP from a web service
  • geoconnex
    • get_geoconnex_reference() - access geoconnex reference features from a web service
  • WBD
    • download_wbd() - download a National WBD geodatabase
    • get_huc() - get a particular Hydrologic Unit Code from a service
  • RF1
    • download_rf1() - download a National RF1 geodatabase

Network Linked Data Index integration

The Network Linked Data Index plays a key role in discovery nhdplusTools. It provides access to an easy network navigation and basin boundary delineation tool. Behind the scenes, the NLDI is based on the network of the NHDPlusV2 – so has some special functionality related directly to that dataset. Functions that utilize the NLDI include:

  • subset_nhdplus()
  • plot_nhdplus()
  • map_nhdplus()
  • discover_nhdplus_id()
  • get_nldi_basin()
  • get_nldi_feature()
  • get_nldi_index()
  • navigate_network()
  • navigate_nldi()
  • get_split_catchment()
  • get_raindrop_trace()

In the demo below, we’ll choose a stream gage and show how to access data related to it from all the datasets that nhdplusTools works with as well as use some of the NLDI functionality in question.

For the sake of demonstration, we’ll look at the Wolf River at Langlade, WI – site ID “04074950”.

nhdplusTools allows us to start from a stream gage to build a subset of NHDPlusV2 data. Below, we’ll do just that and plot the results on a default map.


demo_dir <- nhdplusTools_data_dir()

site_id <- "USGS-04074950"


# use dataRetrieval::get_nldi_sources() to find other nldi sources

site <- list(featureSource = "nwissite",
             featureID = "USGS-04074950")

site_feature <- get_nldi_feature(site)

upstream_network <- navigate_nldi(site, 
                                  mode = "UT", distance_km = 9999)

demo_data <- file.path(demo_dir, "data_demo.gpkg")

dataset <- subset_nhdplus(as.integer(upstream_network$UT_flowlines$nhdplus_comid), 
                          nhdplus_data = "download", # download from a service
                          output_file = demo_data, # write the data to disk
                          return_data = TRUE, # return the data rather
                          flowline_only = FALSE, overwrite = TRUE)

names(dataset)

sapply(dataset, nrow)

old_par <- par(mar = c(0, 0, 0, 0))

plot_nhdplus(outlets = list(featureSource = "nwissite",
                            featureID = "USGS-04074950"), 
             nhdplus_data = demo_data, flowline_only = TRUE)

The above is the original way NHDPlusTools supported access NHDPlusV2 data. A dedicated web service subset utility is available in get_nhdplus – which is what subset_nhdplus() calls behind the scenes.

Here we grab the basin for our site and request NHDPlus with its geometry as the Area of Interest.


basin <- get_nldi_basin(site)

subset <- get_nhdplus(AOI = basin, realization = "flowline")

par(mar = c(0, 0, 0, 0))

plot(sf::st_geometry(basin))
plot(sf::st_geometry(subset), col = "blue", add = TRUE)

For NHDPlusHR data, which is much denser than NHDPlusV2, nhdplusTools supports downloading four-digit Hydrologic Unit Code staged geodatabases. The function get_huc() is useful to discover the code needed here.


wolf_huc <- get_huc(basin, type = 'huc04')

nrow(wolf_huc)

# it straddles hucs? Not really.

par(mar = c(0, 0, 0, 0))
plot(sf::st_geometry(basin), col = "grey")
plot(sf::st_geometry(wolf_huc), add = TRUE)

wolf_huc <- get_huc(site_feature, type = "huc04")

nrow(wolf_huc)

# better!!
par(mar = c(0, 0, 0, 0))
plot(sf::st_geometry(wolf_huc))
plot(sf::st_geometry(basin), col = "grey", add = TRUE)

outdir <- file.path(nhdplusTools_data_dir(), "hr_access_demo")

dir.create(outdir)

download_dir <- download_nhdplushr(outdir, wolf_huc$huc4)

list.files(download_dir)

If we had asked for more HUC4 codes, additional gdb files would be in the directory we specified.

With this, we can use one of the two functions for access NHDPlusHR data to load data from the directory. Here, we use the more complete get_nhdplus() and set check_terminals=TRUE which uses make_standalone() to ensure that the nhdplus attributes are complete and self-consistent within the subset of data returned.


nhdplushr <- get_nhdplushr(
  download_dir, 
  layers = c("NHDFlowline", "NHDPlusCatchment", "NHDWaterbody", 
             "NHDArea", "NHDLine", "NHDPlusSink", "NHDPlusWall", 
             "NHDPoint", "NHDPlusBurnWaterbody", "NHDPlusBurnLineEvent",
             "HYDRO_NET_Junctions", "WBDHU2", "WBDHU4","WBDHU6", 
             "WBDHU8", "WBDHU10", "WBDHU12", "WBDLine"), 
  check_terminals = TRUE)

sapply(nhdplushr, nrow)

At a lower level, we can use get_hr_data() to access particular layers. Here, rename=TRUE causes the nhdplushr names to be normalized to nhdplusTools conventions using align_nhdplus_names().

NOTE: “NHDPlusID” from nhdplushr is replaced with the name “COMID”. This attribute is merely a unique integer identifier and should not be assumed to relate to anything outside the context of a given dataset.


nhdplushr <- get_hr_data(list.files(download_dir, pattern = ".gdb", full.names = TRUE),
                         layer = "NHDFlowline", rename = TRUE)

names(nhdplushr)

# Great Lakes coast are flowlines -- remove for visuals
gl_coast <- c(get_DM(nhdplushr, 60002700000331),
              get_DM(nhdplushr, 60002700049157))

plot_data <- dplyr::filter(nhdplushr, FCODE != 56600 & StreamOrde > 2 & !COMID %in% gl_coast)

par(mar = c(0, 0, 0, 0))
plot(sf::st_geometry(wolf_huc))
plot(sf::st_geometry(basin), col = "grey", add = TRUE)
plot(sf::st_geometry(plot_data), lwd = plot_data$StreamOrde / 3, col = "blue", add = TRUE)

par(mar = c(0, 0, 0, 0))
plot(sf::st_geometry(basin), col = "grey")
plot(sf::st_geometry(plot_data), lwd = plot_data$StreamOrde / 2, col = "blue", add = TRUE)

BONUS DEMO: Say we want to know where our stream gage is on the river in question…

get_flowline_index() and disambiguate_flowline_indexes() are your friends.


potential_matches <- get_flowline_index(nhdplushr, 
                                        points = site_feature, 
                                        search_radius = units::as_units(500, "m"),
                                        max_matches = 5)

potential_matches

site_meta <- dataRetrieval::readNWISsite(gsub("USGS-", "", site_feature$identifier))

sqmi_to_sqkm <- 2.59

da_df <- data.frame(id = 1, drainagearea = site_meta$drain_area_va * sqmi_to_sqkm)

# uses nearest drainage area match
disambiguate_flowline_indexes(potential_matches,
                              flowpath = dplyr::select(nhdplushr, COMID, TotDASqKM),
                              hydro_location = da_df)

If what you really need is the base NHD, which is now a static dataset, the pattern is just the same as with nhdplushr using download_nhd().


outdir <- file.path(nhdplusTools_data_dir(), "nhd_access_demo")

dir.create(outdir)

download_dir <- download_nhd(outdir, wolf_huc$huc4)

list.files(download_dir)

nhd_gdb <- list.files(download_dir, pattern = ".gdb", full.names = TRUE)

sf::st_layers(nhd_gdb)

nhd_fline <- sf::read_sf(nhd_gdb, "NHDFlowline")

We’ll wait to plot this up until after we’ve done some work with 3DHP. As of writing, the 3DHP is more or less the same as the NHD but in a new database schema. The get_3dhp() uses a web service to pull data for subsets much the same as the other get_* functions. See vignette("get_3dhp_data.Rmd") for more on how to work with 3DHP data.


sub_3dhp <- get_3dhp(basin, type = "flowline")

plot(sub_3dhp)

sub_3dhp <- st_compatibalize(sub_3dhp, nhd_fline)

par(mar = c(0, 0, 0, 0))
plot(sf::st_geometry(basin), col = "grey")
plot(sf::st_geometry(sf::st_zm(sub_3dhp)), col = "skyblue", lwd = 2.5, add = TRUE)
plot(sf::st_geometry(sf::st_zm(nhd_fline)), col = "blue", lwd = 0.1, add = TRUE)

A brand new, May 2024, feature in nhdplusTools is access to the geoconnex.us reference feature server. The reference feature server provides easy access to representations of datasets that people use to cross reference other data. discover_geoconnex_reference() provides access to a full table of what’s available.


unique(discover_geoconnex_reference()[c("id", "title")])

Each of these sets of “reference features” includes a “uri” which is a persistent way to identify these features and will always lead you back to a representation of the feature when you look it up.

For example, let’s get a subset of data for our Wolf River basin.


wolf_gages <- get_geoconnex_reference(basin, type = "gages")

geoconnex_gage <- dplyr::filter(wolf_gages, 
                                provider_id == gsub("USGS-", "", site_feature$identifier))

wolf_mainstems <- get_geoconnex_reference(basin, type = "mainstems")

wolf_mainstem <- dplyr::filter(wolf_mainstems, uri == geoconnex_gage$mainstem_uri)

par(mar = c(0, 0, 0, 0))
plot(sf::st_geometry(basin), col = "grey")
plot(sf::st_geometry(wolf_mainstems), lwd = 0.5, col = "blue", add = TRUE)
plot(sf::st_geometry(wolf_mainstem), lwd = 3, col = "blue", add = TRUE)
plot(sf::st_geometry(wolf_gages), pch = 2, cex = 0.75, add = TRUE)
plot(sf::st_geometry(geoconnex_gage), pch = 17, cex = 1.5, add = TRUE)

The Watershed Boundary Dataset contains hydrologic units that are used as cataloging units for the country. It was developed and improved over the last twenty years and is nearing a complete static state. There are three prominent versions of it available – a snapshot available as part of the NHDPlusV2, a snapshot available as part of and retrieved with get_huc(), and the latest (soon to be final) snapshot available using download_wbd().


wbd_dir <- file.path(nhdplusTools_data_dir(), "wbd_access_demo")
  
wbd_out <- download_wbd(wbd_dir)

# zip::unzip doesn't always work
if(length(wbd_out == 0)) {
  f <- list.files(wbd_dir, pattern = ".zip", full.names = TRUE)  
  
  utils::unzip(f, exdir = wbd_dir)
}

wbd_gdb <- list.files(wbd_dir, pattern = ".gdb", full.names = TRUE)

sf::st_layers(wbd_gdb)