--- title: "Changes to NWIS QW services" author: Laura A. DeCicco editor_options: chunk_output_type: console output: rmarkdown::html_vignette: toc: true number_sections: true vignette: > %\VignetteIndexEntry{Changes to NWIS QW services} \usepackage[utf8]{inputenc} %\VignetteEngine{knitr::rmarkdown} --- ```{r setup, include=FALSE, message=FALSE} library(knitr) library(dataRetrieval) library(dplyr) options(continue = " ") knitr::opts_chunk$set( echo = TRUE, message = FALSE, fig.height = 7, fig.width = 7 ) ``` **IMPORTANT** These recommendations have been updated using WQX3.0 profiles. # Changes to NWIS water quality services USGS discrete water samples data are undergoing modernization, and NWIS services will no longer be updated with the latest data starting mid-March 2024, with a full decommission expected 6 months later. For the latest news on USGS water quality data, see: [https://doi-usgs.github.io/dataRetrieval/articles/Status.html](https://doi-usgs.github.io/dataRetrieval/articles/Status.html). Learn more about the changes and where to find the new samples data in the [WDFN blog](https://waterdata.usgs.gov/blog/changes-to-sample-data/). What does this mean for `dataRetrieval` users? Eventually water quality data will ONLY be available from the [Water Quality Portal](https://www.waterqualitydata.us/) rather than the NWIS services. There are 3 major `dataRetrieval` functions that will be affected: `readNWISqw`, `whatNWISdata`, and `readNWISdata`. This vignette will describe the most common workflows conversions needed to update existing scripts. # How to find more help This vignette is being provided in advance of any breaking changes, and more information and guidance will be provided. These changes are big, and initially sound overwhelming. But in the end, they are thoughtful changes that will make understanding USGS water data more intuitive. Please reach out for questions and comments to: gs-w-IOW_PO_team@usgs.gov # WQX2 -> WQX3 If you have already converted your workflow from NWIS to WQP, much of the hard work has been done! The final step of the process is to make sure workflows are using the most modern WQX3 formats. WQX stands for Water Quality Exchange. WQX2 is the format that has been historically available on the Water Quality Portal, WQX3 is the more modern format that all the data is being converted to. This is available as of `dataRetrieval` version 2.7.16. Starting with 2.7.16, all WQP functions default to the newer "WQX3" profiles if available. See [WQX Conversions](#conversion) below to get a table of column names in WQX3 vs WQX2. # NWIS -> WQX3 ## `readNWISqw` This function was retired as of Oct. 24, 2024. So...what do you use instead? The function you will need to move to is `readWQPqw`. First, you'll need to convert the numeric USGS site ID's into something that the Water Quality Portal will accept, which requires the agency prefix. For most USGS sites this will mean pasting 'USGS-' before the site number, although it is important to note that there are some USGS sites that begin with a different prefix: it is up to users to determine the agency code.. Here's an example: ```{r eval=FALSE} wqpData <- readWQPqw(paste0("USGS-", site_ids), parameterCd) ``` Let's say we have a data frame that we got from the retired `readNWISqw` function and we saved it as `nwisData`. ```{r echo=FALSE} nwisData <- readRDS("nwisData.rds") wqpData <- readRDS("wqpData.rds") ``` First we compare the number of rows, number of columns, and attributes to each return: ```{r} nrow(nwisData) nrow(wqpData) ``` So, same number of rows returned. That's good, since it's the same data. ```{r} ncol(nwisData) ncol(wqpData) ``` Different columns! ```{r} names(attributes(nwisData)) names(attributes(wqpData)) ``` Slightly different attributes. You can explore the differences of those attributes: ```{r} site_NWIS <- attr(nwisData, "siteInfo") site_WQP <- attr(wqpData, "siteInfo") ``` The next big task is figuring out which columns from the WQP output map to the original columns from the NWIS output. Look at your workflow and determine what columns from the original NWIS output are needed to preserve the integrity of the workflow. Let's use the `dplyr` package to pull out the columns are used in this example workflow, and make sure both NWIS and WQP are ordered in the same way. ```{r} library(dplyr) nwisData_relevant <- nwisData |> select( site_no, startDateTime, parm_cd, remark_cd, result_va ) |> arrange(startDateTime, parm_cd) knitr::kable(head(nwisData_relevant)) ``` If we explore the output from WQP, we can try to find the columns that include the same relevant information: ```{r} wqpData_relevant <- wqpData |> select( site_no = Location_Identifier, startDateTime = Activity_StartDateTime, parm_cd = USGSpcode, remark_cd = Result_ResultDetectionCondition, result_va = Result_Measure, detection_level = DetectionLimit_MeasureA ) |> arrange(startDateTime, parm_cd) knitr::kable(head(wqpData_relevant)) ``` Now we can start looking at the results and trying to decide how future workflows should be setup. Here are some decisions for this example that we can consider: ### Censored values The result_va in the NWIS service came back with a value. However, the data is actually censored, meaning we only know it's below the detection limit. With some lazier coding, it might have been really easy to not realize these values are left-censored. So, while we could substitute the detection levels into the measured values if there's an `NA` in the measured value, this might be a great time to update your workflow to handle censored values more robustly. We are probably interested in maintaining the detection level in another column. For this theoretical workflow, let's think about what we are trying to find out. Let's say that we want to know if a value is "left-censored" or not. Maybe in this case, what would make the most sense is to have a column that is a logical TRUE/FALSE. For this example, there was only the text "Not Detected" in the "ResultDetectionConditionText" column PLEASE NOTE that other data may include different messages about detection conditions, you will need to examine your data carefully. Here's an example from the `EGRET` package on how to decide if a "ResultDetectionConditionText" should be considered a censored value: ```{r} censored_text <- c( "Not Detected", "Non-Detect", "Non Detect", "Detected Not Quantified", "Below Quantification Limit" ) wqpData_relevant <- wqpData |> mutate(left_censored = grepl(paste(censored_text, collapse = "|"), Result_ResultDetectionCondition, ignore.case = TRUE )) |> select( site_no = Location_Identifier, startDateTime = Activity_StartDateTime, parm_cd = USGSpcode, left_censored, result_va = Result_Measure, detection_level = DetectionLimit_MeasureA, dl_units = DetectionLimit_MeasureUnitA ) |> arrange(startDateTime, parm_cd) knitr::kable(head(wqpData_relevant)) ``` ### NWIS codes Another difference that is going to require some thoughtful decisions is how to interpret additional NWIS codes. They will now be descriptive text. Columns such as samp_type_cd and medium_cd will now all be reported with descriptive words rather than single letters or numbers. It will be the responsibility of the user to consider the best way to deal with these changes. If you use the `readNWISqw` function, you WILL need to adjust your workflows, and you may find there are more codes you will need to account for. Hopefully this section helped get you started. It does not include every scenario, so you may find more columns or codes or other conditions you need to account for. # whatNWISdata This function will continue to work for any service EXCEPT "qw" (water quality discrete data). "qw" results will eventually no longer be returned, and are currently showing values that were frozen in March 2024. The function to replace this functionality for discrete water quality data is currently `whatWQPdata`: ```{r whatdata, eval=FALSE} whatNWIS <- whatNWISdata( siteNumber = site_ids, service = "qw" ) ``` ``` WARNING: NWIS does not deliver new discrete water quality data or updates to existing data. For additional details, see: https://doi-usgs.github.io/dataRetrieval/articles/Status.html ``` ```{r whatdatanew, eval=FALSE} whatWQP <- whatWQPdata(siteNumber = paste0("USGS-", site_ids)) ``` There are some major differences in the output. The NWIS services offers back one row per site/parameter code to learn how many samples are available. This is not *currently* available from the Water Quality Portal, however there are new summary services being developed. When those become available, we will include new documentation on how to get this information. # readNWISdata If you get your water quality data from the `readNWISdata` function, no new data will be available and the function will generate a warning message. The other services are working as before. This is not an especially common `dataRetrieval` workflow, so there are not a lot of details here. Please reach out if more information is needed to update your workflows. See `?readWQPdata` to see all the ways to query data in the Water Quality Portal. Use the suggestions above to convert the output of the `readWQPdata` function to convert the WQP output to what is important for your workflow. # WQX Conversion {#conversion} A table is provided on the EPA website that shows the conversions of WQX3 to the WQX2 mappings. This table may change periodically while WQP services are under active development, which is expected to last through Fall 2024. Here is an example for pulling the EPA table and comparing column names. ```{r echo=TRUE, eval=TRUE} schema <- readr::read_csv("https://www.epa.gov/system/files/other-files/2024-07/schema_outbound_wqx3.0.csv") ``` Within that schema table, the column "FieldName3.0" shows the column names for the new WQX3 profile. There are several "FieldName2.0.XXXX" columns that show how the older 2.0 profiles align with the newer columns. For example, the 2.0 "narrow" dataProfile match up with these new WQP3 columns: ```{r echo=TRUE, eval=TRUE} sub_schema <- schema |> select(WQX3 = FieldName3.0, WQX2 = FieldName2.0.Narrow) |> filter(!is.na(WQX2)) knitr::kable(sub_schema) ``` # Disclaimer This information is preliminary and is subject to revision. It is being provided to meet the need for timely best science. The information is provided on the condition that neither the U.S. Geological Survey nor the U.S. Government may be held liable for any damages resulting from the authorized or unauthorized use of the information.