IMPORTANT These recommendations have been updated using WQX3.0 profiles.

Changes to NWIS water quality services

USGS discrete water samples data are undergoing modernization, and NWIS services will no longer be updated with the latest data starting mid-March 2024, with a full decommission expected 6 months later.

For the latest news on USGS water quality data, see: https://doi-usgs.github.io/dataRetrieval/articles/Status.html. Learn more about the changes and where to find the new samples data in the WDFN blog.

What does this mean for dataRetrieval users? Eventually water quality data will ONLY be available from the Water Quality Portal rather than the NWIS services. There are 3 major dataRetrieval functions that will be affected: readNWISqw, whatNWISdata, and readNWISdata. This vignette will describe the most common workflows conversions needed to update existing scripts.

How to find more help

This vignette is being provided in advance of any breaking changes, and more information and guidance will be provided. These changes are big, and initially sound overwhelming. But in the end, they are thoughtful changes that will make understanding USGS water data more intuitive. Please reach out for questions and comments to: [email protected]

WQX2 -> WQX3

If you have already converted your workflow from NWIS to WQP, much of the hard work has been done! The final step of the process is to make sure workflows are using the most modern WQX3 formats. WQX stands for Water Quality Exchange. WQX2 is the format that has been historically available on the Water Quality Portal, WQX3 is the more modern format that all the data is being converted to.

This is available as of dataRetrieval version 2.7.16. Starting with 2.7.16, all WQP functions default to the newer “WQX3” profiles if available. See WQX Conversions below to get a table of column names in WQX3 vs WQX2.

NWIS -> WQX3

`readNWISqw`

This function was retired as of Oct. 24, 2024.

So…what do you use instead? The function you will need to move to is readWQPqw. First, you’ll need to convert the numeric USGS site ID’s into something that the Water Quality Portal will accept, which requires the agency prefix. For most USGS sites this will mean pasting ‘USGS-’ before the site number, although it is important to note that there are some USGS sites that begin with a different prefix: it is up to users to determine the agency code..

Here’s an example:

wqpData <- readWQPqw(paste0("USGS-", site_ids), parameterCd)

Let’s say we have a data frame that we got from the retired readNWISqw function and we saved it as nwisData.

First we compare the number of rows, number of columns, and attributes to each return:

nrow(nwisData)

## [1] 208

nrow(wqpData)

## [1] 208

So, same number of rows returned. That’s good, since it’s the same data.

ncol(nwisData)

## [1] 36

ncol(wqpData)

## [1] 67

Different columns!

names(attributes(nwisData))

##  [1] "names"        "class"        "row.names"   
##  [4] "queryTime"    "url"          "headerInfo"  
##  [7] "comment"      "siteInfo"     "variableInfo"
## [10] "header"

names(attributes(wqpData))

## [1] "names"      "row.names"  "class"      "headerInfo"
## [5] "legacy"     "siteInfo"   "queryTime"  "url"

Slightly different attributes. You can explore the differences of those attributes:

site_NWIS <- attr(nwisData, "siteInfo")
site_WQP <- attr(wqpData, "siteInfo")

The next big task is figuring out which columns from the WQP output map to the original columns from the NWIS output. Look at your workflow and determine what columns from the original NWIS output are needed to preserve the integrity of the workflow.

Let’s use the dplyr package to pull out the columns are used in this example workflow, and make sure both NWIS and WQP are ordered in the same way.

library(dplyr)

nwisData_relevant <- nwisData |> 
  select(
    site_no, startDateTime, parm_cd,
    remark_cd, result_va
  ) |> 
  arrange(startDateTime, parm_cd)

knitr::kable(head(nwisData_relevant))

site_no	startDateTime	parm_cd	remark_cd	result_va
04024000	2011-03-15 15:35:00	30234	<	0.16
04024000	2011-03-15 15:35:00	32104	<	0.16
04024000	2011-03-15 15:35:00	34220	<	0.02
04024000	2011-03-15 15:35:00	34247	<	0.02
04024000	2011-04-20 15:00:00	30234	<	0.16
04024000	2011-04-20 15:00:00	32104	<	0.16

If we explore the output from WQP, we can try to find the columns that include the same relevant information:

wqpData_relevant <- wqpData |> 
  select(
    site_no = Location_Identifier,
    startDateTime = Activity_StartDateTime,
    parm_cd = USGSpcode,
    remark_cd = Result_ResultDetectionCondition,
    result_va = Result_Measure,
    detection_level = DetectionLimit_MeasureA
  ) |> 
  arrange(startDateTime, parm_cd)
knitr::kable(head(wqpData_relevant))

site_no	startDateTime	parm_cd	remark_cd	result_va	detection_level
USGS-04024000	2011-03-15 15:35:00	30234	Not Detected	NA	0.16
USGS-04024000	2011-03-15 15:35:00	32104	Not Detected	NA	0.16
USGS-04024000	2011-03-15 15:35:00	34220	Not Detected	NA	0.02
USGS-04024000	2011-03-15 15:35:00	34247	Not Detected	NA	0.02
USGS-04024000	2011-04-20 15:00:00	30234	Not Detected	NA	0.16
USGS-04024000	2011-04-20 15:00:00	32104	Not Detected	NA	0.16

Now we can start looking at the results and trying to decide how future workflows should be setup. Here are some decisions for this example that we can consider:

Censored values

The result_va in the NWIS service came back with a value. However, the data is actually censored, meaning we only know it’s below the detection limit. With some lazier coding, it might have been really easy to not realize these values are left-censored. So, while we could substitute the detection levels into the measured values if there’s an NA in the measured value, this might be a great time to update your workflow to handle censored values more robustly. We are probably interested in maintaining the detection level in another column.

For this theoretical workflow, let’s think about what we are trying to find out. Let’s say that we want to know if a value is “left-censored” or not. Maybe in this case, what would make the most sense is to have a column that is a logical TRUE/FALSE. For this example, there was only the text “Not Detected” in the “ResultDetectionConditionText” column PLEASE NOTE that other data may include different messages about detection conditions, you will need to examine your data carefully. Here’s an example from the EGRET package on how to decide if a “ResultDetectionConditionText” should be considered a censored value:

censored_text <- c(
  "Not Detected",
  "Non-Detect",
  "Non Detect",
  "Detected Not Quantified",
  "Below Quantification Limit"
)

wqpData_relevant <- wqpData |> 
  mutate(left_censored = grepl(paste(censored_text, collapse = "|"),
    Result_ResultDetectionCondition,
    ignore.case = TRUE
  )) |> 
  select(
    site_no = Location_Identifier,
    startDateTime = Activity_StartDateTime,
    parm_cd = USGSpcode,
    left_censored,
    result_va = Result_Measure,
    detection_level = DetectionLimit_MeasureA,
    dl_units = DetectionLimit_MeasureUnitA
  ) |> 
  arrange(startDateTime, parm_cd)

knitr::kable(head(wqpData_relevant))

site_no	startDateTime	parm_cd	left_censored	result_va	detection_level	dl_units
USGS-04024000	2011-03-15 15:35:00	30234	TRUE	NA	0.16	ug/L
USGS-04024000	2011-03-15 15:35:00	32104	TRUE	NA	0.16	ug/L
USGS-04024000	2011-03-15 15:35:00	34220	TRUE	NA	0.02	ug/L
USGS-04024000	2011-03-15 15:35:00	34247	TRUE	NA	0.02	ug/L
USGS-04024000	2011-04-20 15:00:00	30234	TRUE	NA	0.16	ug/L
USGS-04024000	2011-04-20 15:00:00	32104	TRUE	NA	0.16	ug/L

NWIS codes

Another difference that is going to require some thoughtful decisions is how to interpret additional NWIS codes. They will now be descriptive text. Columns such as samp_type_cd and medium_cd will now all be reported with descriptive words rather than single letters or numbers. It will be the responsibility of the user to consider the best way to deal with these changes.

If you use the readNWISqw function, you WILL need to adjust your workflows, and you may find there are more codes you will need to account for. Hopefully this section helped get you started. It does not include every scenario, so you may find more columns or codes or other conditions you need to account for.

whatNWISdata

This function will continue to work for any service EXCEPT “qw” (water quality discrete data). “qw” results will eventually no longer be returned, and are currently showing values that were frozen in March 2024.

The function to replace this functionality for discrete water quality data is currently whatWQPdata:

whatNWIS <- whatNWISdata(
  siteNumber = site_ids,
  service = "qw"
)

WARNING: NWIS does not deliver
new discrete water quality data or updates to existing data. 
For additional details, see:
https://doi-usgs.github.io/dataRetrieval/articles/Status.html

whatWQP <- whatWQPdata(siteNumber = paste0("USGS-", site_ids))

There are some major differences in the output. The NWIS services offers back one row per site/parameter code to learn how many samples are available. This is not currently available from the Water Quality Portal, however there are new summary services being developed. When those become available, we will include new documentation on how to get this information.

readNWISdata

If you get your water quality data from the readNWISdata function, no new data will be available and the function will generate a warning message. The other services are working as before. This is not an especially common dataRetrieval workflow, so there are not a lot of details here. Please reach out if more information is needed to update your workflows.

See ?readWQPdata to see all the ways to query data in the Water Quality Portal. Use the suggestions above to convert the output of the readWQPdata function to convert the WQP output to what is important for your workflow.

WQX Conversion

A table is provided on the EPA website that shows the conversions of WQX3 to the WQX2 mappings. This table may change periodically while WQP services are under active development, which is expected to last through Fall 2024.

Here is an example for pulling the EPA table and comparing column names.

schema <- readr::read_csv("https://www.epa.gov/system/files/other-files/2024-07/schema_outbound_wqx3.0.csv")

Within that schema table, the column “FieldName3.0” shows the column names for the new WQX3 profile. There are several “FieldName2.0.XXXX” columns that show how the older 2.0 profiles align with the newer columns.

For example, the 2.0 “narrow” dataProfile match up with these new WQP3 columns:

sub_schema <- schema |> 
  select(WQX3 = FieldName3.0,
         WQX2 = FieldName2.0.Narrow) |> 
  filter(!is.na(WQX2))

knitr::kable(sub_schema)

WQX3	WQX2
Org_Identifier	OrganizationIdentifier
Org_FormalName	OrganizationFormalName
ProviderName	ProviderName
Location_Identifier	MonitoringLocationIdentifier
Activity_ActivityIdentifier	ActivityIdentifier
Activity_StartDate	ActivityStartDate
Activity_StartTime	ActivityStartTime/Time
Activity_StartTimeZone	ActivityStartTime/TimeZoneCode
SampleCollectionMethod_Description	MethodDescriptionText
SamplePrepMethod_Description	MethodDescriptionText
Result_ResultDetectionCondition	ResultDetectionConditionText
Result_Characteristic	CharacteristicName
ResultBiological_Intent	BiologicalIntentName
ResultBiological_IndividualIdentifier	BiologicalIndividualIdentifier
ResultBiological_Taxon	SubjectTaxonomicName
ResultBiological_UnidentifiedSpeciesIdentifier	UnidentifiedSpeciesIdentifier
ResultBiological_SampleTissueAnatomy	SampleTissueAnatomyName
Taxonomy_CellForm	CellFormName
Taxonomy_CellShape	CellShapeName
Taxonomy_Habit	HabitName
Taxonomy_PollutionTolerance	TaxonomicPollutionTolerance
Taxonomy_PollutionToleranceScale	TaxonomicPollutionToleranceScaleText
Taxonomy_TrophicLevel	TrophicLevelName
Taxonomy_FunctionalFeedingGroup	FunctionalFeedingGroupName
TaxonomyCitation_ResourceTitle	TaxonomicDetailsCitation/ResourceTitleName
TaxonomyCitation_ResourceCreator	TaxonomicDetailsCitation/ResourceCreatorName
TaxonomyCitation_ResourceSubject	TaxonomicDetailsCitation/ResourceSubjectText
TaxonomyCitation_ResourcePublisher	TaxonomicDetailsCitation/ResourcePublisherName
TaxonomyCitation_ResourceDate	TaxonomicDetailsCitation/ResourceDate
TaxonomyCitation_ResourceIdentifier	TaxonomicDetailsCitation/ResourceIdentifier
ResultDepthHeight_Measure	ResultDepthHeightMeasure/MeasureValue
ResultDepthHeight_MeasureUnit	ResultDepthHeightMeasure/MeasureUnitCode
ResultDepthHeight_AltitudeReferencePoint	ResultDepthAltitudeReferencePointText
ResultDepthHeight_SamplingPointName	ResultSamplingPointName
Result_MeasureIdentifier	ResultIdentifier
Result_Measure	ResultMeasureValue
Result_MeasureUnit	ResultMeasure/MeasureUnitCode
Result_MeasureQualifierCode	MeasureQualifierCode
Result_MeasureStatusIdentifier	ResultStatusIdentifier
Result_StatisticalBase	StatisticalBaseCode
Result_MeasureType	ResultValueTypeName
DataQuality_PrecisionValue	PrecisionValue
DataQuality_BiasValue	DataQuality/BiasValue
DataQuality_ConfidenceIntervalValue	ConfidenceIntervalValue
DataQuality_UpperConfidenceLimitValue	UpperConfidenceLimitValue
DataQuality_LowerConfidenceLimitValue	LowerConfidenceLimitValue
DataQuality_ResultComment	ResultCommentText
DetectionLimit_LabAccreditationIndicator	LaboratoryAccreditationIndicator
DetectionLimit_LabAccreditationAuthority	LaboratoryAccreditationAuthorityName
DetectionLimit_TaxonAccreditationIndicator	TaxonomistAccreditationIndicator
DetectionLimit_TaxonAccreditationAuthority	TaxonomistAccreditationAuthorityName
ResultAnalyticalMethod_Identifier	ResultAnalyticalMethod/MethodIdentifier
ResultAnalyticalMethod_IdentifierContext	ResultAnalyticalMethod/MethodIdentifierContext
ResultAnalyticalMethod_Name	ResultAnalyticalMethod/MethodName
ResultAnalyticalMethod_QualifierType	ResultAnalyticalMethod/MethodQualifierTypeName
ResultAnalyticalMethod_Description	MethodDescriptionText
LabInfo_Name	LaboratoryName
LabInfo_AnalysisStartDate	AnalysisStartDate
LabInfo_AnalysisStartTime	AnalysisStartTime/Time
LabInfo_AnalysisStartTimeZone	AnalysisStartTime/TimeZoneCode
LabInfo_AnalysisEndDate	AnalysisEndDate
LabInfo_AnalysisEndTime	AnalysisEndTime/Time
LabInfo_AnalysisEndTimeZone	AnalysisEndTime/TimeZoneCode
LabSamplePrepMethod_Description	MethodDescriptionText
USGSpcode	USGSPCode

Disclaimer

This information is preliminary and is subject to revision. It is being provided to meet the need for timely best science. The information is provided on the condition that neither the U.S. Geological Survey nor the U.S. Government may be held liable for any damages resulting from the authorized or unauthorized use of the information.

Changes to NWIS QW services