Changes to NWIS QW services

IMPORTANT These recommendations have been updated using WQX3.0 profiles.

Changes to NWIS water quality services

USGS discrete water samples data are undergoing modernization, and NWIS services will no longer be updated with the latest data starting mid-March 2024, with a full decommission expected 6 months later.

For the latest news on USGS water quality data, see: https://doi-usgs.github.io/dataRetrieval/articles/Status.html. Learn more about the changes and where to find the new samples data in the WDFN blog.

What does this mean for dataRetrieval users? Eventually water quality data will ONLY be available from the Water Quality Portal rather than the NWIS services. There are 3 major dataRetrieval functions that will be affected: readNWISqw, whatNWISdata, and readNWISdata. This vignette will describe the most common workflows conversions needed to update existing scripts.

How to find more help

This vignette is being provided in advance of any breaking changes, and more information and guidance will be provided. These changes are big, and initially sound overwhelming. But in the end, they are thoughtful changes that will make understanding USGS water data more intuitive. Please reach out for questions and comments to:

WQX2 -> WQX3

If you have already converted your workflow from NWIS to WQP, much of the hard work has been done! The final step of the process is to make sure workflows are using the most modern WQX3 formats. WQX stands for Water Quality Exchange. WQX2 is the format that has been historically available on the Water Quality Portal, WQX3 is the more modern format that all the data is being converted to.

This is available as of dataRetrieval version 2.7.16. Starting with 2.7.16, all WQP functions default to the newer “WQX3” profiles if available. See WQX Conversions below to get a table of column names in WQX3 vs WQX2.

NWIS -> WQX3

readNWISqw

This function was retired as of Oct. 24, 2024.

So…what do you use instead? The function you will need to move to is readWQPqw. First, you’ll need to convert the numeric USGS site ID’s into something that the Water Quality Portal will accept, which requires the agency prefix. For most USGS sites this will mean pasting ‘USGS-’ before the site number, although it is important to note that there are some USGS sites that begin with a different prefix: it is up to users to determine the agency code..

Here’s an example:

wqpData <- readWQPqw(paste0("USGS-", site_ids), parameterCd)

Let’s say we have a data frame that we got from the retired readNWISqw function and we saved it as nwisData.

First we compare the number of rows, number of columns, and attributes to each return:

nrow(nwisData)
## [1] 208
nrow(wqpData)
## [1] 208

So, same number of rows returned. That’s good, since it’s the same data.

ncol(nwisData)
## [1] 36
ncol(wqpData)
## [1] 67

Different columns!

names(attributes(nwisData))
##  [1] "names"        "class"        "row.names"   
##  [4] "queryTime"    "url"          "headerInfo"  
##  [7] "comment"      "siteInfo"     "variableInfo"
## [10] "header"
names(attributes(wqpData))
## [1] "names"      "row.names"  "class"      "headerInfo"
## [5] "legacy"     "siteInfo"   "queryTime"  "url"

Slightly different attributes. You can explore the differences of those attributes:

site_NWIS <- attr(nwisData, "siteInfo")
site_WQP <- attr(wqpData, "siteInfo")

The next big task is figuring out which columns from the WQP output map to the original columns from the NWIS output. Look at your workflow and determine what columns from the original NWIS output are needed to preserve the integrity of the workflow.

Let’s use the dplyr package to pull out the columns are used in this example workflow, and make sure both NWIS and WQP are ordered in the same way.

library(dplyr)

nwisData_relevant <- nwisData |> 
  select(
    site_no, startDateTime, parm_cd,
    remark_cd, result_va
  ) |> 
  arrange(startDateTime, parm_cd)

knitr::kable(head(nwisData_relevant))
site_no startDateTime parm_cd remark_cd result_va
04024000 2011-03-15 15:35:00 30234 < 0.16
04024000 2011-03-15 15:35:00 32104 < 0.16
04024000 2011-03-15 15:35:00 34220 < 0.02
04024000 2011-03-15 15:35:00 34247 < 0.02
04024000 2011-04-20 15:00:00 30234 < 0.16
04024000 2011-04-20 15:00:00 32104 < 0.16

If we explore the output from WQP, we can try to find the columns that include the same relevant information:

wqpData_relevant <- wqpData |> 
  select(
    site_no = Location_Identifier,
    startDateTime = Activity_StartDateTime,
    parm_cd = USGSpcode,
    remark_cd = Result_ResultDetectionCondition,
    result_va = Result_Measure,
    detection_level = DetectionLimit_MeasureA
  ) |> 
  arrange(startDateTime, parm_cd)
knitr::kable(head(wqpData_relevant))
site_no startDateTime parm_cd remark_cd result_va detection_level
USGS-04024000 2011-03-15 15:35:00 30234 Not Detected NA 0.16
USGS-04024000 2011-03-15 15:35:00 32104 Not Detected NA 0.16
USGS-04024000 2011-03-15 15:35:00 34220 Not Detected NA 0.02
USGS-04024000 2011-03-15 15:35:00 34247 Not Detected NA 0.02
USGS-04024000 2011-04-20 15:00:00 30234 Not Detected NA 0.16
USGS-04024000 2011-04-20 15:00:00 32104 Not Detected NA 0.16

Now we can start looking at the results and trying to decide how future workflows should be setup. Here are some decisions for this example that we can consider:

Censored values

The result_va in the NWIS service came back with a value. However, the data is actually censored, meaning we only know it’s below the detection limit. With some lazier coding, it might have been really easy to not realize these values are left-censored. So, while we could substitute the detection levels into the measured values if there’s an NA in the measured value, this might be a great time to update your workflow to handle censored values more robustly. We are probably interested in maintaining the detection level in another column.

For this theoretical workflow, let’s think about what we are trying to find out. Let’s say that we want to know if a value is “left-censored” or not. Maybe in this case, what would make the most sense is to have a column that is a logical TRUE/FALSE. For this example, there was only the text “Not Detected” in the “ResultDetectionConditionText” column PLEASE NOTE that other data may include different messages about detection conditions, you will need to examine your data carefully. Here’s an example from the EGRET package on how to decide if a “ResultDetectionConditionText” should be considered a censored value:

censored_text <- c(
  "Not Detected",
  "Non-Detect",
  "Non Detect",
  "Detected Not Quantified",
  "Below Quantification Limit"
)

wqpData_relevant <- wqpData |> 
  mutate(left_censored = grepl(paste(censored_text, collapse = "|"),
    Result_ResultDetectionCondition,
    ignore.case = TRUE
  )) |> 
  select(
    site_no = Location_Identifier,
    startDateTime = Activity_StartDateTime,
    parm_cd = USGSpcode,
    left_censored,
    result_va = Result_Measure,
    detection_level = DetectionLimit_MeasureA,
    dl_units = DetectionLimit_MeasureUnitA
  ) |> 
  arrange(startDateTime, parm_cd)

knitr::kable(head(wqpData_relevant))
site_no startDateTime parm_cd left_censored result_va detection_level dl_units
USGS-04024000 2011-03-15 15:35:00 30234 TRUE NA 0.16 ug/L
USGS-04024000 2011-03-15 15:35:00 32104 TRUE NA 0.16 ug/L
USGS-04024000 2011-03-15 15:35:00 34220 TRUE NA 0.02 ug/L
USGS-04024000 2011-03-15 15:35:00 34247 TRUE NA 0.02 ug/L
USGS-04024000 2011-04-20 15:00:00 30234 TRUE NA 0.16 ug/L
USGS-04024000 2011-04-20 15:00:00 32104 TRUE NA 0.16 ug/L

NWIS codes

Another difference that is going to require some thoughtful decisions is how to interpret additional NWIS codes. They will now be descriptive text. Columns such as samp_type_cd and medium_cd will now all be reported with descriptive words rather than single letters or numbers. It will be the responsibility of the user to consider the best way to deal with these changes.

If you use the readNWISqw function, you WILL need to adjust your workflows, and you may find there are more codes you will need to account for. Hopefully this section helped get you started. It does not include every scenario, so you may find more columns or codes or other conditions you need to account for.

whatNWISdata

This function will continue to work for any service EXCEPT “qw” (water quality discrete data). “qw” results will eventually no longer be returned, and are currently showing values that were frozen in March 2024.

The function to replace this functionality for discrete water quality data is currently whatWQPdata:

whatNWIS <- whatNWISdata(
  siteNumber = site_ids,
  service = "qw"
)
WARNING: NWIS does not deliver
new discrete water quality data or updates to existing data. 
For additional details, see:
https://doi-usgs.github.io/dataRetrieval/articles/Status.html
whatWQP <- whatWQPdata(siteNumber = paste0("USGS-", site_ids))

There are some major differences in the output. The NWIS services offers back one row per site/parameter code to learn how many samples are available. This is not currently available from the Water Quality Portal, however there are new summary services being developed. When those become available, we will include new documentation on how to get this information.

readNWISdata

If you get your water quality data from the readNWISdata function, no new data will be available and the function will generate a warning message. The other services are working as before. This is not an especially common dataRetrieval workflow, so there are not a lot of details here. Please reach out if more information is needed to update your workflows.

See ?readWQPdata to see all the ways to query data in the Water Quality Portal. Use the suggestions above to convert the output of the readWQPdata function to convert the WQP output to what is important for your workflow.

WQX Conversion

A table is provided on the EPA website that shows the conversions of WQX3 to the WQX2 mappings. This table may change periodically while WQP services are under active development, which is expected to last through Fall 2024.

Here is an example for pulling the EPA table and comparing column names.

schema <- readr::read_csv("https://www.epa.gov/system/files/other-files/2024-07/schema_outbound_wqx3.0.csv")

Within that schema table, the column “FieldName3.0” shows the column names for the new WQX3 profile. There are several “FieldName2.0.XXXX” columns that show how the older 2.0 profiles align with the newer columns.

For example, the 2.0 “narrow” dataProfile match up with these new WQP3 columns:

sub_schema <- schema |> 
  select(WQX3 = FieldName3.0,
         WQX2 = FieldName2.0.Narrow) |> 
  filter(!is.na(WQX2))

knitr::kable(sub_schema)
WQX3 WQX2
Org_Identifier OrganizationIdentifier
Org_FormalName OrganizationFormalName
ProviderName ProviderName
Location_Identifier MonitoringLocationIdentifier
Activity_ActivityIdentifier ActivityIdentifier
Activity_StartDate ActivityStartDate
Activity_StartTime ActivityStartTime/Time
Activity_StartTimeZone ActivityStartTime/TimeZoneCode
SampleCollectionMethod_Description MethodDescriptionText
SamplePrepMethod_Description MethodDescriptionText
Result_ResultDetectionCondition ResultDetectionConditionText
Result_Characteristic CharacteristicName
ResultBiological_Intent BiologicalIntentName
ResultBiological_IndividualIdentifier BiologicalIndividualIdentifier
ResultBiological_Taxon SubjectTaxonomicName
ResultBiological_UnidentifiedSpeciesIdentifier UnidentifiedSpeciesIdentifier
ResultBiological_SampleTissueAnatomy SampleTissueAnatomyName
Taxonomy_CellForm CellFormName
Taxonomy_CellShape CellShapeName
Taxonomy_Habit HabitName
Taxonomy_PollutionTolerance TaxonomicPollutionTolerance
Taxonomy_PollutionToleranceScale TaxonomicPollutionToleranceScaleText
Taxonomy_TrophicLevel TrophicLevelName
Taxonomy_FunctionalFeedingGroup FunctionalFeedingGroupName
TaxonomyCitation_ResourceTitle TaxonomicDetailsCitation/ResourceTitleName
TaxonomyCitation_ResourceCreator TaxonomicDetailsCitation/ResourceCreatorName
TaxonomyCitation_ResourceSubject TaxonomicDetailsCitation/ResourceSubjectText
TaxonomyCitation_ResourcePublisher TaxonomicDetailsCitation/ResourcePublisherName
TaxonomyCitation_ResourceDate TaxonomicDetailsCitation/ResourceDate
TaxonomyCitation_ResourceIdentifier TaxonomicDetailsCitation/ResourceIdentifier
ResultDepthHeight_Measure ResultDepthHeightMeasure/MeasureValue
ResultDepthHeight_MeasureUnit ResultDepthHeightMeasure/MeasureUnitCode
ResultDepthHeight_AltitudeReferencePoint ResultDepthAltitudeReferencePointText
ResultDepthHeight_SamplingPointName ResultSamplingPointName
Result_MeasureIdentifier ResultIdentifier
Result_Measure ResultMeasureValue
Result_MeasureUnit ResultMeasure/MeasureUnitCode
Result_MeasureQualifierCode MeasureQualifierCode
Result_MeasureStatusIdentifier ResultStatusIdentifier
Result_StatisticalBase StatisticalBaseCode
Result_MeasureType ResultValueTypeName
DataQuality_PrecisionValue PrecisionValue
DataQuality_BiasValue DataQuality/BiasValue
DataQuality_ConfidenceIntervalValue ConfidenceIntervalValue
DataQuality_UpperConfidenceLimitValue UpperConfidenceLimitValue
DataQuality_LowerConfidenceLimitValue LowerConfidenceLimitValue
DataQuality_ResultComment ResultCommentText
DetectionLimit_LabAccreditationIndicator LaboratoryAccreditationIndicator
DetectionLimit_LabAccreditationAuthority LaboratoryAccreditationAuthorityName
DetectionLimit_TaxonAccreditationIndicator TaxonomistAccreditationIndicator
DetectionLimit_TaxonAccreditationAuthority TaxonomistAccreditationAuthorityName
ResultAnalyticalMethod_Identifier ResultAnalyticalMethod/MethodIdentifier
ResultAnalyticalMethod_IdentifierContext ResultAnalyticalMethod/MethodIdentifierContext
ResultAnalyticalMethod_Name ResultAnalyticalMethod/MethodName
ResultAnalyticalMethod_QualifierType ResultAnalyticalMethod/MethodQualifierTypeName
ResultAnalyticalMethod_Description MethodDescriptionText
LabInfo_Name LaboratoryName
LabInfo_AnalysisStartDate AnalysisStartDate
LabInfo_AnalysisStartTime AnalysisStartTime/Time
LabInfo_AnalysisStartTimeZone AnalysisStartTime/TimeZoneCode
LabInfo_AnalysisEndDate AnalysisEndDate
LabInfo_AnalysisEndTime AnalysisEndTime/Time
LabInfo_AnalysisEndTimeZone AnalysisEndTime/TimeZoneCode
LabSamplePrepMethod_Description MethodDescriptionText
USGSpcode USGSPCode

Disclaimer

This information is preliminary and is subject to revision. It is being provided to meet the need for timely best science. The information is provided on the condition that neither the U.S. Geological Survey nor the U.S. Government may be held liable for any damages resulting from the authorized or unauthorized use of the information.