IMPORTANT These recommendations have been updated using WQX3.0 profiles.
USGS discrete water samples data are undergoing modernization, and NWIS services will no longer be updated with the latest data starting mid-March 2024, with a full decommission expected 6 months later.
For the latest news on USGS water quality data, see: https://doi-usgs.github.io/dataRetrieval/articles/Status.html. Learn more about the changes and where to find the new samples data in the WDFN blog.
What does this mean for dataRetrieval
users? Eventually
water quality data will ONLY be available from the Water Quality Portal rather
than the NWIS services. There are 3 major dataRetrieval
functions that will be affected: readNWISqw
,
whatNWISdata
, and readNWISdata
. This vignette
will describe the most common workflows conversions needed to update
existing scripts.
This vignette is being provided in advance of any breaking changes, and more information and guidance will be provided. These changes are big, and initially sound overwhelming. But in the end, they are thoughtful changes that will make understanding USGS water data more intuitive. Please reach out for questions and comments to: [email protected]
If you have already converted your workflow from NWIS to WQP, much of the hard work has been done! The final step of the process is to make sure workflows are using the most modern WQX3 formats. WQX stands for Water Quality Exchange. WQX2 is the format that has been historically available on the Water Quality Portal, WQX3 is the more modern format that all the data is being converted to.
This is available as of dataRetrieval
version 2.7.16.
Starting with 2.7.16, all WQP functions default to the newer “WQX3”
profiles if available. See WQX Conversions
below to get a table of column names in WQX3 vs WQX2.
readNWISqw
This function was retired as of Oct. 24, 2024.
So…what do you use instead? The function you will need to move to is
readWQPqw
. First, you’ll need to convert the numeric USGS
site ID’s into something that the Water Quality Portal will accept,
which requires the agency prefix. For most USGS sites this will mean
pasting ‘USGS-’ before the site number, although it is important to note
that there are some USGS sites that begin with a different prefix: it is
up to users to determine the agency code..
Here’s an example:
Let’s say we have a data frame that we got from the retired
readNWISqw
function and we saved it as
nwisData
.
First we compare the number of rows, number of columns, and attributes to each return:
## [1] 208
## [1] 208
So, same number of rows returned. That’s good, since it’s the same data.
## [1] 36
## [1] 67
Different columns!
## [1] "names" "class" "row.names"
## [4] "queryTime" "url" "headerInfo"
## [7] "comment" "siteInfo" "variableInfo"
## [10] "header"
## [1] "names" "row.names" "class" "headerInfo"
## [5] "legacy" "siteInfo" "queryTime" "url"
Slightly different attributes. You can explore the differences of those attributes:
The next big task is figuring out which columns from the WQP output map to the original columns from the NWIS output. Look at your workflow and determine what columns from the original NWIS output are needed to preserve the integrity of the workflow.
Let’s use the dplyr
package to pull out the columns are
used in this example workflow, and make sure both NWIS and WQP are
ordered in the same way.
library(dplyr)
nwisData_relevant <- nwisData |>
select(
site_no, startDateTime, parm_cd,
remark_cd, result_va
) |>
arrange(startDateTime, parm_cd)
knitr::kable(head(nwisData_relevant))
site_no | startDateTime | parm_cd | remark_cd | result_va |
---|---|---|---|---|
04024000 | 2011-03-15 15:35:00 | 30234 | < | 0.16 |
04024000 | 2011-03-15 15:35:00 | 32104 | < | 0.16 |
04024000 | 2011-03-15 15:35:00 | 34220 | < | 0.02 |
04024000 | 2011-03-15 15:35:00 | 34247 | < | 0.02 |
04024000 | 2011-04-20 15:00:00 | 30234 | < | 0.16 |
04024000 | 2011-04-20 15:00:00 | 32104 | < | 0.16 |
If we explore the output from WQP, we can try to find the columns that include the same relevant information:
wqpData_relevant <- wqpData |>
select(
site_no = Location_Identifier,
startDateTime = Activity_StartDateTime,
parm_cd = USGSpcode,
remark_cd = Result_ResultDetectionCondition,
result_va = Result_Measure,
detection_level = DetectionLimit_MeasureA
) |>
arrange(startDateTime, parm_cd)
knitr::kable(head(wqpData_relevant))
site_no | startDateTime | parm_cd | remark_cd | result_va | detection_level |
---|---|---|---|---|---|
USGS-04024000 | 2011-03-15 15:35:00 | 30234 | Not Detected | NA | 0.16 |
USGS-04024000 | 2011-03-15 15:35:00 | 32104 | Not Detected | NA | 0.16 |
USGS-04024000 | 2011-03-15 15:35:00 | 34220 | Not Detected | NA | 0.02 |
USGS-04024000 | 2011-03-15 15:35:00 | 34247 | Not Detected | NA | 0.02 |
USGS-04024000 | 2011-04-20 15:00:00 | 30234 | Not Detected | NA | 0.16 |
USGS-04024000 | 2011-04-20 15:00:00 | 32104 | Not Detected | NA | 0.16 |
Now we can start looking at the results and trying to decide how future workflows should be setup. Here are some decisions for this example that we can consider:
The result_va in the NWIS service came back with a value. However,
the data is actually censored, meaning we only know it’s below the
detection limit. With some lazier coding, it might have been really easy
to not realize these values are left-censored. So, while we could
substitute the detection levels into the measured values if there’s an
NA
in the measured value, this might be a great time to
update your workflow to handle censored values more robustly. We are
probably interested in maintaining the detection level in another
column.
For this theoretical workflow, let’s think about what we are trying
to find out. Let’s say that we want to know if a value is
“left-censored” or not. Maybe in this case, what would make the most
sense is to have a column that is a logical TRUE/FALSE. For this
example, there was only the text “Not Detected” in the
“ResultDetectionConditionText” column PLEASE NOTE that other data may
include different messages about detection conditions, you will need to
examine your data carefully. Here’s an example from the
EGRET
package on how to decide if a
“ResultDetectionConditionText” should be considered a censored
value:
censored_text <- c(
"Not Detected",
"Non-Detect",
"Non Detect",
"Detected Not Quantified",
"Below Quantification Limit"
)
wqpData_relevant <- wqpData |>
mutate(left_censored = grepl(paste(censored_text, collapse = "|"),
Result_ResultDetectionCondition,
ignore.case = TRUE
)) |>
select(
site_no = Location_Identifier,
startDateTime = Activity_StartDateTime,
parm_cd = USGSpcode,
left_censored,
result_va = Result_Measure,
detection_level = DetectionLimit_MeasureA,
dl_units = DetectionLimit_MeasureUnitA
) |>
arrange(startDateTime, parm_cd)
knitr::kable(head(wqpData_relevant))
site_no | startDateTime | parm_cd | left_censored | result_va | detection_level | dl_units |
---|---|---|---|---|---|---|
USGS-04024000 | 2011-03-15 15:35:00 | 30234 | TRUE | NA | 0.16 | ug/L |
USGS-04024000 | 2011-03-15 15:35:00 | 32104 | TRUE | NA | 0.16 | ug/L |
USGS-04024000 | 2011-03-15 15:35:00 | 34220 | TRUE | NA | 0.02 | ug/L |
USGS-04024000 | 2011-03-15 15:35:00 | 34247 | TRUE | NA | 0.02 | ug/L |
USGS-04024000 | 2011-04-20 15:00:00 | 30234 | TRUE | NA | 0.16 | ug/L |
USGS-04024000 | 2011-04-20 15:00:00 | 32104 | TRUE | NA | 0.16 | ug/L |
Another difference that is going to require some thoughtful decisions is how to interpret additional NWIS codes. They will now be descriptive text. Columns such as samp_type_cd and medium_cd will now all be reported with descriptive words rather than single letters or numbers. It will be the responsibility of the user to consider the best way to deal with these changes.
If you use the readNWISqw
function, you WILL need to
adjust your workflows, and you may find there are more codes you will
need to account for. Hopefully this section helped get you started. It
does not include every scenario, so you may find more columns or codes
or other conditions you need to account for.
This function will continue to work for any service EXCEPT “qw” (water quality discrete data). “qw” results will eventually no longer be returned, and are currently showing values that were frozen in March 2024.
The function to replace this functionality for discrete water quality
data is currently whatWQPdata
:
WARNING: NWIS does not deliver
new discrete water quality data or updates to existing data.
For additional details, see:
https://doi-usgs.github.io/dataRetrieval/articles/Status.html
There are some major differences in the output. The NWIS services offers back one row per site/parameter code to learn how many samples are available. This is not currently available from the Water Quality Portal, however there are new summary services being developed. When those become available, we will include new documentation on how to get this information.
If you get your water quality data from the readNWISdata
function, no new data will be available and the function will generate a
warning message. The other services are working as before. This is not
an especially common dataRetrieval
workflow, so there are
not a lot of details here. Please reach out if more information is
needed to update your workflows.
See ?readWQPdata
to see all the ways to query data in
the Water Quality Portal. Use the suggestions above to convert the
output of the readWQPdata
function to convert the WQP
output to what is important for your workflow.
A table is provided on the EPA website that shows the conversions of WQX3 to the WQX2 mappings. This table may change periodically while WQP services are under active development, which is expected to last through Fall 2024.
Here is an example for pulling the EPA table and comparing column names.
schema <- readr::read_csv("https://www.epa.gov/system/files/other-files/2024-07/schema_outbound_wqx3.0.csv")
Within that schema table, the column “FieldName3.0” shows the column names for the new WQX3 profile. There are several “FieldName2.0.XXXX” columns that show how the older 2.0 profiles align with the newer columns.
For example, the 2.0 “narrow” dataProfile match up with these new WQP3 columns:
sub_schema <- schema |>
select(WQX3 = FieldName3.0,
WQX2 = FieldName2.0.Narrow) |>
filter(!is.na(WQX2))
knitr::kable(sub_schema)
WQX3 | WQX2 |
---|---|
Org_Identifier | OrganizationIdentifier |
Org_FormalName | OrganizationFormalName |
ProviderName | ProviderName |
Location_Identifier | MonitoringLocationIdentifier |
Activity_ActivityIdentifier | ActivityIdentifier |
Activity_StartDate | ActivityStartDate |
Activity_StartTime | ActivityStartTime/Time |
Activity_StartTimeZone | ActivityStartTime/TimeZoneCode |
SampleCollectionMethod_Description | MethodDescriptionText |
SamplePrepMethod_Description | MethodDescriptionText |
Result_ResultDetectionCondition | ResultDetectionConditionText |
Result_Characteristic | CharacteristicName |
ResultBiological_Intent | BiologicalIntentName |
ResultBiological_IndividualIdentifier | BiologicalIndividualIdentifier |
ResultBiological_Taxon | SubjectTaxonomicName |
ResultBiological_UnidentifiedSpeciesIdentifier | UnidentifiedSpeciesIdentifier |
ResultBiological_SampleTissueAnatomy | SampleTissueAnatomyName |
Taxonomy_CellForm | CellFormName |
Taxonomy_CellShape | CellShapeName |
Taxonomy_Habit | HabitName |
Taxonomy_PollutionTolerance | TaxonomicPollutionTolerance |
Taxonomy_PollutionToleranceScale | TaxonomicPollutionToleranceScaleText |
Taxonomy_TrophicLevel | TrophicLevelName |
Taxonomy_FunctionalFeedingGroup | FunctionalFeedingGroupName |
TaxonomyCitation_ResourceTitle | TaxonomicDetailsCitation/ResourceTitleName |
TaxonomyCitation_ResourceCreator | TaxonomicDetailsCitation/ResourceCreatorName |
TaxonomyCitation_ResourceSubject | TaxonomicDetailsCitation/ResourceSubjectText |
TaxonomyCitation_ResourcePublisher | TaxonomicDetailsCitation/ResourcePublisherName |
TaxonomyCitation_ResourceDate | TaxonomicDetailsCitation/ResourceDate |
TaxonomyCitation_ResourceIdentifier | TaxonomicDetailsCitation/ResourceIdentifier |
ResultDepthHeight_Measure | ResultDepthHeightMeasure/MeasureValue |
ResultDepthHeight_MeasureUnit | ResultDepthHeightMeasure/MeasureUnitCode |
ResultDepthHeight_AltitudeReferencePoint | ResultDepthAltitudeReferencePointText |
ResultDepthHeight_SamplingPointName | ResultSamplingPointName |
Result_MeasureIdentifier | ResultIdentifier |
Result_Measure | ResultMeasureValue |
Result_MeasureUnit | ResultMeasure/MeasureUnitCode |
Result_MeasureQualifierCode | MeasureQualifierCode |
Result_MeasureStatusIdentifier | ResultStatusIdentifier |
Result_StatisticalBase | StatisticalBaseCode |
Result_MeasureType | ResultValueTypeName |
DataQuality_PrecisionValue | PrecisionValue |
DataQuality_BiasValue | DataQuality/BiasValue |
DataQuality_ConfidenceIntervalValue | ConfidenceIntervalValue |
DataQuality_UpperConfidenceLimitValue | UpperConfidenceLimitValue |
DataQuality_LowerConfidenceLimitValue | LowerConfidenceLimitValue |
DataQuality_ResultComment | ResultCommentText |
DetectionLimit_LabAccreditationIndicator | LaboratoryAccreditationIndicator |
DetectionLimit_LabAccreditationAuthority | LaboratoryAccreditationAuthorityName |
DetectionLimit_TaxonAccreditationIndicator | TaxonomistAccreditationIndicator |
DetectionLimit_TaxonAccreditationAuthority | TaxonomistAccreditationAuthorityName |
ResultAnalyticalMethod_Identifier | ResultAnalyticalMethod/MethodIdentifier |
ResultAnalyticalMethod_IdentifierContext | ResultAnalyticalMethod/MethodIdentifierContext |
ResultAnalyticalMethod_Name | ResultAnalyticalMethod/MethodName |
ResultAnalyticalMethod_QualifierType | ResultAnalyticalMethod/MethodQualifierTypeName |
ResultAnalyticalMethod_Description | MethodDescriptionText |
LabInfo_Name | LaboratoryName |
LabInfo_AnalysisStartDate | AnalysisStartDate |
LabInfo_AnalysisStartTime | AnalysisStartTime/Time |
LabInfo_AnalysisStartTimeZone | AnalysisStartTime/TimeZoneCode |
LabInfo_AnalysisEndDate | AnalysisEndDate |
LabInfo_AnalysisEndTime | AnalysisEndTime/Time |
LabInfo_AnalysisEndTimeZone | AnalysisEndTime/TimeZoneCode |
LabSamplePrepMethod_Description | MethodDescriptionText |
USGSpcode | USGSPCode |
This information is preliminary and is subject to revision. It is being provided to meet the need for timely best science. The information is provided on the condition that neither the U.S. Geological Survey nor the U.S. Government may be held liable for any damages resulting from the authorized or unauthorized use of the information.