Title: | Exploring Biological Relevance of Environmental Chemistry Observations |
---|---|
Description: | Data analysis package for estimating potential biological effects from chemical concentrations in environmental samples. Included are a set of functions to analyze, visualize, and organize measured concentration data as it relates to user-selected chemical-biological interaction benchmark data such as water quality criteria. The intent of these analyses is to develop a better understanding of the potential biological relevance of environmental chemistry data. Results can be used to prioritize which chemicals at which sites may be of greatest concern. These methods are meant to be used as a screening technique to predict potential for biological influence from chemicals that ultimately need to be validated with direct biological assays. A description of the analysis can be found in Blackwell (2017) <doi:10.1021/acs.est.7b01613>. |
Authors: | Laura DeCicco [aut, cre] , Steven Corsi [aut] , Daniel Villeneuve [aut] , Brett Blackwell [aut] , Gerald Ankley [aut] , Alison Appling [rev] (Reviewed for USGS), Dalma Martinovic [rev] (Reviewed for USGS) |
Maintainer: | Laura DeCicco <[email protected]> |
License: | CC0 |
Version: | 1.4.0 |
Built: | 2024-11-09 06:10:46 UTC |
Source: | https://github.com/doi-usgs/toxeval |
A small collection of helper functions for toxEval
as.toxEval(x, ...)
as.toxEval(x, ...)
x |
list or toxEval object |
... |
data frames to override data within the original x list. |
path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) # To over-ride one of the data frames: chem_data <- data.frame( CAS = "21145-77-7", Value = 1, "Sample Date" = as.Date("2012-01-01"), SiteID = "USGS-04249000" ) tox_list_new <- as.toxEval(tox_list, chem_data)
path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) # To over-ride one of the data frames: chem_data <- data.frame( CAS = "21145-77-7", Value = 1, "Sample Date" = as.Date("2012-01-01"), SiteID = "USGS-04249000" ) tox_list_new <- as.toxEval(tox_list, chem_data)
As of ToxCast 4.1, this function only helps clean up abbrieviations found in the end_point_info data frame.
clean_endPoint_info(end_point_info)
clean_endPoint_info(end_point_info)
end_point_info |
Data frame Endpoint information from ToxCast. |
The returned data frame is based on end_point_info, but with some endPoints filtered out and some additional categories in intended_target_family and intended_target_family_sub. The names in intended_target_family are revised to look more appealing in graphs and tables.
end_point_info <- end_point_info nrow(end_point_info) cleaned_ep <- clean_endPoint_info(end_point_info) nrow(cleaned_ep)
end_point_info <- end_point_info nrow(end_point_info) cleaned_ep <- clean_endPoint_info(end_point_info) nrow(cleaned_ep)
This function is used to load a data file for analysis in the form of a single Excel file. The Excel file should include 3 mandatory sheets named "Data", "Chemicals", and "Sites". Additionally there are 2 optional sheets: "Exclude" and "Benchmarks". This function creates a data frame for each sheet, perform basic checks on the data to assure that required columns are included for each sheet
create_toxEval(excel_file_path, ...)
create_toxEval(excel_file_path, ...)
excel_file_path |
Path to Excel file that contains at least 3 sheets: Data, Chemicals, and Sites, and could optionally contain Exclude and Benchmarks. |
... |
data frames to override data within the original x list. |
Required columns in the Data sheet include "CAS", "SiteID", "Value", and
"Sample Date". The "Value" column includes concentration measurements in
units of g/L. "Sample Date" can be either a date or date/time or an integer.
Additional columns may be included for user purposes, but will not be used in
toxEval.
Required columns in the Chemical sheet include "CAS", "Class". "CAS" values in this sheet must exactly match corresponding "CAS" values in the Data sheet. The "Class" designation allows data to be grouped in a user-specified way. For example, in a data set of multiple pesticides, it may be valuable to explore differences and similarities to of insecticides, herbicides and fungicides. Additional columns may be included for user purposes, but will not be used in toxEval.
Required columns in the Sites sheet includes "SiteID", "Short Name", and for the Shiny application "dec_lat","dec_lon". Values in the "SiteID" column in this sheet exactly match corresponding values in the "SiteID" column in the Data sheet. Additional columns may be included for user purposes, but will not be used in toxEval.
When using the optional sheet Exclude, columns required include "CAS" and "endPoint". These are used to exclude particular chemicals (via CAS), ToxCast endpoints (via endPoint), or a unique chemical/endpoint combination. Additional columns may be included for user purposes, but will not be used in toxEval.
When using the optional sheet Benchmarks, columns required include "CAS", "endPoint","ACC_value" and "chnm". This sheet is used to over-ride the functions using endpoints from the ToxCast database, allowing the user to import endpoint information from other sources. It could also be useful for reproducing results in the future (for example, if after ToxCast updates, analysis with an older version of ToxCast may be reproduced by including the selected ToxCast endpoint database in this sheet. Additional columns may be included for user purposes, but will not be used in toxEval.
For more information, see the "Prepare Data" vignette: vignette("PrepareData", package = "toxEval")
.
All remaining toxEval functions use data from via the list that is returned from this function.
The object returned from this function contains a list of between three and five data frames. The minimum data frames returned are chem_data (containing at least the columns: "CAS", "SiteID", "Value", "Sample Date"), chem_info (containing at least the columns: "CAS", "Class"), and chem_site (containing at least the columns: "SiteID", "Short Name". For the Shiny app, "dec_lat" and "dec_lon" are also required). The optional data frames are exclusions (containing at least the columns: "CAS" and "endPoint"), and benchmarks (containing at least the columns: "CAS", "endPoint", "ACC_value" and "chnm")
path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" excel_file_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(excel_file_path)
path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" excel_file_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(excel_file_path)
See vignette("Setting up toxEval package data", package = "toxEval")
for more information on how the data was aggregated.
data frame with 72 columns. The columns and definitions are discussed in the "ToxCast Assay Annotation Version 1.0 Data User Guide (PDF)" (see source). The column "Relevance Category" was included for consideration of grouping/filtering endpoints based on user goals.
doi:10.23645/epacomptox.6062479.v3
U.S. EPA. 2014. ToxCast Assay Annotation Data User Guide.
end_point_info <- end_point_info head(end_point_info[, 1:5])
end_point_info <- end_point_info head(end_point_info[, 1:5])
The endpoint_hits_DT
(data.table (DT) option) and endpoint_hits
(data frame option) functions create tables with one row per endPoint, and
one column per category("Biological", "Chemical", or "Chemical Class"). The
values in the table are the number of sites where the EAR exceeded the
user-defined EAR hit_threshold in that endpoint/category combination. If the
category "Chemical" is chosen, an "info" link is provided to the
chemical information available in the "Comptox Dashboard"
https://comptox.epa.gov/dashboard/.
endpoint_hits_DT( chemical_summary, category = "Biological", mean_logic = FALSE, sum_logic = TRUE, hit_threshold = 0.1, include_links = TRUE ) endpoint_hits( chemical_summary, category = "Biological", mean_logic = FALSE, sum_logic = TRUE, hit_threshold = 0.1 )
endpoint_hits_DT( chemical_summary, category = "Biological", mean_logic = FALSE, sum_logic = TRUE, hit_threshold = 0.1, include_links = TRUE ) endpoint_hits( chemical_summary, category = "Biological", mean_logic = FALSE, sum_logic = TRUE, hit_threshold = 0.1 )
chemical_summary |
Data frame from |
category |
Character. Either "Biological", "Chemical Class", or "Chemical". |
mean_logic |
Logical. |
sum_logic |
Logical. |
hit_threshold |
Numeric. EAR threshold defining a "hit". |
include_links |
Logical. whether or not to include a link to the ToxCast dashboard. Only needed for the "Chemical" category. |
The tables show slightly different results when choosing to explore data from a single site rather than all sites. The value displayed in this instance is the number of samples with hits rather than the number of sites with hits.
data frame with one row per endpoint that had a hit (based on the hit_threshold). The columns are based on the category.
# This is the example workflow: path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) hits_df <- endpoint_hits(chemical_summary, category = "Biological") endpoint_hits_DT(chemical_summary, category = "Biological") endpoint_hits_DT(chemical_summary, category = "Chemical Class") endpoint_hits_DT(chemical_summary, category = "Chemical")
# This is the example workflow: path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) hits_df <- endpoint_hits(chemical_summary, category = "Biological") endpoint_hits_DT(chemical_summary, category = "Biological") endpoint_hits_DT(chemical_summary, category = "Chemical Class") endpoint_hits_DT(chemical_summary, category = "Chemical")
Open an interactive app in a browser. Using this function is a quick and convenient way to explore data. For more customization, the R-code to produce each graph and table is displayed in the app. That is a good starting-point for a custom analysis.
explore_endpoints(browse = TRUE)
explore_endpoints(browse = TRUE)
browse |
Logical. Use browser for running Shiny app. |
This function provides a mechanism to specify 3 levels of information in the
supplied data frame end_point_info
to be used in subsequent analysis steps.
First, the user specifies the ToxCast assay annotation using the 'groupCol'
argument, which is a column header in 'end_point_info'. Second, the user
specifies the families of assays to exclude. Finally, the user can choose to
remove specific group(s) from the category. The default is to remove
'Background Measurement' and 'Undefined'. Choices for this should be
reconsidered based on individual study objectives.
filter_groups( ep, groupCol = "intended_target_family", remove_assays = c("BSK"), remove_groups = c("Background Measurement", "Undefined") )
filter_groups( ep, groupCol = "intended_target_family", remove_assays = c("BSK"), remove_groups = c("Background Measurement", "Undefined") )
ep |
Data frame containing Endpoint information from ToxCast |
groupCol |
Character name of ToxCast annotation column to use as a group category |
remove_assays |
Vector of assays to EXCLUDE in the data analysis. By default, the "BSK" (BioSeek) assay is removed. |
remove_groups |
Vector of groups within the selected 'groupCol' to remove. |
The default category ('groupCol') is 'intended_target_family'. Depending on the study, other categories may be more relevant. The best resource on these groupings is the "ToxCast Assay Annotation Data User Guide". It defines "intended_target_family" as "the target family of the objective target for the assay". Much more detail can be discovered in that documentation.
end_point_info <- end_point_info cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) head(filtered_ep)
end_point_info <- end_point_info cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) head(filtered_ep)
See vignette("Setting up toxEval package data", package = "toxEval")
for more information on how the data was aggregated.
data frame
head(flags)
head(flags)
The get_ACC
function retrieves the activity concentration at cutoff
(ACC) values for specified chemicals.
get_ACC(CAS)
get_ACC(CAS)
CAS |
Vector of CAS. |
The function get_ACC
will
convert the ACC values in the ToxCast database from units of (M)
to units of
g/L, and reformat the data as input to toxEval.
data frame with columns CAS, chnm, flags, endPoint, ACC, MlWt, and ACC_value
CAS <- c("121-00-6", "136-85-6", "80-05-7", "84-65-1", "5436-43-1", "126-73-8") ACC <- get_ACC(CAS) head(ACC)
CAS <- c("121-00-6", "136-85-6", "80-05-7", "84-65-1", "5436-43-1", "126-73-8") ACC <- get_ACC(CAS) head(ACC)
This function computes Exposure:Activity ratios using user-provided measured
concentration data from the output of create_toxEval
,
and joins the data with the activity concentration at cutoff data provided by
ToxCast.Data from ToxCast is included with this package, but alternative
benchmark data can be provided to perform the same "toxEval" analysis.
get_chemical_summary( tox_list, ACC = NULL, filtered_ep = "All", chem_data = NULL, chem_site = NULL, chem_info = NULL, exclusion = NULL )
get_chemical_summary( tox_list, ACC = NULL, filtered_ep = "All", chem_data = NULL, chem_site = NULL, chem_info = NULL, exclusion = NULL )
tox_list |
List with data frames for chem_data, chem_info, chem_site,
and optionally exclusions and benchmarks. Created with |
ACC |
Data frame with columns: CAS, chnm, endPoint, and ACC_value
for specific chemical/endpoint combinations generated using the
|
filtered_ep |
Data frame with columns: endPoints, groupCol. Default is |
chem_data |
Optional data frame with (at least) columns: CAS, SiteID, and Value. Default is |
chem_site |
Optional data frame with (at least) columns: SiteID, and Short Name. Default is |
chem_info |
Optional data frame with (at least) columns: CAS, and class. Default is |
exclusion |
Optional data frame with (at least) columns: CAS and endPoint. Default is |
To use the data provided by the package, a sample workflow is shown below in the examples. The examples include retrieving the ToxCast (ACC) values that are used to calculate EARs, choosing endPoints that should be ignored based on data quality "flags" in the ToxCast database, and removing groups of endPoints that may not be important to the analysis at hand.
a data frame with the columns: CAS, chnm (chemical name as a factor), site, date, EAR, Bio_category, shortName (of site), Class. The output of this function is where you find EAR values for every chemical/endpoint combination.
path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) head(chemical_summary)
path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) head(chemical_summary)
Use this function to create a chemical_summary, but instead
of using any benchmarks, the EAR column is simply
the concentration. The output of this function can be used
in any of the plotting or table functions in the same way
that the output of get_chemical_summary
.
get_concentration_summary( tox_list, chem_data = NULL, chem_site = NULL, chem_info = NULL, tox_names = FALSE )
get_concentration_summary( tox_list, chem_data = NULL, chem_site = NULL, chem_info = NULL, tox_names = FALSE )
tox_list |
List with data frames for chem_data, chem_info, and chem_site.
Created with |
chem_data |
Optional data frame with (at least) columns: CAS, SiteID, and Value. Default is |
chem_site |
Optional data frame with (at least) columns: SiteID, and Short Name. Default is |
chem_info |
Optional data frame with (at least) columns: CAS, and class. Default is |
tox_names |
Logical whether to use the provided chemical names from the ToxCast or not. If there is not a match by CAS, the function will look for a column "Chemical" in the "Chemical" tab. If that column doesn't exist, it will create a (not good!) name. |
a data frame with the columns: CAS, chnm (chemical name as a factor), site, date, EAR (which is just concentration), Bio_category, shortName (of site), Class. The output of this function is where you find EAR values for every chemical/endpoint combination.
path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) chemical_summary_conc <- get_concentration_summary(tox_list) head(chemical_summary_conc) plot_tox_boxplots(chemical_summary_conc, category = "Chemical", x_label = "Concentration [ug/L]" )
path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) chemical_summary_conc <- get_concentration_summary(tox_list) head(chemical_summary_conc) plot_tox_boxplots(chemical_summary_conc, category = "Chemical", x_label = "Concentration [ug/L]" )
A set of functions to prepare the data for boxplots. Often, these functions are used within the plotting functions. They are exported however to allow custom graphs to be created.
graph_chem_data( chemical_summary, ..., manual_remove = NULL, mean_logic = FALSE, sum_logic = TRUE ) tox_boxplot_data( chemical_summary, category = "Biological", manual_remove = NULL, mean_logic = FALSE, sum_logic = TRUE ) side_by_side_data( gd_left, gd_right, left_title = "Left", right_title = "Right" )
graph_chem_data( chemical_summary, ..., manual_remove = NULL, mean_logic = FALSE, sum_logic = TRUE ) tox_boxplot_data( chemical_summary, category = "Biological", manual_remove = NULL, mean_logic = FALSE, sum_logic = TRUE ) side_by_side_data( gd_left, gd_right, left_title = "Left", right_title = "Right" )
chemical_summary |
Data frame from |
... |
Additional group_by arguments. This can be handy for creating facet graphs. |
manual_remove |
Vector of categories to remove. |
mean_logic |
Logical. |
sum_logic |
Logical. |
category |
Character. Either "Biological", "Chemical Class", or "Chemical". |
gd_left |
Data frame that must include the columns chnm, Class, and either EAR or meanEAR. |
gd_right |
Data frame that must include the columns chnm, Class, and either EAR or meanEAR. |
left_title |
Character that will be associated with the "gd_left" data frame in a column named "guide_side". |
right_title |
Character that will be associated with the "gd_right" data frame in a column named "guide_side". |
The function side_by_side_data will combine two data frames,
either the output of get_chemical_summary
or graph_chem_data
,
into a single data frame. The important work here is that the chemicals
and classes factor levels are ordered primarily based on "gd_left", but
include "gd_right" when the contents are mismatched.
path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) # Let's say we want to compare 2 chemical summaries # We'll look at one summing EARs, and with concentrations # First, we need a chemical summary for concentrations: chemical_summary_conc <- get_concentration_summary(tox_list) gd_tox <- graph_chem_data(chemical_summary) gd_conc <- graph_chem_data(chemical_summary_conc) ch_combo <- side_by_side_data(gd_tox, gd_conc, left_title = "ToxCast", right_title = "Concentrations" ) plot_chemical_boxplots(ch_combo, guide_side, x_label = "" ) + ggplot2::facet_grid(. ~ guide_side, scales = "free_x")
path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) # Let's say we want to compare 2 chemical summaries # We'll look at one summing EARs, and with concentrations # First, we need a chemical summary for concentrations: chemical_summary_conc <- get_concentration_summary(tox_list) gd_tox <- graph_chem_data(chemical_summary) gd_conc <- graph_chem_data(chemical_summary_conc) ch_combo <- side_by_side_data(gd_tox, gd_conc, left_title = "ToxCast", right_title = "Concentrations" ) plot_chemical_boxplots(ch_combo, guide_side, x_label = "" ) + ggplot2::facet_grid(. ~ guide_side, scales = "free_x")
The hits_by_groupings_DT
(DT option) and
hits_by_groupings
(data frame option) functions create tables
with one row per category("Biological", "Chemical", or "Chemical Class").
The columns indicate the "Biological" groupings. The values in the table
signify how many sites have samples with EARs that exceeded the hit_threshold
for that particular "Biological"/category combination. If the user chooses
"Biological" as the category, it is a simple 2-column table of "Biological"
groupings and number of sites (nSites).
hits_by_groupings_DT( chemical_summary, category = "Biological", mean_logic = FALSE, sum_logic = TRUE, hit_threshold = 0.1 ) hits_by_groupings( chemical_summary, category, mean_logic = FALSE, sum_logic = TRUE, hit_threshold = 0.1 )
hits_by_groupings_DT( chemical_summary, category = "Biological", mean_logic = FALSE, sum_logic = TRUE, hit_threshold = 0.1 ) hits_by_groupings( chemical_summary, category, mean_logic = FALSE, sum_logic = TRUE, hit_threshold = 0.1 )
chemical_summary |
Data frame from |
category |
Character. Either "Biological", "Chemical Class", or "Chemical". |
mean_logic |
Logical. |
sum_logic |
Logical. |
hit_threshold |
Numeric threshold defining a "hit". |
The tables result in slightly different results for a single site, displaying the number of samples with hits rather than the number of sites.
data frame with one row per category, and one column per Biological grouping.
# This is the example workflow: path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) site_df <- hits_by_groupings(chemical_summary, category = "Biological") hits_by_groupings_DT(chemical_summary, category = "Biological") hits_by_groupings_DT(chemical_summary, category = "Chemical Class") hits_by_groupings_DT(chemical_summary, category = "Chemical")
# This is the example workflow: path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) site_df <- hits_by_groupings(chemical_summary, category = "Biological") hits_by_groupings_DT(chemical_summary, category = "Biological") hits_by_groupings_DT(chemical_summary, category = "Chemical Class") hits_by_groupings_DT(chemical_summary, category = "Chemical")
The hits_summary_DT
(DT option) and hits_summary
(data frame
option) functions create tables information on the number of
hit_threshold
exceedances per site for each individual grouping. The
table has one row per group per site that has hit_threshold
exceedances. For example, if "Biological" is the category, and a
site has EAR levels above the specified hit_threshold
for "DNA
Binding" and "Nuclear Receptors", that site will have 2 rows of data in this
table.
hits_summary_DT( chemical_summary, category = "Biological", sum_logic = TRUE, hit_threshold = 0.1 ) hits_summary(chemical_summary, category, hit_threshold = 0.1, sum_logic = TRUE)
hits_summary_DT( chemical_summary, category = "Biological", sum_logic = TRUE, hit_threshold = 0.1 ) hits_summary(chemical_summary, category, hit_threshold = 0.1, sum_logic = TRUE)
chemical_summary |
Data frame from |
category |
Character. Either "Biological", "Chemical Class", or "Chemical". |
sum_logic |
Logical. |
hit_threshold |
Numeric threshold defining a "hit". |
For each row, there are 4 columns. Site and category (as defined by the category argument) define the row. "Samples with hits" are how many samples exceeded the hit_threshold for the specified category at the specified site. "Number of Samples" indicates how many samples were collected at an individual site based on unique date.
The tables contain slightly different results for evaluation of a single site. There are three columns (the Site column is dropped), and rather than one row per site/category, there is one row per category.
data frame with with one row per unique site/category combination. The columns are site, category, Samples with Hits, and Number of Samples.
data frame with columns "Hits per Sample", "Individual Hits", "nSample", "site", and "category"
# This is the example workflow: path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) stats_group <- hits_summary(chemical_summary, "Biological") hits_summary_DT(chemical_summary, category = "Biological") hits_summary_DT(chemical_summary, category = "Chemical Class") hits_summary_DT(chemical_summary, category = "Chemical")
# This is the example workflow: path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) stats_group <- hits_summary(chemical_summary, "Biological") hits_summary_DT(chemical_summary, category = "Biological") hits_summary_DT(chemical_summary, category = "Chemical Class") hits_summary_DT(chemical_summary, category = "Chemical")
The function make_tox_map
creates a leaflet
map
of the sites. This function places symbols at the location of each site in the data
file that represent the magnitude of EAR (color) and the number of
samples in the data set (size). This is the only function that requires
"dec_lon" and "dec_lat" (decimal longitude and decimal latitude) in the
data frame specified for the chem_site argument.
make_tox_map( chemical_summary, chem_site, category = "Biological", mean_logic = FALSE, sum_logic = TRUE ) map_tox_data( chemical_summary, chem_site, category = "Biological", mean_logic = FALSE, sum_logic = TRUE )
make_tox_map( chemical_summary, chem_site, category = "Biological", mean_logic = FALSE, sum_logic = TRUE ) map_tox_data( chemical_summary, chem_site, category = "Biological", mean_logic = FALSE, sum_logic = TRUE )
chemical_summary |
Data frame from |
chem_site |
Data frame containing the columns SiteID, site_grouping, Short Name, dec_lon, and dec_lat. |
category |
Character. Either "Biological", "Chemical Class", or "Chemical". |
mean_logic |
Logical. |
sum_logic |
Logical. |
The function map_tox_data
calculates the statistics for the map. It
my be useful on it's own.
# This is the example workflow: path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) make_tox_map(chemical_summary, tox_list$chem_site, "Biological") make_tox_map(chemical_summary, tox_list$chem_site, "Chemical Class") make_tox_map(chemical_summary, tox_list$chem_site, "Chemical")
# This is the example workflow: path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) make_tox_map(chemical_summary, tox_list$chem_site, "Biological") make_tox_map(chemical_summary, tox_list$chem_site, "Chemical Class") make_tox_map(chemical_summary, tox_list$chem_site, "Chemical")
The plot_tox_boxplots
function creates a set of boxplots representing EAR
values computed with the get_chemical_summary
function, and
dependent on the choice of several input options. See "Summarizing the data"
in the Introduction vignette: vignette("Introduction", package = "toxEval")
.
for a description of how the EAR values are computed, aggregated,
and summarized. Choosing "Chemical Class" in the category argument
will generate separate boxplots for each unique class. "Chemical" will generate
boxplots for each individual chemical, and "Biological" will generate boxplots
for each group in the selected ToxCast annotation.
plot_chemical_boxplots( chemical_summary, ..., manual_remove = NULL, mean_logic = FALSE, sum_logic = TRUE, plot_ND = TRUE, font_size = NA, title = NA, x_label = NA, palette = NA, hit_threshold = NA, site_counts = "count" ) plot_tox_boxplots( chemical_summary, category = "Biological", manual_remove = NULL, mean_logic = FALSE, sum_logic = TRUE, plot_ND = TRUE, font_size = NA, title = NA, x_label = NA, palette = NA, hit_threshold = NA )
plot_chemical_boxplots( chemical_summary, ..., manual_remove = NULL, mean_logic = FALSE, sum_logic = TRUE, plot_ND = TRUE, font_size = NA, title = NA, x_label = NA, palette = NA, hit_threshold = NA, site_counts = "count" ) plot_tox_boxplots( chemical_summary, category = "Biological", manual_remove = NULL, mean_logic = FALSE, sum_logic = TRUE, plot_ND = TRUE, font_size = NA, title = NA, x_label = NA, palette = NA, hit_threshold = NA )
chemical_summary |
Data frame from |
... |
Additional group_by arguments. This can be handy for creating facet graphs. |
manual_remove |
Vector of categories to remove. |
mean_logic |
Logical. |
sum_logic |
Logical. |
plot_ND |
Logical. Whether or not to plot "Biological" groupings, "Chemical Class" groupings, or "Chemical" that do not have any detections. |
font_size |
Numeric value to adjust the axis font size. |
title |
Character title for plot. Default is NA which produces no title. |
x_label |
Character for x label. Default is NA which produces an automatic label. |
palette |
Vector of color palette for boxplot fill. Can be a named vector to specify specific colors for specific categories. |
hit_threshold |
Numeric threshold defining a "hit". |
site_counts |
This describes how to calculate the left-side counts. Default is "count" which is number of sites with detections. "frequency" would calculate a percentage of detections. This assumes all non-detects have been included as 0 for a value. "none" is the final option which can be to remove those labels. Custom labels could then be constructed after this function is run. |
category |
Character. Either "Biological", "Chemical Class", or "Chemical". |
It is also possible to display a threshold line using the hit_threshold argument. The graph will then include the number of sites with detections, the threshold line, and the number of "hits" indicating how many sites that have EAR values exceeding the hit_threshold.
The graph shows a slightly different result for a single site. For a single site graph, the number of chemicals that were detected and have associated endpoint ACCs represented are displayed.
The functions plot_tox_boxplots
and graph_chem_data
are functions that perform
the statistical calculations to create the plot. graph_chem_data
is specific
to the "Chemical" plot, and plot_tox_boxplots
is for "Biological" and
"Chemical Class".
Box plots are standard Tukey representations. See "Box plot details" in the Basic Workflow vignette:
vignette("basicWorkflow", package = "toxEval")
for more information.
# This is the example workflow: path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) plot_tox_boxplots(chemical_summary, "Biological") plot_tox_boxplots(chemical_summary, "Chemical Class") plot_tox_boxplots(chemical_summary, "Chemical") cbPalette <- c( "#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7" ) graphData <- tox_boxplot_data( chemical_summary = chemical_summary, category = "Biological" ) cbValues <- colorRampPalette(cbPalette)(length(levels(graphData$category))) names(cbValues) <- levels(graphData$category) plot_tox_boxplots(chemical_summary, hit_threshold = 0.1, category = "Biological", palette = cbValues, title = "Maximum EAR per site, grouped by biological activity groupings" ) plot_tox_boxplots(chemical_summary, category = "Chemical", x_label = "EAR" ) single_site <- dplyr::filter(chemical_summary, site == "USGS-04024000") plot_tox_boxplots(single_site, category = "Biological" ) plot_tox_boxplots(single_site, category = "Chemical", hit_threshold = 0.001 )
# This is the example workflow: path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) plot_tox_boxplots(chemical_summary, "Biological") plot_tox_boxplots(chemical_summary, "Chemical Class") plot_tox_boxplots(chemical_summary, "Chemical") cbPalette <- c( "#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7" ) graphData <- tox_boxplot_data( chemical_summary = chemical_summary, category = "Biological" ) cbValues <- colorRampPalette(cbPalette)(length(levels(graphData$category))) names(cbValues) <- levels(graphData$category) plot_tox_boxplots(chemical_summary, hit_threshold = 0.1, category = "Biological", palette = cbValues, title = "Maximum EAR per site, grouped by biological activity groupings" ) plot_tox_boxplots(chemical_summary, category = "Chemical", x_label = "EAR" ) single_site <- dplyr::filter(chemical_summary, site == "USGS-04024000") plot_tox_boxplots(single_site, category = "Biological" ) plot_tox_boxplots(single_site, category = "Chemical", hit_threshold = 0.001 )
The plot_tox_endpoints
function creates a set of boxplots representing EAR
values for each endPoint based on the selected data. A subset of data is first
chosen by specifying a group in the filterBy argument. The
filterBy argument must match one of the unique options in the category.
For example, if the category is "Chemical Class", then the filterBy argument
must be one of the defined "Chemical Class" options such as "Herbicide".
A boxplot is generated for each endPoint. The EAR values that are used to
create the boxplots are the mean or maximum (as defined by mean_logic) for each
site as described in "Summarizing the data"in the Introduction vignette:
vignette("Introduction", package = "toxEval")
.
plot_tox_endpoints( chemical_summary, category = "Biological", filterBy = "All", manual_remove = NULL, hit_threshold = NA, mean_logic = FALSE, sum_logic = TRUE, font_size = NA, title = NA, x_label = NA, palette = NA, top_num = NA )
plot_tox_endpoints( chemical_summary, category = "Biological", filterBy = "All", manual_remove = NULL, hit_threshold = NA, mean_logic = FALSE, sum_logic = TRUE, font_size = NA, title = NA, x_label = NA, palette = NA, top_num = NA )
chemical_summary |
Data frame from |
category |
Either "Biological", "Chemical Class", or "Chemical". |
filterBy |
Character. Either "All" or one of the filtered categories. |
manual_remove |
Vector of categories to remove. |
hit_threshold |
Numeric threshold defining a "hit". |
mean_logic |
Logical. |
sum_logic |
logical. |
font_size |
Numeric to adjust the axis font size. |
title |
Character title for plot. |
x_label |
Character for x label. Default is NA which produces an automatic label. |
palette |
Vector of color palette for fill. Can be a named vector to specify specific color for specific categories. |
top_num |
Integer number of endpoints to include in the graph. If NA, all endpoints will be included. |
Box plots are standard Tukey representations. See "Box plot details" in the Basic Workflow vignette:
vignette("basicWorkflow", package = "toxEval")
for more information.
# This is the example workflow: path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) plot_tox_endpoints(chemical_summary, filterBy = "Cell Cycle", top_num = 10 ) plot_tox_endpoints(chemical_summary, filterBy = "Cell Cycle", top_num = 10, x_label = "EAR" ) plot_tox_endpoints(chemical_summary, category = "Chemical Class", filterBy = "PAHs", top_num = 10, hit_threshold = 0.001 ) plot_tox_endpoints(chemical_summary, category = "Chemical", filterBy = "Atrazine") plot_tox_endpoints(chemical_summary, category = "Chemical", top_num = 10) single_site <- dplyr::filter(chemical_summary, site == "USGS-04024000") plot_tox_endpoints(single_site, category = "Chemical", top_num = 10)
# This is the example workflow: path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) plot_tox_endpoints(chemical_summary, filterBy = "Cell Cycle", top_num = 10 ) plot_tox_endpoints(chemical_summary, filterBy = "Cell Cycle", top_num = 10, x_label = "EAR" ) plot_tox_endpoints(chemical_summary, category = "Chemical Class", filterBy = "PAHs", top_num = 10, hit_threshold = 0.001 ) plot_tox_endpoints(chemical_summary, category = "Chemical", filterBy = "Atrazine") plot_tox_endpoints(chemical_summary, category = "Chemical", top_num = 10) single_site <- dplyr::filter(chemical_summary, site == "USGS-04024000") plot_tox_endpoints(single_site, category = "Chemical", top_num = 10)
The plot_tox_endpoints2
function creates a set of boxplots representing EAR
values for each endPoint based on the selected data. A subset of data is first
chosen by specifying a group in the filterBy
argument. The
filterBy
argument must match one of the unique options in the category.
For example, if the category is "Chemical Class", then the filterBy
argument
must be one of the defined "Chemical Class" options such as "Herbicide".
plot_tox_endpoints2( cs, ..., category = "Chemical", filterBy = "All", manual_remove = NULL, hit_threshold = NA, mean_logic = FALSE, sum_logic = TRUE, font_size = NA, title = NA, x_label = NA, palette = NA, top_num = NA )
plot_tox_endpoints2( cs, ..., category = "Chemical", filterBy = "All", manual_remove = NULL, hit_threshold = NA, mean_logic = FALSE, sum_logic = TRUE, font_size = NA, title = NA, x_label = NA, palette = NA, top_num = NA )
cs |
Data.frame from |
... |
Additional group_by arguments. This can be handy for creating facet graphs. |
category |
Either "Biological", "Chemical Class", or "Chemical". |
filterBy |
Character. Either "All" or one of the filtered categories. |
manual_remove |
Vector of categories to remove. |
hit_threshold |
Numeric threshold defining a "hit". |
mean_logic |
Logical. |
sum_logic |
logical. |
font_size |
Numeric to adjust the axis font size. |
title |
Character title for plot. |
x_label |
Character for x label. Default is NA which produces an automatic label. |
palette |
Vector of color palette for fill. Can be a named vector to specify specific color for specific categories. |
top_num |
Integer number of endpoints to include in the graph. If NA, all endpoints will be included. |
The difference between this function and the
plot_tox_endpoints
is that
the ... arguments allow for customized faceting. To include this in
the original toxEval function, backward compatibility would be broken.
path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) cs <- get_chemical_summary(tox_list, ACC, filtered_ep) cs$guide_side <- "A" cs2 <- cs cs2$guide_side <- "B" cs_double <- rbind(cs, cs2) plot_tox_endpoints2(cs_double, guide_side, top_num = 10 ) + ggplot2::facet_grid(. ~ guide_side, scales = "free_x")
path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) cs <- get_chemical_summary(tox_list, ACC, filtered_ep) cs$guide_side <- "A" cs2 <- cs cs2$guide_side <- "B" cs_double <- rbind(cs, cs2) plot_tox_endpoints2(cs_double, guide_side, top_num = 10 ) + ggplot2::facet_grid(. ~ guide_side, scales = "free_x")
The plot_tox_heatmap
function creates a heat (tile) map with sites on the x-axis,
a specified grouping on the y-axis (defined by the category argument), and color shading
defining the mean or maximum EAR. See "Summarizing the data" in the Introduction vignette:
vignette("Introduction", package = "toxEval")
for a description on how the
EAR values are computed, aggregated, and summarized. The y-axis grouping can be "Biological",
"Chemical Class", or "Chemical". When specifying the "Chemical" option, a secondary y-axis
is automatically included to group chemicals into chemical class. The function computes
default breaks for the color scale to match the spread of the data, but breaks can also
be customized with the breaks argument.
This is a function where it may be ideal to create a custom order to the sites
(for example, west-to-east). See the above section "Custom configuration"
vignette("Introduction", package = "toxEval")
for instructions on how to convert
the character vector sites to a factor with ordered levels.
plot_tox_heatmap( chemical_summary, chem_site, category = "Biological", breaks = c(1e-05, 1e-04, 0.001, 0.01, 0.1, 1, 10), manual_remove = NULL, mean_logic = FALSE, sum_logic = TRUE, plot_ND = TRUE, font_size = NA, title = NA, legend_lab = NA )
plot_tox_heatmap( chemical_summary, chem_site, category = "Biological", breaks = c(1e-05, 1e-04, 0.001, 0.01, 0.1, 1, 10), manual_remove = NULL, mean_logic = FALSE, sum_logic = TRUE, plot_ND = TRUE, font_size = NA, title = NA, legend_lab = NA )
chemical_summary |
Data frame from |
chem_site |
Data frame with columns SiteID, site_grouping, and Short Name. |
category |
Either "Biological", "Chemical Class", or "Chemical". |
breaks |
Numerical vector to define data bins and legend breaks. |
manual_remove |
Vector of categories to remove. |
mean_logic |
Logical. |
sum_logic |
Logical. |
plot_ND |
Logical. Whether or not to plot "Biological" groupings, "Chemical Class" groupings, or "Chemical" that do not have any detections. |
font_size |
Numeric value to adjust the axis font size. |
title |
Character title for plot. |
legend_lab |
Character label for legend. Default is NA which produces an automatic label. |
If there are site/parameters (chemical/chemical class/biological grouping) combinations that don't have data, those areas are represented by an "X". If there are 0 values, they are considered "non-detects", and represented with a distinct color.
path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) # Order the site_groupings: tox_list$chem_site$site_grouping <- factor(tox_list$chem_site$site_grouping, levels = c( "Lake Superior", "Lake Michigan", "Lake Huron", "Lake Erie", "Lake Ontario" ) ) # Order sites: sitesOrdered <- c( "StLouis", "Nemadji", "WhiteWI", "Bad", "Montreal", "PresqueIsle", "Ontonagon", "Sturgeon", "Tahquamenon", "Burns", "IndianaHC", "StJoseph", "PawPaw", "Kalamazoo", "GrandMI", "Milwaukee", "Muskegon", "WhiteMI", "PereMarquette", "Manitowoc", "Manistee", "Fox", "Oconto", "Peshtigo", "Menominee", "Indian", "Cheboygan", "Ford", "Escanaba", "Manistique", "ThunderBay", "AuSable", "Rifle", "Saginaw", "BlackMI", "Clinton", "Rouge", "HuronMI", "Raisin", "Maumee", "Portage", "Sandusky", "HuronOH", "Vermilion", "BlackOH", "Rocky", "Cuyahoga", "GrandOH", "Cattaraugus", "Tonawanda", "Genesee", "Oswego", "BlackNY", "Oswegatchie", "Grass", "Raquette", "StRegis" ) tox_list$chem_site$`Short Name` <- factor(tox_list$chem_site$`Short Name`, levels = sitesOrdered ) plot_tox_heatmap(chemical_summary, tox_list$chem_site, category = "Chemical Class") plot_tox_heatmap(chemical_summary, tox_list$chem_site, category = "Chemical", legend_lab = "EAR" ) single_site <- dplyr::filter(chemical_summary, site == "USGS-04024000") plot_tox_heatmap( chemical_summary = single_site, chem_site = dplyr::filter(tox_list$chem_site, SiteID == "USGS-04024000"), category = "Chemical Class" ) plot_tox_heatmap( chemical_summary = single_site, chem_site = dplyr::filter(tox_list$chem_site, SiteID == "USGS-04024000"), category = "Chemical" )
path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) # Order the site_groupings: tox_list$chem_site$site_grouping <- factor(tox_list$chem_site$site_grouping, levels = c( "Lake Superior", "Lake Michigan", "Lake Huron", "Lake Erie", "Lake Ontario" ) ) # Order sites: sitesOrdered <- c( "StLouis", "Nemadji", "WhiteWI", "Bad", "Montreal", "PresqueIsle", "Ontonagon", "Sturgeon", "Tahquamenon", "Burns", "IndianaHC", "StJoseph", "PawPaw", "Kalamazoo", "GrandMI", "Milwaukee", "Muskegon", "WhiteMI", "PereMarquette", "Manitowoc", "Manistee", "Fox", "Oconto", "Peshtigo", "Menominee", "Indian", "Cheboygan", "Ford", "Escanaba", "Manistique", "ThunderBay", "AuSable", "Rifle", "Saginaw", "BlackMI", "Clinton", "Rouge", "HuronMI", "Raisin", "Maumee", "Portage", "Sandusky", "HuronOH", "Vermilion", "BlackOH", "Rocky", "Cuyahoga", "GrandOH", "Cattaraugus", "Tonawanda", "Genesee", "Oswego", "BlackNY", "Oswegatchie", "Grass", "Raquette", "StRegis" ) tox_list$chem_site$`Short Name` <- factor(tox_list$chem_site$`Short Name`, levels = sitesOrdered ) plot_tox_heatmap(chemical_summary, tox_list$chem_site, category = "Chemical Class") plot_tox_heatmap(chemical_summary, tox_list$chem_site, category = "Chemical", legend_lab = "EAR" ) single_site <- dplyr::filter(chemical_summary, site == "USGS-04024000") plot_tox_heatmap( chemical_summary = single_site, chem_site = dplyr::filter(tox_list$chem_site, SiteID == "USGS-04024000"), category = "Chemical Class" ) plot_tox_heatmap( chemical_summary = single_site, chem_site = dplyr::filter(tox_list$chem_site, SiteID == "USGS-04024000"), category = "Chemical" )
The plot_tox_stacks
function creates a set of boxplots representing EAR
values computed with the get_chemical_summary
function, and
dependent on the choice of several input options. See "Summarizing the data"
in the Introduction vignette: vignette("Introduction", package = "toxEval")
for a description on how the EAR values are computed, aggregated, and summarized.
Choosing "Chemical Class" in the category argument will generate separate stacked
bars for each unique class. "Chemical" will generate stacked bars for each individual
chemical, and "Biological" will generate stacked bars for each group in the selected
ToxCast annotation. The legend can optionally be turned on or off using the
include_legend argument. It may be impractical for instance to show the
legend for "Chemical" if there are hundreds of chemicals.
plot_tox_stacks( chemical_summary, chem_site, category = "Biological", mean_logic = FALSE, sum_logic = TRUE, manual_remove = NULL, include_legend = TRUE, font_size = NA, title = NA, y_label = NA, top_num = NA )
plot_tox_stacks( chemical_summary, chem_site, category = "Biological", mean_logic = FALSE, sum_logic = TRUE, manual_remove = NULL, include_legend = TRUE, font_size = NA, title = NA, y_label = NA, top_num = NA )
chemical_summary |
Data frame from |
chem_site |
Data frame with at least columns SiteID, site_grouping, and Short Name. |
category |
Character. Either "Biological", "Chemical Class", or "Chemical". |
mean_logic |
Logical. |
sum_logic |
Logical. |
manual_remove |
Vector of categories to remove. |
include_legend |
Logical. Used to include legend or not. |
font_size |
Numeric value to adjust the axis font size. |
title |
Character title for plot. |
y_label |
Character for x label. Default is NA which produces an automatic label. |
top_num |
Integer number to include in the graph. If NA, all data will be included. |
The graph displays a slightly different result for a single site. Providing data with only one site displays each individual sample as a stacked bar rather than the mean or maximum for a site.
This is a function where it may be ideal to create a custom order to the sites
(for example, west-to-east). See the above section "Custom configuration"
vignette("Introduction", package = "toxEval")
for instructions on how to convert
the character vector sites to a factor with ordered levels.
# This is the example workflow: path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) plot_tox_stacks(chemical_summary, tox_list$chem_site, "Biological", top_num = 5) plot_tox_stacks(chemical_summary, tox_list$chem_site, "Chemical Class") plot_tox_stacks(chemical_summary, tox_list$chem_site, "Chemical", include_legend = FALSE) plot_tox_stacks(chemical_summary, tox_list$chem_site, "Chemical", top_num = 5, y_label = "EAR") single_site <- dplyr::filter(chemical_summary, site == "USGS-04024000") plot_tox_stacks(single_site, tox_list$chem_site, "Chemical", top_num = 5) plot_tox_stacks(single_site, chem_site = tox_list$chem_site, category = "Chemical", top_num = 5, y_label = "EAR" )
# This is the example workflow: path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) plot_tox_stacks(chemical_summary, tox_list$chem_site, "Biological", top_num = 5) plot_tox_stacks(chemical_summary, tox_list$chem_site, "Chemical Class") plot_tox_stacks(chemical_summary, tox_list$chem_site, "Chemical", include_legend = FALSE) plot_tox_stacks(chemical_summary, tox_list$chem_site, "Chemical", top_num = 5, y_label = "EAR") single_site <- dplyr::filter(chemical_summary, site == "USGS-04024000") plot_tox_stacks(single_site, tox_list$chem_site, "Chemical", top_num = 5) plot_tox_stacks(single_site, chem_site = tox_list$chem_site, category = "Chemical", top_num = 5, y_label = "EAR" )
The rank_sites_DT
(DT option) and rank_sites
(data frame option) functions
create tables with one row per site. Columns represent the maximum or mean EAR
(depending on the mean_logic argument) for each category ("Chemical Class",
"Chemical", or "Biological") and the frequency of the maximum or mean EAR
exceeding a user specified hit_threshold.
rank_sites_DT( chemical_summary, category = "Biological", mean_logic = FALSE, sum_logic = TRUE, hit_threshold = 0.1 ) rank_sites( chemical_summary, category, hit_threshold = 0.1, mean_logic = FALSE, sum_logic = TRUE )
rank_sites_DT( chemical_summary, category = "Biological", mean_logic = FALSE, sum_logic = TRUE, hit_threshold = 0.1 ) rank_sites( chemical_summary, category, hit_threshold = 0.1, mean_logic = FALSE, sum_logic = TRUE )
chemical_summary |
Data frame from |
category |
Character. Either "Biological", "Chemical Class", or "Chemical". |
mean_logic |
Logical. |
sum_logic |
Logical. |
hit_threshold |
Numeric threshold defining a "hit". |
The tables show slightly different results for a single site. Rather than multiple columns for categories, there is now 1 row per category (since the site is known).
data frame with one row per site, and the mas or mean EAR and frequency of hits based on the category.
# This is the example workflow: path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) stats_df <- rank_sites(chemical_summary, "Biological") rank_sites_DT(chemical_summary, category = "Biological") rank_sites_DT(chemical_summary, category = "Chemical Class") rank_sites_DT(chemical_summary, category = "Chemical")
# This is the example workflow: path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" full_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(full_path) ACC <- get_ACC(tox_list$chem_info$CAS) ACC <- remove_flags(ACC) cleaned_ep <- clean_endPoint_info(end_point_info) filtered_ep <- filter_groups(cleaned_ep) chemical_summary <- get_chemical_summary(tox_list, ACC, filtered_ep) stats_df <- rank_sites(chemical_summary, "Biological") rank_sites_DT(chemical_summary, category = "Biological") rank_sites_DT(chemical_summary, category = "Chemical Class") rank_sites_DT(chemical_summary, category = "Chemical")
Through the ToxCast program quality assurance procedures, information
is examined and at times, it is necessary to assign a data quality flag
to a specific chemical/assay result. A toxEval user may want to include
or exclude assay results with certain flags depending on the objectives
of a given study. Assay results with specific data quality flags assigned
to them can be removed based on their designated flag with the
remove_flags
function. The flags included in ToxCast, and the associated
flagsShort value (used in the remove_flags function) are as follows:
flag_id | Full Name |
5* | Model directionality questionable |
6* | Only highest conc above baseline, active |
7 | Only one conc above baseline, active |
8 | Multiple points above baseline, inactive |
9 | Bmd > ac50, indication of high baseline variability |
10 | Noisy data |
11* | Borderline |
15* | Gain AC50 < lowest conc & loss AC50 < mean conc |
17 | Less than 50% efficacy |
18* | AC50 less than lowest concentration tested |
13 | Average number of replicates per conc is less than 2 |
14 | Number of concentrations tested is less than 4 |
19 | Cell viability assay fit with gnls winning model |
Asterisks indicate flags removed in the function as default.
remove_flags(ACC, flag_id = c(5, 6, 11, 15, 18))
remove_flags(ACC, flag_id = c(5, 6, 11, 15, 18))
ACC |
data frame with columns: casn, chnm, endPoint, and ACC_value |
flag_id |
vector of flags to to trigger REMOVAL |
CAS <- c("121-00-6", "136-85-6", "80-05-7", "84-65-1", "5436-43-1", "126-73-8") ACC <- get_ACC(CAS) nrow(ACC) # See available flags and associated ids: flags ACC <- remove_flags(ACC) nrow(ACC)
CAS <- c("121-00-6", "136-85-6", "80-05-7", "84-65-1", "5436-43-1", "126-73-8") ACC <- get_ACC(CAS) nrow(ACC) # See available flags and associated ids: flags ACC <- remove_flags(ACC) nrow(ACC)
A "tox_list" object is created from create_toxEval
. It
is a list of 5 data frames: chem_data, chem_info,
chem_site, exclusions, and benchmarks. This function returns
a message with how many chemicals have ToxCast information,
and returns a vector of which chemicals do not have ToxCast information.
## S3 method for class 'toxEval' summary(object, ...)
## S3 method for class 'toxEval' summary(object, ...)
object |
toxEval object with "chem_info" data frame included. |
... |
additional parameters |
path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" excel_file_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(excel_file_path) summary(tox_list)
path_to_tox <- system.file("extdata", package = "toxEval") file_name <- "OWC_data_fromSup.xlsx" excel_file_path <- file.path(path_to_tox, file_name) tox_list <- create_toxEval(excel_file_path) summary(tox_list)
See vignette("Setting up toxEval package data", package = "toxEval")
for more information on how the data was aggregated.
data frame
head(tox_chemicals)
head(tox_chemicals)
vignette("Setting up toxEval package data", package = "toxEval")
for more information on how the data was aggregated.Downloaded on September 2023 from ToxCast. See also https://www.frontiersin.org/articles/10.3389/ftox.2023.1275980/full.
data frame with columns CAS, chnm (chemical name), flags, endPoint, and ACC (value).
https://www.epa.gov/comptox-tools/exploring-toxcast-data
U.S. EPA. 2023. ToxCast & Tox21 Summary Files. Retrieved from https://www.epa.gov/chemical-research/toxicity-forecaster-toxcasttm-data on September 2023.
head(ToxCast_ACC)
head(ToxCast_ACC)