--- title: "Introduction to toxEval" date: "`r format(Sys.time(), '%d %B, %Y')`" output: rmarkdown::html_vignette: toc: true number_sections: false vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Introduction to toxEval} \usepackage[utf8]{inputenc} --- The `toxEval` R-package includes a set of functions to analyze, visualize, and organize measured concentration data as it relates to [ToxCast data](https://www.epa.gov/comptox-tools/toxicity-forecasting-toxcast) (default) or other user-selected chemical-biological interaction benchmark data such as water quality criteria. The intent of these analyses is to develop a better understanding of the potential biological relevance of environmental chemistry data. Results can be used to prioritize which chemicals at which sites may be of greatest concern. These methods are meant to be used as a screening technique to predict potential for biological influence from chemicals that ultimately need to be validated with direct biological assays. Full documentation of this R package including a tutorial with examples is available here: The functions within this package allow great flexibly for exploring the potential biological affects of measured chemicals. Also included in the package is a browser-based application made from the `Shiny` R-package (the app). The app is based on functions within the R-package and includes many convenient analyses and visualization options for users to choose. Use of the functions within the R-package allows for additional flexibility within the functions beyond what the app offers and provides options for the user to interact more directly with the data. The overview in this document focuses on the R-package. Documentation for the app is provided [here](https://doi-usgs.github.io/toxEval/articles/shinyApp.html). This vignette provides a general overview of the concepts within `toxEval`, definitions of common terminology used throughout the package, and links to information to help understand fundamentals of the ToxCast database used within `toxEval`. ## What is ToxCast? The U.S. EPA's Toxicity Forecaster ToxCast includes a database of chemical-biological interactions that contains information from hundreds of assays on thousands of chemicals, providing a means to assess biological relevance to measured concentrations. The `toxEval` package attempts to simplify the workflow for exploring data as it relates to these assay endpoints (benchmark data). The workflow uses ToxCast as a default for evaluation of chemical:biological interactions, but the user may also define alternative benchmarks for a custom or more traditional approach to biological relevance evaluation. This is also a useful capability for efficient comparison of ToxCast evaluation results with those from other toxicity benchmark databases. When using the ToxCast endPoints for analysis, it is important to have at least a minimal understanding of what ToxCast data is, and which ToxCast data is relevant to any given study. There are many useful resources here. There is also a tool called the Comptox Dashboard that has a wealth of information on ToxCast data. So what are we doing with the user input data and ToxCast? First, we calculate an Exposure-Activity Ratio (EAR) for each measurement. Then we can explore the EARs based on a wide variety of groupings to view the data in many dimensions. ## Exposure-Activity Ratio An Exposure-Activity Ratio (EAR) is defined as the ratio of a measured concentration and a concentration that was determined to cause some activity in a specified ToxCast assay ("endPoint" concentration). An EAR > 1.0 would indicate that the measured concentration is greater than the endpoint concentration. The ToxCast database (as provided in the current version of `toxEval`) provides as many as several hundred endPoints for more than 7000 chemicals. Each endPoint is a single test that was done to detect some form of biological activity. In order to get appropriate EAR results, it is important to use the correct units. The `toxEval` package assumes all measured concentrations are reported in micrograms per liter ($\mu$g/L). ToxCast data is reported in log($\mu$M), so the `toxEval` package automatically performs the unit conversion. A secondary option within `toxEval` is for the user to provide a set of "benchmark concentrations" to define custom biological responses to meet specific study objectives (e.g. water quality criteria). In this case, EAR values are replaced with toxicity quotients. Similar to EAR values, toxicity quotients are defined as the ratio of a measured concentration to the benchmark concentration. EAR is basically synonymous with bioanalytical equivalent concentrations (BEQ). EAR is a ratio, and BEQ is a measure of concentration, but both convey the same information. ## What is an "endPoint"? ToxCast uses high-throughput assays to create concentration-response curves for each of these chemical: endPoint combinations. An endPoint is "associated with the perturbation of specific biological processes identified for the confirmation or monitoring of predicted site-specific hazards" Blackwell 2017. That means a specific biological action was tested, and the concentration at which activity was observed was determined. Of several endpoint values provided within the ToxCast database, the activity concentration at cutoff (ACC) was chosen to compute EAR values within the `toxEval` package, consistent with the description in Blackwell, 2017. ACC values from the ToxCast database are provided within the `toxEval` package. ## What is a "group"? Often, it is valuable to consider aggregations of single endPoints in evaluation efforts. ToxCast has provided tables that group individual endPoints into generalized categories for functional use. The grouping summary table is included in `toxEval` and can be explored via the `end_point_info` data: ```{r end_point_info, eval=FALSE} library(toxEval) end_point_info <- end_point_info ``` See the help file `?end_point_info` for specifics on how the table was downloaded. Throughout the `toxEval` analysis, there are graphing and table functions that will summarize EARs based on either "Biological" groupings (as defined by a group of endPoints) or "Chemical Class" groupings (as defined by a group of chemicals). The default grouping of ToxCast endPoints is "intended_target_family", but depending on the analysis, it may be more appropriate to use other grouping categories. To change the default, specify a grouping in the `groupCol` argument of the `filter_groups` function. For example: ```{r eval=FALSE} filtered_ep <- filter_groups(end_point_info, groupCol = "intended_target_family", assays = c("ATG","NVS", "OT", "TOX21", "CEETOX", "APR", "CLD", "TANGUAY", "NHEERL_PADILLA","NCCT_SIMMONS", "ACEA"), remove_groups = c("Background Measurement", "Undefined")) ``` What is happening here is that the supplied data frame `end_point_info` is filtered to just the "intended_target_family" group. Additionally, the assay BioSeek is removed (it is the only one not included in the possible list of assays). Finally, of all the groups within intended_target_family, the end points designated "Undefined" and "Background Measurement" are removed. ## Summarizing the data{#summarize_data} The functions in `toxEval` summarize the data as follows: First, individual EAR values are calculated for each chemical:endPoint combination. Then, the EAR values are summed together by samples (a sample is defined as a unique site/date) based on the grouping picked in the "category" argument. Categories include "Biological", "Chemical Class", or "Chemical". "Biological" refers to the chosen ToxCast annotation as defined in the `groupCol` argument of the `filter_groups` function. "Chemical Class" refers to the groupings of chemicals as defined in the "Class" column of the "Chemicals" sheet of the input file. "Chemical" refers to the individual chemical as defined by a unique CAS value. Finally, the maximum or mean EAR is calculated per site (based on the `mean_logic` option). This ensures that each site is represented equally regardless of how many samples are available per site. Some functions will also include a calculation for a "hit". A threshold is defined by the user, and if the mean or maximum EAR (calculated as described above) is greater than the threshold, that is considered a "hit". ## Reporting bugs If you discover an issue that you feel is a bug in the package or have a question on functionality, please consider reporting bugs and asking questions on the Issues page: ## Citing toxEval ```{r, eval=TRUE} citation(package = "toxEval") ```