us epa comptox chemicals dashboard to support mass
play

US-EPA Comptox Chemicals Dashboard to support mass spectrometry - PowerPoint PPT Presentation

http://www. orcid.org/0000-0002-2668-4821 US-EPA Comptox Chemicals Dashboard to support mass spectrometry targeted and non-targeted analysis Antony Williams 1 , Alex Chao 2 , Tom Transue 3 , Tommy Cathey 3 , Elin Ulrich 4 and Jon Sobus 4 1)


  1. http://www. orcid.org/0000-0002-2668-4821 US-EPA Comptox Chemicals Dashboard to support mass spectrometry targeted and non-targeted analysis Antony Williams 1 , Alex Chao 2 , Tom Transue 3 , Tommy Cathey 3 , Elin Ulrich 4 and Jon Sobus 4 1) National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC 2) Oak Ridge Institute of Science and Education (ORISE) Research Participant, RTP, NC 3) General Dynamics Information Technology, RTP, NC 4) National Exposure Research Laboratory, U.S. Environmental Protection Agency, RTP, NC The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA August 2019 ACS Fall Meeting, San Diego

  2. CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard 875k Chemical Substances 1

  3. Detailed Chemical Pages 2

  4. Sources of Exposure to Chemicals 3

  5. Physicochemical properties and environmental fate and transport 4

  6. CompTox Chemicals Dashboard • Can provide access to toxicity, environmental fate and transport and metabolism data • Individual chemicals can map to degradation products and metabolites • • Advanced searches support mass and formula searches 5

  7. Link farm to public resources 6

  8. MassBank of North America https://mona.fiehnlab.ucdavis.edu 7

  9. Toxicity Estimation Software Tool (TEST) Real Time Predictions 8

  10. Mass & Formula Searching 9

  11. Advanced Searches Mass Search 10

  12. Advanced Searches Mass Search 11

  13. MS-Ready Structures for Formula Search 12

  14. “MS-Ready Structures” https://doi.org/10.1186/s13321-018-0299-2 13

  15. 14

  16. MS-Ready Mappings 15

  17. MS-Ready Mappings Set 16

  18. MS-Ready Mappings • EXACT Formula : C10H16N2O8: 3 Hits 17

  19. MS-Ready Mappings • Same Input Formula: C10H16N2O8 • MS Ready Formula Search: 125 Chemicals 18

  20. MS-Ready Mappings • 125 chemicals returned in total – 8 of the 125 are single component chemicals – 3 of the 8 are isotope-labeled – 3 are neutral compounds and 2 are charged 19

  21. Candidate ranking 20

  22. Data Source Ranking of “known unknowns” C14H22N2O3 266.16304 • Mass and/or formula is for an unknown chemical but contained within a reference database Chemical Reference Database • Most likely candidate chemicals have the most associated data sources, most associated lit. Sorted candidate articles or both structures 21

  23. Is a bigger database better? • ChemSpider was 26 million chemicals then • Much BIGGER today • Is bigger better?? 22

  24. Using Metadata for Ranking • Use available metadata to rank candidates – Associated data sources • Associated lists in the underlying database • Associated data sources in PubChem • Specific types (e.g. water, surfactants, pesticides etc.) – Number of associated literature articles (Pubmed) – Chemicals in the environment – the number of products/categories containing the chemical is a very important source of data 23

  25. Identification ranks for 1783 chemicals using multiple data streams DS: Data Sources PC: PubChem PM: PubMed STOFF: DB KEMI: DB 24

  26. Comparing Search Performance • Dashboard content was 720k chemicals • Only 3% of ChemSpider size • What was the comparison in performance? 25

  27. SAME dataset for comparison 26

  28. How did performance compare? For the same 162 chemicals, Dashboard outperforms ChemSpider 27

  29. How did performance compare? 28

  30. Will the correct Microcystin LR Stand Up? ChemSpider Skeleton Search 29

  31. Comparing ChemSpider Structures 30

  32. Comparing ChemSpider Structures 31

  33. Other Searches 32

  34. Batch Searching • Singleton searches are useful but we work with thousands of masses and formulae! • Typical questions – What is the list of chemicals for the formula C x H y O z – What is the list of chemicals for a mass +/- error – Can I get chemical lists in Excel files? In SDF files? – Can I include properties in the download file? 33

  35. Batch Searching Formula/Mass 34

  36. Searching batches using MS-Ready Formula (or mass) searching 35

  37. Related Searches to Support Mass Spectrometry 36

  38. Find me “related structures” Formula-Based Search 37

  39. Select Chemicals of Interest 38

  40. Find me “related structures” Based on Structure Similarity 39

  41. Find me “related structures” Based on Structure Similarity 40

  42. Find me “related structures” Structure Similarity – sort on mass 41

  43. Chemical lists 42

  44. Chemical Lists 43

  45. EPAHFR: Hydraulic Fracturing 44

  46. List of Opioids – Presence in Lists? 45

  47. Batch Search Names Excel Download 46

  48. Batch Search in specific lists 47

  49. API services and Open Data • Available API and web services • Open Data available for download 48

  50. Web Services https://actorws.epa.gov/actorws/ • Dozens of web services to provide access to data • Data in UI, JSON and XML format 49

  51. Example: InChIKey to DTXCIDs https://actorws.epa.gov/actorws/dsstox/v02/msready?identifier =UVOFGKIRTCCNKG-UHFFFAOYSA-N 50

  52. MassBank mapping to Dashboard 51

  53. Benefits of Open Data 52

  54. NORMAN Suspect List Exchange https://www.norman-network.com/?q=node/236 53

  55. Integration to MetFrag in place https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0299-2 54

  56. In Progress 55

  57. Work in Progress • Predicted Spectra for candidate ranking – Viewing and Downloading pre-predicted spectra – Search spectra against the database 56

  58. Predicted Mass Spectra http://cfmid.wishartlab.com/ • MS/MS spectra prediction for ESI+, ESI-, and EI • Predictions generated and stored for >800,000 structures, to be accessible via Dashboard 57

  59. Search Expt. vs. Predicted Spectra

  60. CFM-ID Predicted Library Available • Predictions generated and stored for >700,000 structures • Python code to score experimental vs predicted spectra • Cosine dot product match score calculation Nontargeted screening of wastewater for water reuse using mass spectrometry Current Advances in Water 59 August 26, 2019 Analysis

  61. Prototype Development Structure/substructure search 60

  62. Conclusion • Dashboard access to data for ~875,000 chemicals • MS-Ready data facilitates structure identification • Related metadata facilitates candidate ranking • Relationship mappings and chemical lists of great utility • Dashboard and contents are one part of the solution • New API and Web Services are in development 61

  63. Acknowledgements • NCCT IT development team • Tommy Cathey, ACTOR Web Services • Nancy Baker, Abstract Sifter • Todd Martin & Valery Tkachenko, WebTEST • Kathie Dionisio & Kristin Isaacs, CPDat • Thanks to Emma Schymanski, University of Luxembourg, for coordinating all efforts with the NORMAN Network for curation of lists on the Suspect Exchange

  64. Contact Antony Williams US EPA Office of Research and Development National Center for Computational Toxicology EMAIL: Williams.Antony@epa.gov ORCID : https://orcid.org/0000-0002-2668-4821 63

Recommend


More recommend