construction for next
play

CONSTRUCTION FOR NEXT - GENERATION COLLABORATIVE SCIENCE Matthew - PowerPoint PPT Presentation

R ETHINKING STREAMING SYSTEM CONSTRUCTION FOR NEXT - GENERATION COLLABORATIVE SCIENCE Matthew Wolf, Patrick Widener, Greg Eisenhauer -- and a cast of many more S TREAMING TO SUPPORT NEW SCIENCE -- B IG D ATA S OTHER 4 V S Historically,


  1. R ETHINKING STREAMING SYSTEM CONSTRUCTION FOR NEXT - GENERATION COLLABORATIVE SCIENCE Matthew Wolf, Patrick Widener, Greg Eisenhauer -- and a cast of many more

  2. S TREAMING TO SUPPORT NEW SCIENCE -- B IG D ATA ’ S OTHER 4 V’ S ¢ Historically, a great deal of emphasis has been placed on batch processing of data-at-rest ¢ However, this focus has meant that scientists trying to do interactive or collaborative work have had to work with mismatched tools ¢ In particular, the steering/command and control functions in many scenarios gets short shrift — Collaboration is more than sharing repositories — Discovery, multi-disciplinary viewpoints on data, verification & gatekeeping on data

  3. S TREAMING AT E XASCALE : THE RISE OF IN SITU Legend Workstation Data Movement Orchestrator Monitoring and Control Messages Global Orchestrator Codes Simulation GTS • Workstation GTC-P • Workstation Workstation Orchestrator Orchestrator LAMMPS • Workstation Orchestrator PIConGPU • Pixie3D • S3D • Einstein Toolkit • Analysis Analysis Analysis … • Workstation Workstation Workstation Storage Thanks: Jai Dayal, Scott Klasky, Hasan Abbasi, Fang Zheng, Norbert Podhorski, KarstenSchwan, Manish Parashar, Jay Lofstead…

  4. Z OOM -I N A NALYSIS VMWare, Amazon, DOE Detect E2E Transaction Anomaly Detection Response Time Anomaly DCG 1: Aggregation Lightweight SLO SLO Anomaly Anomaly metrics metrics Detection Detected! Cloud Hosting Web Services FS1 FS2 FS3 AS DS AS DS AS DS 3 3 1 1 2 2 PRN: Network Traffic Trigger Zoom-In Zoom-In Tracing Analytics Heavyweight Casual Path Inference Analysis Bottleneck Identification DCG 2 Localizing DS3 as the source Highly reduced data overhead and focus on problematic area Thanks: Chengwei Wang, Drew Bratcher, KarstenSchwan, and many more.

  5. S OFTWARE S OLUTION : AN E VENT P ROCESSING T OOLKIT • http://evpath.net & http://korvo.gatech.edu/software • EVPath is an Open Source event processing A http://korvo.gatech.edu/projects/MON Matthew Wolf - infrastructure designed for high performance • A component of the SDAV SciDAC institute • Allows the construction of application-level overlay networks with embedded computation MONA - • Fully-typed data flows along the path • Very low overhead self-describing binary data • Dynamic code generation for on-the-fly processing • Flexible network infrastructure allows run-time selection and parameterization of network transport • Toolkit that supports construction of CDN-like, 5 DHT-like, aggregation-tree-like, asynchronous, p2p, or other steering infrastructures

  6. A N I LLUSTRATIVE E XAMPLE : E XPERIMENTAL C OMBUSTION C OLLABORATION • Science goal is to understand the complex dynamics of different fuel mixes, speeds, acoustic interactions, and so on • Use laser probes and cameras at 10k+ frames per second • Inject particles so you can trace fuel, flame, and residue in real time. • Initial process was driven by disk I/O & storage transport Thanks: Tim Lieuwen, Ben Emerson, Vishal Acharya, Jonathan Frank, Akash Gagnil, Drew Bratcher

  7. ¢ Stream processing lets us address a number of critical issues: — Are the lasers properly aligned? Did someone bump something? — Are the particle injectors working correctly? — Are there any obvious experimental defects in the data (i.e. chunks of foam)? — Does this look approximately right for the input parameters (i.e. did someone leave a wrench in the inlet)? — Has the effect we’re looking at saturated? Should we change the next parameter test in the campaign? — Does this line up with what we know from simulation? Should I adapt the campaign to better probe the difference? — Are the Physical Chemists right?

  8. S CI K HAN – AN I NITIAL DEMONSTRATION The interactions between data-in-motion and data-at-rest • (thanks, IBM!) can be complicated. Scientists wanted the stream-based capabilities, but they were • used to a file system interface.

  9. C ONCLUSION ¢ The data management problem is beyond just large Volume. — Streaming has been treated as a corner case for a long time — Critical gap when all 5 V’s (volume, velocity, variety, value and veracity) are in play ¢ Steering and/or control requires highly specialized designs for each of the users — Use a toolkit that allows that customization ¢ Human-in-the-loop, delegated control, etc. ¢ There is a change management problem — The science questions and the way science is conducted can change as the technology shifts

Recommend


More recommend