PDSW 2019 Panel A h ouse divided: Why don’t cloud storage and HPC storage share more technology? Brent Welch (Google) Raghu Raja (Amazon) Evan Burness (Microsoft) 2:00pm to 2:40pm, Monday November 18
Brent Welch • Works at Google in GCP (public cloud platform) • My focus is resource management at scale, not core storage systems • Built Sprite Distributed File System in the 1980’s for PhD at UCB • Helped build PanFS Distributed Filesystem in the 2000’s • Helped with OSD-T10 Object Storage Device (OSD) standard • Helped with NFSv4.{1,2} Parallel NFS standard • Created exmh email user interface, tclhttpd web server • Enjoys gardening, footbag, hiking, juggling, etc.
Because POSIX • Serious Cloud HPC users re-write apps to use cloud native storage • Write-once • Non-POSIX namespace • Highly scalable • User-space services implement various semantics (key-value, (no)sql, files) • POSIX is useful for the long tail • Lots of “dusty deck” applications that (think they) need POSIX and can be served well by a single NFS server • There are various solutions that map POSIX (or NFS) to cloud buckets
Raghu nath Raja chandrasekar • AWS : HPC instance software, EFA, FSx for Lustre … • Cray : Object storage for HPC, DataWarp, Lustre … • OSU : MVAPICH MPI, Checkpointing, Networking… • LLNL : SCR, in- memory checkpointing, filesystems…
Observations / Discussion starters • Title of the panel • Two communities coming together • Shoehorning in tech? • When to reuse, when to reinvent • Drawing from HPC+ML • Design patterns and principles
Evan Burness Mic Microso soft Azure e (2017 2017-2019) 2019) Principal Program Manager for H-series VMs for High Performance Computing Cycle Co Cy Compu puting (2016-2017) 2017) Director for High-Performance Computing Na Nationa nal C Cent nter f for S Sup upercomput uting ng A Applications ns ( (Uni niv. I . Illino nois) ( (2009-2016) 2016) Program Manager, Private Sector Program In Interests ts Being a dad to this little guy Duke basketball All the HPC things!
What happens first - the first sustained 1 exaflop FP64 app, or 100 exaflop lower precisions apps? If you believe the latter – should the modeling and simulation community intentionally move to align it’s storage and I/O architectures to what the AI Community will do?
Recommend
More recommend