Using a Hybrid Cray S Supercomputer to M Model N Non-Icing - PowerPoint PPT Presentation

DRAFT Using a Hybrid Cray S Supercomputer to M Model N Non-Icing Surfaces f for Cold- Climate Wind Turbines Accelerating Three-Body Potentials using GPUs NVIDIA Tesla K20X GE Global Research Masako Yamada

DRAFT Opportunity in Cold-Climate Wind Wind energy production > 285 GW/year and growing Cold regions favorable • • Lower human population Good wind conditions • 45-50 GW opportunity from 2013-2017 ~$2million/MW installed • Technical need • Anti-icing surfaces • 3-10% energy losses due to icing • Shut-downs • • Active heating expensive VTT Tech chnica cal Research ch Centre of Finland http://www.vtt.fi/news/2013/280520 13_wind_energy.jsp?lang=en 2 GE Title or job number 11/11/2013

DRAFT ALCC Awards 40 + 40 million hours DOE ASCR Leadership Computing Challenge Awards Energy-relevant applications 1. Non-Icing Surfaces for Cold-Climate Wind Turbines Jaguar (Cray XK6) at Oak Ridge National Lab • • Molecular dynamics using LAMMPS 1 million mW water molecule droplets on engineered surfaces • Completed >300 simulations • • Achieved >200x speedup from 2011 to 2013 • >5x from GPU acceleration 2. Accelerated Non-Icing Surfaces for Cold-Climate Wind Turbines Titan (Cray XK7, hybrid) at Oak Ridge National Lab • • “Time parallelization” via Parallel Replica method Expected 10 – 100x faster results • 3 GE Title or job number 11/11/2013

DRAFT Titan enables leadership-class study Size of simulation ~ 1 million molecules • • Droplet size >> critical nucleus size • Mimic physical dimensions (*somewhat) Duration of simulation ~ 1 microsecond • • Nucleation is an activated process • Freezing rarely observed in MD simulations Number of simulations ~ 100’s • • Study requires “embarrassingly parallel” runs • Different surfaces, ambient temperatures, conductivity • Multiple replicates required due to stochastic nature 4 *million molecule droplet ~ 50nm diameter GE Title or job number 11/11/2013

LOGO DRAFT Personal history with MD Year Software/Language # of Molecu cules Hardware 1995 Pascal Few Desktop Mac 2000 C, Fortran90 Hundreds IBM SP, SGI O2K 2010 NAMD, LAMMPS 1000’s Linux HPC Present GPU-enabled LAMMPS Millions Titan 1995 2000 2013 5 GE Title or job number 11/11/2013

DRAFT >200x overall speedup since 2011 1. Switched to mW water potential 3-body model is more expensive/complex than 2-body but Particle reduction – at least 3x • • Timestep increase – 10x No long-range forces • 2. LAMMPS dynamic load balance – 2-3x 3. GPU acceleration of 3-body model – 5x 2011: 6 femtosecond/1024 CPU-second (SPC/E) 2013: 2 picoseconds/1024 CPU-second (mW) 6 GE Title or job number 11/11/2013

DRAFT 1. mW water potential Stillinger Weber 3-body particle = one water molecule • Introduced in 2009, Nature paper in 2011 Bulk water properties comparable or better than existing • point-charge models • Much faster than point-charge models Exemplary test case by authors: 180x faster than SPC/E • GE production simulation: 40-50x faster than SPC/E • asymmetric million molecule droplet on engineered surface; loaded onto 64 nodes SPC/E mW 7 GE Title or job number 11/11/2013

DRAFT 2. LAMMPS dynamic load balance Introduced in 2012 Adjusts size of processor sub-domains to equalize number of particles 2-3x speedup for 1 million molecule droplets on 64 nodes (with user-specified processor mapping) No load balancing Default load balancing User-specified mapping 8 GE Title or job number 11/11/2013

DRAFT 3. GPU-acceleration of 3-body potential See details W. Michael Brown and Masako Yamada Implementing Molecular Dynamics on Hybrid High Performance Computers – Three-Body Potentials. Computer Physics Communications. 2013.Computer Physics Communications, (2013) 9 GE Title or job number 11/11/2013

DRAFT Load 1 million molecules on Host/CPU 1 million molecules 64 nodes • • Processor sub-domains correspond to “spatial” + partitioning of droplet + + + • 8 MPI tasks/node 1 core/paired-unit • 10 GE Title or job number 11/11/2013

DRAFT Per node ~ 15,000 molecules Host Accelerator AMD Opteron 6274 CPU NVIDIA Tesla K20X GPU “Kernel” Core0 Core 1 Private Local Memory Core1 Processor Core 2 Private Core2 1 Core3 Core 192 Private Core4 Core5 Host Memory Core6 Processor Core7 Global Memory Work iterm Work item Work item Work item Work item Work item Work item Work Group 2 Core8 Core9 Core10 …. Core11 Core12 Core13 Processor Core14 14 Core15 11 GE Title or job number Work item = fundamental unit of activity 11/11/2013

DRAFT Parallelization in LAMMPS Accelerator Host Time integration 3-body potential Thermostat/barostat Neighbor-lists Bond/angle calculations Statistics 12 GE Title or job number 11/11/2013

DRAFT Generic 3-body potential 𝑉 = 𝜚 𝒒 𝑗 , 𝒒 𝑘 , 𝒒 𝑙 𝑠 𝑗𝑘 < 𝑠 𝑑 , 𝑠 𝑗𝑙 < 𝑠 𝑑 𝑗 𝑘≠𝑗 𝑙>𝑘 0 otherwise 𝑠 𝑗𝑘 j Good ca candidate for GPU i 1. Occupies majority of 𝑠 𝑗𝑙 computational time 2. Can be decomposed k into independent 𝒒 𝑗 𝒒 𝑘 kernels/work-items 𝒒 𝑙 𝑠 𝑑 = cutoff Stillinger-Weber MEAM 𝑠 𝛽 = neighbor Tersoff (0,0,0) REBO/AIREBO skin Bond- order… 13 GE Title or job number 11/11/2013

DRAFT Redundant Computation Approach Atom-decomposition • 1 atom  1 computational kernel only • fewest operations (and effective parallelization) but – shared memory access a bottleneck Force-decomposition • 1 atom  3 computational kernels required • redundant computations but – reduced shared memory issues – many work-items = more effective use of cores 14 GE Title or job number 11/11/2013

DRAFT Stillinger-Weber Parallelization 𝑉 = 𝜚 2 (𝑠 𝑗𝑘 ) + 𝜚 3 𝑠 𝑗𝑘 , 𝑠 𝑗𝑙 , 𝜄 𝑘𝑗𝑙 𝑗 𝑘<𝑗 𝑗 𝑘≠𝑗 𝑙>𝑘 2-body operations 3 kernels 3-body operations Atom 𝑗 ( 𝑠 𝑗𝑘 < 𝑠 𝛽 ) .AND. ( 𝑠 𝑗𝑙 < 𝑠 𝛽 ) == .TRUE. no data update forces on i only dependencies 3-body operations ( 𝑠 𝑗𝑘 < 𝑠 𝛽 ) .AND. ( 𝑠 𝑗𝑙 < 𝑠 𝛽 ) == .FALSE. neighbor-of-neighbor interactions 15 GE Title or job number 11/11/2013

DRAFT Neighbor List 3-body force-decomposition approach involves • neighbor-of-neighbor operations Requires additional overhead • • increase in border size shared by two processes neighbor list for ghost atoms “straddling” across cores • GPU implementation not necessarily faster than • CPU but less time spent in host-accelerator data transfer (note: neighbor lists are huge) 16 GE Title or job number 11/11/2013

DRAFT GPU acceleration benefit >5x speedup achieved in production water droplet of 1 million molecules on engineered surface (64 nodes) Not limited to Stillinger-Weber -- applicable to MEAM, Tersoff, REBO, AIREBO, Bond-order, etc. 17 GE Title or job number 11/11/2013

DRAFT Implementation 18 GE Title or job number 11/11/2013

DRAFT 6 different surfaces Interaction potential developed at GE Global Research 19 GE Title or job number 11/11/2013

DRAFT Freezing front propagation Visualization of “latent heat” release 20 GE Title or job number 11/11/2013

DRAFT Visualizing crystalline regions particle mobility Steinhardt-Nelson order parameter Side View Bottom View

DRAFT Advanced visualization Mike Matheson, Oak Ridge National Lab Will include visuals/movies here 22 GE Title or job number 11/11/2013

DRAFT Next steps Quasi “ t ime parallelization” using Parallel Replica • Method Launch dozens of replicates simultaneously; monitor • ensemble behavior Expected outcome: 10-100x faster results • Analysis and application of simulation results • 23 GE Title or job number 11/11/2013

DRAFT Credits • Mike Brown (ORNL) – GPU acceleration • Paul Crozier (Sandia) – dynamic load balancing Valeria Molinero (Utah) – mW potential • Aaron Keyes (Umich, Berkeley) – Steinhardt-Nelson order parameters • • Art Voter/Danny Perez (LANL) – Parallel Replica method • Mike Matheson (ORNL) -- Visualization Jack Wells, Suzy Tichenor (ORNL) – General • Azar Alizadeh, Branden Moore, Rick Arthur, Margaret Blohm (GE Global Research) • This research was conducted in part under the auspices of the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy under Contract No. DEAC05-00OR22725 with UT- Battelle, LLC. This research was also conducted in part under the auspices of the GE Global Research High Performance Computing program. 24 GE Title or job number 11/11/2013

Using a Hybrid Cray S Supercomputer to M Model N Non-Icing - PowerPoint PPT Presentation

DRAFT Using a Hybrid Cray S Supercomputer to M Model N Non-Icing Surfaces f for Cold- Climate Wind Turbines Accelerating Three-Body Potentials using GPUs NVIDIA Tesla K20X GE Global Research Masako Yamada DRAFT Opportunity in

Cray Lustre Model Roadmap Cory Spitz and Derek Robb Cray Inc. 5/24/2011 Introduction and Agenda

Application Performance Tuning on Cray XT Systems Luiz DeRose John Levesque PE Director CSCE

The Cray 1 Time line 1969 -- CDC Introduces 7600, designed by cray. 1972 -- Design of the

FFT libraries on Cray XT: CRay Adaptive FFT (CRAFFT) Jonathan Bentz Cray Inc. Outline

Introducing the Cray XMT Petr Konecny November 29 th 2007 Agenda Shared memory programming

Howard Pritchard and Igor Gorodetsky Cray, Inc. Cray User Group Conference 2011 1 Cray User

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Cray XT in SecondLife By Jason Tan, Western Australian Supercomputer Program at The University

I/O Performance on Cray XC30 Zhengji Zhao 1) , Doug Petesch 2) , David Knaak 2) , and Tina Declerck

Dave Strenski, Cray Inc. Cray User Group, Atlanta 5-5-09 Storaasli - MRSC - 29 M 07 3 FPGA

Detecting Application Load Imbalance on Cray Systems Heidi Poxon Technical Lead, Performance

Environment (CLE) Performance Jeff Larkin Jeff Kuehn Cray Inc. ORNL <larkin@cray.com>

Cray I/O Software Enhancements Tom Edwards tedwards@cray.com C O M P U T E | S T O R E

Application Characteristics and Performance on a Cray XE6 Performance on a Cray XE6 Courtenay T.

GTC Overflow PARQUET Cray Inc. Confidential Slide 2 Cray has a long tradition of

AIRLINK 2085 Ramp Communication Equipment Content 1. Globalsys Presentation 2. Airlink 2085

Verification of DICAST Temperature Forecasts for Boulder Municipal Airport Jeffery K. Lazo

Urban Snow and Ice Control Robert Kirby Director, Public Operations www.woodbuffalo.ab.ca

"Introducing the Nordic Forum for Wind Energy Research" "Wind Turbine Icing -

BC HYDRO WIND RESOURCE OPTION ENGAGEMENT DECEMBER 15,2014 WIND RESOURCE ENGAGEMENT AGENDA

Company Presentation Agenda 1. This is Swissport 2. Our Service Offerings 3. The Hub Concept 4.

Wind Development in Yukon Unique technical matters in wind development in Yukon Presentation

NASA Glenn: Enabling NASA Missions Today and Tomorrow Dr. Janet Kavandi Director, NASA Glenn

Using a Hybrid Cray S Supercomputer to M Model N Non-Icing - PowerPoint PPT Presentation

DRAFT Using a Hybrid Cray S Supercomputer to M Model N Non-Icing Surfaces f for Cold- Climate Wind Turbines Accelerating Three-Body Potentials using GPUs NVIDIA Tesla K20X GE Global Research Masako Yamada DRAFT Opportunity in

Cray Lustre Model Roadmap Cory Spitz and Derek Robb Cray Inc. 5/24/2011 Introduction and Agenda

Application Performance Tuning on Cray XT Systems Luiz DeRose John Levesque PE Director CSCE

The Cray 1 Time line 1969 -- CDC Introduces 7600, designed by cray. 1972 -- Design of the

FFT libraries on Cray XT: CRay Adaptive FFT (CRAFFT) Jonathan Bentz Cray Inc. Outline

Introducing the Cray XMT Petr Konecny November 29 th 2007 Agenda Shared memory programming

Howard Pritchard and Igor Gorodetsky Cray, Inc. Cray User Group Conference 2011 1 Cray User

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Cray XT in SecondLife By Jason Tan, Western Australian Supercomputer Program at The University

I/O Performance on Cray XC30 Zhengji Zhao 1) , Doug Petesch 2) , David Knaak 2) , and Tina Declerck

Dave Strenski, Cray Inc. Cray User Group, Atlanta 5-5-09 Storaasli - MRSC - 29 M 07 3 FPGA

Detecting Application Load Imbalance on Cray Systems Heidi Poxon Technical Lead, Performance

Environment (CLE) Performance Jeff Larkin Jeff Kuehn Cray Inc. ORNL &lt;larkin@cray.com&gt;

Cray I/O Software Enhancements Tom Edwards tedwards@cray.com C O M P U T E | S T O R E

Application Characteristics and Performance on a Cray XE6 Performance on a Cray XE6 Courtenay T.

GTC Overflow PARQUET Cray Inc. Confidential Slide 2 Cray has a long tradition of

AIRLINK 2085 Ramp Communication Equipment Content 1. Globalsys Presentation 2. Airlink 2085

Verification of DICAST Temperature Forecasts for Boulder Municipal Airport Jeffery K. Lazo

Urban Snow and Ice Control Robert Kirby Director, Public Operations www.woodbuffalo.ab.ca

&quot;Introducing the Nordic Forum for Wind Energy Research&quot; &quot;Wind Turbine Icing -

BC HYDRO WIND RESOURCE OPTION ENGAGEMENT DECEMBER 15,2014 WIND RESOURCE ENGAGEMENT AGENDA

Company Presentation Agenda 1. This is Swissport 2. Our Service Offerings 3. The Hub Concept 4.

Wind Development in Yukon Unique technical matters in wind development in Yukon Presentation

NASA Glenn: Enabling NASA Missions Today and Tomorrow Dr. Janet Kavandi Director, NASA Glenn

Environment (CLE) Performance Jeff Larkin Jeff Kuehn Cray Inc. ORNL <larkin@cray.com>

"Introducing the Nordic Forum for Wind Energy Research" "Wind Turbine Icing -