Accelerated Deep Learning Discovery in Fusion Energy Science William - PowerPoint PPT Presentation

Accelerated Deep Learning Discovery in Fusion Energy Science William M. Tang Princeton University/Princeton Plasma Physics Laboratory (PPPL) NVIDIA GPU TECHNOLOGY CONFERENCE GTC-2018 San Jose, CA March 19 , 2018 Co-authors: Julian Kates-Harbeck (Harvard U/PPPL), Alexey Svyatkovskiy (Princeton U) Eliot Feibush (PPPL/Princeton U), Kyle Felker (Princeton U/PPPL) Joe Abbate (Princeton U), Sunny Qin (Princeton U)

CNN’s “MOONSHOTS for 21 st CENTURY” (Hosted by Fareed Zakaria) – Five segments (Spring, 2015) exploring “exciting futuristic endeavors in science & technology in 21 st Century” (1) Human Mission to Mars (2) 3D Printing of a Human Heart (3) Creating a Star on Earth: Quest for Fusion Energy (4) Hypersonic Aviation (5) Mapping the Human Brain “Creating a Star on Earth” à “takes a fascinating look at how harnessing the energy of nuclear fusion reactions may create a virtually limitless energy source.” Stephen Hawking: ( BBC Interview, 18 Nov. 2016 ) “I would like nuclear fusion to become a practical power source. It would provide an inexhaustible supply of energy, without pollution or global warming.”

APPLICATION FOCUS FOR DEEP LEARNING STUDIES: FUSION ENERGY SCIENCE Most Critical Problem for Fusion Energy à Accurately predict and mitigate large-scale major disruptions in magnetically-confined thermonuclear plasmas such as the ITER –the $25B international burning plasma “tokamak” • Most Effective Approach: Use of big-data-driven statistical/machine-learning predictions for the occurrence of disruptions in world-leading facilities such as EUROFUSION “Joint European Torus (JET)” in UK, DIII-D (US), and other tokomaks worldwide. • Recent Status: 8 years of R&D results (led by JET) using Support Vector Machine Machine Learning on zero-D time trace data executed on CPU clusters yielding success rates in mid-80% range for JET 30 ms before disruptions, BUT > 95% accuracy with false alarm rate < 5% at least 30 milliseconds before actually needed for ITER ! Reference – P. DeVries, et al. (2015)

CURRENT CHALLENGES FOR DEEP LEARNING/AI STUDIES: • Disruption Prediction & Avoidance Goals include: (i) improve physics fidelity via development of new ML multi-D, time-dependent software including improved classifiers; (ii) develop “portable” (cross-machine) predictive software beyond JET to other devices and eventually ITER; and (iii) enhance accuracy & speed of disruption analysis for very large datasets via HPC à TECHNICAL FOCUS: development & deployment of advanced Machine Learning Software via Deep Learning/AI Neural Networks • both Convolutional & Recurrent Neural Nets included in Princeton’s “Fusion • Recurrent Neural Net (FRNN) Software • Julian Kates-Harbeck (chief architect) •

CLASSIFICATION ● Binary Classification Problem: ○ Shots are Disruptive or Non-Disruptive ● Supervised ML techniques: ○ Domain fusion physicists combine knowledge base of observationally validated information with advanced statistical/Machine Learning predictive methods. ● Machine Learning Methods Engaged: Shallow Learning “SVM” approach initiated by JET team with “APODIS” software has led now to Princeton’s New Deep Learning Fusion Recurrent Neural Net (FRNN) code including both Convolutional & Recurren t NN) ● Challenge: → Multi-D data analysis requires new signal representations; → FRNN software’s Convolutional Neural Nets (CNN) enables – for first time – capability to deal with dimensional (beyond Zero-D) data

SVM Approach: W.H Press. Numerical Recipes, 2007: “The Art of Scientific Computing” 14 Feature vectors are extracted from raw time series data • 7 signals* (O7) x 2 representations + *Signals: (“ZERO-D Time Traces) + Representations: 1. Plasma current [A] 1. Mean 2. Mode lock amplitude [T] 2. Standard deviation of positive 3. Plasma density [m -3 ] FFT spectrum (excluding first 4. Radiated power [W] component) 5. Total input power [W] 6. d/dt Stored Diamagnetic Energy [W] 7. Plasma Internal Inductance Feature vectors are remapped to higher-D space à “hyper-plane” maximizing distance between classes of points

APODIS (“Advanced Predictor of Disruptions”): Multi-tiered SVM Code ➔ separate SVM models trained for separate consecutive time intervals preceding disruption Reference: J. Vega et al . Fusion Engineering and Design, 88 (2013) + refs. cited therein Incoming real-time data BUT – UNABLE TO DEAL WITH 1D PROFILE SIGNALS !

Background/Approach for DL/AI • Deep Learning Method: distributed data-parallel approach to train deep neural networks à Python Framework using high-level Keras library with Google Tensorflow backend Reference : Deep Learning with Python, François Chollet (Nov. 2017, 384 pages) *** Major contrast with “Shallow Learning” approaches including SVM’s, Random Forests, Single Layer Neural Nets, & modern Stochastic Gradient Boosting (“XG-BOOST”) methods by enabling moving from ML software deployment on clusters to supercomputers : à Titan (ORNL), Summit (ORNL); Tsubame-3 (TiTech); Piz Daint (CSCS); .. Also other architectures, e.g. – Intel Systems: KNL currently + promising new future designs -- stochastic gradient descent (SGD) used for large-scale (i.e., optimization on supercomputers) with parallelization via mini-batch training to reduce communication costs -- DL Supercomputer Challenge : need large-scale scaling studies to examine if convergence rate saturates with increasing mini-batch size (to thousands of GPU’s)

François Chollet M A N N I N G

Machine Learning Workflow Identify Preprocessing Train model, Use model for Signals and feature Normalization Hyper parameter prediction • Classifiers extraction tuning All data placed on appropriate Princeton/PPPL DL numerical scale ~ O(1) Apply ML/DL software on predictions now advancing e.g., Data-based with all new data to multi-D time trace signals divided by their signals (beyond zero-D) standard deviation • All available data analyzed; Measured sequential data • Train LSTM (Long Short Term arranged in patches of Memory Network) iteratively; equal length for training • Evaluate using ROC (Receiver Operating Characteristics) and cross-validation loss for every epoch (equivalent of entire data set for each iteration)

JET Disruption Data # Shots Disruptive Nondisruptive Totals Carbon Wall 324 4029 4353 JET produces ~ Terabyte (TB) of Beryllium 185 1036 1221 Wall (ILW) data per day Totals 509 5065 5574 JET studies à 7 Signals of zero-D Data Size (GB) (scalar) time traces, including ~55 GB data Plasma Current 1.8 collected from Mode Lock Amplitude 1.8 each JET shot Plasma Density 7.8 Radiated Power 30.0 ➔ Well over 350 TB total Total Input Power 3.0 amount with multi- d/dt Stored Diamagnetic Energy 2.9 dimensional data just recently being analyzed Plasma Internal Inductance 3.0

Deep Recurrent Neural Networks (RNNs): Basic Description ● “ Deep ” ○ Learn salient representation of complex, higher dimensional data ● “ Recurrent ” ○ Output h (t) depends on input x (t) & internal state s (t-1) Internal State ( “ memory/context ” ) Image adapted from: colah.github.io

Deep Learning/AI FRNN Software Schematic Alarm Alarm Alarm > Threshold? Output: Disruption coming? Output Output Output FRNN Architecture: • LSTM FRNN FRNN FRNN • 3 layers Internal • 300 cells per layer State Signals Signals Signals Signals: • Plasma Current • Locked Mode Amplitude • Plasma Density 0D signals 1D 0D signals 1D 0D signals 1D • Internal Inductance • Input Power CNN CNN CNN • Radiated Power • Internal Energy 1D signals 1D signals 1D signals • 1D profiles (electron temperature, density) • … T = t T = 0 [ms] T = 1

FRNN Code PERFORMANCE: ROC CURVES JET ITER-like Wall Cases @30ms before Disruption Performance Tradeoff: Tune True Positives (good: correctly caught disruption) vs. False Positives (bad: safe shot incorrectly labeled disruptive). TP: 93.5% FP: 7.5% TP: 90.0% FP: 5.0% ROC Area: 0.96 Data (~50 GB), 0D signals: • Training: on 4100 shots from JET C-Wall campaigns • Testing 1200 shots from Jet ILW campaigns • All shots used , no signal filtering or removal of shots

RNNs: HPC Innova.ons Engaged GPU training ● Neural networks use dense tensor manipulations, efficient use of GPU FLOPS ● Over 10x speedup better than multicore node training (CPU’s) Distributed Training via MPI Linear scaling: ● Key benchmark of “time to accuracy”: we can train a model that achieves the same results nearly N times faster with N GPUs Scalable ● to 100s or >1000’s of GPU’s on Leadership Class Facilities ● TB’s of data and more ● Example: Best model training time on full dataset (~40GB, 4500 shots) of 0D signals training ○ SVM (JET) : > 24hrs ○ RNN ( 20 GPU’s) : ~40min

Scaling Summary Communication: each batch of data requires time for synchronization Runtime: computation time Parallel Efficiency

FRNN Scaling Results on GPU’s • Tests on OLCF Titan CRAY supercomputer – OLCF DD AWARD : Enabled Scaling Studies on Titan currently up to 6000 GPU’s – Total ~ 18.7K Tesla K20X Kepler GPUs Tensorflow+MPI

Accelerated Deep Learning Discovery in Fusion Energy Science William - PowerPoint PPT Presentation

Accelerated Deep Learning Discovery in Fusion Energy Science William M. Tang Princeton University/Princeton Plasma Physics Laboratory (PPPL) NVIDIA GPU TECHNOLOGY CONFERENCE GTC-2018 San Jose, CA March 19 , 2018 Co-authors: Julian

Probabilistic and Model Fusion: . . . Model Fusion: . . . Interval Uncertainty Model Fusion:

Update on the Fusion Update on the Fusion Energy Sciences Program Energy Sciences Program Ed

Fusion Nothing But The Truth Fusion Orbotech s True Commitment To The PCB Industry Overall

High resolution image fusion via fusion frames Shidong Li San Francisco State University

Update of Magnetic Fusion Energy Research Brian A. Nelson for the UW Fusion Energy Research Group

October 2016 October 2016 WHAT IS FUSION? TWO FUSION TYPES NEUTRONIC ANEUTRONIC TWO

Modeling with MOSEK Fusion Ulf Worse INFORMS Minneapolis October 5 2013 http://www.mosek.com

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Oncentra Prostate Image Fusion Josh Mason Oncentra Prostate Image Fusion Multiple image

' COLD FUSION ' Byron New Energy COLD FUSION MEETING NEW REVOLUTIONARY GREEN TECHNOLOGY

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

Fusion Nuclear Science and Technology (FNST) Fusion Nuclear Science and Technology (FNST)

Some Topics in Stochastic Partial Differential Equations Tadahisa Funaki University of Tokyo

Incorporating Detractors into SVM Marcin Orchel AGH University of Science and Technology Marcin

Deep Machine Learning on GPUs Seminar talk | Daniel Schlegel | 28.01.2015 University of

Efficient Multiple Kernel Learning Lei Tang Outline What is Kernel Learning? Whats the

Outline Introduction Inter Mode Decision in Transcoding Using SVM Experimental Results

Progress in the implementation of ICP interim activities and 2017 cycle Second Governing Board

General Data Protection Regulation (GDPR) What does it mean for us? This is not A full

IASC Presentation IFRC Observer IFRC Observer The 31 st International Conference of the