Deep Learning Acceleration of Progress Toward Delivery of Fusion - PowerPoint PPT Presentation

Deep Learning Acceleration of Progress Toward Delivery of Fusion Energy William M. Tang Princeton University/Princeton Plasma Physics Laboratory (PPPL) GPU TECHNOLOGY CONFERENCE -- GTC-2017 San Jose, California May 10, 2017 Co-authors: Julian Kates-Harbeck, Alexey Svyatkovskiy, Kyle Felker, Eliot Feibush, Michael Churchill

CNN’s “MOONSHOTS for 21 st CENTURY” (Hosted by Fareed Zakaria) – Five segments (broadcast in Spring, 2015 on CNN) exploring “exciting futuristic endeavors in science & technology” in the 21 st century (1) Human Mission to Mars (2) 3D Printing of a Human Heart (3) Creating a Star on Earth: Quest for Fusion Energy (4) Hypersonic Aviation (5) Mapping the Human Brain CNN Moonshots Series: “Creating a Star on Earth” à “takes a fascinating look at how harnessing the energy of nuclear fusion reactions may create a virtually limitless energy source.”

Application Domain: MAGNETIC FUSION ENERGY (MFE) plasma magnets magnetic field “Tokamak” Device ITER ~$25B facility located in France & involving 7 governments representing over half of world ’ s population à dramatic next-step for Magnetic Fusion Energy (MFE) producing a sustained burning plasma -- Today: 10 MW(th) for 1 second with gain ~1 -- ITER: 500 MW(th) for >400 seconds with gain >10

SITUATION ANALYSIS Most critical problem for MFE: avoid/mitigate large-scale major disruptions • Approach: Use of big-data-driven statistical/machine-learning (ML) predictions for the occurrence of disruptions in EUROFUSION facility “Joint European Torus (JET)” • Current Status: ~ 8 years of R&D results (led by JET) using Support Vector Machine (SVM) ML on zero-D time trace data executed on CPU clusters yielding ~ reported success rates in mid-80% range for JET 30 ms before disruptions , BUT > 95% with false alarm rate < 3% actually needed for ITER (Reference – P. DeVries, et al. (2015) • Princeton Team Goals include: (i) improve physics fidelity via development of new ML multi-D , time-dependent software including better classifiers; (ii) develop “portable” (cross-machine) predictive software beyond JET to other devices and eventually ITER; and (iii) enhance execution speed of disruption analysis for very large datasets à development & deployment of advanced ML software via Deep Learning Recurrent Neural Networks

Plasma Disruption Characteristics Large-scale macroscopic instabilities: • Loss of confinement – ends fusion reaction • Intense radiation – damaging concentration in small areas • Current quench – produces high magnetic forces Time Scale: Milliseconds (ms) à Need at least 30ms warning to mitigate à accurate/rapid prediction is necessary Consequences : More severe with higher volume to surface area ratio à ITER cannot tolerate disruptions at maximum current ! Present Day Approaches : Hypothesis-based first principles simulations; simple statistical/threshold models with regression analysis; and “Shallow Machine Learning ” (e.g. small NNs, SVM, Random Forests, ….)

Challenges & Opportuni2es Higher Dimensional Signals • At each timestep: arrays instead of scalars • All as a function of ρ (normalized flux surface) • Examples: – 1D Current profiles – 1D Electron temperature profiles – 1D Radiation profiles ρ = 1 ρ = 0 Mazon, Didier, Christel Fenzi, and Roland Sabot. "As hot as it gets." Nature Physics 12.1 (2016): 14-17.

Challenges & Opportunities Signal Normalization & Outlier Detection • All signals placed on appropriate numerical scale ~ O(1) • Rescale signals from different experimental systems (tokamaks) such that the same “meaning” of the signal on the various machines gets mapped to the same numerical value after re-scaling Approaches: Physics-based (e.g. density divided by empirical “Greenwald Density Limit” ) Data-based (e.g. all signals are divided by their standard deviation ) Challenge: Need rapid training time to determine best approach from these options

DEEP LEARNING RECURRENT NEURAL NETS (RNN) APPROACH Julian Kates-Harbeck, DOE CSGF Fellow from Harvard U. → Rapid development of new GPU-compatible predictive software with results benchmarked vs. those from extensive SVM analysis Most Promising Approach to Analysis of Higher Dimensional Signals via Deep Learning RNN with rapid training 1D Targets: (i) radial temperature profiles; (ii) density profiles; & (iii) radiation profiles DL RNN Benefits : -- Captures more physics to improve predictive accuracy -- Rapid progress toward addressing challenges of more data and longer training time → modern HPC training (e.g., via GPU’s & MPI) -- Neural Networks able to efficiently extract salient physics features from higher-D data -- Associated timely improvements in accuracy of ML/DL predictions

CLASSIFICATION ● Binary Classification Problem: ○ Shots are Disruptive (D) or Non-Disruptive (ND) ● Supervised ML techniques: ○ Physics domain scientists combine knowledge base of observationally validated information with advanced statistical/ML predictive methods. Shots can be labeled D/ND retrospectively. ● Machine Learning (ML) Methods Engaged: Basic SVM approach initiated by JET team leading to APODIS software; à enabled efficient, rapid progress toward development & deployment at PPPL of: New Deep Learning Recurrent Neural Net (stacked LSTM) software ● Approach: (i) examine appropriately normalized data; (ii) use training set to generate model; (iii) use trained model to classify new samples → Targeted multi-D data analysis requires new signal representations

Machine Learning Workflow Identify Preprocessing Train model, Use model for Signals and feature Normalization Hyper parameter prediction • Classifiers extraction tuning All data placed on appropriate Princeton/PPPL DL numerical scale ~ O(1) Apply ML/DL software on predictions now advancing e.g., Data-based with all new data to multi-D time trace signals divided by their signals (beyond zero-D) standard deviation • All available data analyzed; Measured sequential data • Train LSTM (Long Short Term arranged in patches of Memory Network) iteratively; equal length for training • Evaluate using ROC (Receiver Operating Characteristics) and cross-validation loss for every epoch (equivalent of entire data set for each iteration)

JET Disruption Data # Shots Disruptive Nondisruptive Totals Carbon Wall 324 4029 4353 JET produces ~ Terabyte (TB) of Beryllium 185 1036 1221 Wall (ILW) data per day Totals 509 5065 5574 Sample 7 Signals of zero-D time Data Size (GB) traces (07) ~55 GB data Plasma Current 1.8 collected from Mode Lock Amplitude 1.8 each JET shot Plasma Density 7.8 Radiated Power 30.0 ➔ Well over 350 TB total Total Input Power 3.0 amount with multi- d/dt Stored Diamagnetic Energy 2.9 dimensional data yet to be analyzed Plasma Internal Inductance 3.0

Deep Recurrent Neural Networks (RNNs): Basic Description ● “Deep” ○ Hierarchical representation of complex data, building up salient features automatically ○ Obviating the need for hand tuning, feature engineering, and feature selection ● “Recurrent” ○ Natural notion of time and memory à i.e., at every time-step, the output depends on ■ Last Internal state “s(t-1)” Recurrence! ■ Current input x(t) ○ The internal state can act as memory and accumulate information of what has happened in the past Internal State (“memory/ context”) Image adapted from: colah.github.io

FRNN (“Fusion Recurrent Neural Net”) Code Performance (ROC Plot) Performance Tradeoff: Tune True Positives (good: correctly caught disruption) vs. False Positives (bad: safe shot incorrectly labeled disruptive). RNN Data: ● Testing 1200 shots from Jet ILW campaigns (C28-C30) ● All shots used , no True Posi8ves: 93.5% signal filtering or False Posi8ves: 7.5% removal of shots Jet SVM* work: ● 990 shots from same campaigns True Posi8ves: 90.0% False Posi8ves: 5.0% ● Filtering of signals, ad hoc removal of shots with abnormal signals ● TP 80 to 90%, FP 5% *Vega, Jesús, et al. "Results of the JET real-time disruption predictor in the ITER-like wall campaigns." Fusion Engineering and Design 88.6 (2013): 1228-1231.

RNNs: HPC Innova2ons Engaged GPU training ● Neural networks use dense tensor manipulations, efficient use of GPU FLOPS ● Over 10x speedup better than multicore node training (CPU’s) Distributed Training via MPI Linear scaling: ● Key benchmark of “time to accuracy”: we can train a model that achieves the same results nearly N times faster with N GPUs Scalable ● to 100s or >1000’s of GPU’s on Leadership Class Facilities ● TB’s of data and more ● Example: Best model training time on full dataset (~40GB, 4500 shots) of 0D signals training ○ SVM (JET) : > 24hrs ○ RNN ( 20 GPU’s) : ~40min

Deep Learning Acceleration of Progress Toward Delivery of Fusion - PowerPoint PPT Presentation

Deep Learning Acceleration of Progress Toward Delivery of Fusion Energy William M. Tang Princeton University/Princeton Plasma Physics Laboratory (PPPL) GPU TECHNOLOGY CONFERENCE -- GTC-2017 San Jose, California May 10, 2017 Co-authors: Julian

1 Deep Learning Acceleration of the Boosted Higgs Program and HEP Computing Nhan Tran, Wilson

Deep Learning Acceleration via Low Precision Computing Zha Zhaoxia (S (Summer) r) Deng AI AI

Tsing nghua hua University versity Introduction Deep learning has widely used in lots of

GPYTORCH : BLACKBOX MATRIX- MATRIX GAUSSIAN PROCESS INFERENCE WITH GPU ACCELERATION Jacob R.

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // VENKATA KISHORE PATCHA Lecture#16 :

Continuous Delivery for AI applications Machine Learning Significantly improve many applications

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Recent Progress of Laser Ion Acceleration at Peking University C. Lin, J. Q. Yu, W. J. Ma, Q.

What is Schoology? Delivery of online course & online content (blended learning). Safe and

Reproducibility and Replicability in Deep Reinforcement Learning (and Other Deep Learning

Image Classification with DIGITS NVIDIA Deep Learning Institute 1 DEEP LEARNING INSTITUTE DLI

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Natural Language Processing with Deep Learning CS224N The Future of Deep Learning + NLP Kevin

An Overview of Deep Residual Learning Semih Yagcioglu 01.03.2016 Deep Residual Learning

July 9-10, 2020 Progress of GPU Acceleration Module in nTRACER for Cycle Depletion Han Gyu Lee,

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Mocha.jl Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, MIT Deep Learning Learning

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Learning: State of the Art (2020) Deep Learning Lecture Series https://deeplearning.mit.edu

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Regularization for Deep Learning Lecture slides for Chapter 7 of Deep Learning

Deep Learning Acceleration of Progress Toward Delivery of Fusion - PowerPoint PPT Presentation

Deep Learning Acceleration of Progress Toward Delivery of Fusion Energy William M. Tang Princeton University/Princeton Plasma Physics Laboratory (PPPL) GPU TECHNOLOGY CONFERENCE -- GTC-2017 San Jose, California May 10, 2017 Co-authors: Julian

1 Deep Learning Acceleration of the Boosted Higgs Program and HEP Computing Nhan Tran, Wilson

Deep Learning Acceleration via Low Precision Computing Zha Zhaoxia (S (Summer) r) Deng AI AI

Tsing nghua hua University versity Introduction Deep learning has widely used in lots of

GPYTORCH : BLACKBOX MATRIX- MATRIX GAUSSIAN PROCESS INFERENCE WITH GPU ACCELERATION Jacob R.

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // VENKATA KISHORE PATCHA Lecture#16 :

Continuous Delivery for AI applications Machine Learning Significantly improve many applications

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Recent Progress of Laser Ion Acceleration at Peking University C. Lin, J. Q. Yu, W. J. Ma, Q.

What is Schoology? Delivery of online course &amp; online content (blended learning). Safe and

Reproducibility and Replicability in Deep Reinforcement Learning (and Other Deep Learning

Image Classification with DIGITS NVIDIA Deep Learning Institute 1 DEEP LEARNING INSTITUTE DLI

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Natural Language Processing with Deep Learning CS224N The Future of Deep Learning + NLP Kevin

An Overview of Deep Residual Learning Semih Yagcioglu 01.03.2016 Deep Residual Learning

July 9-10, 2020 Progress of GPU Acceleration Module in nTRACER for Cycle Depletion Han Gyu Lee,

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Mocha.jl Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, MIT Deep Learning Learning

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Learning: State of the Art (2020) Deep Learning Lecture Series https://deeplearning.mit.edu

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Regularization for Deep Learning Lecture slides for Chapter 7 of Deep Learning

What is Schoology? Delivery of online course & online content (blended learning). Safe and