GENERALIZABLE AI: A NEW FOUNDATION ANIMA ANANDKUMAR TRINITY OF AI - PowerPoint PPT Presentation

GENERALIZABLE AI: A NEW FOUNDATION ANIMA ANANDKUMAR

TRINITY OF AI ALGORITHMS COMPUTE DATA 2

IMPRESSIVE GROWTH OF AI Wide range of domains Deep reinforcement learning NVIDIA GAN generates photo- beats human champion realistic images, passes Turing test

BUT NOTABLE FAILURES AI is not living up to its hype Safety-critical applications Language understanding How do we fix these gaps in deep learning?

PATH TO GENERALIZABLE AI 5

INGREDIENTS OF AN AI ALGORITHM Task + AI Data Priors = + + Algorithm Action Learning Decision making

DEEP LEARNING STATUS QUO AI Algorithm = • Massive datasets Data hungry Data Expensive human labeling • + Priors + Task

DEEP LEARNING STATUS QUO AI Algorithm = Data + Easy to fool current models • Not robust Priors Not domain specific • + Task

DEEP LEARNING STATUS QUO AI Algorithm = Data + Priors + Fixed tasks • Simplistic Task • Limited benchmarks

NEXT FRONTIER IN AI AI Algorithm = Disentanglement learning • Unsupervised Data Domain adaptation • + Recurrent feedback • Priors Robust Domain knowledge • Compositionality • + Multi-task & domains • Adaptive Task • Online and continual learning

BRAIN-INSPIRED ARCHITECTURES WITH RECURRENT FEEDBACK Sihui Yujia James Tan Doris Zhiding Huang Dai Gornet Nyugen A Tsao Yu 12

THE HUMAN BRAIN IS HIERARCHICAL Adapted from Journal of Vision (2013), 13 , 10 13

HUMAN VISION IS ROBUST 14

THE BRAIN IS BAYESIAN 15

COMBINING CLASSIFIER AND GENERATOR THROUGH FEEDBACK CONNECTIONS 16

GENERATIVE VS DISCRIMINATIVE CLASSIFIER 𝑞 𝑦, 𝑧 → 𝑞(𝑧|𝑦) 𝑞 𝑦, 𝑧, 𝑨 → 𝑞(𝑧|𝑦, 𝑨) Logistic Regression Gaussian Mixture Deconvolutional CNN Generative Model 17

MESSAGE PASSING NETWORK Feedfoward layers Feedback layers Latent variables Soft label Image Feedforward CNN Generative CNN-F Feedback feedback 18

SELF-CONSISTENCY THROUGH RECURRENT FEEDBACK Initialization Iteration 1 Iteration 2 … CNN-F 19

CNN-F CAN REPAIR DISTORTED IMAGES WITHOUT SUPERVISION Shot Noise Gaussian Noise Dotted Line Corrupted images Ground-truth 20

CNN-F IMPROVES ADVERSARIAL ROBUSTNESS • Standard training on Fashion-MNIST . • Attack with PGD-40. • CNN-F has higher adversarial robustness than CNN. 21

CNN-F COMBINED WITH ADVERSARIAL TRAINING Adversarial training on Fashion-MNIST . • • Trained with PGD-40 (eps=0.3). Attack with PGD-40. • CNN-F augmented with adversarial images achieves high accuracy for both clean and adversarial data. 22

CNN-F HAS HIGHER BRAIN SCORE Feedback is biologically more plausible 23

TAKE-AWAYS Recurrent generative feedback for robust learning Human brain has feedback pathways for top-down inference Internal generative model of the world Bayesian brain: bottom up feedforward + top down feedback Robustness is inherent in CNN-F Biological plausibility in CNN-F 24

NEURO-SYMBOLIC SYSTEMS FOR COMPOSITIONAL REASONING Forough Sameer A Arabshahi Singh

SYMBOLISTS VS. CONNECTIONISTS Representation Extraction Explainability Generalization & knowledge coverage Extrapolation

TYPES OF TRAINING EXAMPLES sin 2 𝜄 + cos 2 𝜄 = 1 sin −2.5 = −0.6 Decimal Tree for 2.5 Symbolic Expressions Function Evaluation Number Encoding

CONTINUOUS REPRESENTATIONS FOR REASONING 3 + 4 7 × 1.1 Representations of symbols, numbers and functions in 1 − sin 2 (𝜄) common embedding space cos 2 (𝜄) 2.45 2 … 𝜄 … sin cos ×

TREE-LSTM FOR COMPOSITIONALITY

EQUATION VERIFICATION 100% 97% 96% 95% 90% Accuracy 85% 82% 82% 80% 76% 75% 72% 70% 65% GENERALIZATION EXTRAPOLATION Sympy LSTM Tree-LSTM 30

AUGMENTING WITH STACK MEMORY Differentiable memory for extrapolation to harder examples Train: Test: Depth 1-7 Depth 8-13

TAKE-AWAYS Neuro-symbolic systems for compositional learning Math reasoning tasks Combine symbolic expressions and numerical data Generalizable and composable representation of functions Differentiable memory stack for extrapolation to harder examples

AI4SCIENCE ROLE OF PRIORS

Learning Data Priors = + How to use structure and domain knowledge to design Priors? Examples of Priors Tensors and graphs • • Laws of nature Simulations • 34

fly T’ firs floor AUTONOMOUS DYNAMIC ROBOTS AT CAST, CALTECH flying flight field flo flo flo ’ configuration infinite flo field. fields flo field fit figurable confines infinite to– flapping flying 35 flat floors Chung’ ’ Profile

LEARNING RESIDUAL DYNAMICS FOR DRONE LANDING 𝑔 = nominal dynamics ሚ 𝑔 = learned dynamics Current Action (aka control input) New State 𝑡 𝑢+1 = 𝑔 𝑡 𝑢 , 𝑏 𝑢 + ሚ 𝑔 𝑡 𝑢 , 𝑏 𝑢 + 𝜗 Unmodeled Disturbance Current State Our method is • Provably robust and safe • Generalizes to higher landing speeds

CAST @ CALTECH LEARNING TO LAND 37

QUANTUM FEATURES IN CHEMISTRY Pair correlation energy MOB-ML features: universal mapping in chemical space MOB feature value M. Welborn, Z. Qiao, F . R. Manby, A. Anandkumar , T . F . Miller III, arXiv:2007.08026 .

ORBNET: MOB + GRAPH NEURAL NETWORKS M. Welborn, Z. Qiao, F . R. Manby, A. Anandkumar , T . F . Miller III, arXiv:2007.08026 . 39

ORBNET:1000X SIMULATION SPEED-UP Quantum-mechanical accuracy at semi-empirical cost 1 . 0 MP2 CC 6 7 OrbNet 1 3 5 4 2 0 . 8 1. B97-3c 2. PBE-D3(BJ)/Def2-SVP Test data: Drug-like molecules 3. PBE-D3(BJ)/Def2-TZVP ANI-1ccx 4. B3LYP-D3(BJ)/Def2-SVP 5. PBEH-3c with 10-50 heavy atoms 6. B3LYP-D3(BJ)/Def2-TZVP Drug-molecule 7. ωB97X -D3/Def2-TZVP GFN1/GFN2 0 . 6 conformer BATTY ANI-2x stability Zero shot generalization: testing ANI-1x rankings (R 2 ) on molecules ~10x larger Force Field 0 . 4 GFN0 Semiempirical GAFF Machine Learning BAT DFT MMFF94 PM7BoB UFF 0 . 2 10 - 2 10 0 10 2 10 4 Time-to-solution (s) . Miller III, arXiv: 2007.08026 . M. Welborn, Z. Qiao, F . R. Manby, A. Anandkumar , T . F

STATE-OF-ART DATA EFFICIENCY MOB-ML vs. others for water MOB-ML works across solvents MAE Solvent T raining Data (kcal/mol) RMSE *kcal/mol) Benzene 50 0.57 Carbon 50 0.40 tetrachloride Chloroform 40 0.84 Cyclohexane 30 0.44 Diethylether 40 0.58 Hexadecane 40 0.57 Octanol 50 0.84 No. of training molecules MoleculeNet: Chem. Sci. , 2018, 9 , 513; MPNN: J. Chem. Inf. Model. 2019, 59 ,3370

LEARNING FAMILY OF PDE Problems in science and engineering reduce to PDEs. Learning mapping from parameters to output through operator learning

MULTIPOLE GRAPHS • Multi-scale graphs to capture different ranges of interaction • Linear complexity

EXPERIMENTAL RESULTS Graph neural networks for operator learning Super-resolution and generalization within family of PDEs Burgers equation

TAKE-AWAYS Domain knowledge augments deep learning Black-box deep learning is unsuitable for scientific domains Lack of labeled data and robustness Domain knowledge can tailor learning to the problem What is right mix of priors + deep learning?

UNSUPERVISED LEARNING 46

DISENTANGLEMENT LEARNING Learning latent variables that disentangle data 47

DISENTANGLED GENERATION Semi-supervised > learning with very little labeled data Both unsupervised and > supervised auxiliary losses help in disentanglement Semi-supervised learning on 1% Labeled data Semi-supervised learning on 0.5% Labeled data https://sites.google.com/nvidia.com/semi-stylegan 48

SELF-SUPERVISED LEARNING Robust measures of confidence • Data invariances provide supervision • Self training with pseudo-labels • Need confidence measure to select pseudo-labels 49

DOMAIN ADAPTATION THROUGH SELF-TRAINING Source Labels (GTA5) Deep CNN Pseudo-labels (Cityscapes) Pseudo-labels (Cityscapes) at 1 st round Target Images (Cityscapes) at 2 nd round Source Images (GTA5) Network re-training Pseudo-label generation Predictions (Cityscapes) Predictions (Cityscapes) at 1 st round at 2 nd round 50

ANGULAR MEASURE IMPROVES SELF-TRAINING Angular measure of hardness for sample selection for self training https://sites.google.com/nvidia.com/avh 51

TASK ADAPTATION AND GENERALIZATION Hongyu Ren Yuke Zhu Animesh Garg A 52

META-REINFORCEMENT LEARNING ● Agents should be versatile! Given a new task, should quickly adapt ● Each task is complex and requires finishing a sequence of sub-tasks. Task Distribution of tasks State Action Reward Agent 𝜌 𝜄 (𝑏|𝑡) 53

META-REINFORCEMENT LEARNING Task inference is key! Distribution of tasks What is the Unsupervised current task? learning Task State Action Reward Agent 𝜌 𝜄 (𝑏|𝑡) 54

GENERALIZABLE AI: A NEW FOUNDATION ANIMA ANANDKUMAR TRINITY OF AI - PowerPoint PPT Presentation

GENERALIZABLE AI: A NEW FOUNDATION ANIMA ANANDKUMAR TRINITY OF AI ALGORITHMS COMPUTE DATA 2 IMPRESSIVE GROWTH OF AI Wide range of domains Deep reinforcement learning NVIDIA GAN generates photo- beats human champion realistic images,

Are Learning Health System Are Learning Health System Are Learning Health System Are Learning

Auto-conditioned Recurrent Mixture Density Networks for Learning Generalizable Robot

Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly Kevin Zakka, Andy

Introducing The CATS Foundation The CATS Foundation The very beginning www.cats-foundation.org

PLACETO: LEARNING GENERALIZABLE DEVICE PLACEMENT ALGORITHMS FOR DISTRIBUTED MACHINE LEARNING

Designing Generalizable Trials: Why Inclusivity Matters Estelle Russek-Cohen, PhD U.S. Food and

Design Based Research: research, capable of producing generalizable results? The tension

Why formalize? ! ML is tricky, particularly in corner cases Formal Semantics ! generalizable type

Why formalize? n ML is tricky, particularly in corner cases Formal Semantics n generalizable type

TITANIUM EYEWEAR DESIGNED IN ICELAND, MADE IN ITALY AGNAR NEW NEW NEW ALBA NEW NEW NEW

MDM/R DATA ACCESS FOUNDATION (FOUNDATION) Introduction to the Foundation Working Group March 26,

WEDNESDAY 15 February 2017 HGC HERITAGE FOUNDATION HGC HERITAGE FOUNDATION HGC HERITAGE FOUNDATION

School Performance Report 2015-2016 Foundation Phase Regional Performance Foundation Phase

KATIOU FOUNDATION Katiou Foundation The main goal of the Katiou Foundation is to defend and

How to Obtain a Grant From the Rochester Area Foundation Your Community. Your Foundation . What

Introducing the new Predator 68 New Predator 68 New Predator 68 New Predator 68 New Predator 68

Fault attack vulnerability assessment of binary code Cryptography and Security in Computing

Video 3.1 Vijay Kumar and Ani Hsieh Robo3x-1.3 1 Property of Penn Engineering, Vijay Kumar

! X SAS08 Valencia p.8/44 I N C OMPARATIVE S EMANTICS P 1 ; P 2 A = P 1

Introd u ction to animation IN TE R ME D IATE IN TE R AC TIVE DATA VISU AL IZATION W ITH P L

Testing the reachability of (new) address space Steve Uhlig Delft University of Technology

Caching for Data Intensive Scientific Repositories Ani Thakar, Dan Wang, Tanu Malik, Philip

Ani Aprahamian Robustness of observational r-process patterns Uncertainties in

In Defense of Corpus Data Summary from Week 1: Introspective judgments about