Monte Carlo Methods and Neural Networks Noah Gamboa and Alexander - PowerPoint PPT Presentation

Monte Carlo Methods and Neural Networks Noah Gamboa and Alexander Keller

Neural Networks Fully connected layers ⌅ neurons a 0 , 0 a L , 0 a 1 , 0 a 2 , 0 a 0 , 1 a L , 1 a 1 , 1 a 2 , 1 a 0 , 2 a L , 2 a 1 , 2 a 2 , 2 . . . . . . . . . . . . a 0 , n 0 � 1 a L , n L � 1 a 1 , n 1 � 1 a 2 , n 2 � 1 NVIDIA Confidential 2

Neural Networks Fully connected layers ⌅ neurons compute max { 0 , ∑ j w j a i , j } a 0 , 0 a L , 0 a 1 , 0 a 2 , 0 a 0 , 1 a L , 1 a 1 , 1 a 2 , 1 a 2 , 1 a 0 , 2 a L , 2 a 1 , 2 a 2 , 2 . . . . . . . . . . . . a 0 , n 0 � 1 a L , n L � 1 a 1 , n 1 � 1 a 2 , n 2 � 1 NVIDIA Confidential 2

Monte Carlo Methods all over Neural Networks Examples ⌅ drop out ⌅ drop connect ⌅ stochastic binarization ⌅ stochastic gradient descent ⌅ fixed pseudo-random matrices for direct feedback alignment ⌅ ... NVIDIA Confidential 3

Monte Carlo Methods all over Neural Networks Observations ⌅ the brain – about 10 11 nerve cells with to up to 10 4 connections to others – much more energy efficient than a GPU NVIDIA Confidential 4

Monte Carlo Methods all over Neural Networks Observations ⌅ the brain – about 10 11 nerve cells with to up to 10 4 connections to others – much more energy efficient than a GPU ⌅ artificial neural networks – rigid layer structure – expensive to scale in depth – partially trained fully connected NVIDIA Confidential 4

Monte Carlo Methods all over Neural Networks Observations ⌅ the brain – about 10 11 nerve cells with to up to 10 4 connections to others – much more energy efficient than a GPU ⌅ artificial neural networks – rigid layer structure – expensive to scale in depth – partially trained fully connected ⌅ goal: explore algorithms linear in time and space NVIDIA Confidential 4

Partition instead of Dropout

Partition instead of Dropout Guaranteeing coverage of neural units ⌅ so far: dropout neuron if threshold t > ξ – ξ by linear feedback register generator (for example) NVIDIA Confidential 6

Partition instead of Dropout Guaranteeing coverage of neural units ⌅ so far: dropout neuron if threshold t > ξ – ξ by linear feedback register generator (for example) ⌅ now: assign neuron to partition p = b ξ · P c out of P – less random number generator calls – all neurons guaranteed to be considered NVIDIA Confidential 6

Partition instead of Dropout Guaranteeing coverage of neural units ⌅ so far: dropout neuron if threshold t > ξ – ξ by linear feedback register generator (for example) ⌅ now: assign neuron to partition p = b ξ · P c out of P – less random number generator calls – all neurons guaranteed to be considered LeNet on MNIST Average of t = 1 / 2 to 1 / 9 dropout Average of P = 2 to 9 partitions Mean accuracy 0.6062 0.6057 StdDev accuracy 0.0106 0.009 NVIDIA Confidential 6

Partition instead of Dropout Training accuracy with LeNet on MNIST 0 . 6 Accuracy 0 . 4 2 dropout partitions 1/2 dropout 0 . 2 0 20 40 60 80 100 120 140 Epoch of Training NVIDIA Confidential 7

Partition instead of Dropout Training accuracy with LeNet on MNIST 0 . 6 Accuracy 0 . 4 3 dropout partitions 1/3 dropout 0 . 2 0 20 40 60 80 100 120 140 Epoch of Training NVIDIA Confidential 7

Simulating Discrete Densities

Simulating Discrete Densities Stochastic evaluation of scalar product ⌅ discrete density approximation of the weights 0 1 NVIDIA Confidential 9

Simulating Discrete Densities Stochastic evaluation of scalar product ⌅ discrete density approximation of the weights 0 1 – remember to flip sign accordingly NVIDIA Confidential 9

Simulating Discrete Densities Stochastic evaluation of scalar product ⌅ discrete density approximation of the weights 0 1 – remember to flip sign accordingly – transform jittered equidistant samples using cumulative distribution function of absolute value of weights NVIDIA Confidential 9

Simulating Discrete Densities Stochastic evaluation of scalar product ⌅ partition of unit interval by sums P k := ∑ k j = 1 | w j | of normalized absolute weights 0 = P 0 < P 1 < ··· < P m = 1 NVIDIA Confidential 10

Simulating Discrete Densities Stochastic evaluation of scalar product ⌅ partition of unit interval by sums P k := ∑ k j = 1 | w j | of normalized absolute weights 0 = P 0 < P 1 < ··· < P m = 1 – using a uniform random variable ξ 2 [ 0 , 1 ) we find select neuron i , P i � 1  ξ < P i satisfying Prob ( { P i � 1  ξ < P i } ) = | w i | NVIDIA Confidential 10

Simulating Discrete Densities Stochastic evaluation of scalar product ⌅ partition of unit interval by sums P k := ∑ k j = 1 | w j | of normalized absolute weights 0 = P 0 < P 1 < ··· < P m = 1 – using a uniform random variable ξ 2 [ 0 , 1 ) we find select neuron i , P i � 1  ξ < P i satisfying Prob ( { P i � 1  ξ < P i } ) = | w i | – transform jittered equidistant samples using cumulative distribution function of absolute value of weights NVIDIA Confidential 10

Simulating Discrete Densities Stochastic evaluation of scalar product ⌅ partition of unit interval by sums P k := ∑ k j = 1 | w j | of normalized absolute weights 0 = P 0 < P 1 < ··· < P m = 1 – using a uniform random variable ξ 2 [ 0 , 1 ) we find select neuron i , P i � 1  ξ < P i satisfying Prob ( { P i � 1  ξ < P i } ) = | w i | – transform jittered equidistant samples using cumulative distribution function of absolute value of weights ⌅ in fact derivation of quantization to weights in { � 1 , 0 , + 1 } – integer weights if a neuron referenced more than once – explains why ternary and binary did not work in some articles – relation to drop connect and drop out, too NVIDIA Confidential 10

Simulating Discrete Densities Test accuracy for two layer ReLU feedforward network on MNIST ⌅ able to achieve 97% of accuracy of model by sampling most important 12% of weights! 1 Test Accuracy 0 . 98 0 . 96 0 . 94 0 100 200 300 400 500 600 Number of Samples per Neuron NVIDIA Confidential 11

Simulating Discrete Densities Application to convolutional layers ⌅ sample from distribution of filter (for example, 128x5x5 = 3200) – less redundant than fully connected layers ⌅ LeNet Architecture on CIFAR-10, best accuracy is 0.6912 ⌅ able to get 88% of accuracy of full model at 50% sampled NVIDIA Confidential 12

Simulating Discrete Densities Test accuracy for LeNet on CIFAR-10 0 . 6 Test Accuracy 0 . 4 0 500 1 , 000 1 , 500 2 , 000 2 , 500 3 , 000 Number of Samples per Filter NVIDIA Confidential 13

Neural Networks linear in Time and Space

Neural Networks linear in Time and Space Number n of neural units ⌅ for L fully connected layers L ∑ n = n l l = 1 where n l is the number of neurons in layer l NVIDIA Confidential 15

Neural Networks linear in Time and Space Number n of neural units ⌅ for L fully connected layers L ∑ n = n l l = 1 where n l is the number of neurons in layer l ⌅ number of weights L ∑ n w = n l � 1 · n l l = 1 NVIDIA Confidential 15

Neural Networks linear in Time and Space Number n of neural units ⌅ for L fully connected layers L ∑ n = n l l = 1 where n l is the number of neurons in layer l ⌅ number of weights L ∑ n w = n l � 1 · n l l = 1 ⌅ choose number of weights per neuron such that n proportional to n w – for example, constant number n w of weights per neuron NVIDIA Confidential 15

Neural Networks linear in Time and Space Results 1 0 . 8 Test Accuracy 0 . 6 0 . 4 0 . 2 LeNet on MNIST LeNet on CIFAR-10 0 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1 Percent of FC layers sampled NVIDIA Confidential 16

Neural Networks linear in Time and Space Test accuracy for AlexNet on CIFAR-10 1 0 . 8 Test Accuracy 0 . 6 0 . 4 0 . 2 0 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1 Percent of FC layers sampled NVIDIA Confidential 17

Neural Networks linear in Time and Space Test accuracy for AlexNet on ILSVRC12 1 0 . 8 Test Accuracy 0 . 6 0 . 4 0 . 2 Top-5 Accuracy Top-1 Accuracy 0 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1 Percent of FC layers sampled NVIDIA Confidential 18

Neural Networks linear in Time and Space Sampling paths through networks ⌅ complexity bounded by number of paths times depth ⌅ strong indication of relation to Markov chains ⌅ importance sampling by weights NVIDIA Confidential 19

Neural Networks linear in Time and Space Sampling paths through networks ⌅ sparse from scratch a 0 , 0 a L , 0 a 1 , 0 a 2 , 0 a 0 , 1 a L , 1 a 1 , 1 a 2 , 1 a 0 , 2 a L , 2 a 1 , 2 a 2 , 2 . . . . . . . . . . . . a 0 , n 0 � 1 a L , n L � 1 a 1 , n 1 � 1 a 2 , n 2 � 1 NVIDIA Confidential 20

Neural Networks linear in Time and Space Sampling paths through networks ⌅ sparse from scratch a 0 , 0 a L , 0 a 1 , 0 a 2 , 0 a 0 , 1 a L , 1 a 1 , 1 a 2 , 1 a 2 , 1 a 0 , 2 a L , 2 a 1 , 2 a 2 , 2 . . . . . . . . . . . . a 0 , n 0 � 1 a L , n L � 1 a 1 , n 1 � 1 a 2 , n 2 � 1 NVIDIA Confidential 20

Monte Carlo Methods and Neural Networks Noah Gamboa and Alexander - PowerPoint PPT Presentation

Monte Carlo Methods and Neural Networks Noah Gamboa and Alexander Keller Neural Networks Fully connected layers neurons a 0 , 0 a L , 0 a 1 , 0 a 2 , 0 a 0 , 1 a L , 1 a 1 , 1 a 2 , 1 a 0 , 2 a L , 2 a 1 , 2 a 2 , 2 . . . . . . . . .

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Methods for physically based Volume rendering Monte Carlo Methods for physically based

Monte Carlo methods for volumetric light transport Monte Carlo methods for volumetric light

Monte Carlo Methods An introduction to Monte Carlo (MC) methods How to use MC methods

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

Monte Carlo Methods and Area Estimates CS3220 - Summer 2008 Jonathan Kaldor Monte Carlo Methods

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Methods Monte Carlo Methods I, at any rate, am convinced that He does not throw dice.

Bayesian and Non-Bayesian Analysis of Soccer Data using Bivariate Poisson Regression Models

Discrete Element Method in STAR - CCM+ Material Presented at STAR Japanese Conference 2012 By Oleh

Factorization Theorem Lecture 02 Biostatistics 602 - Statistical Inference . Summary . .

Advanced Section #2: Optimal Transport AC 209B: Data Science 2 Javier Zazo Pavlos Protopapas

A Vector-Space Approach for Stochastic Finite Element Analysis S Adhikari 1 1 Swansea University,

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters

Rates of f Estimation for Dis iscrete Determinantal Point Processes V.-E. Brunel, A. Moitra, P.

A Cantelli-type inequality for constructing non-parametric p-boxes based on exchangeability

Monte Carlo Methods and Neural Networks Noah Gamboa and Alexander - PowerPoint PPT Presentation

Monte Carlo Methods and Neural Networks Noah Gamboa and Alexander Keller Neural Networks Fully connected layers neurons a 0 , 0 a L , 0 a 1 , 0 a 2 , 0 a 0 , 1 a L , 1 a 1 , 1 a 2 , 1 a 0 , 2 a L , 2 a 1 , 2 a 2 , 2 . . . . . . . . .

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Methods for physically based Volume rendering Monte Carlo Methods for physically based

Monte Carlo methods for volumetric light transport Monte Carlo methods for volumetric light

Monte Carlo Methods An introduction to Monte Carlo (MC) methods How to use MC methods

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&amp;B 5.3-5.5, 5.7 Lecture Outline 1.

Monte Carlo Methods and Area Estimates CS3220 - Summer 2008 Jonathan Kaldor Monte Carlo Methods

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Methods Monte Carlo Methods I, at any rate, am convinced that He does not throw dice.

Bayesian and Non-Bayesian Analysis of Soccer Data using Bivariate Poisson Regression Models

Discrete Element Method in STAR - CCM+ Material Presented at STAR Japanese Conference 2012 By Oleh

Factorization Theorem Lecture 02 Biostatistics 602 - Statistical Inference . Summary . .

Advanced Section #2: Optimal Transport AC 209B: Data Science 2 Javier Zazo Pavlos Protopapas

A Vector-Space Approach for Stochastic Finite Element Analysis S Adhikari 1 1 Swansea University,

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters

Rates of f Estimation for Dis iscrete Determinantal Point Processes V.-E. Brunel, A. Moitra, P.

A Cantelli-type inequality for constructing non-parametric p-boxes based on exchangeability

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.