Neural Networks - II Henrik I Christensen Robotics & - PowerPoint PPT Presentation

Introduction Mixture Density Networks Bayesian Neural Networks Summary Neural Networks - II Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary Outline Introduction 1 Mixture Density Networks 2 Bayesian Neural Networks 3 Summary 4 Henrik I Christensen (RIM@GT) Neural Networks 2 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary Introduction Last lecture: Neural networks as a layered regression problem Feed-forward networks Linear model with activation functions Global Optimization Coverage of multi-modal networks Bayesian models for neural networks Henrik I Christensen (RIM@GT) Neural Networks 3 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary Motivation The models this far have assumed a Gaussian Distribution How about multi-modal distributions? How about inverse problems Mixture models is one possible solution Henrik I Christensen (RIM@GT) Neural Networks 5 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary Simple Robot Example ( x 1 , x 2 ) ( x 1 , x 2 ) L 2 θ 2 elbow up elbow L 1 θ 1 down Henrik I Christensen (RIM@GT) Neural Networks 6 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary Simple Functional Approximation Example 1 1 0 0 0 1 0 1 Henrik I Christensen (RIM@GT) Neural Networks 7 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary Basic Formulation Objective - approximation of: p ( t | x ) A generic model K � π k ( x ) N ( t | µ k ( x ) , σ 2 p ( t | x ) = k ( x )) k =1 Here a Gaussian mixture is used but any distribution could be the basis Parameters to be estimated π k ( x ), µ k ( x ) and σ 2 k ( x ). Henrik I Christensen (RIM@GT) Neural Networks 8 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary The mixture density network p ( t | x ) x D θ M θ x 1 θ 1 t Henrik I Christensen (RIM@GT) Neural Networks 9 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary The Model Parameters Mixing coefficients K � π k ( x ) = 1 0 ≤ π k ( x ) ≤ 1 k =1 achieved using softmax e a π k π k ( x ) = � K l =1 e a π l The variance must be postive, so a good choice is σ k ( x ) = e a σ k The means can be represented by direct activations µ kj ( x ) = a µ kj Henrik I Christensen (RIM@GT) Neural Networks 10 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary The Energy Equation(s) The error function is then as seen before � K N � � � π k ( x n , w ) N ( t | µ k ( x n , w ) , σ 2 E ( w ) = − ln k ( x n , w )) n =1 k =1 Computing the derivatives we can minimize E ( w ) Lets use γ nk = γ n ( t n | x n ) = π k N nk / � π l N nl The derivatives are then ∂ E n = π k − γ nk ∂ a π k ∂ E n � µ kl − t nl � = γ nk ∂ a µ σ 2 kl k L − || t n − µ k || 2 ∂ E n � � = γ nk σ 2 ∂ a σ k k Henrik I Christensen (RIM@GT) Neural Networks 11 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary A Toy Example 1 1 0 0 0 1 0 1 (a) (b) 1 1 0 0 0 1 0 1 (c) (d) Henrik I Christensen (RIM@GT) Neural Networks 12 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary Mixed density networks The net is optimizing a mixture of parameters Different parts corresponds to different components Each part has its own set “energy terms” and gradients Illustrates the flexibility but also complications Henrik I Christensen (RIM@GT) Neural Networks 13 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary Introductory Remarks What is the output was a probability distribution? Could we optimize over the posterior distribution? p ( t | x ) Assume it is Gaussian to enable processing p ( t | x , w , β ) = N ( t | y ( x , w ) , β − 1 ) Let’s consider how we can analyze the problem? Henrik I Christensen (RIM@GT) Neural Networks 15 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary The Laplace Approximation - I Sometimes the posterior is no longer Gaussian Challenges integration Closed form solutions might not be available How can we generate an approximation Obviously, using a Gaussian approximation would be helpful. Using a Laplace approximation Consider for now f ( z ) p ( z ) = � f ( a ) da the denominator is merely for normalization and considered unknown Assume the mode, z 0 has been determined, so that df ( z ) / dz = 0 Henrik I Christensen (RIM@GT) Neural Networks 16 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary The Laplace Approximation - II Taylor expansion of ln f is then ln f ( z ) ≈ ln f ( z 0 ) − 1 2 A ( z − z 0 ) 2 where A = − d 2 dz 2 ln f ( z ) | z = z 0 Taking the exponential f ( z ) ≈ f ( z 0 ) e { − A 2 ( z − z 0 ) 2 } which can be transformed to � A � 1 2 e { − A 2 ( z − z 0 ) 2 } q ( z ) = 2 π the extension to multi-variate distribution is straight forward (see book). Henrik I Christensen (RIM@GT) Neural Networks 17 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary Posterior Parameter Distribution Back to the Bayesian networks For an IID dataset with target values t = { t 1 , . . . , t N } we have N � N ( t n | y ( x n , w ) , β − 1 ) p ( t | w , β ) = n =1 The posterior is then p ( w | t , α, β ) ∝ p ( w | α ) p ( t | w , β ) As usual we have N ln p ( w | t ) = − α 2 w T w − β { y ( x n , w ) − t n } 2 + const � 2 n =1 Henrik I Christensen (RIM@GT) Neural Networks 18 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary Posterior Parameter Distribution - II We can use the Laplace approximation to estimate the distribution A = −∇ 2 ln p ( w | t , α, β ) = α I + β H The approximation would be q ( w | t ) = N ( w | w MAP , A − 1 ) In turn we have p ( t | x , t , α, β ) = N ( t | y ( x , w MAP ) , σ 2 ) where σ 2 = β − 1 + g T A − 1 g and g = ∇ w y ( x , w ) | w = w MAP Henrik I Christensen (RIM@GT) Neural Networks 19 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary Optimization of Hyper-parameters How do we estimate α and β ? We can consider the problem � p ( t | α, β ) = p ( t | w , β ) p ( w | α ) dw From linear regression we have the composition β Hu i = λ i u i where H is the Hessian for the error, E with regression with have γ α = w T MAP w MAP where γ is the effective rank of the Hessian Similarly β can be derived to be N 1 1 � { y ( x n , w MAP ) − t n } 2 β = N − γ n =1 Henrik I Christensen (RIM@GT) Neural Networks 20 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary Bayesian Neural Networks Modelling of system as a probabilistic generator Use standard techniques to generate w MAP We can in addition generate estimates for the precision/variance Henrik I Christensen (RIM@GT) Neural Networks 21 / 23

Introduction Mixture Density Networks Bayesian Neural Networks Summary Summary With Neural Nets we have a general functional estimator Can be applied both for regression and discrmination The basis functions can be a broad set of functions NNs can also be used for estimation of mixture systems Estimation of probability distributions is also possible for Gaussians (approximation w. w MAP , β ) Neural nets is a rich area with a long history. Henrik I Christensen (RIM@GT) Neural Networks 23 / 23

Neural Networks - II Henrik I Christensen Robotics & - PowerPoint PPT Presentation

Introduction Mixture Density Networks Bayesian Neural Networks Summary Neural Networks - II Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Learning Objectives Identify at least 5 mounting or holding solutions that can be created to

Symmetries of stochastic colored vertex models Pavel Galashin (UCLA) Dimers in Combinatorics and

TNRII SLIDE A24860 INSTALLATION AND OPERATING INSTRUCTIONS WARNING Prior to assembly, you

Peter Elbow revised: 05.29.14 || English 1301: Composition & Rhetoric I || D. Glen Smith,

ASTR633 Astrophysical Techniques Course slides Chapter 2: Telescopes Build the largest

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 20: Distance models

Inverse Kinematics Robert Platt Northeastern University Inverse Kinematics This addresses the

Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview

Neural Networks - II Henrik I Christensen Robotics & - PowerPoint PPT Presentation

Introduction Mixture Density Networks Bayesian Neural Networks Summary Neural Networks - II Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Learning Objectives Identify at least 5 mounting or holding solutions that can be created to

Symmetries of stochastic colored vertex models Pavel Galashin (UCLA) Dimers in Combinatorics and

TNRII SLIDE A24860 INSTALLATION AND OPERATING INSTRUCTIONS WARNING Prior to assembly, you

Peter Elbow revised: 05.29.14 || English 1301: Composition &amp; Rhetoric I || D. Glen Smith,

ASTR633 Astrophysical Techniques Course slides Chapter 2: Telescopes Build the largest

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 20: Distance models

Inverse Kinematics Robert Platt Northeastern University Inverse Kinematics This addresses the

Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview

Peter Elbow revised: 05.29.14 || English 1301: Composition & Rhetoric I || D. Glen Smith,