Genera&ve Stochas&c Networks Trainable by Backprop - PowerPoint PPT Presentation

Genera&ve ¡Stochas&c ¡Networks ¡ Trainable ¡by ¡Backprop ¡ Yoshua ¡Bengio ¡ with ¡Eric ¡Laufer, ¡Li ¡Yao, ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ Guillaume ¡Alain ¡& ¡Pascal ¡Vincent ¡ ¡ RepLearn ¡Workshop ¡@ ¡AAAI ¡2013 ¡ July ¡15th ¡2013, ¡Bellevue, ¡WA, ¡USA ¡ ¡ ¡ ¡

Represe sentation Learning • Good ¡ features ¡essenBal ¡for ¡successful ¡ML ¡ raw ¡ represented ¡ represented ¡ MACHINE ¡ input ¡ by ¡learned ¡ by ¡chosen ¡ LEARNING ¡ ¡ data ¡ features ¡ features ¡ • HandcraGing ¡features ¡vs ¡learning ¡them ¡ • Good ¡representaBon: ¡captures ¡posterior ¡belief ¡about ¡ explanatory ¡causes, ¡disentangles ¡these ¡underlying ¡ factors ¡of ¡variaBon ¡ • RepresentaBon ¡learning: ¡guesses ¡ ¡ ¡ ¡ ¡ ¡the ¡features ¡/ ¡factors ¡/ ¡causes ¡= ¡ ¡ ¡ ¡ ¡ ¡ ¡good ¡representaBon ¡of ¡observed ¡data. ¡ 2 ¡

De Deep Represe sentation Lear ni ning ng Learn ¡mul&ple ¡levels ¡of ¡representa&on ¡ … ¡ h 3 ¡ of ¡increasing ¡complexity/abstrac&on ¡ h 2 ¡ h 1 ¡ ¡ • potenBally ¡exponenBal ¡gain ¡in ¡expressive ¡power ¡ x ¡ • brains ¡are ¡deep ¡ ¡ • humans ¡organize ¡knowledge ¡in ¡a ¡composiBonal ¡way ¡ ¡ • BeWer ¡MCMC ¡mixing ¡in ¡space ¡of ¡deeper ¡representaBons ¡ ¡(Bengio ¡et ¡al, ¡ICML ¡2013) ¡ • They ¡work! ¡SOTA ¡on ¡industrial-‑scale ¡AI ¡tasks ¡ (object ¡recogni&on, ¡speech ¡recogni&on, ¡ ¡ language ¡modeling, ¡music ¡modeling) ¡ ¡ 3 ¡

Follo wi wing ng up p on n (B IPS’2000) Follo (Bengi engio et et al al NIP sualization Neural word embeddings s - visu 4 ¡

Analogical Represe sentations s for Free (Mi Mikolov kolov et al, ICL CLR 2013) • SemanBc ¡relaBons ¡appear ¡as ¡linear ¡relaBonships ¡in ¡the ¡space ¡of ¡ learned ¡representaBons ¡ • King ¡– ¡Queen ¡≈ ¡ ¡Man ¡– ¡Woman ¡ • Paris ¡– ¡France ¡+ ¡Italy ¡≈ ¡Rome ¡ France ¡ Italy ¡ Paris ¡ Rome ¡ 5 ¡

Combining Multiple Sources Co s of Evidence with Shared Represe sentations s person ¡ url ¡ event ¡ • TradiBonal ¡ML: ¡data ¡= ¡matrix ¡ url ¡ words ¡ history ¡ • RelaBonal ¡learning: ¡mulBple ¡sources, ¡ different ¡tuples ¡of ¡variables ¡ • Share ¡representaBons ¡of ¡same ¡types ¡ across ¡data ¡sources ¡ • Shared ¡learned ¡representaBons ¡help ¡ url ¡ person ¡ event ¡ propagate ¡informaBon ¡among ¡data ¡ url ¡ history ¡ words ¡ sources: ¡e.g., ¡WordNet, ¡XWN, ¡ Wikipedia, ¡ FreeBase , ¡ImageNet… (Bordes ¡et ¡al ¡AISTATS ¡2012, ¡ML ¡J. ¡2013) ¡ FACTS ¡= ¡DATA ¡ • P(person,url,event) ¡ Deduc&on ¡= ¡Generaliza&on ¡ • P(url,words,history) ¡ 6 ¡

Temporal Co Coherence and Scales s • Hints ¡from ¡nature ¡about ¡different ¡explanatory ¡factors: ¡ Rapidly ¡changing ¡factors ¡(oGen ¡noise) ¡ • Slowly ¡changing ¡(generally ¡more ¡abstract) ¡ • Different ¡factors ¡at ¡different ¡Bme ¡scales ¡ • Exploit ¡those ¡ hints ¡to ¡ disentangle ¡beWer! ¡ • (Becker ¡& ¡Hinton ¡1993, ¡WiskoW ¡& ¡Sejnowski ¡2002, ¡Hurri ¡& ¡ • Hyvarinen ¡2003, ¡Berkes ¡& ¡WiskoW ¡2005, ¡Mobahi ¡et ¡al ¡ 2009, ¡Bergstra ¡& ¡Bengio ¡2009) ¡

How do humans s generalize from very few examples? s? • They ¡ transfer ¡knowledge ¡from ¡previous ¡learning: ¡ RepresentaBons ¡ • Explanatory ¡factors ¡ • • Previous ¡learning ¡from: ¡unlabeled ¡data ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡+ ¡labels ¡for ¡other ¡tasks ¡ • Prior: ¡shared ¡underlying ¡explanatory ¡factors, ¡in ¡ par&cular ¡between ¡P(x) ¡and ¡P(Y|x) ¡ ¡ à ¡Need ¡good ¡unsupervised ¡learning ¡of ¡representa&ons ¡ • à 8 ¡ ¡

Unsu supervise sed and Transf sfer Learning Challenge + Transf Ch sfer Learning Challenge: Deep Learning 1st Ch st Place NIPS’2011 ¡ Transfer ¡ Raw ¡data ¡ Learning ¡ 1 ¡layer ¡ 2 ¡layers ¡ Challenge ¡ ¡ Paper: ¡ ICML’2012 ¡ ICML’2011 ¡ workshop ¡on ¡ Unsup . ¡& ¡ 3 ¡layers ¡ Transfer ¡Learning ¡ 4 ¡layers ¡

Latent Variables s Love-Hate Relationsh ship • GOOD! ¡ Appealing : ¡model ¡explanatory ¡factors ¡ h ¡ • BAD! ¡Exact ¡inference? ¡Nope. ¡Just ¡ Pain . ¡ ¡too ¡many ¡possible ¡configuraBons ¡of ¡ h ¡ ¡ • WORSE! ¡Learning ¡usually ¡requires ¡inference ¡ and/or ¡sampling ¡from ¡P( h , ¡x ) ¡ 10 ¡

An Anon onymous ymous Latent Variables s • No ¡pre-‑assigned ¡seman1cs ¡ • Learning ¡ discovers ¡underlying ¡factors, ¡ ¡ ¡e.g., ¡PCA ¡discovers ¡leading ¡direcBons ¡of ¡variaBons ¡ ¡ ¡ • Increases ¡expressiveness ¡of ¡P( x )= Σ h ¡P( x , h ) ¡ • Universal ¡approximators, ¡e.g. ¡for ¡RBMs ¡ ¡ ¡ (Le ¡Roux ¡& ¡Bengio, ¡Neural ¡Comp. ¡2008) ¡ . ¡ 11 ¡

Deep Probabilist stic Models s • Linear ¡factor ¡models ¡(sparse ¡coding, ¡PCA, ¡ICA) ¡-‑ ¡shallow ¡ • Restricted ¡Boltzmann ¡Machines ¡( RBMs ) ¡many ¡variants ¡– ¡shallow ¡ • Energy( x , h ) ¡= ¡-‑ ¡ h’ ¡ W ¡x ¡ • Deep ¡Belief ¡Nets ¡( DBN ) ¡ • P( x , h 1 , h 2 , ¡h 3 ) ¡= ¡P( x | h 1 ) ¡P( h 1 | h 2 ¡ )P( h 2 , ¡h 3 ), ¡ ¡ ¡where ¡P( h 2 , ¡h 3 ) ¡= ¡RBM, ¡condiBonals ¡= ¡sigmoid+affine ¡ • Deep ¡Boltzmann ¡Machines ¡( DBM ) ¡ • Energy( x , h 1 , h 2 ,…) ¡= ¡-‑ ¡ h 1 ’ ¡ W 1 ¡x ¡ -‑ ¡ h 2 ’ ¡ W 2 ¡h 1 -‑… ¡ 12 ¡

Stack of RBMs s à Deep Deep Bol Boltz tzma mann Ma Machin ine e à (Salakhutdinov ¡& ¡Hinton ¡AISTATS ¡2009) ¡ • Halve ¡the ¡RBM ¡weights ¡because ¡each ¡layer ¡now ¡has ¡inputs ¡from ¡ below ¡and ¡from ¡above ¡ • PosiBve ¡phase: ¡(mean-‑field) ¡variaBonal ¡inference ¡= ¡recurrent ¡AE ¡ • NegaBve ¡phase: ¡Gibbs ¡sampling ¡(stochasBc ¡units) ¡ • train ¡by ¡SML/PCD ¡ h 3 ¡ ½W 3 ¡ W 3 ¡ T ¡ T ¡ ½W 3 ¡ ½W 3 ¡ h 2 ¡ T ¡ T ¡ ½W 2 ¡ W 2 ¡ ½W 2 ¡ ½W 2 ¡ ½W 2 ¡ h 1 ¡ T ¡ T ¡ ½W 1 ¡ ½W 1 ¡ ½W 1 ¡ T ¡ ½W 1 ¡ W 1 ¡ W 1 ¡ x ¡ 13 ¡

Ap Approxima roximate I te Inferen ference e • MAP ¡ • h * ¡ ≅ ¡argmax h ¡P( h | x ) ¡ ¡ è ¡assume ¡1 ¡dominant ¡mode ¡ • VariaBonal ¡ • Look ¡for ¡tractable ¡Q( h ) ¡minimizing ¡KL(Q(.)||P(.| x )) ¡ ¡ • Q ¡is ¡either ¡factorial ¡or ¡tree-‑structured ¡ • è ¡strong ¡assumpBon ¡ • MCMC ¡ • Setup ¡Markov ¡chain ¡asymptoBcally ¡sampling ¡from ¡P( h | x ) ¡ • Approx. ¡marginalizaBon ¡through ¡MC ¡avg ¡over ¡few ¡samples ¡ • è ¡assume ¡a ¡few ¡dominant ¡modes ¡ ¡ • Approximate ¡inference ¡can ¡seriously ¡hurt ¡learning ¡ ¡ ¡ ¡ ¡ ¡(Kulesza ¡& ¡Pereira ¡NIPS’2007) ¡ 14 ¡

Co Computational Graphs s • OperaBons ¡for ¡parBcular ¡task ¡ • Neural ¡nets’ ¡structure ¡= ¡computaBonal ¡graph ¡for ¡P( y | x ) ¡ • Graphical ¡model’s ¡structure ¡≠ ¡computaBonal ¡graph ¡for ¡inference ¡ • Recurrent ¡nets ¡& ¡graphical ¡models ¡ ¡ ¡ è ¡ family ¡of ¡computa&onal ¡graphs ¡sharing ¡parameters ¡ • Could ¡we ¡have ¡a ¡parametrized ¡family ¡of ¡computa5onal ¡graphs ¡ defining ¡“the ¡model”? ¡ 15 ¡

Genera&ve Stochas&c Networks Trainable by Backprop - PowerPoint PPT Presentation

Genera&ve Stochas&c Networks Trainable by Backprop Yoshua Bengio with Eric Laufer, Li Yao,

The Gender Equality Network in Physics in the European Research Area (GENERA) in Physics Day

Automa'c Genera'on Control Using Ar'ficial Neural Networks By-

Digital Security of Physical Objects Slava Voloshynovskiy Stochas:c Informa:on Processing Group

Stochas&c efficiencies G. Verley, M. Esposito, T. Willaert

Stochas(c analysis of mountain glacier dynamics Polina Morozova,

Simula'ons of cor'cal network models made of stochas'c spiking

Stochas(c Stellar Feedback in Low-Mass Galaxies Chris Power, ICRAR/UWA with Lilian

Maximizing Expected U=lity for Stochas=c Combinatorial Op=miza=on

Stochas(c)Approach)for)Integrated)Rendering) of)Volumes)and)Semi9transparent)Surfaces

Lottery Industry Insights Survey 2019 Genera Networks recently sent out the Lottery Industry

Womens Gambling Survey 2019 Genera Networks recently sent out the Womens Gambling Survey to

Genera&ve Adversarial Networks NTT

P2P Networks as Content P2P Networks as Content Delivery Networks Delivery Networks FINAL

Current Network Structure for Pediatrics Hospital Networks Country, state, regional, Academic

Mobile Communications Ad-Hoc Networks & Wireless Sensor Networks Ad-hoc networks

Outline Applications of Random Networks Random Networks Applications of Random Networks

The Magic Of Mentorship : Carol E. Murray Getting Value on Both Moderator Sides of the

Final Project Update Stefan Behr Another Project Change... CV stuff a bit too much given

A Formal Framework for UML Modeling with Timed Constraints: Application to Railway Control

Offload Mode Case Study James Briggs 1 COSMOS DiRAC April 28, 2015 Case Study: Modal2d

Spanners Social and Technological Networks Rik Sarkar University of Edinburgh, 2018. Distances

Digital Integrated Circuits Chapter 6 The CMOS Inverter EEL7312 INE5442 1 Digital

Geometric Spanner Networks Course Outline Textbook Introduction Algorithms Review Greedy

Lecture 19: Generative Models, Part 1 Justin Johnson November 11, 2020 Lecture 19 - 1

Sambuz

Useful Links

Newsletter

Mail Us

Genera&ve Stochas&c Networks Trainable by Backprop - PowerPoint PPT Presentation

Genera&ve Stochas&c Networks Trainable by Backprop Yoshua Bengio with Eric Laufer, Li Yao,

The Gender Equality Network in Physics in the European Research Area (GENERA) in Physics Day

Automa'c Genera'on Control Using Ar'ficial Neural Networks By-

Digital Security of Physical Objects Slava Voloshynovskiy Stochas:c Informa:on Processing Group

Stochas&amp;c efficiencies G. Verley, M. Esposito, T. Willaert

Stochas(c analysis of mountain glacier dynamics Polina Morozova,

Simula'ons of cor'cal network models made of stochas'c spiking

Stochas(c Stellar Feedback in Low-Mass Galaxies Chris Power, ICRAR/UWA with Lilian

Maximizing Expected U=lity for Stochas=c Combinatorial Op=miza=on

Stochas(c)Approach)for)Integrated)Rendering) of)Volumes)and)Semi9transparent)Surfaces

Lottery Industry Insights Survey 2019 Genera Networks recently sent out the Lottery Industry

Womens Gambling Survey 2019 Genera Networks recently sent out the Womens Gambling Survey to

Genera&amp;ve Adversarial Networks NTT

P2P Networks as Content P2P Networks as Content Delivery Networks Delivery Networks FINAL

Current Network Structure for Pediatrics Hospital Networks Country, state, regional, Academic

Mobile Communications Ad-Hoc Networks &amp; Wireless Sensor Networks Ad-hoc networks

Outline Applications of Random Networks Random Networks Applications of Random Networks

The Magic Of Mentorship : Carol E. Murray Getting Value on Both Moderator Sides of the

Final Project Update Stefan Behr Another Project Change... CV stuff a bit too much given

A Formal Framework for UML Modeling with Timed Constraints: Application to Railway Control

Offload Mode Case Study James Briggs 1 COSMOS DiRAC April 28, 2015 Case Study: Modal2d

Spanners Social and Technological Networks Rik Sarkar University of Edinburgh, 2018. Distances

Digital Integrated Circuits Chapter 6 The CMOS Inverter EEL7312 INE5442 1 Digital

Geometric Spanner Networks Course Outline Textbook Introduction Algorithms Review Greedy

Lecture 19: Generative Models, Part 1 Justin Johnson November 11, 2020 Lecture 19 - 1

Sambuz

Useful Links

Newsletter

Mail Us

Stochas&c efficiencies G. Verley, M. Esposito, T. Willaert

Genera&ve Adversarial Networks NTT

Mobile Communications Ad-Hoc Networks & Wireless Sensor Networks Ad-hoc networks