FrankWood Gatsby UCL - PowerPoint PPT Presentation

�� Frank�Wood Gatsby� UCL Cedric�Archambeau Gatsby Jan�Gasthaus HKUST Lancelot�James Gatsby Yee�Whye Teh

�� • Model – Smoothing�Markov�model�of�discrete�sequences – Extension�of�hierarchical�Pitman�Yor�process�[Teh�2006] • Unbounded�depth�(context�length) • Algorithms�and�estimation – Linear�time�suffix0tree�graphical�model�identification�and�construction – Standard�Chinese�restaurant�franchise�sampler • Results – Maximum�contextual�information�used�during�inference – Competitive�language�modelling�results • Limit�of� n 0gram�language�model�as� n →∞ – Same�computational�cost�as�a�Bayesian�interpolating�50gram�language� model

�� • Uses – Any�situation�in�which�a�low0order�Markov�model�of�discrete� sequences�is�insufficient – Drop�in�replacement�for�smoothing�Markov�model • Name? – ‘‘A�Stochastic�Memoizer�for�Sequence�Data ’’ →� Sequence� Memoizer�(SM)� • Describes�posterior�inference�[Goodman�et�al� ‘ 08]

�� • Sequence�Markov�models�are�usually�constructed�by�treating�a� sequence�as�a�set�of�(exchangeable)�observations�in�fixed0length� contexts    o | []    a | o   �    c | ao    a | []   a | cao c | a oacac → oacac → a | ca oacac → oacac → c | []  c | aca  a | c      c | ac   a | []   c | a   c | [] unigram bigram trigram 40gram Increasing�context�length�/�order�of�Markov�model Decreasing�number�of�observations Increasing�number�of�conditional�distributions�to�estimate�(indexed�by�context) Increasing�power�of�model

�� N � P ( x i | x � , . . . x i − � ) P ( x �� N ) = i �� N � ≈ P ( x i | x i − n �� , . . . x i − � ) , n = 2 i �� = P ( x � ) P ( x � | x � ) P ( x � | x � ) P ( x � | x � ) . . . • Example P (o) P (a | o) P (c | a) P (a | c) P (c | a) P (oacac) = G �� (o) G �� (a) G �� (a) G �� (c) G �� (a) =

�� • Discrete�distribution� ↔ vector�of�parameters G � � � = [ π � , . . . , π K ] , K ∈ | Σ | • Counting�/�Maximum�likelihood�estimation� – Training�sequence� x �� N π k = � { � k } ˆ G � � � ( X = k ) = ˆ � { � } G � � � – Predictive�inference P ( X n �� | x � . . . x N ) = ˆ G � � � ( X n �� ) • Example x i – Non0smoothed�unigram�model�( �� ǫ ) i = 1 : N

!�� • Estimation P ( G � � � | x �� n ) ∝ P ( x �� n |G � � � ) P ( G � � � ) • Predictive�inference � U P ( X n �� | x �� n ) = P ( X n �� |G � � � ) P ( G � � � | x �� n ) d G � � � • Priors�over�distributions G � � � ∼ Dirichlet( U ) , G � � � ∼ PY( d, c, U ) G � � � • Net�effect – Inference�is� “ smoothed ” w.r.t.�uncertainty�about� unknown� distribution • Example x i – Smoothed�unigram�( �� ǫ ) i = 1 : N

"�#�� discount concentration G � � � ∼ PY( d, c, G � σ � � �� ) ∼ G � � � base distribution x i • Tool�for�tying�together�related�distributions�in�hierarchical�models • Measure�over�measures • Base�measure�is�the� “ mean ” measure E [ G � � � ( dx )] = G � σ � � �� ( dx ) • A�distribution�drawn�from�a�Pitman�Yor�process�is�related�to�its base� distribution� – (equal�when� � =� ∞ or� �� =�1) �� ’ ��

$��%&��$�� • Generalization�of�the�Dirichlet�process�( �� =�0) – Different�(power0law)�properties – Better�for�text�[Teh,�2006]�and�images�[Sudderth and�Jordan,�2009] • Posterior�predictive�distribution Can’t actually do this integral this way � P ( X N �� | x �� N ; c, d ) ≈ P ( x N �� |G � � � ) P ( G � � � | x �� N ; c, d ) d G � � � �� K � k �� ( m k − d ) � ( φ k = X N �� ) + c + dK � = c + N G � σ � � �� ( X N �� ) c + N • Forms�the�basis�for�straightforward,�simple�samplers • Rule�for�stochastic�memoization

'��!�� • Estimation U {G � � � , G � � � , G � � � } , Θ = ( = σ ( � ) = σ ( � ) P (Θ | x �� N ) ∝ P ( x �� N | Θ) P (Θ) • Predictive�inference G � � � P ( X N �� | x �� N ) � P ( X N �� | Θ) P (Θ | x �� N ) d Θ = • Naturally�related�distributions�tied� G � � � G � � � together G � the United States � ∼ PY( d, c, G � United States � ) Net�effect� • x i x j – Observations�in�one�context�affect� inference�in�other�context. – Statistical�strength�is�shared�between� similar�contexts i = 1 : N � � � j = 1 : N � � � • Example – Smoothing�bi0gram�( �� ǫ � � � �� ∈ Σ )

��)'$&$��"�� Observations Conditional Distributions Posterior Predictive Probabilities U G �� G �� G �� G ��

�*��$��$��+,�� Observations Conditional Distributions Posterior Predictive Probabilities U CP U G �� G �� G �� G ��

�*��$��$��+,�� Observations Conditional Distributions Posterior Predictive Probabilities U CP U G �� CP U G �� G �� G ��

'$&$��"�� • Share�statistical�strength�between� sequentially�related�predictive� �� G �� conditional�distributions – Estimates�of�highly�specific� conditional�distributions G �� G �� G �� G �� – Are�coupled�with�others�that�are� related G �� – Through�a�single�common,�more0 G �� G �� G �� general�shared�ancestor G �� G �� • Corresponds�intuitively�to�back0off G �� G �� G �� G �� G �� G �� G ��

'��$��&��$�� G �� | d � , U ∼ PY( d � , 0 , U ) G � � � | d | � | , G � σ � � �� ∼ PY( d | � | , 0 , G � σ � � �� ) x i | � �� i − � = � ∼ G � � � i = 1 , . . . , T ∀ � ∈ Σ n − � • Bayesian�generalization�of�smoothing� n 0gram�Markov�model� • Language�model�:�outperforms�interpolated�Kneser0Ney�(KN)�smoothing • Efficient�inference�algorithms�exist� – [Goldwater�et�al� ’ 05;�Teh,� ’ 06;�Teh,�Kurihara,�Welling,� ’ 08] • Sharing�between�contexts�that�differ�in�most�distant�symbol�only • Finite�depth �� ’ �� ’ ��

"�� • A�sequence�can�be�characterized�by�a�set�of� single observations�in�unique�contexts�of�growing�length   o | []     Increasing�context�length  a | o  oacac → c | ao Always�a�single�observation    a | cao     c | acao Foreshadowing:�all�suffixes�of�the�string� “ cacao ”

FrankWood Gatsby UCL - PowerPoint PPT Presentation

FrankWood Gatsby UCL CedricArchambeau Gatsby JanGasthaus HKUST LancelotJames Gatsby YeeWhye Teh

Variational Inference for Diffusion Processes C edric Archambeau Xerox Research Centre Europe

Probabilistic Programming Frank Wood frank@invrea.com

Information Theory Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience

Neural Encoding Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience

Population Coding Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit

Introduction to Neural Coding Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational

Binary Factorization Models for Statistical Relational Learning Guillaume Bouchard Collaborators

RMLL 2009 Network virtualisation using Netkit and Dynamips Cedric Foll 07.08.09 Cedric Foll

Statistical methods for neural decoding Liam Paninski Gatsby Computational Neuroscience Unit

Probabilistic Programming Frank Wood fwood@robots.ox.ac.uk http://www.robots.ox.ac.uk/~fwood

LEEP: A New Measure to Evaluate Transferability of Learned Representations Cuong V. Nguyen Tal

Estimation of information-theoretic quantities Liam Paninski Gatsby Computational Neuroscience

Probabilistic Programming Practical Frank Wood, Brooks Paige {fwood,brooks}@robots.ox.ac.uk MLSS

Introduction to probabilistic programming Frank Wood fwood@cs.ubc.ca Objectives For Today Get

Tutorial on Probabilistic Programming in Machine Learning Frank Wood Play Along 1. Download

The Great Gatsby and Icarus Exposing Parallels and Problems within an Entropic Universe 5 May

Neural Networks (and Gradient Ascent Again) Frank Wood April 27, 2010 Generalized Regression

When E.T. comes into Windows Mobile 6 a.k.a. PoC(k)ET Cedric Halbronn Sogeti / ESEC R&D

Introduction to Bayesian Inference Frank Wood April 6, 2010 Introduction Overview of Topics

Scalable Hyperparameter Transfer learning Valerio Perrone , Rodolphe Jenatton , C edric

Some Microfoundations for the Great Gatsby Curve Steven N. Durlauf University of Wisconsin

Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational

Amortized Monte Carlo Integration Adam Goli ski, Frank Wood, Tom Rainforth 11/06/19

Improve Your Health with Proper Ergonomics and Frequent Movement Presenter Cedric Arijs, MSc As

FrankWood Gatsby UCL - PowerPoint PPT Presentation

FrankWood Gatsby UCL CedricArchambeau Gatsby JanGasthaus HKUST LancelotJames Gatsby YeeWhye Teh

Variational Inference for Diffusion Processes C edric Archambeau Xerox Research Centre Europe

Probabilistic Programming Frank Wood frank@invrea.com

Information Theory Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience

Neural Encoding Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience

Population Coding Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit

Introduction to Neural Coding Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational

Binary Factorization Models for Statistical Relational Learning Guillaume Bouchard Collaborators

RMLL 2009 Network virtualisation using Netkit and Dynamips Cedric Foll 07.08.09 Cedric Foll

Statistical methods for neural decoding Liam Paninski Gatsby Computational Neuroscience Unit

Probabilistic Programming Frank Wood fwood@robots.ox.ac.uk http://www.robots.ox.ac.uk/~fwood

LEEP: A New Measure to Evaluate Transferability of Learned Representations Cuong V. Nguyen Tal

Estimation of information-theoretic quantities Liam Paninski Gatsby Computational Neuroscience

Probabilistic Programming Practical Frank Wood, Brooks Paige {fwood,brooks}@robots.ox.ac.uk MLSS

Introduction to probabilistic programming Frank Wood fwood@cs.ubc.ca Objectives For Today Get

Tutorial on Probabilistic Programming in Machine Learning Frank Wood Play Along 1. Download

The Great Gatsby and Icarus Exposing Parallels and Problems within an Entropic Universe 5 May

Neural Networks (and Gradient Ascent Again) Frank Wood April 27, 2010 Generalized Regression

When E.T. comes into Windows Mobile 6 a.k.a. PoC(k)ET Cedric Halbronn Sogeti / ESEC R&amp;D

Introduction to Bayesian Inference Frank Wood April 6, 2010 Introduction Overview of Topics

Scalable Hyperparameter Transfer learning Valerio Perrone , Rodolphe Jenatton , C edric

Some Microfoundations for the Great Gatsby Curve Steven N. Durlauf University of Wisconsin

Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational

Amortized Monte Carlo Integration Adam Goli ski*, Frank Wood, Tom Rainforth* 11/06/19

Improve Your Health with Proper Ergonomics and Frequent Movement Presenter Cedric Arijs, MSc As

When E.T. comes into Windows Mobile 6 a.k.a. PoC(k)ET Cedric Halbronn Sogeti / ESEC R&D

Amortized Monte Carlo Integration Adam Goli ski, Frank Wood, Tom Rainforth 11/06/19