Bayesian Nonparametrics: Models Based on the Dirichlet Process - PowerPoint PPT Presentation

Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro Panella (CS Dept. - UIC) Bayesian Nonparametrics February 18, 2013 1 / 57

Sources and Inspirations Tutorials (slides) P . Orbanz and Y.W. Teh, Modern Bayesian Nonparametrics . NIPS 2011. M. Jordan, Dirichlet Process, Chinese Restaurant Process, and All That . NIPS 2005. Articles etc. E.B. Sudderth, Chapter in PhD thesis, 2006. E. Fox, Chapter in PhD thesis, 2008. Y.W. Teh, Dirichlet Processes . Encyclopedia of Machine Learning, 2010. Springer. ... Alessandro Panella (CS Dept. - UIC) Bayesian Nonparametrics February 18, 2013 2 / 57

Outline Introduction and background 1 Bayesian learning Nonparametric models Finite mixture models 2 Bayesian models Clustering with FMMs Inference 3 Dirichlet process mixture models Going nonparametric! The Dirichlet process DP mixture models Inference A little more theory. . . 4 De Finetti’s REDUX Dirichlet process REDUX The hierarchical Dirichlet process 5 Alessandro Panella (CS Dept. - UIC) Bayesian Nonparametrics February 18, 2013 3 / 57

Introduction and background Outline Introduction and background 1 Bayesian learning Nonparametric models Finite mixture models 2 Bayesian models Clustering with FMMs Inference 3 Dirichlet process mixture models Going nonparametric! The Dirichlet process DP mixture models Inference A little more theory. . . 4 De Finetti’s REDUX Dirichlet process REDUX The hierarchical Dirichlet process 5 Alessandro Panella (CS Dept. - UIC) Bayesian Nonparametrics February 18, 2013 4 / 57

Introduction and background Bayesian learning The meaning of it all BAYESIAN NONPARAMETRICS Alessandro Panella (CS Dept. - UIC) Bayesian Nonparametrics February 18, 2013 5 / 57

Introduction and background Bayesian learning Bayesian statistics Estimate a parameter θ ∈ Θ after observing data x . Frequentist Maximum Likelihood (ML): ˆ θ MLE = argmax θ p ( x | θ ) = argmax θ L ( θ : x ) Bayesian Bayes Rule: p ( θ | x ) = p ( x | θ ) p ( θ ) p ( x ) Bayesian prediction (using the whole posterior, not just one estimator) � p ( x new | x ) = p ( x new | θ ) p ( θ | x ) d θ Θ Maximum A Posteriori (MAP) ˆ p ( x | θ ) p ( θ ) θ MAP = argmax θ Alessandro Panella (CS Dept. - UIC) Bayesian Nonparametrics February 18, 2013 6 / 57

Introduction and background Bayesian learning De Finetti’s theorem A premise: Definition An infinite sequence random variables ( x 1 , x 2 , . . . ) is said to be (infinitely) exchangeable if, for every N and every possible permutation π on ( 1 , . . . , N ) , p ( x 1 , x 2 , . . . , x N ) = p ( x π ( 1 ) , x π ( 2 ) . . . , x π ( N ) ) Note: exchangeability not equal i.i.d! Example (Polya Urn) An urn contains some red balls and some black balls; an infinite sequence of colors is drawn recursively as follows: draw a ball, mark down its color, then put the ball back in the urn along with an additional ball of the same color. Alessandro Panella (CS Dept. - UIC) Bayesian Nonparametrics February 18, 2013 7 / 57

Introduction and background Bayesian learning De Finetti’s theorem (cont’d) Theorem (De Finetti, 1935. Aka Representation Theorem) A sequence of random variables ( x 1 , x 2 , . . . ) is infinitely exchangeable if for all N , there exists a random variable θ and a probability measure p on it such that N � � p ( x 1 , x 2 , . . . , x N ) = p ( θ ) p ( x i | θ ) d θ Θ i = 1 i.e., there exists a parameter space and a measure on it that makes the variables iid! The representation theorem motivates (and encourages!) the use of Bayesian statistics. Alessandro Panella (CS Dept. - UIC) Bayesian Nonparametrics February 18, 2013 8 / 57

Introduction and background Bayesian learning Bayesian learning Hypothesis space H Given data D , compute p ( h | D ) = p ( D | h ) p ( h ) p ( D ) Then, we probably want to predict some future data D ′ , by either: Average over H , i.e. p ( D ′ | D ) = H p ( D ′ | h ) p ( h | D ) p ( h ) dh � Choose the MAP h (or compute it directly), i.e. p ( D ′ | D ) = p ( D ′ | h MAP ) Sample from the posterior ... H can be anything! Bayesian learning as a general learning framework We will consider the case in which h is a probabilistic model itself, i.e. a parameter vector θ . Alessandro Panella (CS Dept. - UIC) Bayesian Nonparametrics February 18, 2013 9 / 57

Introduction and background Bayesian learning A simple example Infer the bias θ ∈ [ 0 , 1 ] of a coin after observing N tosses. H = 1 , T = 0 , p ( H ) = θ h = θ , hence H = [ 0 , 1 ] Sequence of Bernoulli trials: θ p ( x 1 , . . . , x n | θ ) = θ n H ( 1 − θ ) N − n H x 1 x 2 x N where n H = # heads. Unknown θ : θ � 1 θ n H ( 1 − θ ) n H − k p ( θ ) d θ p ( x 1 , . . . , x N ) = x i 0 N Need to find a “good” prior p ( θ ) . . . Beta distribution! Alessandro Panella (CS Dept. - UIC) Bayesian Nonparametrics February 18, 2013 10 / 57

Introduction and background Bayesian learning A simple example (cont’d) Beta distribution: θ ∼ Beta ( a , b ) B ( a , b ) θ a − 1 ( 1 − θ ) b − 1 1 p ( θ | a , b ) = Bayesian learning: p ( h | D ) ∝ p ( D | h ) p ( h ) ; for us: Beta(0 . 1 , 0 . 1) p ( θ | x 1 , . . . , x N ) ∝ p ( x 1 , . . . , x n | θ ) p ( θ ) 1 = θ n H ( 1 − θ ) n T B ( a , b ) θ a − 1 ( 1 − θ ) b − 1 Beta(1 , 1) ∝ θ n H + a − 1 ( 1 − θ ) n T + b − 1 i.e. θ | x 1 , . . . , x N ∼ Beta ( a + N H , b + N T ) Beta(2 , 3) We’re lucky! The Beta distribution is a conjugate prior to the binomial distribution. Beta(10 , 10) Alessandro Panella (CS Dept. - UIC) Bayesian Nonparametrics February 18, 2013 11 / 57

Bayesian Nonparametrics: Models Based on the Dirichlet Process - PowerPoint PPT Presentation

Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro Panella (CS Dept. - UIC) Bayesian

Bayesian nonparametrics Dr. Jarad Niemi STAT 615 - Iowa State University December 5, 2017 Jarad

Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 L. Rosasco Bayesian

Bayesian Nonparametrics Charlie Frogner 9.520 Class 11 March 14, 2012 C. Frogner Bayesian

The Dirichlet-Bohr radius Manuel Maestre April 13, 2014 Kent State University Content

Variational Russian Roulette for Variational Russian Roulette for Deep Bayesian Nonparametrics

A Tutorial on Bayesian Nonparametrics Fatima Al-Raisi Carnegie Mellon University

Applied Bayesian Nonparametrics 3. Infinite Hidden Markov Models Tutorial at CVPR 2012 Erik

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Perspective Hierarchical Dirichlet Process for Perspective Hierarchical Dirichlet Process for

Hierarchical Dirichlet Processes Presenters: Micah Hodosh, Yizhou Sun 4/7/2010 1 Content

Boundary Representation of Dirichlet Forms on Canonically Compactifiable Graphs Michael Schwarz

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Bayesian Magic for Complex Social Science Data: Fusion, Nonparametrics, Dynamics, Dyads, Networks

Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University

Structured Databases of Named Entities from Bayesian Nonparametrics Dr. Jacob Eisenstein

Toward a 35-Year North American Precipitation and Ground Surface Reanalysis International

Say Hi! Programme 12.30pm Welcome Remarks by Mr Johnie Goh, OGEM 12.30pm 1pm Preparing

GEM Canada Report on Youth Entrepreneurship 2018 Nov Overview Report purpose and key definition

Provably Convergent Two- Timescale Off-Policy Actor-Critic with Function Approximation Shangtong

One () to Rule Them All Aaron Bedra Relevance, Inc. Wednesday, November 16, 11 I have a double

Formal Methods for Mining Structured Objects Gemma Casas Garriga Ph.D. Software Program

CRE AT I NG SPACE S F OR I NDI GE NOUS WOME N Who live with HI V/ AI DS Pre se nte

California College Pathways Webinar November 2019 In Inform rmation to Part rticipate