Flexible Priors for Deep Hierarchies Jacob Steinhardt Wednesday, - PowerPoint PPT Presentation

Flexible Priors for Deep Hierarchies Jacob Steinhardt Wednesday, November 9, 2011

Hierarchical Modeling • many data are well-modeled by an underlying tree Wednesday, November 9, 2011

Hierarchical Modeling • many data are well-modeled by an underlying tree [Celtic] Irish [Celtic] Gaelic (Scots) [Celtic] Welsh [Celtic] Cornish [Celtic] Breton [Iranian] Tajik [Iranian] Persian [Iranian] Kurdish (Central) [Romance] French [Germanic] German [Germanic] Dutch [Germanic] English [Germanic] Icelandic [Germanic] Swedish [Germanic] Norwegian [Germanic] Danish [Romance] Spanish [Greek] Greek (Modern) [Slavic] Bulgarian [Romance] Romanian [Romance] Portuguese [Romance] Italian [Romance] Catalan [Albanian] Albanian [Slavic] Polish [Slavic] Slovene [Slavic] Serbian − Croatian [Slavic] Ukrainian [Slavic] Russian [Baltic] Lithuanian [Baltic] Latvian [Slavic] Czech [Iranian] Pashto [Indic] Panjabi [Indic] Hindi [Indic] Kashmiri [Indic] Sinhala [Indic] Nepali [Iranian] Ossetic [Indic] Maithili [Indic] Marathi [Indic] Bengali [Armenian] Armenian (Western) [Armenian] Armenian (Eastern) Wednesday, November 9, 2011

Hierarchical Modeling Wednesday, November 9, 2011

Hierarchical Modeling • advantages of hierarchical modeling: Wednesday, November 9, 2011

Hierarchical Modeling • advantages of hierarchical modeling: • captures both broad and specific trends Wednesday, November 9, 2011

Hierarchical Modeling • advantages of hierarchical modeling: • captures both broad and specific trends • facilitates transfer learning Wednesday, November 9, 2011

Hierarchical Modeling • advantages of hierarchical modeling: • captures both broad and specific trends • facilitates transfer learning • issues: Wednesday, November 9, 2011

Hierarchical Modeling • advantages of hierarchical modeling: • captures both broad and specific trends • facilitates transfer learning • issues: • the underlying tree may not be known Wednesday, November 9, 2011

Hierarchical Modeling • advantages of hierarchical modeling: • captures both broad and specific trends • facilitates transfer learning • issues: • the underlying tree may not be known • predictions in deep hierarchies can be strongly influenced by the prior Wednesday, November 9, 2011

Learning the Tree Wednesday, November 9, 2011

Learning the Tree • major approaches for choosing a tree: Wednesday, November 9, 2011

Learning the Tree • major approaches for choosing a tree: • agglomerative clustering Wednesday, November 9, 2011

Learning the Tree • major approaches for choosing a tree: • agglomerative clustering • Bayesian methods (place prior over trees) Wednesday, November 9, 2011

Learning the Tree • major approaches for choosing a tree: • agglomerative clustering • Bayesian methods (place prior over trees) • stochastic branching processes Wednesday, November 9, 2011

Learning the Tree • major approaches for choosing a tree: • agglomerative clustering • Bayesian methods (place prior over trees) • stochastic branching processes • nested random partitions Wednesday, November 9, 2011

Agglomerative Clustering • start with each datum in its own subtree • iteratively merge subtrees based on a similarity metric • issues: • can’t add new data • can’t form hierarchies over latent parameters • difficult to incorporate structured domain knowledge Wednesday, November 9, 2011

Stochastic Branching Processes • fully Bayesian model � • data starts at top and branches � � � based on an arrival process � � � (Dirichlet diffusion trees) • can also start at bottom and �� merge (Kingman coalescents) Wednesday, November 9, 2011

Stochastic Branching Processes • many nice properties • infinitely exchangeable • complexity of tree grows with the data • latent parameters must undergo a continuous-time diffusion process • unclear how to construct such a process for models over discrete data Wednesday, November 9, 2011

Random Partitions • stick-breaking process: a way to partition the unit interval into countably many masses π 1 , π 2 ,... • draw β k from Beta(1, γ ) • let π k = β k x (1- β 1 ) ... (1- β k-1 ) • the distribution over the π k is called a Dirichlet process Wednesday, November 9, 2011

Random Partitions • suppose { π k } k=1,..., ∞ are drawn from a Dirichlet process • for n=1,..,N, let X n ~ Multinomial({ π k }) • induces distribution over partitions of {1,...,N} • given partition of {1,...,N}, add X N+1 to a part of size s with probability s/(N+ γ ) and to a new part with probability γ /(N+ γ ) • Chinese restaurant process Wednesday, November 9, 2011

Nested Random Partitions • a tree is equivalent to a collection of nested partitions • nested tree <=> nested random partitions • partition at each node given by Chinese restaurant process • issue: when to stop recursing? Wednesday, November 9, 2011

Martingale Property • martingale property: E[f( θ child ) | θ parent ] = f( θ parent ) • implies E[f( θ v ) | θ u ] = f( θ u ) for any ancestor u of v • says that learning about a child does not change beliefs in expectation Wednesday, November 9, 2011

Doob’s Theorem Wednesday, November 9, 2011

Doob’s Theorem • Let θ 1 , θ 2 ,... be a sequence of random variables such that E[f( θ n+1 ) | θ n ] = f( θ n ) and sup n E[| θ n |] < ∞ . Wednesday, November 9, 2011

Doob’s Theorem • Let θ 1 , θ 2 ,... be a sequence of random variables such that E[f( θ n+1 ) | θ n ] = f( θ n ) and sup n E[| θ n |] < ∞ . • Then lim n ! ∞ f( θ n ) exists with probability 1. Wednesday, November 9, 2011

Doob’s Theorem • Let θ 1 , θ 2 ,... be a sequence of random variables such that E[f( θ n+1 ) | θ n ] = f( θ n ) and sup n E[| θ n |] < ∞ . • Then lim n ! ∞ f( θ n ) exists with probability 1. • Intuition: each new random variable reveals more information about f( θ ) until it is completely determined. Wednesday, November 9, 2011

Doob’s Theorem • Use Doob’s theorem to build infinitely deep hierarchy • data associated with infinite paths v 1 ,v 2 ,... down the tree • each datum drawn from distribution parameterized by lim n f( θ vn ) Wednesday, November 9, 2011

Doob’s Theorem • all data have infinite depth • can think of effective depth of a datum as first point where it is in a unique subtree • effective depth is O(logN) Wednesday, November 9, 2011

Letting the Complexity Grow with the Data Wednesday, November 9, 2011

Letting the Complexity Grow with the Data 9 25 8 20 7 6 maximum depth average depth 15 5 10 4 3 nCRP nCRP 5 TSSB − 10 − 0.5 TSSB − 10 − 0.5 TSSB − 20 − 1.0 TSSB − 20 − 1.0 2 TSSB − 50 − 0.5 TSSB − 50 − 0.5 TSSB − 100 − 0.8 TSSB − 100 − 0.8 0 1 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 number of data points number of data points Wednesday, November 9, 2011

Hierarchical Beta Processes Wednesday, November 9, 2011

Hierarchical Beta Processes • θ v lies in [0,1] D Wednesday, November 9, 2011

Hierarchical Beta Processes • θ v lies in [0,1] D • θ v,d | θ p(v),d ~ Beta(c θ p(v),d ,c(1- θ p(v),d )) Wednesday, November 9, 2011

Hierarchical Beta Processes • θ v lies in [0,1] D • θ v,d | θ p(v),d ~ Beta(c θ p(v),d ,c(1- θ p(v),d )) • martingale property for f( θ v ) = θ v Wednesday, November 9, 2011

Hierarchical Beta Processes • θ v lies in [0,1] D • θ v,d | θ p(v),d ~ Beta(c θ p(v),d ,c(1- θ p(v),d )) • martingale property for f( θ v ) = θ v • let θ denote the limit Wednesday, November 9, 2011

Hierarchical Beta Processes • θ v lies in [0,1] D • θ v,d | θ p(v),d ~ Beta(c θ p(v),d ,c(1- θ p(v),d )) • martingale property for f( θ v ) = θ v • let θ denote the limit • X d | θ d ~ Bernoulli( θ d ), where θ is the limit Wednesday, November 9, 2011

Hierarchical Beta Processes • θ v lies in [0,1] D • θ v,d | θ p(v),d ~ Beta(c θ p(v),d ,c(1- θ p(v),d )) • martingale property for f( θ v ) = θ v • let θ denote the limit • X d | θ d ~ Bernoulli( θ d ), where θ is the limit • note that X d | θ v,d ~ Bernoulli( θ v,d ) as well Wednesday, November 9, 2011

Hierarchical Beta Processes Wednesday, November 9, 2011

Priors for Deep Hierarchies • for HBP , θ v,d converges to 0 or 1 • rate of convergence: tower of exponentials e e ee ··· • numerical issues + philosophically troubling Wednesday, November 9, 2011

Priors for Deep Hierarchies • inverse Wishart time-series • Σ n+1 | Σ n ~ InvW( Σ n ) • converges to 0 with probability 1 • becomes singular to numerical precision • rate also given by tower of exponentials Wednesday, November 9, 2011

Flexible Priors for Deep Hierarchies Jacob Steinhardt Wednesday, - PowerPoint PPT Presentation

Flexible Priors for Deep Hierarchies Jacob Steinhardt Wednesday, November 9, 2011 Hierarchical Modeling many data are well-modeled by an underlying tree Wednesday, November 9, 2011 Hierarchical Modeling many data are well-modeled by

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

Integrable twisted hierarchies Twisted with D 2 symmetries hierarchies of a splitting type

Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Complexity Hierarchies Lecture 2 2

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

The The Beverly Beverly Middle Middle School School Flexible Flexible Learning Learning

Mixture of g Priors for Bayesian Variable Selection Feng Liang, Rui Paulo et al. Sheng Zhang

Choosing Priors Probability Intervals 18.05 Spring 2014 Conjugate priors A prior is conjugate

Conjugate Priors: Beta and Normal 18.05 Spring 2018 Review: Continuous priors, discrete data

P-values, Probability, Priors, Rabbits, P-values, Probability, Priors, Rabbits, Quantifauxcation,

Informative Priors for Graphical Model Structure James Cussens, University of York

Personalized Learning Flexible Seating and Space Flexible Seating and Space Flexible Seating and

OUTLINE CHAPTER 10 Recursive Hierarchies Table of contents Recursive Hierarchies and Bridges

Lecture 20: Cache Hierarchies, Virtual Memory Todays topics: Cache hierarchies

Relational Data Hierarchies CSC444 Why hierarchies?

Hierarchies in inclusion logic Miika Hannula University of Helsinki 27.8.2014 Miika Hannula

A Tale of Knots & Games Allison Henrich, Ph.D. Seattle University April 27, 2014 Allison

Enlightenment, Reason, Religion, and Knowledge Sociology 250 January 15, 2013 () Enlightenment,

Building open source geospatial education at research universities Helena Mitasova, Makiko

Battle Hill Primary School y Summer 2011 presentation Battle Hill Primary School Th The

Whose slides are they anyway? Wales Home Last updated: 05 January 2007 The Rev Michael W alker

Ch 12: Facet Across Multiple Views Paper: BallotMaps Tamara Munzner Department of Computer

L INGNIERIE UN MTIER VIVRE ENSEMBLE www.viveris.fr Agenda 2 Vi Viveris eris

Higher Education Data Workshop 28 July 2011 Agenda Data and coding issues Audit outcomes,

Sambuz

Useful Links

Newsletter

Mail Us

Flexible Priors for Deep Hierarchies Jacob Steinhardt Wednesday, - PowerPoint PPT Presentation

Flexible Priors for Deep Hierarchies Jacob Steinhardt Wednesday, November 9, 2011 Hierarchical Modeling many data are well-modeled by an underlying tree Wednesday, November 9, 2011 Hierarchical Modeling many data are well-modeled by

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

Integrable twisted hierarchies Twisted with D 2 symmetries hierarchies of a splitting type

Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Complexity Hierarchies Lecture 2 2

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

The The Beverly Beverly Middle Middle School School Flexible Flexible Learning Learning

Mixture of g Priors for Bayesian Variable Selection Feng Liang, Rui Paulo et al. Sheng Zhang

Choosing Priors Probability Intervals 18.05 Spring 2014 Conjugate priors A prior is conjugate

Conjugate Priors: Beta and Normal 18.05 Spring 2018 Review: Continuous priors, discrete data

P-values, Probability, Priors, Rabbits, P-values, Probability, Priors, Rabbits, Quantifauxcation,

Informative Priors for Graphical Model Structure James Cussens, University of York

Personalized Learning Flexible Seating and Space Flexible Seating and Space Flexible Seating and

OUTLINE CHAPTER 10 Recursive Hierarchies Table of contents Recursive Hierarchies and Bridges

Lecture 20: Cache Hierarchies, Virtual Memory Todays topics: Cache hierarchies

Relational Data Hierarchies CSC444 Why hierarchies?

Hierarchies in inclusion logic Miika Hannula University of Helsinki 27.8.2014 Miika Hannula

A Tale of Knots &amp; Games Allison Henrich, Ph.D. Seattle University April 27, 2014 Allison

Enlightenment, Reason, Religion, and Knowledge Sociology 250 January 15, 2013 () Enlightenment,

Building open source geospatial education at research universities Helena Mitasova, Makiko

Battle Hill Primary School y Summer 2011 presentation Battle Hill Primary School Th The

Whose slides are they anyway? Wales Home Last updated: 05 January 2007 The Rev Michael W alker

Ch 12: Facet Across Multiple Views Paper: BallotMaps Tamara Munzner Department of Computer

L INGNIERIE UN MTIER VIVRE ENSEMBLE www.viveris.fr Agenda 2 Vi Viveris eris

Higher Education Data Workshop 28 July 2011 Agenda Data and coding issues Audit outcomes,

Sambuz

Useful Links

Newsletter

Mail Us

A Tale of Knots & Games Allison Henrich, Ph.D. Seattle University April 27, 2014 Allison