truncated random measures
play

Truncated Random Measures Jonathan Huggins MIT CSAIL and Dept. of - PowerPoint PPT Presentation

Truncated Random Measures Jonathan Huggins MIT CSAIL and Dept. of EECS with: T. Campbell, J. How, T. Broderick What leads to a statistical method being used for science? What leads to a statistical method being used for science? 1.


  1. Inference in BNP models • Option #1: Integrate out the parameter (CRP, IBP, etc.) 
 issues: care about the parameters, using approximations (HMC/VB), distributed computation • Option #2: use a finite approximation... 
 with e.g. variational inference, HMC 
 All BNP priors [Blei 06; Neal 10] Problem: Priors with finite approx (new) Wide variety of priors in Previously studied priors BNP with no finite with finite approx (past work) approximation Contributions: ● 2 representation forms (7 reps total) that allow finite approximation of (normalized) completely random measures ( (N)CRMs ) ● Approximation error analysis

  2. Inference in BNP models • Option #1: Integrate out the parameter (CRP, IBP, etc.) 
 issues: care about the parameters, using approximations (HMC/VB), distributed computation • Option #2: use a finite approximation... 
 with e.g. variational inference, HMC 
 All BNP priors [Blei 06; Neal 10] Problem: Priors with finite approx (new) Wide variety of priors in Previously studied priors BNP with no finite with finite approx (past work) approximation Contributions: ● 2 representation forms (7 reps total) that allow finite approximation of (normalized) completely random measures ( (N)CRMs ) ● Approximation error analysis ● Computational complexity analysis (not in this talk)

  3. Past work: finite approximations to BNP priors Finite Approximation Computational Approximation Error Bounds Complexity ✓ ✓ ✓ DP ✓ ✓ ✓ BP ✓ BPP ✓ ✓ ✓ 𝚫 P ✓ (N)CRM

  4. Past work: finite approximations to BNP priors Finite Approximation Computational Approximation Error Bounds Complexity ✓ ✓ ✓ DP [Sethuraman 94] [Ishwaran 01] [Roychowdhury 15] ✓ ✓ ✓ BP [Teh 07] [Doshi-Velez 09] [Paisley 12] [Paisley 12] [Thibaux 07] ✓ BPP [Broderick 14] ✓ ✓ ✓ 𝚫 P [Bondesson 82] [Roychowdhury 15] [Roychowdhury 15] ✓ (N)CRM [Broderick 14]

  5. Past work: finite approximations to BNP priors Finite Approximation Computational Approximation Error Bounds Complexity ✓ ✓ ✓ DP [Sethuraman 94] [Ishwaran 01] [Roychowdhury 15] ✓ ✓ ✓ BP [Teh 07] [Doshi-Velez 09] [Paisley 12] [Paisley 12] Sparse results for a few [Thibaux 07] ✓ BPP priors in BNP [Broderick 14] ✓ ✓ ✓ 𝚫 P [Bondesson 82] [Roychowdhury 15] [Roychowdhury 15] ✓ (N)CRM [Broderick 14]

  6. Past work: finite approximations to BNP priors Finite Approximation Computational Approximation Error Bounds Complexity ✓ ✓ ✓ DP [Sethuraman 94] [Ishwaran 01] [Roychowdhury 15] ✓ ✓ ✓ BP [Teh 07] [Doshi-Velez 09] [Paisley 12] [Paisley 12] Sparse results for a few [Thibaux 07] ✓ BPP priors in BNP [Broderick 14] ✓ ✓ ✓ 𝚫 P [Bondesson 82] [Roychowdhury 15] [Roychowdhury 15] ✓ (N)CRM No general theory [Broderick 14]

  7. Truncation Roadmap

  8. Truncation Roadmap Tractable models in BNP

  9. Truncation Roadmap Tractable models in BNP two forms for sequential representations

  10. Truncation Roadmap Tractable models in BNP two forms for sequential representations Truncation and error analysis

  11. Truncation Roadmap Tractable models in BNP two forms for sequential representations Truncation and error analysis

  12. The Standard Model in BNP (By Example ) politics s t r d o o … p o s f 343 189 Doc 1 (532 words) 210 Doc 2 (210 words) 854 Doc 3 (854 words) 342 584 Doc 4 (926 words) … 0.7 0.5 0.2

  13. The Standard Model in BNP (By Example ) frequency space politics s t r d o o … p o s f 343 189 Doc 1 (532 words) 210 Doc 2 (210 words) 854 Doc 3 (854 words) topic 
 342 584 space Doc 4 (926 words) … 0.7 0.5 0.2

  14. The Standard Model in BNP (By Example ) frequency space politics s t r d o o … p o s f 343 189 Doc 1 (532 words) 210 Doc 2 (210 words) 854 Doc 3 (854 words) topic 
 342 584 space Doc 4 (926 words) … 0.7 0.5 0.2

  15. The Standard Model in BNP (By Example ) frequency space politics s t r d o o … p o s f 343 189 Doc 1 (532 words) 210 Doc 2 (210 words) 854 Doc 3 (854 words) topic 
 342 584 space Doc 4 (926 words) … 0.7 0.5 0.2

  16. The Standard Model in BNP (By Example ) frequency space politics s t r d o o … p o s f 343 189 Doc 1 (532 words) 0.7 210 Doc 2 (210 words) sports 854 Doc 3 (854 words) topic 
 342 584 space Doc 4 (926 words) … 0.7 0.5 0.2

  17. The Standard Model in BNP (By Example ) frequency space politics s t r d o o … p o s f 343 189 Doc 1 (532 words) 0.7 210 Doc 2 (210 words) sports 854 Doc 3 (854 words) topic 
 342 584 space Doc 4 (926 words) … 0.7 0.5 0.2

  18. The Standard Model in BNP (By Example ) frequency space politics s t r d o o … p o s f 343 189 Doc 1 (532 words) 0.7 210 Doc 2 (210 words) sports 854 Doc 3 (854 words) topic 
 342 584 space Doc 4 (926 words) … 0.7 0.5 0.2

  19. The Standard Model in BNP (By Example ) frequency space politics s t r d o o … p o s f 343 189 Doc 1 (532 words) 0.7 210 Doc 2 (210 words) sports 854 Doc 3 (854 words) topic 
 342 584 space Doc 4 (926 words) … 0.7 0.5 0.2

  20. The Standard Model in BNP (By Example ) frequency space politics s t r d o o … p o s f 343 189 Doc 1 (532 words) 0.7 210 Doc 2 (210 words) sports 854 Doc 3 (854 words) topic 
 342 584 space Doc 4 (926 words) … 0.7 0.5 0.2

  21. The Standard Model in BNP (By Example ) frequency space politics s t r d o o … p o Θ s f 343 189 Doc 1 (532 words) 0.7 210 Doc 2 (210 words) sports 854 Doc 3 (854 words) topic 
 342 584 space Doc 4 (926 words) ϴ is a random … discrete measure on the topics 0.7 0.5 0.2

  22. The Standard Model in BNP (By Example ) frequency space politics s t r d o o … p o Θ s f 343 189 Doc 1 (532 words) 0.7 210 Doc 2 (210 words) sports 854 Doc 3 (854 words) topic 
 342 584 space Doc 4 (926 words) ϴ is a random … discrete measure on the topics 0.7 0.5 0.2

  23. The Standard Model in BNP (By Example ) rate 
 space “traits” ψ 1 ψ 2 ψ 3 … Θ 343 189 Obs 1 210 Obs 2 854 Obs 3 trait 
 342 584 space Obs 4 ϴ is a random … discrete measure θ 1 θ 2 θ 3 on the topics topics traits “rates”

  24. The Standard Model in BNP (By Example ) rate 
 space “traits” ψ 1 ψ 2 ψ 3 … Θ 343 189 Obs 1 210 Obs 2 854 Obs 3 trait 
 342 584 space Obs 4 ϴ is a random … discrete measure θ 1 θ 2 θ 3 on the topics topics traits “rates”

  25. Poisson processes and (N)CRMs How do we generate infinitely many trait/rate points ( 𝜔 , 𝜄 )? rate 
 space trait space

  26. Poisson processes and (N)CRMs How do we generate infinitely many trait/rate points ( 𝜔 , 𝜄 )? Poisson point process with measure 𝜉 (d 𝜄 x d 𝜔 ): rate 
 space trait space [Kingman 93]

  27. Poisson processes and (N)CRMs How do we generate infinitely many trait/rate points ( 𝜔 , 𝜄 )? Poisson point process with measure 𝜉 (d 𝜄 x d 𝜔 ): rate 
 space Θ completely random measure (CRM) (e.g. BP, 𝚫 P) trait space [Kingman 93]

  28. Poisson processes and (N)CRMs How do we generate infinitely many trait/rate points ( 𝜔 , 𝜄 )? Poisson point process with measure 𝜉 (d 𝜄 x d 𝜔 ): rate 
 space Θ completely random measure (CRM) (e.g. BP, 𝚫 P) trait space Normalize rates: normalized CRM (NCRM) (e.g. DP) [Kingman 93]

  29. Poisson processes and (N)CRMs How do we generate infinitely many trait/rate points ( 𝜔 , 𝜄 )? Poisson point process with measure 𝜉 (d 𝜄 x d 𝜔 ): rate 
 space Θ completely random measure (CRM) (e.g. BP, 𝚫 P) trait space Normalize rates: normalized CRM (NCRM) (e.g. DP) Captures a large class of useful priors in BNP [Kingman 93]

  30. Poisson processes and (N)CRMs How do we generate infinitely many trait/rate points ( 𝜔 , 𝜄 )? Poisson point process with measure 𝜉 (d 𝜄 x d 𝜔 ): rate 
 space Θ completely random measure (CRM) (e.g. BP, 𝚫 P) trait space Normalize rates: normalized CRM (NCRM) (e.g. DP) Captures a large class of useful priors in BNP How do we pick a finite subset of the points? [Kingman 93]

  31. Truncation Roadmap Tractable models in BNP two forms for sequential representations Truncation and error analysis

  32. Truncation Roadmap Tractable models in BNP two forms for sequential representations Truncation and error analysis

  33. Sequential representation & truncation We pick a finite subset of atoms ( 𝜔 , 𝜄 ) by: rate 
 space Θ trait space

  34. Sequential representation & truncation We pick a finite subset of atoms ( 𝜔 , 𝜄 ) by: 1) ordering the atoms (sequential representation) rate 
 space Θ trait space

  35. Sequential representation & truncation We pick a finite subset of atoms ( 𝜔 , 𝜄 ) by: 1) ordering the atoms (sequential representation) rate 
 space Θ 1 trait space

  36. Sequential representation & truncation We pick a finite subset of atoms ( 𝜔 , 𝜄 ) by: 1) ordering the atoms (sequential representation) rate 
 space Θ 1 2 trait space

  37. Sequential representation & truncation We pick a finite subset of atoms ( 𝜔 , 𝜄 ) by: 1) ordering the atoms (sequential representation) rate 
 space Θ 1 3 2 trait space

  38. Sequential representation & truncation We pick a finite subset of atoms ( 𝜔 , 𝜄 ) by: 1) ordering the atoms (sequential representation) rate 
 space 4 Θ 1 3 2 trait space

  39. Sequential representation & truncation We pick a finite subset of atoms ( 𝜔 , 𝜄 ) by: 1) ordering the atoms (sequential representation) rate 
 space 4 Θ 1 3 K 2 trait space

  40. Sequential representation & truncation We pick a finite subset of atoms ( 𝜔 , 𝜄 ) by: 1) ordering the atoms (sequential representation) rate 
 space 4 Θ 1 3 K 2 trait space

  41. Sequential representation & truncation We pick a finite subset of atoms ( 𝜔 , 𝜄 ) by: 1) ordering the atoms (sequential representation) 2) removing any atoms beyond the K-th (truncation) rate 
 space 4 Θ 1 3 K 2 trait space

  42. Sequential representation & truncation We pick a finite subset of atoms ( 𝜔 , 𝜄 ) by: 1) ordering the atoms (sequential representation) 2) removing any atoms beyond the K-th (truncation) rate 
 space 4 Θ 1 3 K 2 trait space

  43. Sequential representation & truncation We pick a finite subset of atoms ( 𝜔 , 𝜄 ) by: 1) ordering the atoms (sequential representation) 2) removing any atoms beyond the K-th (truncation) rate 
 space 4 Θ 1 3 K 2 trait space

  44. Sequential representation & truncation We pick a finite subset of atoms ( 𝜔 , 𝜄 ) by: 1) ordering the atoms (sequential representation) 2) removing any atoms beyond the K-th (truncation) rate 
 space 4 Θ 1 3 K 2 trait space

  45. Ordering of (N)CRM atoms We describe 2 forms for sequential representations

  46. Ordering of (N)CRM atoms We describe 2 forms for sequential representations Series representation 
 function of a homogenous 
 Poisson point process 
 (4 versions)

  47. Ordering of (N)CRM atoms We describe 2 forms for sequential representations Superposition representation 
 Series representation 
 infinite sum of homogenous function of a homogenous 
 CRMs, each with finite # of atoms 
 Poisson point process 
 (3 versions) (4 versions)

  48. Ordering of (N)CRM atoms We describe 2 forms for sequential representations Superposition representation 
 Series representation 
 infinite sum of homogenous function of a homogenous 
 CRMs, each with finite # of atoms 
 Poisson point process 
 (3 versions) (4 versions) Theorem (H., Campbell, How, Broderick). 
 Can generate (N)CRMs using all 7 sequential representations

  49. Sequential representation comparison Why so many representations?

  50. Sequential representation comparison Why so many representations? They’re all useful in different circumstances

  51. Sequential representation comparison Why so many representations? They’re all useful in different circumstances Series Reps Superposition Reps B-Rep IL-Rep R-Rep T-Rep DB-Rep PL-Rep SB-Rep ✓ ✓ ✓ / ✗ ✓ ✓ Error ✗ ✗ Bound (exp) (exp) (exp) (exp) Decay ✓ ✓ ✓ Ease of ✗ ✗✗ ✗ ✗ Analysis ✓ ✓ ✓ ✓ ✓ ✓ ✓ Generality ✓ ✓ ✗ ✗ ✗ ✗ ✗ Known # Atoms

  52. Sequential representation example Given Gamma process:

  53. Sequential representation example Given Gamma process: Step 1: compute

  54. Sequential representation example Given Gamma process: Step 1: compute

  55. Sequential representation example Given Gamma process: Step 1: compute Step 2: compute

  56. Sequential representation example Given Gamma process: Step 1: compute Step 2: compute

  57. Sequential representation example Given Gamma process: Step 1: compute Step 2: compute Exponential( 𝜇 ) density!

  58. Sequential representation example Given Gamma process: Step 1: compute Step 2: compute Exponential( 𝜇 ) Step 3: plug in! density!

  59. Truncation Roadmap Tractable models in BNP two forms for sequential representations Truncation and error analysis

  60. Truncation Roadmap Tractable models in BNP two forms for sequential representations Truncation and error analysis

  61. Choosing between the seven representations How close is our finite approximation?

  62. Choosing between the seven representations How close is our finite approximation? Truncation error:

  63. Choosing between the seven representations How close is our finite approximation? Truncation error: truncated full infinite ϴ K ϴ

  64. Choosing between the seven representations How close is our finite approximation? Truncation error: truncated full infinite ϴ K ϴ generated data generated data

  65. Choosing between the seven representations How close is our finite approximation? Truncation error: truncated full infinite ϴ K ϴ generated data generated data Compare the distribution of the data under full vs. truncated

  66. Choosing between the seven representations How close is our finite approximation? Truncation error: Depends on number of observations N and truncation level K

  67. Choosing between the seven representations How close is our finite approximation? Truncation error: Depends on number of observations N and truncation level K As N gets larger, error increases

  68. Choosing between the seven representations How close is our finite approximation? Truncation error: Depends on number of observations N and truncation level K ε As N gets larger, error increases As K gets larger, error decreases

  69. Choosing between the seven representations How close is our finite approximation? Truncation error: Depends on number of observations N and truncation level K ε As N gets larger, error increases As K gets larger, error decreases Cannot evaluate exactly , so we develop new upper bounds

  70. Protobound Leads to all the other truncation error bounds in this work Lemma (H., Campbell, How, Broderick). The truncation error i.e. P( whoops! )

Recommend


More recommend