exploiting compositionality to explore a large space of
play

Exploiting compositionality to explore a large space of model - PowerPoint PPT Presentation

Exploiting compositionality to explore a large space of model structures R. Grosse, R. Salakhutdinov, W. Freeman, & J. Tenenbaum Best Student Paper at UAI 2012 Jan Gasthaus Tea talk 31st Aug 2012 1 / 15 Motivation Goal: Given a data set,


  1. Exploiting compositionality to explore a large space of model structures R. Grosse, R. Salakhutdinov, W. Freeman, & J. Tenenbaum Best Student Paper at UAI 2012 Jan Gasthaus Tea talk 31st Aug 2012 1 / 15

  2. Motivation Goal: Given a data set, determine the right model to use for that data set 2 / 15

  3. Motivation Goal: Given a data set, determine the right model to use for that data set Ideal approach ◮ Implement all models ever published 2 / 15

  4. Motivation Goal: Given a data set, determine the right model to use for that data set Ideal approach ◮ Implement all models ever published ◮ Fit them to the data set 2 / 15

  5. Motivation Goal: Given a data set, determine the right model to use for that data set Ideal approach ◮ Implement all models ever published ◮ Fit them to the data set ◮ Compare them using some model selection criterion and pick the best 2 / 15

  6. Motivation Goal: Given a data set, determine the right model to use for that data set Ideal approach ◮ Implement all models ever published ◮ Fit them to the data set ◮ Compare them using some model selection criterion and pick the best Mainly a computational problem; Proposed solution: 2 / 15

  7. Motivation Goal: Given a data set, determine the right model to use for that data set Ideal approach ◮ Implement all models ever published ◮ Fit them to the data set ◮ Compare them using some model selection criterion and pick the best Mainly a computational problem; Proposed solution: ◮ Pick a rich class of models: matrix decomposition models 2 / 15

  8. Motivation Goal: Given a data set, determine the right model to use for that data set Ideal approach ◮ Implement all models ever published ◮ Fit them to the data set ◮ Compare them using some model selection criterion and pick the best Mainly a computational problem; Proposed solution: ◮ Pick a rich class of models: matrix decomposition models ◮ Fit more complex models re-using computations from simple ones 2 / 15

  9. Motivation Goal: Given a data set, determine the right model to use for that data set Ideal approach ◮ Implement all models ever published ◮ Fit them to the data set ◮ Compare them using some model selection criterion and pick the best Mainly a computational problem; Proposed solution: ◮ Pick a rich class of models: matrix decomposition models ◮ Fit more complex models re-using computations from simple ones ◮ Approximate model selection criterion 2 / 15

  10. Motivation Goal: Given a data set, determine the right model to use for that data set Ideal approach ◮ Implement all models ever published ◮ Fit them to the data set ◮ Compare them using some model selection criterion and pick the best Mainly a computational problem; Proposed solution: ◮ Pick a rich class of models: matrix decomposition models ◮ Fit more complex models re-using computations from simple ones ◮ Approximate model selection criterion ◮ Greedy heuristic for exploring the space of structure exploiting compositionality 2 / 15

  11. In A Nutshell Grammar for generative models for matrix factorization ◮ Express models as algebraic expressions such as MG + G ◮ Devise CFG that generates these expressions with rules like G → GG + G Search over model structures greedily by applying the production rules and using an approximate lower bound on model score Initialize sampling in model by using a specialized algorithm for each production rule 3 / 15

  12. Components 4 / 15

  13. Grammar 5 / 15

  14. Models 6 / 15

  15. Inference: Individual Models Initialize state using one-shot algorithm for each rule application Latent dimensionality is determined during initialization using BNP Then run simple Gibbs sampler (no details provided . . . ) 7 / 15

  16. Initialization 8 / 15

  17. Scoring Candidate Structures Criterion used: predictive likelihood of held-out rows and columns ◮ Marginal likelihood not feasible ◮ MSE not selective enough Use a (stochastic) lower bound on predictive likelihood, computed using a variational approximation combined with annealed importance sampling (this is about as much detail as is in the paper . . . ) 9 / 15

  18. Search Over Structures Greedy search following grammar Start with G 1 Expand using all possible rules 2 Fit & score models 3 Keep top K models 4 Go to 2 5 Assumes that good simple models will lead to good more complex models when refined Assumption seems to be warranted: K = 3 yields the same results as K = 1 in experiments 10 / 15

  19. Results on Synthetic Data 11 / 15

  20. Results on Real Data 12 / 15

  21. Results on Real Data 13 / 15

  22. Results on Real Data 14 / 15

  23. Computing Predictive Likelihood 15 / 15

Recommend


More recommend