compositional methods for learning and inference in deep
play

Compositional Methods for Learning and Inference in Deep - PowerPoint PPT Presentation

Compositional Methods for Learning and Inference in Deep Probabilistic Programs Jan-Willem van de Meent Eli Sennesh Sam Stites Hao Wu Heiko Zimmermann Deep Learning Success Stories Computer Vision Natural Language Reinforcement


  1. Compositional Methods for Learning and 
 Inference in Deep Probabilistic Programs Jan-Willem van de Meent Eli Sennesh Sam Stites Hao Wu Heiko Zimmermann

  2. Deep Learning Success Stories Computer Vision Natural Language Reinforcement Learning 14M images (ImageNet) Very large corpora of text 
 4.9M games (Self-play) Annotations available (can self-supervise) Clear definition of success Ingredients for success 1. Abundance of (labeled) data and compute 2. A well-defined general notion of utility

  3. Do we still need models? The Bitter Lesson Do we still need models or 
 just more data and compute? Rich Sutton, March 13, 2019 Max Welling, April 20, 2019 The biggest lesson that can be read from 70 years of AI research is that When you need to generalize to 
 general methods that leverage new domains, i.e. extrapolate away 
 computation are ultimately the most from the data, you will need a 
 effective, and by a large margin. generative model http://www.incompleteideas.net/ https://staff.fnwi.uva.nl/m.welling/ IncIdeas/BitterLesson.html wp-content/uploads/Model-versus- Data-AI-1.pdf

  4. Do we still need models ?

  5. When are models useful? Science & Engineering Autonomous Vehicles Recommendation High quality models Generalization to Large collection of and/or limited data long tail events small-data problems We need inductive biases that 1. Improve generalization 2. Safe-guard against overconfident predictions

  6. Deep Probabilistic Models Deep Learning Probabilistic Programming High-capacity models Programs as inductive biases • • Scalable to large datasets Structured, interpretable • • Easy to try new models Also easy to try new models • • SGD + AutoDiff 
 Monte Carlo Methods (very general) (more model specific) Stochastic Variational Inference (learn proposals using neural networks)

  7. <latexit sha1_base64="+UXAQ6ZeCoF1fnHJT3XwyVfaI=">AGCnicfZTNbtQwEIDd0oWy/LVw5LJiL620qpJStRXiUIEqOJaqf1KzWjnO7K5VJzG20z/Lb8Aj8BTcEJwQN3gB3gY7G4lNHOFIyWjmz974pgzKlUQ/FlYvLPUuXtv+X73wcNHj5+srD49kXkhCByTnOXiLMYSGM3gWFHF4IwLwGnM4DS+eOvsp5cgJM2zI3XDYZjiSUbHlGBlVaOVbT6K1BQUXosu9bXpRSlNela8MQP3uTXrvehVj6+VqvWZYJWjlX6wEZSr5wthJfT30GwdjFaXvkdJToUMkUYlvI8DLgaiwUJQxMNyokcEwu8ATOrZjhFORQlw2aXs16FA71OM8UZKTmpnEqU6ymntLBsq4lU5sYRD1tpRxqFyUBSdZ3StOTbcbJTC2m1WpOYFWD04bs3RgeD7ZeDcHPHNBABSUWEu8HAPk1gIgCyCtndGoTbuz7DC8EZ/IMCh7lqBGRwRfI0xVmio0sg5tzuTwSZLAS4RnQUp7ofGmM8eIZan9LejeaN10brqFZgOSBN7GYOc42Wc9KEbti3XrYxzZsNp0t1at2GhctbOGxhQ8JDxLNCqE1J3BJWZ5/Yzn6HJOxn5SNsdUZ+xCMvsHJ9iLyKftOJ9Sjz1snMyhceMyT2AxSbE95yjnILDKhfvprqiaMpSJXVlN74Xzf7vZe3NZPumPpTuHcd63gkiVk5mPW98yeUiKTOuS5bsImoY7ODawF5A6w2JH2vgubt5svnGxuhFb+sNXfe13dfMvoOXqB1lCIdtAeo8O0DEi6DP6gX6h351PnS+dr51vM3RxofJ5hmqr8/Mv+kc3QA=</latexit> <latexit sha1_base64="+UXAQ6ZeCoF1fnHJT3XwyVfaI=">AGCnicfZTNbtQwEIDd0oWy/LVw5LJiL620qpJStRXiUIEqOJaqf1KzWjnO7K5VJzG20z/Lb8Aj8BTcEJwQN3gB3gY7G4lNHOFIyWjmz974pgzKlUQ/FlYvLPUuXtv+X73wcNHj5+srD49kXkhCByTnOXiLMYSGM3gWFHF4IwLwGnM4DS+eOvsp5cgJM2zI3XDYZjiSUbHlGBlVaOVbT6K1BQUXosu9bXpRSlNela8MQP3uTXrvehVj6+VqvWZYJWjlX6wEZSr5wthJfT30GwdjFaXvkdJToUMkUYlvI8DLgaiwUJQxMNyokcEwu8ATOrZjhFORQlw2aXs16FA71OM8UZKTmpnEqU6ymntLBsq4lU5sYRD1tpRxqFyUBSdZ3StOTbcbJTC2m1WpOYFWD04bs3RgeD7ZeDcHPHNBABSUWEu8HAPk1gIgCyCtndGoTbuz7DC8EZ/IMCh7lqBGRwRfI0xVmio0sg5tzuTwSZLAS4RnQUp7ofGmM8eIZan9LejeaN10brqFZgOSBN7GYOc42Wc9KEbti3XrYxzZsNp0t1at2GhctbOGxhQ8JDxLNCqE1J3BJWZ5/Yzn6HJOxn5SNsdUZ+xCMvsHJ9iLyKftOJ9Sjz1snMyhceMyT2AxSbE95yjnILDKhfvprqiaMpSJXVlN74Xzf7vZe3NZPumPpTuHcd63gkiVk5mPW98yeUiKTOuS5bsImoY7ODawF5A6w2JH2vgubt5svnGxuhFb+sNXfe13dfMvoOXqB1lCIdtAeo8O0DEi6DP6gX6h351PnS+dr51vM3RxofJ5hmqr8/Mv+kc3QA=</latexit> <latexit sha1_base64="+UXAQ6ZeCoF1fnHJT3XwyVfaI=">AGCnicfZTNbtQwEIDd0oWy/LVw5LJiL620qpJStRXiUIEqOJaqf1KzWjnO7K5VJzG20z/Lb8Aj8BTcEJwQN3gB3gY7G4lNHOFIyWjmz974pgzKlUQ/FlYvLPUuXtv+X73wcNHj5+srD49kXkhCByTnOXiLMYSGM3gWFHF4IwLwGnM4DS+eOvsp5cgJM2zI3XDYZjiSUbHlGBlVaOVbT6K1BQUXosu9bXpRSlNela8MQP3uTXrvehVj6+VqvWZYJWjlX6wEZSr5wthJfT30GwdjFaXvkdJToUMkUYlvI8DLgaiwUJQxMNyokcEwu8ATOrZjhFORQlw2aXs16FA71OM8UZKTmpnEqU6ymntLBsq4lU5sYRD1tpRxqFyUBSdZ3StOTbcbJTC2m1WpOYFWD04bs3RgeD7ZeDcHPHNBABSUWEu8HAPk1gIgCyCtndGoTbuz7DC8EZ/IMCh7lqBGRwRfI0xVmio0sg5tzuTwSZLAS4RnQUp7ofGmM8eIZan9LejeaN10brqFZgOSBN7GYOc42Wc9KEbti3XrYxzZsNp0t1at2GhctbOGxhQ8JDxLNCqE1J3BJWZ5/Yzn6HJOxn5SNsdUZ+xCMvsHJ9iLyKftOJ9Sjz1snMyhceMyT2AxSbE95yjnILDKhfvprqiaMpSJXVlN74Xzf7vZe3NZPumPpTuHcd63gkiVk5mPW98yeUiKTOuS5bsImoY7ODawF5A6w2JH2vgubt5svnGxuhFb+sNXfe13dfMvoOXqB1lCIdtAeo8O0DEi6DP6gX6h351PnS+dr51vM3RxofJ5hmqr8/Mv+kc3QA=</latexit> <latexit sha1_base64="AnF+U5gmvKZR/Rp3+LdcpQrED24=">AGCnicfZTNbtQwEIDd0oWy/LVw5LJiL620qpJStRXiUIEqOJaqf1KzWjnO7K5VJzG20z/Lb8Aj8BTcEJwQN3gB3gY7G4lNHOFIyWjmz974pgzKlUQ/FlYvLPUuXtv+X73wcNHj5+srD49kXkhCByTnOXiLMYSGM3gWFHF4IwLwGnM4DS+eOvsp5cgJM2zI3XDYZjiSUbHlGBlVaOVbT6K1BQUXosu9bXpRSlNela8MQP3uTXrvehVj6+VqvWZYJWjlX6wEZSr5wthJfRtQ5Gq0vfoyQnRQqZIgxLeR4GXA01FoSBqYbFRI4Jhd4AudWzHAKcqjLBk2vZj0Kh3qcZwoyUnPTOJUpVlNP6WBZ15KpTQyinrZSDrWLkoCk6zuFaem240SGNvNLivTScwKMPrw3Rujg8H2y0G4uWMaiICkIsLdYGCfJjARAFmF7G4Nwu1dn+GF4Az+QYHDXDUCMrgieZriLNHRJRBzbvcngkwWAlwjOopT3Q+NMR48Q61Pae9G8Zro3VUK7AckCZ2M4e5Rs5aUK3bFuPexjGzabzpbqVTuNixa28NjCh4QHiWaF0JoTuKQsz7x+xnN0OSdjPymbY6ozdiGZ/YMT7EXk03acT6nHjZO5tC4cZknsJik2J5zlHMQWOXC/XRXVE0ZTamSurIb34tm/ey9mayfVMfSveOY71vPJLErBzM+t75E0pEUudcly3YRNSx2cG1gLwBVhvsSHvfhc3bzRdONjdCK3/Y6u+9rm6+ZfQcvUBrKEQ7aA+9RwfoGBH0Gf1Av9DvzqfOl87XzrcZurhQ+TxDtdX5+ReoeDcA</latexit> <latexit sha1_base64="kwov9h8dYG1x2QhFDTYq6WBf2l8=">AF/3icfZRLb9QwEIBd6EJZXi0cuazYSxGrKmlLWyEOFaiCY6n6kupV5Tizu1YdJ7Wdviwf+An8Cm6IG0LiAD+Df0OcjcQmjvBKm5Hnm5dn7CjTOkg+DN36/Z8587dhXvd+w8ePnq8uPTkUKW5pHBAU57K4go4EzAgWaw3EmgSQRh6Po7J3TH12AVCwV+/o6g2FCxoKNGCW62DpdfHl+irMJW8YX5toOesXnxvZwmInXtkXPfy6d748lU8X+8FKUK6eL4SV0N9G07V7ujT/E8cpzRMQmnKi1EkYZHpoiNSMcrBdnCvICD0jYzgpRESUENTVmV7Ne1+ODSjVGgQtGZmSKISoifepoNVfZdOisAg62GrzaFxXmJQbCzqVlFiu10cw6g4TIzE0c8B2v23r+1JhsrA3C1U3bQCTEFRFuBYPi1wTGEkBUyNb6INzY8pkslxmHf1DgMJeNBAGXNE0SImKDL4Dak+J8MAiVS3CFGBwlph9az14ihY2pb6LZ5VX1hcS7DsfBO7nsFcoeXwNKGbNl83HnbehmE9AU1astftNMlb2Nxjcx+SHiSbGUJrTMgU46nw6hnN0OWcjPygfIapeuxc8uLaxsTzmE3a8eLaeuxeozN71o3LEHkOCFn3GagSQ6le7SXTI94SxhWplKb30rJv5vVeibwXZsfSjdfxSZHeuRNOLlYNbPzp9QKuM656pswcayjk0b1wJmDbA6YEcW713YfN184XB1JVxbefVxvb/9pnr5FtAz9BwtoxBtom30Ae2iA0TRZ/Qd/UK/O586XzpfO9+m6K25yuYpq3Oj7/b9jLg</latexit> Structured Variational Autoencoders Generative Model (Decoder) Inference Model (Encoder) Digit y q φ ( y , z | x ) q ( x ) p θ ( x | y , z ) p ( y ) p ( z ) Style z y y x z x z Goal: learn “disentangled” Assume independence 
 Infer y from pixels x , 
 representation for y and z between digit y and style z and z from y and x [Kingma, Mohamed, Jimenez-Rezende, Welling, NIPS 2014]

  8. Deep Probabilistic Programs Generative Model (Decoder) Inference Model (Encoder) class Decoder (torch.nn.Module): class Encoder (torch.nn.Module): def __init__(self, x_sz, h_sz, y_sz, z_sz): def __init__(self, x_sz, h_sz, y_sz, z_sz): # intializes layers: h, x_mean, ... # intializes layers: h, y_log_weights, ... ... ... def forward(self, x, q): def forward(self, x, y_values=None): p = probtorch.Trace() q = probtorch.Trace() y = p.concrete(self.y_log_weights, 0.66, h = self.h(x) value=q[ ' y ' ], name= ' y ' ) y = q.concrete( z = p.normal(0.0, 1.0, self.y_log_weights(h), 0.66, value=q[ ' z ' ], name= ' z ' ) value=y_values, name= ' y ' ) h = self.h(torch.cat([y, z], -1)) hy = torch.cat([h, y], -1) x = p.loss(self.bce, z = q.normal(self.z_mean(hy), self.x_mean(h), x, self.z_std(hy), name= ' x ' ) name= ' z ' ) return p return q Edward Probabilistic Torch Pyro https://github.com/bleilab/edward https://github.com/probtorch/probtorch https://github.com/uber/pyro

  9. Learned Representations (Unsupervised) Style Variables Generalization Slant Width Height Style 1 Style 2 Thickness Style 3 Inductive Bias: Style features are uncorrelated with digit label, as well as with other features. [Esmaeli, Wu, Jain, Bozkurt, Siddharth, Paige, Brooks, Dy, van de Meent, AISTATS 2019]

  10. Model Composition Recurrent Recognition Loop Decomposition Reconstructions model architecture, the encoder uses a recurrent recurrent network to repeat- Idea: Embed model for individual MNIST digits in a recurrent model for multiple object detection [Siddharth*, Paige*, van de Meent*, Desmaison, Wood, Goodman, Kohli, Torr, NIPS 2017] based on based frame ork

  11. Example: Modeling Aspects in Reviews Item Encoder Sentence Encoder Sentence Decoder c * ψ i,u,s : A ⨉ K ρ i : A ⨉ K ω i,u,s : A x i : V h i : H x i,u,s : V h i,u,s : H z i,u,s : A x i,u,s : V User Encoder ⨉ Element-wise Product c ⨉ * Broadcast Product c Concrete Dist ψ i,u : A ⨉ K ρ u : A ⨉ K ρ i,u : A ⨉ K x u : V h u : H Learn aspect-based representations of 
 users, items, and reviews ( fully unsupervised ) [Esmaeli, Huang, Wallace, van de Meent, AISTATS 2019 ]

  12. Example: Modeling Aspects in Reviews Data: Beer reviews Amber brown in color with very little head but a nice ring. Nicely carbonated. Smells like a camp fire, malts have a good sweet character with an abundance of smoke. Taste is quite good with smokiness being pungent but not overwelming. A sweet tasting bock with smokiness coming through around mid drink with a smooth mellow finish. A good warming smoky beer. Aspects: Look, Mouthfeel, Aroma, Taste, Overall [Esmaeli, Huang, Wallace, van de Meent, AISTATS 2019 ]

Recommend


More recommend