Meta-Learning with Shared Amortized Variational Inference Ekaterina - PowerPoint PPT Presentation

Meta-Learning with Shared Amortized Variational Inference Ekaterina Iakovleva Jakob Verbeek Karteek Alahari Inria Facebook Inria ICML | 2020 Thirty-seventh International Conference on Machine Learning

Standard classification task pipeline �� 2 ICML | 2020

Meta-learning classification task pipeline �� Meta test data �� Schmidhuber 1999, Ravi & Larochelle ICLR’17 3 ICML | 2020

Overview This work focuses on the empirical Bayes meta-learning approach. • We propose a novel scheme for amortized variational inference. • We demonstrate that earlier work based on Monte-Carlo approximation • underestimates model variance. We show the advantage of our approach on miniImageNet and FC100. • 4 ICML | 2020

Meta-learning classification task definition K - shot N - way classification task • Episodic training: each task t is sampled from a distribution over tasks 𝑞 𝛶 • ! , 𝑧 ",$ ! ) ",$%& ',( Support data 𝐸 ! = (𝑦 ",$ • *,( 𝐸 ! = Query data * ! ! (+ 𝑦 ),$ , + 𝑧 ),$ ) ),$%& • 5 ICML | 2020

Meta-learning approaches Distance-based classifiers • v Learned metric relies on the distance to individual samples or class prototypes. v E.g. Prototypical Networks [1], Matching Nets [2]. [1] – Snell et al. NeurIPS’17, [2] – Vinyals et al. NeurIPS’16 6 ICML | 2020

Meta-learning approaches Distance-based classifiers • v Learned metric relies on the distance to individual samples or class prototypes. v E.g. Prototypical Networks [1], Matching Nets [2]. Optimization-based approaches • v Vanilla SGD approach is replaced by a trainable update mechanism. v E.g. MAML [3], Meta LSTM [4]. [1] – Snell et al. NeurIPS’17, [2] – Vinyals et al. NeurIPS’16, [3] – Finn et al. ICML’17, [4] – Ravi & Larochelle ICLR’17 6 ICML | 2020

Meta-learning approaches Distance-based classifiers • v Learned metric relies on the distance to individual samples or class prototypes. v E.g. Prototypical Networks [1], Matching Nets [2]. Optimization-based approaches • v Vanilla SGD approach is replaced by a trainable update mechanism. v E.g. MAML [3], Meta LSTM [4]. Latent variable models • v The model parameters are treated as latent variables. v Their variance is explicitly modeled in a Bayesian framework. v E.g. Neural Processes [5], VERSA [6]. [1] – Snell et al. NeurIPS’17, [2] – Vinyals et al. NeurIPS’16, [3] – Finn et al. ICML’17, [4] – Ravi & Larochelle ICLR’17, [5] – Garnelo et al. ICML’18, [6] – Gordon et al. ICLR’19 6 ICML | 2020

Multi-task generative model The multi-task graphical model includes: task-agnostic parameters 𝜄 • task-specific latent parameters {𝑥 ! } !%& + • �� 7 ICML | 2020

Multi-task generative model The multi-task graphical model includes: task-agnostic parameters 𝜄 • task-specific latent parameters {𝑥 ! } !%& + • �� Marginal likelihood of the query labels 0 𝑍 = {0 𝑍 ! } !%& + given query � samples 0 𝑌 = { 0 𝑌 ! } !%& + and the support sets 𝐸 = {𝐸 ! } !%& + = (𝑌 ! , 𝑍 ! ) ! + �� + 𝑌, 𝑥 ! 𝑞 , 𝑥 ! 𝐸 ! , 𝜄 𝑒𝑥 ! 𝑞 0 𝑍| 0 5 𝑞 0 𝑍 0 𝑌, 𝐸, 𝜄 = 4 !%& Intractable integral requires approximation for training and prediction. 7 ICML | 2020

Monte Carlo approximation ! ~𝑞 , 𝑥 ! 𝐸 ! , 𝜄 : Monte Carlo approximation of the marginal log-likelihood using 𝑥 - • + * . 1 log 1 𝑍 ! 0 ! + ! , 𝑥 - log 𝑞 0 ! 𝑌 ! , 𝐸 ! , 𝜄 ≈ 𝑈𝑁 ? ? 𝑀 ? 𝑞 + 𝑧 ) 𝑦 ) . !%& )%& -%& This objective function has been used in VERSA [1]. • Our experiments show that this approach learns degenerate prior 𝑞 , 𝑥 ! 𝐸 ! , 𝜄 . • [1] – Gordon et al. ICLR’19 8 ICML | 2020

Amortized variational inference Variational evidence lower bound (ELBO) with the amortized approximate posterior [1] • parameterized by 𝜔 : 𝑍 ! 0 − 𝛾𝒠 '. 𝑟 0 𝑥 ! 0 𝑌 ! , 𝐸 ! , 𝜄 ||𝑞 , 𝑥 ! 𝐸 ! , 𝜄 log 𝑞 0 𝑍 ! | 0 𝑌 ! , 𝐸 ! , 𝜄 ≥ 𝔽 / ! log 𝑞 0 𝑍 ! , 0 𝑌 ! , 𝑥 ! [1] – Kingma & Welling ICLR’14 9 ICML | 2020

Amortized variational inference Variational evidence lower bound (ELBO) with the amortized approximate posterior [1] • parameterized by 𝜔 : 𝑍 ! 0 − 𝛾𝒠 '. 𝑟 0 𝑥 ! 0 𝑌 ! , 𝐸 ! , 𝜄 ||𝑞 , 𝑥 ! 𝐸 ! , 𝜄 log 𝑞 0 𝑍 ! | 0 𝑌 ! , 𝐸 ! , 𝜄 ≥ 𝔽 / ! log 𝑞 0 𝑍 ! , 0 𝑌 ! , 𝑥 ! Reconstruction loss [1] – Kingma & Welling ICLR’14 9 ICML | 2020

Amortized variational inference Variational evidence lower bound (ELBO) with the amortized approximate posterior [1] • parameterized by 𝜔 : 𝑍 ! 0 − 𝛾𝒠 '. 𝑟 0 𝑥 ! 0 𝑌 ! , 𝐸 ! , 𝜄 ||𝑞 , 𝑥 ! 𝐸 ! , 𝜄 log 𝑞 0 𝑍 ! | 0 𝑌 ! , 𝐸 ! , 𝜄 ≥ 𝔽 / ! log 𝑞 0 𝑍 ! , 0 𝑌 ! , 𝑥 ! Regularization [1] – Kingma & Welling ICLR’14 9 ICML | 2020

Amortized variational inference Variational evidence lower bound (ELBO) with the amortized approximate posterior [1] • parameterized by 𝜔 : 𝑍 ! 0 − 𝛾𝒠 '. 𝑟 0 𝑥 ! 0 𝑌 ! , 𝐸 ! , 𝜄 ||𝑞 , 𝑥 ! 𝐸 ! , 𝜄 log 𝑞 0 𝑍 ! | 0 𝑌 ! , 𝐸 ! , 𝜄 ≥ 𝔽 / ! log 𝑞 0 𝑍 ! , 0 𝑌 ! , 𝑥 ! We use regularization coefficient 𝛾 [2] to weight KL term. • [1] – Kingma & Welling ICLR’14, [2] – Higgins et al. ICLR’17 9 ICML | 2020

Amortized variational inference Variational evidence lower bound (ELBO) with the amortized approximate posterior [1] • parameterized by 𝜔 : 𝑍 ! 0 − 𝛾𝒠 '. 𝑟 0 𝑥 ! 0 𝑌 ! , 𝐸 ! , 𝜄 ||𝑞 , 𝑥 ! 𝐸 ! , 𝜄 log 𝑞 0 𝑍 ! | 0 𝑌 ! , 𝐸 ! , 𝜄 ≥ 𝔽 / ! log 𝑞 0 𝑍 ! , 0 𝑌 ! , 𝑥 ! We use regularization coefficient 𝛾 [2] to weight KL term. • Predictions are made via Monte Carlo sampling from the learned prior: • . ! , 𝐸 ! , 𝜄 ≈ 1 ! , ! + ! + ! , 𝑥 - ! ~𝑞 , 𝑥 ! 𝐸 ! , 𝜄 . where 𝑥 - 𝑞 + 𝑧 ) 𝑦 ) 𝑀 ? 𝑞 + 𝑧 ) 𝑦 ) -%& [1] – Kingma & Welling ICLR’14, [2] – Higgins et al. ICLR’17 9 ICML | 2020

Shared amortized variational inference: SAMOVAR Both prior and posterior are conditioned on labeled sets. • The inference network can be shared between prior and posterior. • 𝑍 ! 0 − 𝛾𝒠 '. 𝑟 0 𝑥 ! 0 𝑌 ! , 𝐸 ! , 𝜄 ||𝑞 , 𝑥 ! 𝐸 ! , 𝜄 log 𝑞 0 𝑍 ! | 0 𝑌 ! , 𝐸 ! , 𝜄 ≥ 𝔽 / " log 𝑞 0 𝑍 ! , 0 𝑌 ! , 𝑥 ! 10 ICML | 2020

Shared amortized variational inference: SAMOVAR Both prior and posterior are conditioned on labeled sets. • The inference network can be shared between prior and posterior. • 𝑍 ! 0 − 𝛾𝒠 '. 𝑟 , 𝑥 ! 0 𝑌 ! , 𝐸 ! , 𝜄 ||𝑞 , 𝑥 ! 𝐸 ! , 𝜄 log 𝑞 0 𝑍 ! | 0 𝑌 ! , 𝐸 ! , 𝜄 ≥ 𝔽 / " log 𝑞 0 𝑍 ! , 0 𝑌 ! , 𝑥 ! 10 ICML | 2020

Shared amortized variational inference: SAMOVAR Both prior and posterior are conditioned on labeled sets. • The inference network can be shared between prior and posterior. • 𝑍 ! 0 − 𝛾𝒠 '. 𝑟 , 𝑥 ! 0 𝑌 ! , 𝐸 ! , 𝜄 ||𝑞 , 𝑥 ! 𝐸 ! , 𝜄 log 𝑞 0 𝑍 ! | 0 𝑌 ! , 𝐸 ! , 𝜄 ≥ 𝔽 / " log 𝑞 0 𝑍 ! , 0 𝑌 ! , 𝑥 ! Sharing reduces memory footprint, and encourages learning non-degenerate prior. • 10 ICML | 2020

Meta-Learning with Shared Amortized Variational Inference Ekaterina - PowerPoint PPT Presentation

Meta-Learning with Shared Amortized Variational Inference Ekaterina Iakovleva Jakob Verbeek Karteek Alahari Inria Facebook Inria ICML | 2020 Thirty-seventh International Conference on Machine Learning Standard classification task pipeline

Amortized Analysis Chapter 17 1 CPTR 430 Algorithms Amortized Analysis Amortized Analysis

Amortized Analysis and Union-Find 02283, Inge Li Grtz 1 Today Amortized analysis 3

Amortized Analysis Pedro Ribeiro DCC/FCUP 2018/2019 Pedro Ribeiro (DCC/FCUP) Amortized

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Amortized Analysis Fibonacci Heaps thanks MIT slides thanks Amortized Analysis by Rebecca

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. Probabilistic latent variable

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

CS480/680 Machine Learning Lecture 11: February 11 th , 2020 Variational Inference Zahra

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Meta-Amoruized Variational Inference and Learning Kristy Choi CS236: December 4th, 2019

Variational Laplace Autoencoders Yookoon Park, Chris Dongjoo Kim and Gunhee Kim Vision and

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Semi-Amortized Variational Autoencoders Yoon Kim Sam Wiseman Andrew Miller David Sontag

Today Amortized analysis Dynamic tables Amortized Analysis and Splay Trees Splay

Latent Class Models for Algorithm Portfolio Methods Bryan Silverthorn and Risto Miikkulainen

Examples and Implementations [Bayesian approach to Latent Class Models: Definition, Simulation,

Food Price Heterogeneity and Income Inequality in Malawi: Is Inequality Underestimated? Richard

Releasing Search Queries and Clicks Privately Arne Bayer July 24, 2017 Arne Bayer Releasing

Invariant-equivariant representation learning for multi-class data Ilya Feige Faculty

(An example of) The Expectation-Maximization (EM) Algorithm Instructor: Sham Kakade 1 An

How to measure material deprivation? A Latent Markov Model based approach Francesco Dotto 1 Joint

Strictly Completing Partial Latin Squares Jaromy Kuhl Department of Mathematics and Statistics