Meta-Learning of Structured Representation by Proximal Mapping Mao - PowerPoint PPT Presentation

Meta-Learning of Structured Representation by Proximal Mapping Mao Li, Yingyi Ma, Xinhua Zhang University of Illinois at Chicago

Motivation Goal of meta-learning: Extract prior structures from a set of tasks that allows efficient learning of new tasks . Examples of structural regularities: • Instance level - Input layers: transformation beyond group-based diffeomorphism - Within layers: sparsity, disentanglement, spatial invariance, structured gradient accounting for data covariance, manifold smoothness - Between layers: equvariance, contractivity, robustness under dropout and adversarial perturbations of preceding nodes • Batch/Dataset level - multi-view, multi-modality, multi-domain - diversity, fairness, privacy, causal structure

Existing Approaches • Data Augmentation training data original data augmented data √ boost prediction performance × unclear the improvement is due to the learned representation or due to a better classifier.

Existing Approaches • Auto-encoder downstream encoder decoder label Tasks Input Reconstruction latent representation √ learned the most salient features × usually used as an initialization for subsequent supervised task × not amendable to end-to-end learning Our goal : learn representations that explicitly encode structural priors in an end-to-end fashion.

Existing Approaches • Regularization √ simple and efficient × contention of weights between regularizer and supervised performance

Proposed Method Morph a representation z towards a structured one by proximal mapping: promote desired structure z: mini-batch or single-example a mini-batch a task in meta-learning proximal mapping task-specific base learner Embed the proximal mapping as a layer into deep networks Advantages + decoupling the regularization and supervised learning + extend meta-learning to unsupervised base learners

Proposed Method Morph a representation z towards a structured one by proximal mapping: promote desired structure Before After L : graph-Laplacian (for smoothness on manifold)

MetaProx for Multi-view Learning In multiview learning, observations are available as pairs of views: {x i , y i }. feature extractor f view x supervised predictor h proximal layer view x label features f view x view x view y label features g view y view y proximal map view y supervised predictor h feature extractor g Figure 1: training framework of MetaProx

MetaProx for Multi-view Learning feature extractor f view x supervised predictor h ① proximal layer view x label features f view x view x view y label features g view y view y proximal map view y feature extractor g supervised predictor h ① feature extraction:

MetaProx for Multi-view Learning feature extractor f view x supervised predictor h ② proximal layer view x label features f view x view x view y label features g view y view y proximal map view y feature extractor g supervised predictor h ② proximal mapping: promote high correlation between two views

MetaProx for Multi-view Learning feature extractor f supervised predictor h view x ③ proximal layer view x label features f view x view x view y label features g view y view y proximal map view y supervised predictor h feature extractor g ③ supervised task h : supervised predictor

MetaProx for Multi-view Learning feature extractor f supervised predictor h view x ③ proximal layer view x label features f view x view x view y label features g view y view y proximal map view y supervised predictor h feature extractor g ③ supervised task optimize over red variables

Experiment Results Multi-view image classification - Dataset : a subset of Sketchy (20 classes) { ; … …; } ( , ), ’butterfly’ ( , ), ’cat’ Test accuracy for image classification

Experiment Results (English, German) Crosslingual word embedding word 1 - Dataset : WS353, SimLex999 word 2 . . - Metric : Spearman’ . . s correlation . . word n between the rankings by model and human Table 1: Spearman’ s correlation for word similarities

At the poster: More details and discussions Thanks! “Efficient Meta Learning via Minibatch Proximal Update” (NeurIPS 2019) MetaProx ≠ “Meta-Learning with Implicit Gradients” (NeurIPS 2019) modeling optimization

Meta-Learning of Structured Representation by Proximal Mapping Mao - PowerPoint PPT Presentation

Meta-Learning of Structured Representation by Proximal Mapping Mao Li, Yingyi Ma, Xinhua Zhang University of Illinois at Chicago Motivation Goal of meta-learning: Extract prior structures from a set of tasks that allows efficient learning of

Efficient Meta Learning via Minibatch Proximal Update Pan Zhou Joint work with Xiao-Tong Yuan,

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Convergence of perturbed Proximal Gradient algorithms Gersende Fort Institut de Math ematiques

Asymmetric Proximal Point Algorithms with Moving Proximal Centers Deren Han

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,

Inertial Block Proximal Methods for Non-Convex Non-Smooth Optimization L. T. K. Hien 1 N. Gillis 1

Some applications of proximal methods Caroline CHAUX Joint work with P. L. Combettes, L. Duval,

Proximal Policy Optimization Ruifan Yu (ruifan.yu@uwaterloo.ca) CS 885 June 20 Pro roximal l

CS 285 Instructor: Sergey Levine UC Berkeley Recap: policy gradients fit a model to estimate

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey

iPiano: Inertial Proximal Algorithm for Non-Convex Optimization David Stutz June 2, 2016 David

Efficient Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets Moreau

Proximal Identification and Applications J er ome MALICK CNRS, Lab. J. Kuntzmann, Grenoble

Meta-Learning of Structured Representation by Proximal Mapping Mao - PowerPoint PPT Presentation

Meta-Learning of Structured Representation by Proximal Mapping Mao Li, Yingyi Ma, Xinhua Zhang University of Illinois at Chicago Motivation Goal of meta-learning: Extract prior structures from a set of tasks that allows efficient learning of

Efficient Meta Learning via Minibatch Proximal Update Pan Zhou Joint work with Xiao-Tong Yuan,

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Convergence of perturbed Proximal Gradient algorithms Gersende Fort Institut de Math ematiques

Asymmetric Proximal Point Algorithms with Moving Proximal Centers Deren Han

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

The Meta-Learning Problem &amp; Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers &amp; References META-LRA Kft. 8400 Ajka,

Inertial Block Proximal Methods for Non-Convex Non-Smooth Optimization L. T. K. Hien 1 N. Gillis 1

Some applications of proximal methods Caroline CHAUX Joint work with P. L. Combettes, L. Duval,

Proximal Policy Optimization Ruifan Yu (ruifan.yu@uwaterloo.ca) CS 885 June 20 Pro roximal l

CS 285 Instructor: Sergey Levine UC Berkeley Recap: policy gradients fit a model to estimate

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey

iPiano: Inertial Proximal Algorithm for Non-Convex Optimization David Stutz June 2, 2016 David

Efficient Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets Moreau

Proximal Identification and Applications J er ome MALICK CNRS, Lab. J. Kuntzmann, Grenoble

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,