Learning Task-Agnostic Embedding of Multiple Black-Box Experts for - PowerPoint PPT Presentation

Learning Task-Agnostic Embedding of Multiple Black-Box Experts for Multi-Task Model Fusion Nghia Hoang (MIT-IBM Watson AI Lab), Thanh Lam (National University of Singapore), Bryan Kian Hsiang Low (National University of Singapore), Patrick Jaillet (MIT)

Roadmap ❑ Multi-Task Collective Learning ❑ Related Literature ❑ Model Decomposition via Task-Agnostic Embedding ❑ Model Fusion via PAC-Bayes Adaptation ❑ Empirical Results

Collective Learning: Sharing Information improves Performance Hos Hospital 2 Hos Hospital 1 Hos Hospital 3

Issue: Raw information (data) is private & cannot be shared Federated Learning (McMahan, 2016) addresses this when models are homogeneous Hos Hospital 2 Hos Hospital 1 Hos Hospital 3

Issue: What if models are parameterized differently ? Black-box setting happens when: (a) Models have different parameterization / solve different tasks Heterogeneous Models: (b) Models parameterization cannot be released 1. Deep Neural Network (DNN) Our Focus 2. Gaussian Process (GP) Why? (a) – to fit different on-board 3. Decision Tree (DT) computation capabilities / different (related) tasks 4. Human Cognitive Reasoning etc (b) – to avoid adversarial attack (Ian Goodfellow, 2014)

Idea: Model Fusion using Task-Agnostic Model Embedding Model Fusion: Synthesizing New Model from Observing How Related Models Make Predictions (Without Accessing Local Data) – existing literature will be discussed next!

Model Agnostic Meta Learning (Finn et al., 2017) Idea: sample tasks & learn a base model which Can be adapted to solve any task with little data Model 1: “cat - bird” Model 2: “flower - bike” Caveat: Existing meta learning algorithm assumes data can be centralized for learning

Model Fusion (Hoang et al., 2019) Model Fusion (recap.): Synthesizing New Model from Observing How Related Models Make Predictions (Without Accessing Local Data) – existing literature will be discussed next! A new study that emerged from Federated Learning that allows a certain degree of model agnosticity: Collective Online Learning of Gaussian Processes for Massive Multi-Agent Systems (AAAI-19) (Hoang, Hoang, Low & How) – combine different sparse approximations of Gaussian processes Collective Model Fusion for Multiple Black-Box Experts (ICML-19) (Hoang, Hoang, Low & Kingsford) – assemble different black-box models into a product of expert (PoE) model Bayesian Non-parametric Federated Learning of Neural Networks (ICML-19) (Yurochkin, Agrawal, Ghosh, Greenewald, Hoang & Khazaeni) – combine neural networks with different no. of hidden units Statistical Model Aggregation via Parameter Matching (NeurIPS-19) (Yurochkin, Agrawal, Ghosh, Greenewald & Hoang) – generalize the above to a wider class of model (including GP & DNN) Learning Task-Agnostic Embedding of Multiple Black-Box Experts for Multi-Task Model Fusion (ICML-20) (Hoang, Lam, Low & Jaillet) TODAY’s FOCUS: A new perspective of model fusion for multi-task setting

Task-Agnostic Embedding Model Task-Agnostic Embedding Model black-box models task descriptor task-dependent task-agnostic latent variable latent variable unlabelled data label input … prototype

Learning Task-Agnostic Embedding (without labeled data) Generative Network Parameterization: Task-Agnostic Embedding Model task descriptor learnable parameters Latent prior: encode domain knowledge ☺ task-dependent task-agnostic latent variable latent variable label input ❑ Example: MNIST ❑ [1, 1, 0, 0, 0, 0, 0, 0, 0, 1] – 0/1/9 classifier ❑ w – strokes weights, orientations, … ❑ z – numeric value prototype

Learning Task-Agnostic Embedding (without labeled data) Generative Network Parameterization: learnable parameters Latent prior: encode domain knowledge ☺ Inference Network Parameterization: learnable parameters Inference Network Generative Network Parameters can be learned end-to-end via optimizing the model evidence’s lower -bound (Kingma et al., 2014) ☺

Task-Agnostic Embedding Model: From Model to Prototype ☺ prototypes Task-Agnostic Embedding … prototypes … What does a prototype look like? unlabelled data visualization later ☺ …

How To Combine Prototype For A New Task? new task prototypes few-shot dataset Task-Agnostic Embedding … unaware of the unlabelled data few-shot data …

Multi-Task Model Fusion via Deep Generative new task Embedding + PAC-Bayes Adaptation prototypes few-shot dataset Task-Agnostic Embedding PAC-Bayes … unlabelled data Adaptation ! …

Model Fusion via PAC-Bayes Adaptation ❑ Goal: Optimize the prototype distribution for the new task ❑ Leverage on few-shot data ❑ minimize empirical loss on few-shot data – may overfit  ❑ Add regularization term ☺ prior learnt from posterior after ❑ Minimize PAC-Bayes Bound for Adaptation: embedding adaptation Empirical risk on the few-shot data Complexity term

Empirical Results observations? ❑ Task-Agnostic Decomposition ❑ Separate Task-Dependent and Task-Agnostic Information? ❑ Results: Fix Fix z: • Same digit • Different styles Fix an arbitrary value of z Plot the x generated from 𝐪 𝜾 (𝐲|𝐱, 𝐴) over the w -space

Empirical Results observations? ❑ Prototype Visualization ❑ Prototypes are task-agnostic and will be activated differently depending on each input ❑ Results: Fix an arbitrary value of w Plot the x generated from 𝐪 𝜾 (𝐲|𝐱, 𝐴) over the z -space

Empirical Results observations? ❑ Multi-Task Model Fusion ❑ Qualitative results on standard meta-learning benchmarks ❑ Comparison baseline: Modified-MAML: ❑ Data for different tasks are private ❑ Original MAML requires data centralization ❑ Modified-MAML only samples classes within the same task! ❑ Other baselines: Ad-hoc Aggregation Methods (via + & max) & FS ❑ Dataset: MNIST, nMNIST & miniImageNet

Empirical Results – MNIST & nMNIST (2-way) & Mini-Imagenet (5-way) ❑ Multi-Task Model Fusion ❑ Qualitative results on standard meta-learning benchmarks (1-shot) ❑ Results number of black-boxes dataset name S: test classes were seen U: test classes not seen by any black-boxes

Thank You Take-Home Messages ☺ ❑ A Model Fusion Perspective for Meta Learning in Private Data Setting (a.k.a. where model fusion meets meta learning ☺ )

Learning Task-Agnostic Embedding of Multiple Black-Box Experts for - PowerPoint PPT Presentation

Learning Task-Agnostic Embedding of Multiple Black-Box Experts for Multi-Task Model Fusion Nghia Hoang (MIT-IBM Watson AI Lab), Thanh Lam (National University of Singapore), Bryan Kian Hsiang Low (National University of Singapore), Patrick Jaillet

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Paradoxes in Probability How probability continues to amuse me! Let's play a game! Box A Box B

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning Yilun Du 1 , Karthik Narasimhan 2

Interpreting Embedding Models of Knowledge Bases: Model Agnostic Approaches 2018 ICML Workshop on

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Black Box Scanning Tool + White Box Testing Tool Toshis Black Box Scanning Tool Same

A recipe for black box functors Maru Sarazola and Brendan Fong What is a black box functor? In

LANGUAGE-AGNOSTIC INJECTION LANGUAGE-AGNOSTIC INJECTION DETECTION DETECTION Lars Hermerschmidt,

MANA for MPI MPI-Agnostic Network-Agnostic Transparent Checkpointing Rohan Garg, *Gregory Price,

Pool-based Agnostic Pool-based Agnostic Experiment Design Experiment Design in Linear

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Kid s Box American English Level 1 Presentation Plus: Kid s Box American English Kid s Box

Flux Box Flux Box A concept by Flux Laboratory Flux box : concept Flux box : concept What is Flux

[7] Gaussian Elimination Starting to peek inside the black box So far sol ve( A, b) is a black

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Local Public Health October 30, 2020 Speaker Introduction Dr. Sujan Reddy Medical Officer for

Why Boxes for Multi-D Our Result Proof Uncertainty? Home Page Christian Servin 1 , Erick Duarte

Eduroam in a box Eduroam in a box (take 3) (take 3) Rok Pape, ARNES, Barcelona, 06.09.2005

6/18/2018 What If Everything Was The Way Its Supposed To Be? We can all envision a world in

Welcome Day @ # WelcomeVikings2020 WELCOME TO WELCOME Remarks: Susan Lamb, DVC President

Virtual File System (VFS) Nima Honarmand Spring 2017 :: CSE 506 History Early OSes provided

How Private are Home Directories? Carlos Maltzahn UC Santa Cruz 27 February 2008 Problem

Distributed Systems Distributed Shared Memory Paul Krzyzanowski pxk@cs.rutgers.edu Except as