+ Unifying Perspectives on Knowledge Sharing: From Atomic to - PowerPoint PPT Presentation

+ Unifying Perspectives on Knowledge Sharing: From Atomic to Parameterised Domains and Tasks Task-CV @ ECCV 2016 Timothy Hospedales University of Edinburgh & Queen Mary University of London With Yongxin Yang Queen Mary University of London

+ Today’s Topics n Distributed definitions of task/domains, and different problem settings that arise. n A flexible approach to task/domain transfer n Generalizes existing approaches n Generalizes multiple problem settings n Covers shallow and deep models

+ Why Transfer Learning? Data Data Model 1 1 1 Lifelong Data Data Learning Model 2 2 2 Model Data Data Model 3 3 3 IID Tasks or Domains But…. Humans seem to generalize across tasks E.g., Crawl => Walk => Run => Scooter => Bike => Motorbike => Driving.

+ Taxonomy of Research Issues Sharing Setting Labeling assumption n Sequential / One-way n Supervised n Multi-task n Unsupervised n Life-long learning Transfer Across: Sharing Approach n Task Transfer n Model-based n Domain Transfer n Instance-based n Feature-based Feature/Label Space n Homogeneous Balancing Challenge n Heterogeneous n Positive Transfer Strength n Negative Transfer Robustness

+ Overview n A review of some classic methods n A general framework n Example problems and settings n Going deeper n Open questions

+ Some Classic Methods – 1 Model Adaptation An example of simple sequential transfer: ∑ T x i + λ w s T w s min y i − w s n Learn a source task: y = f s ( x , w s ) w s i T ( w − w s ) n Learn a target new task: ∑ y i − w T x i + λ ( w − w s ) y = f t ( x , w ) min w i n Regularize new task toward old task n (…rather than toward zero) w 1 w 1 w 2 w 2 Source Target E.g., Yang, ACM MM, 2007

+ Some Classic Methods – 1 Model Adaptation An example of simple sequential transfer: T ( w − w s ) n Learn a target new task: ∑ y i − w T x i + λ ( w − w s ) y = f t ( x , w ) min w i n Limitations: ✘ Assumes relatedness of source task ✘ Only sequential, one-way transfer E.g., Yang, ACM MM, 2007

+ Some Classic Methods – 2 Regularized Multi-Task An example of simple multi-task transfer: n Learn a set of tasks: { y = f t ( x , w t ) } { } x i , t , y i , t T ( w t − w 0 ) ∑ T x i , t + λ ( w t − w 0 ) min y i , t − w t w 0 , w t i , t t = 1.. T n Regularize each task towards mean of all tasks: w 1 E.g., Evgeniou & Pontil, KDD’04 E.g., Salakhutdinov, CVPR’11 w 2 E.g., Khosla, ECCV’12

+ Some Classic Methods – 2 Regularized Multi-Task An example of simple multi-task transfer: n Learn a set of tasks: { y = f t ( x , w t ) } { } x i , t , y i , t T ( w t − w 0 ) ∑ T x i , t + λ ( w t − w 0 ) min y i , t − w t w 0 , w t i , t t = 1.. T y i , t − ( w t + w 0 ) T x i , t ∑ min Or…. n Summary: w 0 , w t i , t t = 1.. T ✔ Now multi-task ✗ Tasks and their mean are inter-dependent: jointly optimise ✗ Still assumes all tasks are (equally) related w 1 w 2

+ Some Classic Methods – 3 Task Clustering Relaxing relatedness assumption through task clustering n Learn a set of tasks: { y = f t ( x , w t ) } { } x i , t , y i , t n Assume tasks form K similar groups: n Regularize task towards nearest group T ( w t − w k ' ) ∑ T x i , t + min min y i , t − w t k ' λ ( w t − w k ' ) w k , w t i , t k = 1.. K , t = 1.. T w 1 E.g., Evgeniou et al, JMLR, 2005 E.g., Kang et al, ICML, 2011 w 2

+ Some Classic Methods – 3 Task Clustering Multi-task transfer without assuming relatedness n Assume tasks form similar groups: T ( w t − w k ' ) ∑ T x i , t + min min y i , t − w t k ' λ ( w t − w k ' ) w k , w t i , t k = 1.. K , t = 1.. T n Summary: ü Doesn’t require all tasks related => More robust to negative transfer ü Benefits from “more specific” transfer ✗ What about task specific/task independent knowledge? ✗ How to determine number of clusters K? ✗ What if tasks share at the level of “parts”? ✗ Optimization is hard w 1

+ Some Classic Methods – 4 Task Factoring n Learn a set of tasks { y = f t ( x , w t ) } { } x i , t , y i , t n Assume related by a factor analysis / latent task structure. Binary task indicator vector { x i , y i , z i } n Notation: Input now triples: T x n STL, weight stacking notation: y = f t ( x , W ) = W T ( ) ( t ,:) x = W z T x i + λ W 2 ( ) ∑ min y i − W z i 2 n Factor Analysis-MTL: W i T x = PQ z T x ( ) ( ) y = W z T x i + λ P + ω Q ∑ ( ) min y i − PQ z i P , Q i E.g., Kumar, ICML’12 E.g., Passos, ICML’12

+ Some Classic Methods – 4 Task Factoring n Learn a set of tasks { } x i , y i , z i y = f t ( x , W ) n Assume related by a factor analysis / latent task structure. T x = PQ z T x y = w T ( ) ( ) t x = W z n Factor Analysis-MTL: T x i + λ P + ω Q n What does it mean? ∑ ( ) min y i − PQ z i P , Q i n W: DxK matrix of all task parameters n P: DxK matrix of basis/latent tasks n Q: KxT matrix of low-dimensional task models n => Each task is a low-dimensional linear combination of basis tasks.

+ Some Classic Methods – 4 Task Factoring n Learn a set of tasks { } x i , y i , z i y = f t ( x , W ) n Assume related by a factor analysis / latent task structure. T x = PQ z T x y = w T ( ) ( ) t x = W z n What does it mean? n z: (1-hot binary) Activates a column of Q T x i + λ P + ω Q ∑ ( ) min y i − PQ z i n P: DxK matrix of basis/latent tasks P , Q i n Q: KxT matrix of task models n => Tasks lie on a low-dimensional manifold w 1 Q n => Knowledge sharing by jointly learning manifold P n P: Specify the manifold w 2 n Q: Each task’s position on the manifold w 3

+ Some Classic Methods – 4 Task Factoring n Summary: n Tasks lie on a low-dimensional manifold n Each task is a low-dimensional linear combination of basis tasks. T x = PQ z T x y = w T ü Can flexibly share or not share: ( ) ( ) t x = W z n Two Q cols (tasks) similarity. T x i + λ P + ω Q ∑ ( ) min y i − PQ z i ü Can share piecewise: P , Q i n Two Q cols (tasks) similar in some rows only ü Can represent globally shared knowledge: w 1 n Uniform row in Q => all tasks activate same basis of P w 2 w 3

+ MTL Transfer as a Neural Network n Consider a two sided neural network: n Left: Data input x. n Right: Task indicator z. n Output unit y: Inner product of representations n Equivalent to: Task Regularization [Evgeniou KDD’04], if: n Q = W: (trainable) FC layer. P: (fixed) identity matrix. n z: 1-hot task encoding plus a bias bit => The shared knowledge n Linear activation T x i , t ( ) ∑ min y i , t − w t + w 0 w 0 , w t i , t t = 1.. T y = ( w t + w 0 ) T x [ Yang & Hospedales, ICLR’15 ]

+ MTL Transfer as a Neural Network n Consider a two sided neural network: n Left: Data input x. n Right: Task indicator z. n Output unit y: Inner product of representation on each side. n Equivalent to: Task Factor Analysis [ Kumar, ICML’12, GO-MTL ] if: Constraining task description/parameters: n Train FC layers P&Q Encompass: 5+ classic MTL/MDL approaches! n z: 1-hot task encoding n Linear activation T x ( ) y = W z T x i ∑ ( ) min y i − PQ z i P , Q i T ∑ ( ) Q z i ( ) = min y i − P x i P , Q i

+ MTL Transfer as a Neural Network: Interesting things n Interesting things: n Generalizes many existing frameworks… n Can do regression & classification (activation on y). n Can do multi-task and multi-domain. n As neural network, left side X can be any CNN and train end-to-end T x ( ) y = W z T ∑ ( ) Q z i ( ) min y i − P x i z: Task/Domain-ID x: Data P , Q i

+ MTL Transfer as a Neural Network: Interesting things Interesting things: n Non-linear activation on hidden layers: n Have representation learning on both task and data. n Exploit a non-linear task subspace. w 1 n CF GO-MTL’s linear task subspace. n Final classifier can be non-linear in feature space. w 2 w 3 T ( ) σ Q z ( ) y = σ P x T ∑ ( ) σ Q z i ( ) min y i − σ P x i z: Task/Domain-ID x: Data P , Q i

+ Unifying Perspectives on Knowledge Sharing: From Atomic to - PowerPoint PPT Presentation

+ Unifying Perspectives on Knowledge Sharing: From Atomic to Parameterised Domains and Tasks Task-CV @ ECCV 2016 Timothy Hospedales University of Edinburgh & Queen Mary University of London With Yongxin Yang Queen Mary University of

ESCRI-SA Knowledge Sharing Sharing Objectives and Components A presentation for the ESCRI-SA

DK - Batteridrevet vakuum lfter AL-Atomic 500 D - Batteriebetrieber Vakuumheber AL-Atomic 500

Secret Sharing and Visual Cryptography Outline Secret Sharing Visual Secret Sharing

Unifying Heterogeneous Cray Unifying Heterogeneous Cray Resources and Systems into an

Unifying Mirror Symmetry Constructions David Favero favero@ualberta.ca University of Alberta

Unifying Notions of Feedback Sergey Goncharov FAU Tag der Informatik 2019, April 26 Unifying

Unifying Traditional and Unifying Traditional and Formal Verification Through Formal

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Atomic page flip and mode setting Hardware structure and abstraction Atomic page flip The

Advanced Tools from Modern Cryptography Lecture 3 Secret-Sharing (ctd.) Secret-Sharing Last

The Atomic Simulation Environment Ask Hjorth Larsen and the ASE development team Abinit

Cesium By Olivia H., P.10 Cesium Atomic Symbol: Cs State at room temperature: solid Atomic

Efficiency of equilibria Non-atomic routing games Non-atomic routing games Definition:

Atomic Workstation Kalev Lember, Red Hat desktop team DevConf.cz 2018 What is Fedora Atomic

Atomic Physics Accelerator Facility at Darmstadt, Warsaw, November 24, 2003 Atomic Physics at

Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076)

GEO 101 Earth and Space Science - online Upul Senaratne, PhD Wor-Wic Community College

Learning Semantic Relationships of Geographical Areas Based on Trajectories Presenter: Saim

Learning Sciences: Impact on Learning Technologies & Learning Activities Phillip D. Long,

Semi-supervised Learning with Deep Generative Models Diedrik P. Kingma, Danilo J. Rezende, Shakir

Towards Grounding Conceptual Spaces in Neural Representations Lucas Bechberger and Kai-Uwe

USING COMPUTERIZED ASSESSMENT TO IDENTIFY PROFILES OF READING & LANGUAGE SKILLS IN

General Online Research Conference GOR 18 28 February to 2 March 2018, TH Kln University of

FESAC: Measurement of The Digital Economy Patrick Bajari VP and Chief Economist Amazon 12/15/2017

+ Unifying Perspectives on Knowledge Sharing: From Atomic to - PowerPoint PPT Presentation

+ Unifying Perspectives on Knowledge Sharing: From Atomic to Parameterised Domains and Tasks Task-CV @ ECCV 2016 Timothy Hospedales University of Edinburgh & Queen Mary University of London With Yongxin Yang Queen Mary University of

ESCRI-SA Knowledge Sharing Sharing Objectives and Components A presentation for the ESCRI-SA

DK - Batteridrevet vakuum lfter AL-Atomic 500 D - Batteriebetrieber Vakuumheber AL-Atomic 500

Secret Sharing and Visual Cryptography Outline Secret Sharing Visual Secret Sharing

Unifying Heterogeneous Cray Unifying Heterogeneous Cray Resources and Systems into an

Unifying Mirror Symmetry Constructions David Favero favero@ualberta.ca University of Alberta

Unifying Notions of Feedback Sergey Goncharov FAU Tag der Informatik 2019, April 26 Unifying

Unifying Traditional and Unifying Traditional and Formal Verification Through Formal

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Atomic page flip and mode setting Hardware structure and abstraction Atomic page flip The

Advanced Tools from Modern Cryptography Lecture 3 Secret-Sharing (ctd.) Secret-Sharing Last

The Atomic Simulation Environment Ask Hjorth Larsen and the ASE development team Abinit

Cesium By Olivia H., P.10 Cesium Atomic Symbol: Cs State at room temperature: solid Atomic

Efficiency of equilibria Non-atomic routing games Non-atomic routing games Definition:

Atomic Workstation Kalev Lember, Red Hat desktop team DevConf.cz 2018 What is Fedora Atomic

Atomic Physics Accelerator Facility at Darmstadt, Warsaw, November 24, 2003 Atomic Physics at

Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076)

GEO 101 Earth and Space Science - online Upul Senaratne, PhD Wor-Wic Community College

Learning Semantic Relationships of Geographical Areas Based on Trajectories Presenter: Saim

Learning Sciences: Impact on Learning Technologies &amp; Learning Activities Phillip D. Long,

Semi-supervised Learning with Deep Generative Models Diedrik P. Kingma, Danilo J. Rezende, Shakir

Towards Grounding Conceptual Spaces in Neural Representations Lucas Bechberger and Kai-Uwe

USING COMPUTERIZED ASSESSMENT TO IDENTIFY PROFILES OF READING &amp; LANGUAGE SKILLS IN

General Online Research Conference GOR 18 28 February to 2 March 2018, TH Kln University of

FESAC: Measurement of The Digital Economy Patrick Bajari VP and Chief Economist Amazon 12/15/2017

Learning Sciences: Impact on Learning Technologies & Learning Activities Phillip D. Long,

USING COMPUTERIZED ASSESSMENT TO IDENTIFY PROFILES OF READING & LANGUAGE SKILLS IN