Adaptive Activation Network and Functional Regularization for - PowerPoint PPT Presentation

Adaptive Activation Network and Functional Regularization for Efficient and Flexible Deep Multi-Task Learning (AAAI-2020) Reading Group Dec. 11, 2019 Suman Saha (postdoc), Computer Vision Lab @ ETH Zurich

M o t t i i v v a a t t i i o n n : : D D N N N A A c c t t i i v a t t i i o n n F F u n n c c t t i i o n n s s  L e a r n i n g a c t i v a t i o n f u n c t i o n s t o i m p r o v e d e e p n e u r a l n e t w o r k s ( D N N s ) [ 1 ]  P a r a m e t e r s i n t h e l i n e a r c o m p o n e n t s ( W a n d b ) a r e l e a r n e d f r o m d a t a  Wh i l e n o n l i n e a r i t i e s a r e p r e d e fj n e d , e . g . s i g m o i d , t a n h o r R e L U e t c .  A s s u m p t i o n – a n a r b i t r a r y c o m p l e x f u n c t i o n c a n b e a p p r o x i m a t e d u s i n g a n y o f t h e s e c o m m o n n o n l i n e a r f u n c t i o n s  I n p r a c t i c e , t h e c h o i c e o f n o n l i n e a r i t y a fg e c t s :  → t h e l e a r n i n g d y n a m i c s  → n e t w o r k e x p r e s s i v e p o w e r [1] Agostinelli, Forest, et al. "Learning activation functions to improve deep neural networks." arXiv preprint arXiv:1412.6830 (2014). 2

M o t t i i v v a a t t i i o n n : : C C h o i c e o o f N N o o n n l l i i n n e e a r i t t y y  A c t i v e r e s e a r c h a r e a – d e s i g n a c t i v a t i o n f u n c t i o n s t h a t e n a b l e f a s t t r a i n i n g o f D N N V a n i s h i i n n g G r a d i i e e n n t t P P r o o b b l e m  D e r i v a t i v e o f a S i g m o i d F u n c t i o n  r a n g e s b e t w e e n 0 t t o o 0 0 . 2 5 We i g h t t U U p d a a t t e e  F o r D N N w i t h m o r e l a y e r s , t h e g r a d i e n t s t e n d t o v a n i s h m o r e i n t h e l o w e r l a y e r s 3

M o t t i i v v a a t t i i o n n : : C C h o i c e o o f N N o o n n l l i i n n e e a r i t t y y  Tie r e c t i fj e d l i n e a r a c t i v a t i o n f u n c t i o n ( R e L U ) d o e s n o t s a t u r a t e l i k e s i g m o i d a l f u n c t i o n s  h e l p s t o o v e r c o m e t h e v a n i s h i n g g r a d i e n t p r o b l e m A A n n o h t t e e r r r r e c e e n n t t a a c c t t i i v v a a t t i i o o n f u n c t t i i o o n n s  M a x o u t a c t i v a t i o n ( G o o d f e l l o w e t a l . , 2 0 1 3 ) – c o m p u t e s t h e m a x i m u m  o f a s e t o f l i n e a r f u n c t i o n s  S p r i n g e n b e r g & R i e d m i l l e r ( 2 0 1 3 ) r e p l a c e d t h e m a x f u n c t i o n  G u l c e h r e e t a l . ( 2 0 1 4 ) e x p l o r e d a n a c t i v a t i o n f u n c t i o n t h a t r e p l a c e s t h e m a x f u n c t i o n w i t h a n L n o r m P 4

M o t t i i v v a a t t i i o n  Tie t y p e o f a c t i v a t i o n f u n c t i o n c a n h a v e a s i g n i fj c a n t i m p a c t o n l e a r n i n g  O n e w a y t o e x p l o r e t h e s p a c e o f p o s s i b l e f u n c t i o n s i s t o l e a r n t h e a c t i v a t i o n f u n c t i o n d u r i n g t r a i n i n g ( A g o s t i n e l l i e t a l . , 2 0 1 4 ) 5

A d d a a p t t i i v e P P i e c e w i s s e e L L i i n n e e a r ( ( A A P P L L ) ) u n n i i t t s s  A c t i v a t i o n f u n c t i o n s a s a s u m o f h i n g e - s h a p e d f u n c t i o n s r e s u l t i n g a p i e c e w i s e l i n e a r a c t i v a t i o n f u n c t i o n  S ( t h e n u m b e e r r o o f f h h i i n n g e e s s ) i s a h y p e r p a r a m e t e r s e t i n a d v a n c e  a r e t h e l e a a r r n a a b b l e e p p a r a a m m e e t t e e r r s , w h e r e  v a r i a b l e s c o n t r o l t h e s l o p e e s s o f t h e l i n e a r s e g m e n t s  v a r i a b l e s d e t e r m i n e t h e l o c a a t t i i o o n n s o f t h e h i n g e s 6

A d d a a p t t i i v e P P i e c c e e w i s e L i n n e e a r ( ( A P L ) u u n n i i t t s s  F i g . 1 s h o w s e x a m p l e A P L f u n c t i o n s f o r S = 1  f o r l a r g e e n o u g h S , A P L c a n a p p r o x i m a t e a r b i t r a r i l y c o m p l e x c o n t i n u o u s f u n c t i o n s  t h e fj r s t t e r m i n E q . ( 1 ) i s R e e L L U  w h e n x x < < 0 0 t h e d e e r r i i v v a a t t i i v v e o f R e L U i s 0 r e s u l t i n g d e a a d d n n e e u u r o o n n s Figure 1: Sample activation functions obtained from changing the parameters. Notice that figure b L e a k y R e L U s a d d r e s s e s t h e d e a d n e u r o n s shows that the activation function can also be non-convex. p r o b l e m s , e . g . l e a k y R e L U m a y h a v e y = 0 . 0 1 x w h e n x < 0 Figure 2 7

T A A N ( T a s k A A d d a a p t t i i v e A c t i v a t t i i o n n N N e t w o r k ) ) Categories of deep learning MTL: (a) Hard-sharing; (b) Soft-sharing; (c) Task Adaptive Activation Network (proposed model); (d) Inner Structure of Adaptive Activation Layer. ● P r o o p p o o s s e d a a p p p r o o a a c c h = = h h a r d - s h a r i i n n g + + l l e a a r r n a a b b l e e t t a a s s k k - - s p e c i i fj fj c a c t t i i v a t t i i o o n n f u n c t t i i o o n n s ● a l l t a s k s c a n s h a r e t h e i r w e i g h t s a n d b i a s e s o n t h e h i d d e n l a y e r s ● m o r e s c a l a b l e t h a n t h e s o f t - s h a r i n g m e t h o d s w h e r e t h e n u m b e r o f n e t w o r k c o m p o n e n t s i s p r o p o r t i o n a l t o t h e n u m b e r o f t a s k s 8

T A A A A N N ● F o r a t a s k t , g i v e n t h e i n p u t f r o m e i t h e r t h e p r e v i o u s l a y e r o r d a t a i n p u t , t h e o u t p u t o f t h e l - t h A A L ( A d a p t i v e A c t i v a t i o n L a y e r ) i s d e fj n e d b y ● w e i g h t a n d b i a s p a r a m e t e r s a r e s h a r e d a c r o s s t a s k s (c) Task Adaptive Activation Network (proposed model); (d) Inner Structure of Adaptive Activation Layer. ● Tie t a s k - s p e c i fj c a c t i v a t i o n f u n c t i o n f o r t t a a s s k k t a n d l a y e e r r l l i s d e fj n e d a s ● R e c a l l f r o m s l i d e 6 a n d 7 , M ( t h e n u m b e r o f h i n g e s ) i s a h y p e r p a r a m e t e r s e t i n a d v a n c e ● d e n o t e s t h e c o o r d i n a t e s o f t h e b a s i s f u n c t i o n s 9

Adaptive Activation Network and Functional Regularization for - PowerPoint PPT Presentation

Adaptive Activation Network and Functional Regularization for Efficient and Flexible Deep Multi-Task Learning (AAAI-2020) Reading Group Dec. 11, 2019 Suman Saha (postdoc), Computer Vision Lab @ ETH Zurich M o t t i i v v a a t t i

Compilers Activation Records Alex Aiken Activation Records The information needed to manage

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Eastbourne Reserve Activation Project Project description The Eastbourne Reserve Activation

FFR Guided Functional FFR Guided Functional FFR Guided Functional FFR Guided Functional

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Iterative regularization for general inverse problems Guillaume Garrigos with L. Rosasco and S.

Internalization, Dimerization, and Activation of CD38 during mNOX Activation: - and Ca 2+

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Regularization of optimal control problems Daniel Wachsmuth (RICAM Linz) joint work with Gerd

Numerical Optimal Control with DAEs Lecture 11: High-Index DAEs S ebastien Gros AWESCO PhD

NES COMMUNITY SERVICES ACQUISITION : PENNINE CARE FOUNDATION TRUST COMMUNITY SERVICES TO

Top Yukawa Deviation in Extra Dimension ( ) ( ) ( ) arXiv:0904.3813 [hep-ph]

EP228: Quantum Mechanics I JAN-APR 2016 Lecture 21: Ladder operators (Expectation values,

LDM Sub-group Status & Timeline For DUNE BSM Working Group Dec. 20, 2016 Jae Yu Most the

I s s u e s w i t h L D M i n T D R A n i m e s h C h a t t e r j

Stale pointers are the new black Vincenzo Iozzo, Giovanni

Connecting low-energy Dark Matter searches with high-energy physics: the role of operator mixing

Adaptive Activation Network and Functional Regularization for - PowerPoint PPT Presentation

Adaptive Activation Network and Functional Regularization for Efficient and Flexible Deep Multi-Task Learning (AAAI-2020) Reading Group Dec. 11, 2019 Suman Saha (postdoc), Computer Vision Lab @ ETH Zurich M o t t i i v v a a t t i

Compilers Activation Records Alex Aiken Activation Records The information needed to manage

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Eastbourne Reserve Activation Project Project description The Eastbourne Reserve Activation

FFR Guided Functional FFR Guided Functional FFR Guided Functional FFR Guided Functional

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Iterative regularization for general inverse problems Guillaume Garrigos with L. Rosasco and S.

Internalization, Dimerization, and Activation of CD38 during mNOX Activation: - and Ca 2+

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Regularization of optimal control problems Daniel Wachsmuth (RICAM Linz) joint work with Gerd

Numerical Optimal Control with DAEs Lecture 11: High-Index DAEs S ebastien Gros AWESCO PhD

NES COMMUNITY SERVICES ACQUISITION : PENNINE CARE FOUNDATION TRUST COMMUNITY SERVICES TO

Top Yukawa Deviation in Extra Dimension ( ) ( ) ( ) arXiv:0904.3813 [hep-ph]

EP228: Quantum Mechanics I JAN-APR 2016 Lecture 21: Ladder operators (Expectation values,

LDM Sub-group Status &amp; Timeline For DUNE BSM Working Group Dec. 20, 2016 Jae Yu Most the

I s s u e s w i t h L D M i n T D R A n i m e s h C h a t t e r j

Stale pointers are the new black Vincenzo Iozzo, Giovanni

Connecting low-energy Dark Matter searches with high-energy physics: the role of operator mixing

Regularization Overview Regularization Overview Problems & Multicollinearity We will

LDM Sub-group Status & Timeline For DUNE BSM Working Group Dec. 20, 2016 Jae Yu Most the