Adaptive Activation Network and Functional Regularization for Efficient and Flexible Deep Multi-Task Learning (AAAI-2020) Reading Group Dec. 11, 2019 Suman Saha (postdoc), Computer Vision Lab @ ETH Zurich
M o t t i i v v a a t t i i o n n : : D D N N N A A c c t t i i v a t t i i o n n F F u n n c c t t i i o n n s s L e a r n i n g a c t i v a t i o n f u n c t i o n s t o i m p r o v e d e e p n e u r a l n e t w o r k s ( D N N s ) [ 1 ] P a r a m e t e r s i n t h e l i n e a r c o m p o n e n t s ( W a n d b ) a r e l e a r n e d f r o m d a t a Wh i l e n o n l i n e a r i t i e s a r e p r e d e fj n e d , e . g . s i g m o i d , t a n h o r R e L U e t c . A s s u m p t i o n – a n a r b i t r a r y c o m p l e x f u n c t i o n c a n b e a p p r o x i m a t e d u s i n g a n y o f t h e s e c o m m o n n o n l i n e a r f u n c t i o n s I n p r a c t i c e , t h e c h o i c e o f n o n l i n e a r i t y a fg e c t s : → t h e l e a r n i n g d y n a m i c s → n e t w o r k e x p r e s s i v e p o w e r [1] Agostinelli, Forest, et al. "Learning activation functions to improve deep neural networks." arXiv preprint arXiv:1412.6830 (2014). 2
M o t t i i v v a a t t i i o n n : : C C h o i c e o o f N N o o n n l l i i n n e e a r i t t y y A c t i v e r e s e a r c h a r e a – d e s i g n a c t i v a t i o n f u n c t i o n s t h a t e n a b l e f a s t t r a i n i n g o f D N N V a n i s h i i n n g G r a d i i e e n n t t P P r o o b b l e m D e r i v a t i v e o f a S i g m o i d F u n c t i o n r a n g e s b e t w e e n 0 t t o o 0 0 . 2 5 We i g h t t U U p d a a t t e e F o r D N N w i t h m o r e l a y e r s , t h e g r a d i e n t s t e n d t o v a n i s h m o r e i n t h e l o w e r l a y e r s 3
M o t t i i v v a a t t i i o n n : : C C h o i c e o o f N N o o n n l l i i n n e e a r i t t y y Tie r e c t i fj e d l i n e a r a c t i v a t i o n f u n c t i o n ( R e L U ) d o e s n o t s a t u r a t e l i k e s i g m o i d a l f u n c t i o n s h e l p s t o o v e r c o m e t h e v a n i s h i n g g r a d i e n t p r o b l e m A A n n o h t t e e r r r r e c e e n n t t a a c c t t i i v v a a t t i i o o n f u n c t t i i o o n n s M a x o u t a c t i v a t i o n ( G o o d f e l l o w e t a l . , 2 0 1 3 ) – c o m p u t e s t h e m a x i m u m o f a s e t o f l i n e a r f u n c t i o n s S p r i n g e n b e r g & R i e d m i l l e r ( 2 0 1 3 ) r e p l a c e d t h e m a x f u n c t i o n G u l c e h r e e t a l . ( 2 0 1 4 ) e x p l o r e d a n a c t i v a t i o n f u n c t i o n t h a t r e p l a c e s t h e m a x f u n c t i o n w i t h a n L n o r m P 4
M o t t i i v v a a t t i i o n Tie t y p e o f a c t i v a t i o n f u n c t i o n c a n h a v e a s i g n i fj c a n t i m p a c t o n l e a r n i n g O n e w a y t o e x p l o r e t h e s p a c e o f p o s s i b l e f u n c t i o n s i s t o l e a r n t h e a c t i v a t i o n f u n c t i o n d u r i n g t r a i n i n g ( A g o s t i n e l l i e t a l . , 2 0 1 4 ) 5
A d d a a p t t i i v e P P i e c e w i s s e e L L i i n n e e a r ( ( A A P P L L ) ) u n n i i t t s s A c t i v a t i o n f u n c t i o n s a s a s u m o f h i n g e - s h a p e d f u n c t i o n s r e s u l t i n g a p i e c e w i s e l i n e a r a c t i v a t i o n f u n c t i o n S ( t h e n u m b e e r r o o f f h h i i n n g e e s s ) i s a h y p e r p a r a m e t e r s e t i n a d v a n c e a r e t h e l e a a r r n a a b b l e e p p a r a a m m e e t t e e r r s , w h e r e v a r i a b l e s c o n t r o l t h e s l o p e e s s o f t h e l i n e a r s e g m e n t s v a r i a b l e s d e t e r m i n e t h e l o c a a t t i i o o n n s o f t h e h i n g e s 6
A d d a a p t t i i v e P P i e c c e e w i s e L i n n e e a r ( ( A P L ) u u n n i i t t s s F i g . 1 s h o w s e x a m p l e A P L f u n c t i o n s f o r S = 1 f o r l a r g e e n o u g h S , A P L c a n a p p r o x i m a t e a r b i t r a r i l y c o m p l e x c o n t i n u o u s f u n c t i o n s t h e fj r s t t e r m i n E q . ( 1 ) i s R e e L L U w h e n x x < < 0 0 t h e d e e r r i i v v a a t t i i v v e o f R e L U i s 0 r e s u l t i n g d e a a d d n n e e u u r o o n n s Figure 1: Sample activation functions obtained from changing the parameters. Notice that figure b L e a k y R e L U s a d d r e s s e s t h e d e a d n e u r o n s shows that the activation function can also be non-convex. p r o b l e m s , e . g . l e a k y R e L U m a y h a v e y = 0 . 0 1 x w h e n x < 0 Figure 2 7
T A A N ( T a s k A A d d a a p t t i i v e A c t i v a t t i i o n n N N e t w o r k ) ) Categories of deep learning MTL: (a) Hard-sharing; (b) Soft-sharing; (c) Task Adaptive Activation Network (proposed model); (d) Inner Structure of Adaptive Activation Layer. ● P r o o p p o o s s e d a a p p p r o o a a c c h = = h h a r d - s h a r i i n n g + + l l e a a r r n a a b b l e e t t a a s s k k - - s p e c i i fj fj c a c t t i i v a t t i i o o n n f u n c t t i i o o n n s ● a l l t a s k s c a n s h a r e t h e i r w e i g h t s a n d b i a s e s o n t h e h i d d e n l a y e r s ● m o r e s c a l a b l e t h a n t h e s o f t - s h a r i n g m e t h o d s w h e r e t h e n u m b e r o f n e t w o r k c o m p o n e n t s i s p r o p o r t i o n a l t o t h e n u m b e r o f t a s k s 8
T A A A A N N ● F o r a t a s k t , g i v e n t h e i n p u t f r o m e i t h e r t h e p r e v i o u s l a y e r o r d a t a i n p u t , t h e o u t p u t o f t h e l - t h A A L ( A d a p t i v e A c t i v a t i o n L a y e r ) i s d e fj n e d b y ● w e i g h t a n d b i a s p a r a m e t e r s a r e s h a r e d a c r o s s t a s k s (c) Task Adaptive Activation Network (proposed model); (d) Inner Structure of Adaptive Activation Layer. ● Tie t a s k - s p e c i fj c a c t i v a t i o n f u n c t i o n f o r t t a a s s k k t a n d l a y e e r r l l i s d e fj n e d a s ● R e c a l l f r o m s l i d e 6 a n d 7 , M ( t h e n u m b e r o f h i n g e s ) i s a h y p e r p a r a m e t e r s e t i n a d v a n c e ● d e n o t e s t h e c o o r d i n a t e s o f t h e b a s i s f u n c t i o n s 9
Recommend
More recommend