ICML 2020 WHU C H I N A Adaptive Adversarial Multi-task Representation Learning Yuren Mao 1 Weiwei Liu 2 Xuemin Lin 1 1. University of New South Wales, Australia. 2. Wuhan University, China.
Overview: Adaptive AMTRL (Adversarial Multi-task Representation Learning) Algorithm Forward Propagation Backward Propagation ๐เท ๐(๐ ๐กโ , ๐ 1 ) ๐เท ๐(๐ ๐กโ , ๐ 1 ) Task 1 Task Specific Layers Original MTRL ๐๐ ๐กโ Adaptive AMTRL ๐๐ 1 ๐ก๐๐๐ข๐๐๐ฆ(๐๐ + ๐) โข โฆโฆ โฆโฆ Augmented ๐เท ๐เท ๐(๐ ๐กโ , ๐ ๐ ) ๐(๐ ๐กโ , ๐ ๐ ) Input Task T ๐๐ ๐ ๐๐ ๐กโ (a) Three 2-d Gaussian distributions (b) Discriminator (c) Relatedness changing curve โฆโฆ Lagrangian Task Relatedness for โข Discriminator Relatedness based MinMax โฆ ๐ ๐ธ (๐ ๐กโ ) ๐ ๐ธ (๐ ๐กโ ) โ๐เท ๐เท AMTRL Weighting Strategy ๐๐ ๐กโ ๐๐ Shared Layers Gradient Reversal Layer Better Performance AMTRL PAC Bound ๏ฟฝ L D ( h ) โ L S ( h ) โค c 1 ฯ G a ( G โ ( X 1 )) + c 2 Qsup g โ G โ โฅ g ( X 1 ) โฅ 9 ln (2 / ฮด ) โ n + n 2 nT Negligible Generalization Error The number of tasks does not matter
Content โข Adversarial Multi-task Representation Learning (AMTRL) โข Adaptive AMTRL โข PAC Bound and Analysis โข Experiments
Adversarial Multi-task Representation Learning Adversarial Multi-task Representation Learning (AMTRL) has achieved success in various applications, ranging from sentiment analysis to question answering systems. h L ( h, ฮป ) = L S ( h ) + ฮป L adv Forward Propagation Backward Propagation min ๐เท ๐(๐ ๐กโ , ๐ 1 ) ๐เท ๐(๐ ๐กโ , ๐ 1 ) Task 1 Task Specific Layers Original MTRL ๐๐ ๐กโ ๐๐ 1 Empirical loss: โฆโฆ โฆโฆ T n L S ( h ) = 1 ๐เท ๐(๐ ๐กโ , ๐ ๐ ) ๐เท ๐(๐ ๐กโ , ๐ ๐ ) ๏ฟฝ ๏ฟฝ l t ( f t ( g ( x t i )) , y t i ) Input Task T ๐๐ ๐ ๐๐ ๐กโ โฆโฆ nT t =1 i =1 Loss of the adversarial module: Discriminator MinMax โฆ ๐ ๐ธ (๐ ๐กโ ) ๐ ๐ธ (๐ ๐กโ ) T n โ๐เท ๐เท 1 L adv = max ๏ฟฝ ๏ฟฝ e t ฮฆ ( g ( x t ๐๐ ๐กโ i )) ๐๐ nT ฮฆ Shared Layers Gradient Reversal Layer t =1 i =1
Adaptive AMTRL Adversarial AMTRL aims to minimize the task-averaged empirical risk and enforce the representation of each task to share an identical distribution. We formulate it as a constraint optimization problem min L S ( h ) h L adv โ c = 0 , s.t. and propose to solve the problem with an augmented Lagrangian method. 1 T L S ( h ) + ฮป ( L adv โ c ) + r 2( L adv โ c ) 2 . min h ๐ and ๐ updates in the training process.
Relatedness for AMTRL ๏ฟฝ N n =1 e j ฮฆ ( g ( x i n )) + e i ฮฆ ( g ( x j n )) Relatedness between task i and task j: R ij = min { , 1 } ๏ฟฝ N n )) + e j ฮฆ ( g ( x j n =1 e i ฮฆ ( g ( x i n )) ๏ฃฎ ๏ฃน R 11 R 12 R 1 T ยท ยท ยท R 21 R 22 R 2 T Relatedness matrix: ๏ฃฏ ๏ฃบ ยท ยท ยท R = ๏ฃป . ๏ฃฏ ๏ฃบ . . . ... ๏ฃฏ ๏ฃบ . . . . . . ๏ฃฐ R T 1 R T 2 R TT ยท ยท ยท ๐ก๐๐๐ข๐๐๐ฆ(๐๐ + ๐) (c) Relatedness changing curve (a) Three 2-d Gaussian distributions (b) Discriminator
Adaptive AMTRL In multi-task learning, tasks regularize each other and improve the generalization of some tasks. The weights of each task influences the effect of the regularization. This paper proposes a weighting strategy for AMTRL based on the proposed task relatedness. 1 1 R 1 โฒ 1 R, w = where 1 is a 1ร๐ vector of all 1, and ๐ is the relatedness matrix. Combining the augmented Lagrangian method with the weighting strategy, optimization objective of our adaptive AMTRL method is T 1 w t L S t ( f t โฆ g ) + ฮป ( L adv โ c ) + r ๏ฟฝ 2( L adv โ c ) 2 . min T h t =1
PAC Bound and Analysis Assume the representation of each task share an identical distribution, we have the following generalization error bound. ๏ฟฝ L D ( h ) โ L S ( h ) โค c 1 ฯ G a ( G โ ( X 1 )) + c 2 Qsup g โ G โ โฅ g ( X 1 ) โฅ 9 ln (2 / ฮด ) โ n + n 2 nT Generalization Error Negligible The number of tasks does not matter โข The generalization error bound for AMTRL is tighter than that for MTRL. โข The number of tasks slightly influence the generalization bound of AMTRL.
Experiments - Relatedness Evolution Sentiment Analysis and Topic Classification. T R t = 1 Mean of ๏ฟฝ R tk . T k =0 Sentiment Analysis. Topic Classification
Experiments - Classification Accuracy Sentiment Analysis and Topic Classification. Sentiment Analysis. Topic Classification
Experiments - Influence of the Number of Tasks Sentiment Analysis. Relative Error: er MTL er rel = ๏ฟฝ T 1 1 er t STL T Error rate for the task โ appeal โ .
THANK YOU
Recommend
More recommend