Learning What and Where to Transfer Yunhun Jang* 1,2 , Hankook Lee* 1 , Sung Ju Hwang 3,4,5 , Jinwoo Shin 1,4,5 1 School of Electrical Engineering, KAIST 2 OMNIOUS 3 School of Computing, KAIST 4 Graduate School of AI, KAIST 5 AITRICS * Equal contribution
Transfer Learning 2 • DNNs require large labeled datasets to train • Transfer learning is a popular method to mitigate the lack of samples • Improve the performance of a model on a new task • By utilizing the knowledge of pre-trained source models
Transfer Learning 2 • DNNs require large labeled datasets to train • Transfer learning is a popular method to mitigate the lack of samples • Improve the performance of a model on a new task • By utilizing the knowledge of pre-trained source models • Limitations of previous methods • Require the same architecture between a source and target models (e.g., fine-tuning) ? Output Output Output ImageNet New task Training Training Pre-train and fine-tuning
Transfer Learning 2 • DNNs require large labeled datasets to train • Transfer learning is a popular method to mitigate the lack of samples • Improve the performance of a model on a new task • By utilizing the knowledge of pre-trained source models • Limitations of previous methods • Require the same architecture between a source and target models (e.g., fine-tuning) • Require exhaustive hand-crafted tuning (e.g., attention transfer [1], Jacobian matching [2]) True labels ? Output Output Output Output Output ImageNet New task Training Training Pre-train and fine-tuning Attention transfer/Jacobian matching [1] Zagoruyko, S. and Komodakis, N. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In ICLR 2017 [2] Srinivas, S. and Fleuret, F. Knowledge transfer with Jaco- bian matching. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018) , 2018.
Learning What/Where to Transfer 3 • Propose meta-networks 𝑔 and • Learn the learning rules to transfer the source knowledge
Learning What/Where to Transfer 3 • Propose meta-networks 𝑔 and • Learn the learning rules to transfer the source knowledge Where to transfer • A meta-network decides useful pairs of source/target layers to transfer Previous methods Learning What and Where to Transfer (L2T-ww)
Learning What/Where to Transfer 3 • Propose meta-networks 𝑔 and : Learning what/where to transfer (L2T-ww) • Learn the learning rules to transfer the source knowledge Where to transfer • A meta-network decides useful pairs of source/target layers to transfer What to transfer • A meta-network 𝑔 decides useful channels to transfer Previous methods Learning What and Where to Transfer (L2T-ww)
<latexit sha1_base64="nUZoIMYfa6w+/Drqc7rxvx79KU4=">ADIHicdVJNb9MwGHbC1wgf6+DIxaKlEqlagqCanSpF564DBgXSc1aeS4TmvmOFHsjFUmP4ULf4ULBxCG/wanI9qZQNLVp73eb+e943DlFEh+/1fhnt+o2bt3ZuW3fu3ru/29p7cCySPMNkghOWZCchEoRTiaSkZO0oygOGRkGp6OSv/0jGSCJvxIrlPix2jJaUQxkpoK9ozn9vu5iru8GM4aEChc+MOoMbx0RZ232ic807Hsj2miy9Qk7P8T1QWeHJFJHKO5opXnEe5FyO5CkP1pirkSdpTAQcN9p4Vn2JrjMV3WBwrK3lVj2lkirKogRU6+KDetJCSOtos7+cN6BQ+hFGcLKLdRorNuIPA4U7b4rnIsmtcwLU+vVDbplFHwCrc1gG64zHwStdr/Xrw68CtwGtEFzDoPWT2+R4DwmXGKGhJi5/VT6CmWSYkb0LkgKcKnaElmGnKkt+Kr6gcX0NbMAkZJpi+XsGK3MxSKhVjHoY4sVyIu+0ryX75ZLqN9X1Ge5pJwXDeKcgZlAsvXAhc0I1iytQYIZ1RrhXiF9DqlflOWXoJ7eSr4HjQc5/2Bq+ftQ9eNuvYAY/AY+AF7wAB2AMDsEYOj8dn4anwzP5lfzO/mjzrUNJqch+CvY/7+A2ad/rE=</latexit> L2T-ww: Learning What to Transfer 4 • Transfer by making target features similar to those of source [3] Transformation for channel dimension matching (e.g., 1x1 conv) 1 Feature L m,n X ( r θ ( T n θ ( x )) c,i,j − S m ( x ) c,i,j ) 2 fm ( θ | x ) = Matching CHW i,j [3] Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., and Bengio, Y. Fitnets: Hints for thin deep nets. In ICLR , 2015.
<latexit sha1_base64="WFLpo2AVQDyYsFg1INK5EiAbaUw=">ACxHicbVFda9swFJW9ry7aLo97kVbGCSQBTsbdAwGhcHWhz2krGkKcWJkRW60SraRrpsETftj+yX7G/sFk2MXlnYXBEfn3I/DvUkhuIYg+O35d+7eu/9g72Hr0eMnT/fbB8/OdF4qysY0F7k6T4hmgmdsDBwEOy8UIzIRbJcfqr0yRVTmufZKWwKNpPkIuMpwQcFbd/RZLAkhJhvtq5kf3MxiYCwKtU2m4ESwbkx7q/qUe/oijVBFqQmuOJxZHupSxoQ4AW8PWjUkEoZfWrK67OdW26kTe/26jnayuqsZVU2z3dG7q6fV3eu54n5Vg9/gb65dxV1TvfnQxu1OMAi2gW+DsAEd1MQobv+JFjktJcuACqL1NAwKmBmigFPBnMtSs8L5Ihds6mBGJNMzszVs8WvHLHCaK/cywFv23wpDpNYbmbjMaqX6plaR/9OmJaTvZ4ZnRQkso/WgtBQYclxdDC+4YhTExgFCFXdeMV0SdwVwd93ptLjihW5cr2vbLbek8OZKboOz4SB8OxievOscfWjWtYdeoFeoi0J0iI7QMRqhMaLeS+LN/JO/M+8LVf1qm+19Q8Rzvh/wLl7Hfsg=</latexit> L2T-ww: Learning What to Transfer 5 • Learn what to transfer 1 L m,n X X wfm ( θ | x, w m,n ) = w m,n ( r θ ( T n θ ( x )) c,i,j − S m ( x ) c,i,j ) 2 c HW c i,j
<latexit sha1_base64="K3d+ksuTdnFdGexql9AhTaU2go0=">ACFHicbVDLSgMxFM3UV62vUZdugkUQLGWmCogFNy4rGAf0BlLJk3b0CQzJBmlDPMRbvwVNy4UcevCnX9jOu2ith64cHLOveTeE0SMKu04P1ZuaXldS2/XtjY3NresXf3GiqMJSZ1HLJQtgKkCKOC1DXVjLQiSRAPGkGw+ux3wgUtFQ3OlRHyO+oL2KEbaSB375PE+4SWRdhKcQq9PoFOCnop59p71rqDbsYtO2ckAF4k7JUwRa1jf3vdEMecCI0ZUqrtOpH2EyQ1xYykBS9WJEJ4iPqkbahAnCg/yY5K4ZFRurAXSlNCw0ydnUgQV2rEA9PJkR6oeW8s/ue1Y9278BMqolgTgScf9WIGdQjHCcEulQRrNjIEYUnNrhAPkERYmxwLJgR3/uRF0qiU3dNy5fasWL2cxpEHB+AQHAMXnIMquAE1UAcYPIEX8AberWfr1fqwPietOWs6sw/+wPr6BR2DnY4=</latexit> <latexit sha1_base64="4Z5AMNmGo4HUKN+atZn/4WYMapM=">ACw3icbVHtihMxFM2MX2v9qvrTP8EitFBLpwqKICyIuj8EK263C512yKSZbWySGZI725aYF/NfA2fwExnFtxdLwROzrkfh3vTQnADw+HvILx89btOwd3W/fuP3j4qP34yYnJS03ZhOYi16cpMUxwxSbAQbDTQjMiU8Gm6fpDpU/PmTY8V8ewK9hckjPFM04JeCp/4olgRUlwn5xCyv7yiU2BsCbTLpuDCsG5Oe2v6mlHn6P40wTaiNnj6YOx6aUiaUeANvC3o39rBlTzm4unVtepE3v/h4jorFSVztquradUQ1z1e2Hp4/d32er62X5Xgl/i71ZxF1RvMXJuzMcDPeBr4OoAR3UxDhp/4mXOS0lU0AFMWYWDQuYW6KBU8G8ydKwgtA1OWMzDxWRzMzt3q/DLzyzxFmu/VOA9+y/FZIY3Yy9ZnVRs1VrSL/p81KyN7OLVdFCUzRelBWCgw5rg6Gl1wzCmLnAaGae6+Yrog/AvizXuq0POeFaVxva9stv6To6kqug5PRIHo1GH173Tl816zrAD1Dz1EXRegNOkRHaIwmiAY4+BR8Dcbhx3Ad6hDq1DBoap6iSxG6v4gr30c=</latexit> L2T-ww: Learning What to Transfer 5 • Learn what to transfer X w m,n w m,n ≥ 0 , = 1 c c c 1 L m,n X X wfm ( θ | x, w m,n ) = w m,n ( r θ ( T n θ ( x )) c,i,j − S m ( x ) c,i,j ) 2 c HW c i,j
<latexit sha1_base64="4Z5AMNmGo4HUKN+atZn/4WYMapM=">ACw3icbVHtihMxFM2MX2v9qvrTP8EitFBLpwqKICyIuj8EK263C512yKSZbWySGZI725aYF/NfA2fwExnFtxdLwROzrkfh3vTQnADw+HvILx89btOwd3W/fuP3j4qP34yYnJS03ZhOYi16cpMUxwxSbAQbDTQjMiU8Gm6fpDpU/PmTY8V8ewK9hckjPFM04JeCp/4olgRUlwn5xCyv7yiU2BsCbTLpuDCsG5Oe2v6mlHn6P40wTaiNnj6YOx6aUiaUeANvC3o39rBlTzm4unVtepE3v/h4jorFSVztquradUQ1z1e2Hp4/d32er62X5Xgl/i71ZxF1RvMXJuzMcDPeBr4OoAR3UxDhp/4mXOS0lU0AFMWYWDQuYW6KBU8G8ydKwgtA1OWMzDxWRzMzt3q/DLzyzxFmu/VOA9+y/FZIY3Yy9ZnVRs1VrSL/p81KyN7OLVdFCUzRelBWCgw5rg6Gl1wzCmLnAaGae6+Yrog/AvizXuq0POeFaVxva9stv6To6kqug5PRIHo1GH173Tl816zrAD1Dz1EXRegNOkRHaIwmiAY4+BR8Dcbhx3Ad6hDq1DBoap6iSxG6v4gr30c=</latexit> L2T-ww: Learning What to Transfer 5 • Learn what to transfer Choose important channels for learning a target task 1 L m,n X X wfm ( θ | x, w m,n ) = w m,n ( r θ ( T n θ ( x )) c,i,j − S m ( x ) c,i,j ) 2 c HW c i,j
<latexit sha1_base64="Fw/IGEpRZnARKdtX6DGSYRzYgeM=">ACGXicbVDLSsNAFJ3UV42PRl26GSyCymJCrqz4MZlBfuAtpbJZNIOnUzCzKRYQj5Eu9WPcOlO3LryO/wBJ0XtvXAwOGce7lnjhsxKpVtfxuFldW19Y3iprm1vbNbsvb2GzKMBSZ1HLJQtFwkCaOc1BVjLQiQVDgMtJ0hzeZ3xwRIWnI79U4It0A9Tn1KUZKSz2rlHSYnvbQxKc8jTtW7Yk8Bl4kzI+Xrt6cMz7We9dPxQhwHhCvMkJRtx45UN0FCUcxIanZiSKEh6hP2pyFBDZTabBU3isFQ/6odCPKzhV/24kKJByHLh6MkBqIBe9TPzPa3sjGsnZrcf8mDkfRflX3YTyKFaE4zyJHzOoQpjVBD0qCFZsrAnCgurPQDxAmGlyzR1S85iJ8ukcVZxziv2nVOuXoAcRXAIjsAJcMAlqIJbUAN1gEMJuAFvBoT4934MD7z0YIx2zkAczC+fgErfKWj</latexit> L2T-ww: Learning Where to Transfer 6 • Learn where to transfer λ m,n • Meta-networks choose important matching pairs to transfer • Given all possible candidate matching pairs 𝒟
<latexit sha1_base64="vCrLMCcInMnLaMtxwVMTL+Nvmzg=">AEQHicrZPLb9MwHMe9hMcIrw6OXCyqSKlUqYgbUKqNKmXHjgMWNtJTRs5rtN6y0uxw1YZ/2lc+BO4cebCAYS4csJ0y3d4CEpao/17fz8+OvSgjLfbX7c0/dbtO3e37xn3Hzx89Li282TI4izFZIDjIE6PMRIQCMy4JQH5ChJCQq9gIy8k14eH30gKaNxdMiXCZmEaB5Rn2LElcvd0Ybm6VSEzUh2x6XhCiwnXb/cOMmCWu/VRlpnjYZhOoFqPkNlzfyGrNR1+IJwZB1ORVT4HBo5IeILzxPv5FT0HE5DwmC/B9JxzDXyXm9WDWQhlklMcwKpGEWiopb4xRIN7IdThHPqKZtXl41kDdqHjpwgLW4peX8mxLHQFbR5L60JshXuxVdxKqJlnwRew1FPetbMx7dyoflqVb5bYGxjnFhWpvpvYBWu64jy4yxwCj1LyZ/fUV7UkxJu3DX8pzndWr3dahcLXjXs0qiDch24tS/OLMZSCKOA8TY2G4nfCJQyikOiDScjJE4RM0J2NlRkh9NRNRPAJTeWZQT9O1S/isPBWKwQKGVuGnsrMh2CXY7nzutg4/7eRNAoyTiJ8ErIzwLIY5i/JjijKcE8WCoD4ZQqVogXSN0vV2/OUIdgXx75qjHstOyXrc7bV/X91+VxbINn4DmwgA12wT7ogwMwAFj7pH3Tfmg/9c/6d/2X/nuVqm2VNU/BxtL/AXaEmpj</latexit> L2T-ww: Learning Where to Transfer 7 • Learn where to transfer X λ m,n L m,n L wfm ( θ | x, φ ) = wfm ( θ | x, w m,n ) ( m,n ) ∈ C
Recommend
More recommend