test time training with self supervision for
play

Test-Time Training with Self-Supervision for Generalization under - PowerPoint PPT Presentation

Test-Time Training with Self-Supervision for Generalization under Distribution Shifts Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, Moritz Hardt UC Berkeley ICML 2020 same distribution P = Q x: train set o: test set x x o x


  1. Test-Time Training with Self-Supervision for Generalization under Distribution Shifts Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, Moritz Hardt UC Berkeley ICML 2020

  2. same distribution P = Q x: train set o: test set x x o x • In theory : same distribution for training and testing

  3. distribution shifts P Q x: train set o: test set x x o x • In theory : same distribution for training and testing • In the real word : distribution shifts are everywhere

  4. distribution shifts P Q x: train set o: test set x x o x • In theory : same distribution for training and testing • In the real word : distribution shifts are everywhere CIFAR-10 2009 CIFAR-10 2019 Hendrycks and Dietterich, 2018 Recht, Roelofs, Schmidt and Shankar, 2019

  5. Existing paradigms anticipate the shifts with data or math P Q x: train set o: test set x x o x

  6. A Theory of Learning from Different Domains Ben-David, Blitzer, Crammer, Kulesza, Pereira and Vaughan, 2009 Adversarial Discriminative Domain Adaptation Existing paradigms anticipate the shifts with data or math Tzeng, Hoffman, Saenko and Darrell, 2017 Unsupervised Domain Adaptation through Self-Supervision Sun, Tzeng, Darrell and Efros, 2019 • Domain adaptation • Data from the test distribution P Q x: train set o: test set x x x x x o x

  7. A Theory of Learning from Different Domains Ben-David, Blitzer, Crammer, Kulesza, Pereira and Vaughan, 2009 Adversarial Discriminative Domain Adaptation Existing paradigms anticipate the shifts with data or math Tzeng, Hoffman, Saenko and Darrell, 2017 Unsupervised Domain Adaptation through Self-Supervision Sun, Tzeng, Darrell and Efros, 2019 • Domain adaptation • Data from the test distribution (maybe unlabeled) • Hard to know the test distribution P Q x: train set o: test set x x x x x o x

  8. Existing paradigms anticipate the shifts with data or math • Domain adaptation • Data from the test distribution • Hard to know the test distribution • Domain generalization P Q x: train set • Data from the meta distribution o: test set x x o x Domain generalization via invariant feature representation Muandet, Balduzzi and Scholkopf, 2013 Domain generalization for object recognition with multi-task autoencoders Ghifary, Bastiaan, Zhang and Balduzzi, 2015 Domain Generalization by Solving Jigsaw Puzzles Carlucci, D'Innocente, Bucci, Caputo and Tommasi, 2019

  9. distribution shifts Existing paradigms anticipate the shifts with data or math P Q Q o: test set ⇐ P x: train set ⇒ • Domain adaptation x x o x … X 1 X n X • Data from the test distribution • Hard to know the test distribution • Domain generalization • Data from the meta distribution Domain generalization via invariant feature representation Muandet, Balduzzi and Scholkopf, 2013 Domain generalization for object recognition with multi-task autoencoders Ghifary, Bastiaan, Zhang and Balduzzi, 2015 Domain Generalization by Solving Jigsaw Puzzles Carlucci, D'Innocente, Bucci, Caputo and Tommasi, 2019

  10. distribution shifts Existing paradigms anticipate the shifts with data or math P Q Q o: test set ⇐ P x: train set ⇒ • Domain adaptation x x o x … X 1 X n X • Data from the test distribution • Hard to know the test distribution • Domain generalization M • Data from the meta distribution … P 1 P n Q Domain generalization via invariant feature representation Muandet, Balduzzi and Scholkopf, 2013 … X 1 X n X Domain generalization for object recognition with multi-task autoencoders Ghifary, Bastiaan, Zhang and Balduzzi, 2015 Domain Generalization by Solving Jigsaw Puzzles Carlucci, D'Innocente, Bucci, Caputo and Tommasi, 2019

  11. distribution shifts Existing paradigms anticipate the shifts with data or math P Q Q o: test set ⇐ P x: train set ⇒ • Domain adaptation x x o x … X 1 X n X • Data from the test distribution • Hard to know the test distribution meta distribution shifts • Domain generalization M Q M P • Data from the meta distribution • Hard to know the meta distribution … P 1 P n Q Domain generalization via invariant feature representation Muandet, Balduzzi and Scholkopf, 2013 … X 1 X n X Domain generalization for object recognition with multi-task autoencoders Ghifary, Bastiaan, Zhang and Balduzzi, 2015 Domain Generalization by Solving Jigsaw Puzzles Carlucci, D'Innocente, Bucci, Caputo and Tommasi, 2019

  12. Certifying some distributional robustness with principled adversarial training Sinha, Namkoong and Duchi, 2017 Towards deep learning models resistant to adversarial attacks Existing paradigms anticipate the shifts with data or math Madry, Makelov, Schmidt, Tsipras and Vladu, 2017 Adversarially robust generalization requires more data Schmidt, Santurkar, Tsipras, Talwar and Madry, 2018 • Domain adaptation • Data from the test distribution • Hard to know the test distribution • Domain generalization • Data from the meta distribution • Hard to know the meta distribution • Adversarial robustness • Topological structure of the test distribution

  13. Certifying some distributional robustness with principled adversarial training Sinha, Namkoong and Duchi, 2017 Towards deep learning models resistant to adversarial attacks Existing paradigms anticipate the shifts with data or math Madry, Makelov, Schmidt, Tsipras and Vladu, 2017 Adversarially robust generalization requires more data Schmidt, Santurkar, Tsipras, Talwar and Madry, 2018 • Domain adaptation • Data from the test distribution • Hard to know the test distribution P • Domain generalization • Data from the meta distribution • Hard to know the meta distribution • Adversarial robustness space of distributions • Topological structure of the test distribution

  14. Certifying some distributional robustness with principled adversarial training Sinha, Namkoong and Duchi, 2017 Towards deep learning models resistant to adversarial attacks Existing paradigms anticipate the shifts with data or math Madry, Makelov, Schmidt, Tsipras and Vladu, 2017 Adversarially robust generalization requires more data Schmidt, Santurkar, Tsipras, Talwar and Madry, 2018 • Domain adaptation worst case P • Data from the test distribution • Hard to know the test distribution P • Domain generalization • Data from the meta distribution Q • Hard to know the meta distribution • Adversarial robustness space of distributions • Topological structure of the test distribution

  15. Certifying some distributional robustness with principled adversarial training Sinha, Namkoong and Duchi, 2017 Towards deep learning models resistant to adversarial attacks Existing paradigms anticipate the shifts with data or math Madry, Makelov, Schmidt, Tsipras and Vladu, 2017 Adversarially robust generalization requires more data Schmidt, Santurkar, Tsipras, Talwar and Madry, 2018 • Domain adaptation worst case P • Data from the test distribution • Hard to know the test distribution P • Domain generalization • Data from the meta distribution Q • Hard to know the meta distribution • Adversarial robustness space of distributions • Topological structure of the test distribution • Hard to describe, especially in high dimension

  16. Existing paradigms anticipate the distribution shifts • Domain adaptation • Data from the test distribution • Hard to know the test distribution • Domain generalization • Data from the meta distribution • Hard to know the meta distribution • Adversarial robustness • Topological structure of the test distribution • Hard to describe, especially in high dimension

  17. Test-Time Training (TTT) • Does not anticipate the test distribution

  18. Test-Time Training (TTT) standard test error = E Q [ ` ( x, y ); ✓ ] • Does not anticipate the test distribution • The test sample gives us a hint about Q x

  19. Test-Time Training (TTT) standard test error = E Q [ ` ( x, y ); ✓ ] our test error = E Q [ ` ( x, y ); ✓ ] ( x ) • Does not anticipate the test distribution • The test sample gives us a hint about Q x • No fixed model, but adapt at test time

  20. Test-Time Training (TTT) standard test error = E Q [ ` ( x, y ); ✓ ] our test error = E Q [ ` ( x, y ); ✓ ] ( x ) • Does not anticipate the test distribution • The test sample gives us a hint about Q x • No fixed model, but adapt at test time • One sample learning problem • No label? Self-supervision!

  21. Rotation prediction as self-supervision (Gidaris et al. 2018) x • Create labels from unlabeled input Unsupervised Representation Learning by Predicting Image Rotations Gidaris, Singh and Komodakis, 2018

  22. Rotation prediction as self-supervision (Gidaris et al. 2018) y s x • Create labels from unlabeled input 0º • Rotate input image by multiples of 90º 90º 180º 270º Unsupervised Representation Learning by Predicting Image Rotations Gidaris, Singh and Komodakis, 2018

  23. Rotation prediction as self-supervision (Gidaris et al. 2018) y s x • Create labels from unlabeled input 0º • Rotate input image by multiples of 90º CNN 90º • Produce a four-way classification problem θ 180º 270º Unsupervised Representation Learning by Predicting Image Rotations Gidaris, Singh and Komodakis, 2018

Recommend


More recommend