Set Transformer: A Framework for Attention-based - PowerPoint PPT Presentation

Set Transformer:   A Framework for Attention-based Permutation-Invariant Neural Networks Juho Lee, Yoonho Lee, Jungtaek Kim , Adam R. Kosiorek, Seungjin Choi, and Yee Whye Teh

Set-input problems and Deep Sets [Zaheer et al., 2017] • Take sets (variable lengths, order does not matter) as inputs • Application includes multiple instance learning, point-cloud classification, few-shot image classification, etc. • Deep Sets: a simple way to construct permutation invariant set-input nerual networks, but does not effectively modeling interactions between elements in sets. f ( X ) = ρ ( ∑ ϕ ( x ) ) . x ∈ X

Attention based set operations • Use multihead self-attention [Vaswani et al., 2017] to encode interactions between elements in a set. X Y y ⊤ x ⊤ 1 1 y ⊤ x ⊤ XW q W ⊤ k Y ⊤ 2 Att( X , Y ) = softmax ( ) YW v . 2 ⋮ ⋮ y ⊤ x ⊤ m d n k ⊤ v ⊤ × 1 1 q ⊤ SelfAtt( X ) = Att( X , X ) . 1 k ⊤ v ⊤ × q ⊤ 2 2 2 ⋮ ⋮ ⋮ q ⊤ k ⊤ v ⊤ n × m m Q = XW q K = YW k V = YW v • Note that a self-attention is permutation equivariant , SelfAtt( π ⋅ X ) = π ⋅ SelfAtt( X )

Set transformer - building blocks • Multihead attention block (MAB): residual connection + multihead QKV attention followed by a feed-forward layer MAB( X , Y ) = FFN( WX + Att( X , Y )) . • Self attention block (SAB): MAB applied in self-attention way, O ( n 2 ) SAB( X ) = MAB( X , X ) . • Induced self-attention block (ISAB): introduce a set of trainable inducing O ( nm ) m points to simulate self-attention, with inducing points. ISAB( X ) = MAB( X , MAB( I , X )) . x ⊤ 1 i ⊤ h ⊤ 1 1 MAB( I , X ) x ⊤ = 2 ⋮ ⋮ o ⊤ ⋮ 1 i ⊤ h ⊤ m m o ⊤ x ⊤ = n 2 ⋮ o ⊤ x ⊤ n MAB( X , MAB( I , X )) 1 x ⊤ 2 ⋮ x ⊤ n

Set transformer - building blocks • Pooling by multihead attention (PMA): instead of a simple sum/max/min aggregation, use multihead attention to aggregate features into a single vector. • Introduce a trainable seed vector , and use it to produce one output vector. o = PMA 1 ( Z ) = MAB( s , Z ) • Use multiple seed vectors and apply self-attention to produce multiple interacting outputs (e.g., explaining away) S = [ s ⊤ 1 , …, s ⊤ O = SelfAtt(PMA k ( Z )) = SelfAtt(MAB( S , Z )) k ] .

Set transformer - architecture • Encoder: a stack of permutation-equivarinat ISABs. Z X z ⊤ x ⊤ 1 1 z ⊤ x ⊤ … ISAB 1 ISAB 2 ISAB L 2 2 ⋮ ⋮ z ⊤ x ⊤ n n • Decoder: PMA followed by self-attention to produce outputs. z ⊤ 1 o ⊤ 1 s ⊤ MAB( S , X ) z ⊤ 1 … SAB 1 ⋮ SAB L 2 ⋮ ⋮ o ⊤ s ⊤ k z ⊤ k n

Experiments • Amortized clustering - learn a mapping from dataset to clustering Deep Sets Set transformer

Experiments • Works well for various tasks such as unique character counting, amortized clustering, point cloud classification, and anomaly detection • Generalize well with small number of inducing points • Attentions both in encoder (ISAB) and decoder (PMA + SAB) are important for the performance.

Conclusion • New set-input neural network architecture • Can efficiently model pairwise/higher order interactions between elements in sets • Demonstrated to work well for various set-input tasks • Code available at https://github.com/juho-lee/set_transformer

References [Qi et al., 2017] Qi, R. C., Su, H., Mo, K., and Guibas, J. L. PointNet: Deep learning on point sets for 3D classification and segmentation. CVPR, 2017.   [Vinyals et al., 2016] Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., and Wierstra, D. Matching networks for one shot learning. NIPS, 2016.   [Zaheer et al., 2017] Zaheer, M., Kottur, S., Ravanbhakhsh, S., Póczos, B., Salakhutdinov, R., and Smola, A. J. Deep sets. NIPS, 2017.   [Wagstaff et al, 2019] Wagstaff, E., Fuchs, F. B., Engelcke, M., Posner, I., and Osborne, M. On the limitations of representing functions on sets. arXiv:1901.09006, 2019.   [Cybenko 1989] Cybenko, G. Approximation by superpositions of sigmoidal functions. Mathematics of Control, Signals, and Systems, 2(4), 303314, 1989.   [Shi et al., 2015] Shi, B., Bai, S., Zhou, Z., and Bai, X. DeepPano: deep panoramic representation for 3-D shape recognition. IEEE Signal Processing Letters, 22(12):2339–2343, 2015.   [Su et al., 2015] Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. Multi-view convolutional neural networks for 3d shape recognition. ICCV, 2015.   [Snell et al., 2017] Snell, J., Swersky, K., and Zemel, R. Prototypical networks for few-shot learning. NIPS, 2017.   [Ilse et al., 2018] Ilse, M., Tomczak, J. M., and Welling, M. Attention-based deep multiple instance learning. ICML, 2018.   [Garnelo et al., 2018] Garnelo, M., Rosenbaum, D., Maddison, C. J., Ramalho, T., Saxton, D., Shanahan, M., Teh, Y. W., Rezende, D. J., and Eslami, S. M. A. ICML, 2018.   [Vaswani et al., 2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł ., and Polosukhin, I. Attention is all you need. NIPS, 2017.

Set Transformer: A Framework for Attention-based - PowerPoint PPT Presentation

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks Juho Lee, Yoonho Lee, Jungtaek Kim , Adam R. Kosiorek, Seungjin Choi, and Yee Whye Teh Set-input problems and Deep Sets [Zaheer et al., 2017] Take

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

West Cape Transformer Replacement Purpose Purchase a 50 MVA Load Tap Changer power

IEEE Transformer Committee PC57.167 Distribution Transformer Monitoring - User Mark Scarborough

Transformer Program Trevor Foster Electrical Engineering Manager Calpine Transformer Program 1

Electronic DC Transformer Pavol Bauer Learning objectives What is an electronic DC

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Magnetics Design 3.1 Important magnetic equations 3.2 Magnetic losses 3.3 Transformer 3.3.1

Attention is All You Need (Vaswani et. al. 2017) Slides and figures when not cited are from:

GPT-2 Language Models are Unsupervised Multi-Task Learners GPT-2 Fvrier 2019 Transformer XL

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

The Attention Economy What is the attention economy? A business model where you (as the

Input. A set of men M , and a set of women W . Input. A set of men M , and a set of women W .

Utility Task Force Corrosion Pictures for IEEE C57.12.24 Fall 2018 & Submersible Transformer

APBA Football Innovation Innovation not previously discussed Ease of use without slowing

Transformer Models CSE545 - Spring 2019 Review: Feed Forward Network Z (full-connected)

Lecture 12: Attention and Transformers Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel

Extended Foster Care Hot Topics Susan Zimny Program and Policy Analyst, Transition Age Youth

A System-Wide Debugging Assistant Powered by Natural Language Processing Karthik Narasimhan

Persistence of Gaussian Stationary Processes: a spectral perspective Naomi Feldheim (Stanford)

Dynamic Fitness Functions for Genetic Improvement in Compilers and Interpreters Kyoto 16.7.2018

o-glassesX: Compiler Provenance Recovery with Attention Mechanism from a Short Code Fragment

Set Transformer: A Framework for Attention-based - PowerPoint PPT Presentation

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks Juho Lee, Yoonho Lee, Jungtaek Kim , Adam R. Kosiorek, Seungjin Choi, and Yee Whye Teh Set-input problems and Deep Sets [Zaheer et al., 2017] Take

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

West Cape Transformer Replacement Purpose Purchase a 50 MVA Load Tap Changer power

IEEE Transformer Committee PC57.167 Distribution Transformer Monitoring - User Mark Scarborough

Transformer Program Trevor Foster Electrical Engineering Manager Calpine Transformer Program 1

Electronic DC Transformer Pavol Bauer Learning objectives What is an electronic DC

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Magnetics Design 3.1 Important magnetic equations 3.2 Magnetic losses 3.3 Transformer 3.3.1

Attention is All You Need (Vaswani et. al. 2017) Slides and figures when not cited are from:

GPT-2 Language Models are Unsupervised Multi-Task Learners GPT-2 Fvrier 2019 Transformer XL

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

The Attention Economy What is the attention economy? A business model where you (as the

Input. A set of men M , and a set of women W . Input. A set of men M , and a set of women W .

Utility Task Force Corrosion Pictures for IEEE C57.12.24 Fall 2018 &amp; Submersible Transformer

APBA Football Innovation Innovation not previously discussed Ease of use without slowing

Transformer Models CSE545 - Spring 2019 Review: Feed Forward Network Z (full-connected)

Lecture 12: Attention and Transformers Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel

Extended Foster Care Hot Topics Susan Zimny Program and Policy Analyst, Transition Age Youth

A System-Wide Debugging Assistant Powered by Natural Language Processing Karthik Narasimhan

Persistence of Gaussian Stationary Processes: a spectral perspective Naomi Feldheim (Stanford)

Dynamic Fitness Functions for Genetic Improvement in Compilers and Interpreters Kyoto 16.7.2018

o-glassesX: Compiler Provenance Recovery with Attention Mechanism from a Short Code Fragment

Utility Task Force Corrosion Pictures for IEEE C57.12.24 Fall 2018 & Submersible Transformer