Transformation Equivariance vs. Invariance: Unsupervised Learning of Visual Representations Guo-Jun Qi guojunq@gmail.com Laboratory for MA chine P erception and LE arning ( MAPLE ) & Futurewei Technologies (Huawei Research USA) Laboratory for MAchine Perception and LEarning (MAPLE) 1
Contents • TER: Transformation Equivariant Representations • Definition, Steerability • AET: AutoEncoding Transformations • Deterministic approach: AET (AutoEncoding Transformations) • Probabilistic approach: AVT (Autoencoding Variational Transformations) • SAT: (Semi-)supervised Autoencoding Transformations • Conclusions and Future Work • Unifying the Transformation Equivariance and Invariance Laboratory for MAchine Perception and LEarning (MAPLE) 2
Contents • TER: Transformation Equivariant Representations • Definition, Steerability • AET: AutoEncoding Transformations • Deterministic approach: AET (AutoEncoding Transformations) • Probabilistic approach: AVT (Autoencoding Variational Transformations) • SAT: (Semi-)supervised Autoencoding Transformations • Conclusions and Future Work Laboratory for MAchine Perception and LEarning (MAPLE) 3
Recipe in Success of CNNs CNN = Translation-Equivariant Representation + Fully-Connected classifier Visual structures Semantic concepts Horse Convolution Classifier Grass layers Tree … Fully Connected Classifier Translation Equivariant Representation Laboratory for MAchine Perception and LEarning (MAPLE) 4
Transformation Equivariant Representations • Beyond translations: equivariant feature maps under transformations Various transformations Representations Laboratory for MAchine Perception and LEarning (MAPLE) 5
Generalize CNNs beyond translations Transformation Equivariant Representations + Transformation Invariant Classifiers Semantic Spatial Horse Representati FC classifier Grass on Tree … Transformation Invariance Transformation Equivariance Laboratory for MAchine Perception and LEarning (MAPLE) 6
Contents • TER: Transformation Equivariant Representations • Definition, Steerability • AET: AutoEncoding Transformations • Deterministic approach: AET (AutoEncoding Transformations) • Probabilistic approach: AVT (Autoencoding Variational Transformations) • SAT: (Semi-)supervised Autoencoding Transformations • Conclusions and Future Work Laboratory for MAchine Perception and LEarning (MAPLE) 7
Transformation Equivariance • Definition of transformation equivariance 𝐹 𝐮 (𝐲) = 𝝇 𝐮 𝐹(𝐲) • 𝑭 -- the representation of a sample • 𝐮 -- a transformation on samples • 𝝇 𝐮 -- the representation transformation corresponding to 𝝇 . • Transformation invariance is a special case of transformation equivariance when 𝝇 𝐮 is an identity. Laboratory for MAchine Perception and LEarning (MAPLE) 8
Steerability property • Steerability: a transformed sample 𝐮 (𝐲) can be represented directly from the representation 𝐹 𝐲 of original sample, with no access to 𝐲 • 𝝇(𝐮) is a function of the transformation 𝐮 , independently of sample. 𝐹 𝐮 (𝐲) = 𝝇 𝐮 [𝐹(𝐲)] Laboratory for MAchine Perception and LEarning (MAPLE) 9
Our Goals • For general transformations • unnecessarily limited to discrete or spatial transformations • Recoloring, contrasting, etc. • Nonlinear representations between transformed and original images • Capturing complex visual structures from transformed images 𝐹 𝐮 (𝐲) = 𝝇 𝐮 [𝐹(𝐲)] Nonlinear transformations 𝝇 𝐮 on representations Laboratory for MAchine Perception and LEarning (MAPLE) 10
Contents • TER: Transformation Equivariant Representations • Definition, Steerability • AET: AutoEncoding Transformations • Deterministic approach: AET (AutoEncoding Transformations) • Probabilistic approach: AVT (Autoencoding Variational Transformations) • SAT: (Semi-)supervised Autoencoding Transformations • Conclusions and Future Work Laboratory for MAchine Perception and LEarning (MAPLE) 11
A Big Picture: Stack of AET • AET learns a general representation that can be applied everywhere. Stack of Autoencoding Transformations for learning TER Deterministic SAT Probabilistic SAT AED CNN Deterministic AET Probabilistic AVT SAT AET Group Equivariant CNN Transformation Equivariant Representations AVT Capsule Net SAT: (Semi-)Supervised Autoencding Transformations AET: AutoEncoding Transformations Autoencoders CNNs AVT: Autoencoding Variational Transformations Laboratory for MAchine Perception and LEarning (MAPLE) 12
Contents • TER: Transformation Equivariant Representations • Definition, Steerability • AET: AutoEncoding Transformations • Deterministic approach: AET (AutoEncoding Transformations) • Probabilistic approach: AVT (Autoencoding Variational Transformations) • SAT: (Semi-)supervised Autoencoding Transformations • Conclusions and Future Work Laboratory for MAchine Perception and LEarning (MAPLE) 13
Take A Glance: Autoencoding Transformations than Data E ( x ) E D x 𝐲 AutoEncoding Data (AED) x E ( x ) E D 𝐮 𝐮 E ( t(x) ) E t(x) AutoEncoding Transformations (AET) Zhang et al., AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations rather Laboratory for MAchine Perception and LEarning (MAPLE) 14 than Data, in CVPR 2019.
How does AET work? • Generative Process: • An input image x ~ p (x) x • A random transformation t ~ p (t) E ( x ) E • The transformed t(x) D 𝐮 𝐮 • A representation encoder • E : x ⟼ E(x), t(x) ⟼ E(t(x)) E ( t(x) ) E • A transformation decoder • D : (x , t(x)) ⟼ D(x , t(x)) t(x) Laboratory for MAchine Perception and LEarning (MAPLE) 15
Decoding Transformations • A Siamese network individually encodes representations of images • E.g., Visual structures, and spatial relations among objects x E ( x ) E D 𝐮 𝐮 E ( t(x) ) E t(x) • Decoding by comparing the representations before and after transformations. Laboratory for MAchine Perception and LEarning (MAPLE) 16
AET loss for training • Parameterized Transformations: 𝒰 = {t θ | θ ~ Ө } 1 2 ||𝑁 θ − 𝑁( 2 E.g. affine or projective: 𝑚 𝐮 θ , 𝐮 = θ )|| 2 θ • GAN-Induced Transformations: transformed image G( x , z ) z = 1 2 𝑚 𝐮 z , 𝐮 2 ||𝐴 − z|| 2 • Non-Parametric Transformations 𝐮 = 1 𝑚 𝐮, 2 𝔽 𝐲~𝑌 dist(𝐮 𝐲 , 𝐮(𝐲)) Laboratory for MAchine Perception and LEarning (MAPLE) 17
Contents • TER: Transformation Equivariant Representations • Definition, Steerability • AET: AutoEncoding Transformations • Deterministic approach: AET (AutoEncoding Transformations) • Probabilistic approach: AVT (Autoencoding Variational Transformations) • SAT: (Semi-)supervised Autoencoding Transformations • Conclusions and Future Work Laboratory for MAchine Perception and LEarning (MAPLE) 18
Revisit: Steerability of TER • Obtain the representation 𝐴 of a transformed sample 𝐮(𝐲) from 𝐮 and 𝐴 without accessing x 𝐹 𝐮 (𝐲) = 𝝇 𝐮 [𝐹(𝐲)] • Maximizing the mutual information between 𝐴 and ( 𝐴, 𝐮 ) t(x) 𝐴 𝐮 transformation 𝐴 Steerability of z through 𝝇 𝐮 and 𝐴 x Laboratory for MAchine Perception and LEarning (MAPLE) 19
An Information-Theoretical Insight t(x) 𝐴 𝐮 • Train a TER model 𝜾 by maximizing transformation max 𝜾 𝑱 𝜾 (𝐴; 𝐴, 𝐮) 𝐴 • By chain rule of mutual information, we have x 𝑱 𝜾 𝐴; 𝐴, 𝐮 = 𝑱 𝜾 𝐴; 𝐴, 𝐮, 𝐲 − 𝑱 𝜾 𝐴; 𝐲| 𝐴, 𝐮 ≤ 𝑱 𝜾 𝐴; 𝐴, 𝐮, 𝐲 • 𝑱 𝜾 𝐴; 𝐴, 𝐮 attains its maximum value 𝑱 𝜾 𝐴; 𝐴, 𝐮, 𝐲 (the upper bound) when 𝑱 𝜾 𝐴; 𝐲| 𝐴, 𝐮 = 𝟏 Steerability : Given ( 𝐴, 𝐮 ), x contains no more information about z . • Nonlinearity of transformation 𝝇 𝐮 in representations. Laboratory for MAchine Perception and LEarning (MAPLE) 20
AVT: Autoencoding Variational Transformations • Unable to maximize the mutual information directly • Intractable to evaluate the posterior 𝑞 𝜄 (𝐮|𝐴, 𝐲) • Deriving a lower bound by introducing a transformation decoder 𝑟 𝝔 𝑱 𝜾 𝐴; 𝐴, 𝐮 ≥ 𝐼(𝐮| 𝐴) + 𝔽 𝒒 𝜾 𝒖,𝒜, 𝐴 log𝑟 𝝔 (𝒖|𝒜, 𝐴) • Unsupervised loss to learn AVT max 𝜾,𝝔 𝔽 𝒒 𝜾 𝐮,𝐴, 𝐴 log𝑟 𝝔 (𝒖|𝒜, 𝐴) Qi, Learning Generalized Transformation Equivariant Representations via Autoencoding Transformations, preprint. Laboratory for MAchine Perception and LEarning (MAPLE) 21
AVT: Autoencoding Variational Transformations • Generative process • Given an image x sampled from p ( x ) x • Sample a transformation t from p ( t ) 𝑞 𝜄 (𝐴|𝐲, 𝟐) 𝐴 • Apply t to x , resulting in t(x) 𝑟 𝜚 (𝐮|𝐴, 𝐴) 𝐮 𝐮 • Sample a representation z of t(x) from 𝐴 𝑞 𝜄 (𝐴|𝐲, 𝐮) 𝑞 𝜄 (𝐴|𝐲, 𝐮) t(x) • 𝐴 is sampled by setting t to an identity • Decode transformations 𝐮 from 𝑟 𝜚 (𝐮|𝐴, 𝐴) AVT Laboratory for MAchine Perception and LEarning (MAPLE) 22
Recommend
More recommend