General Equivariance Zhuohui Zhang Amos Gropp Department of Computer Science & Applied Math Weizmann Institute of Science AGMDL, 2019 Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 1 / 19
Outline Definitions 1 Compact Group What is a CNN Compact Groups and Equivariance Equivariant MFF-NN and G -CNN How to prove? 2 Three ways to look at representations The Proof Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 2 / 19
What is a compact group? A group is compact if it is closed and bounded . Finite groups are considered to be compact. A compact group has finite volume, one can take integration 1 � 1 � f �→ f ( g ) dg or f �→ f ( g ) | G | | G | g ∈ G Convolution on G : � f ( xy − 1 ) g ( y ) dy ( f ∗ g )( x ) = G and convolution theorem, Fourier coefficients of f ∗ g is the elementwise product of Fourier coefficients of f and g : � ( f ∗ g ) = ˆ f ⊙ ˆ g What is a representation : a (complex) vector space V on which G acts as linear maps(matrices). g ∈ G � g ∈ GL ( V ) We allow complex matrix entries. Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 3 / 19
Examples Group Representation Permutation group S n Permutation matrices 1-dim rep’s e 2 π im θ Rotation group SO (2) Roation group SO (3) 3-dim rotation matrices Quaternions (2-dim) Torus [0 , 1] n 1-dim rep’s e 2 π i ( m 1 θ 1 + m 2 θ 2 + ... + m n θ n ) Irreducibility of V : there is no subrepresentation W ⊂ V closed under group action. Reducible = ⇒ simultaneous block-diagonalization: possible to choose { e , g 1 , g 2 . . . } ⊂ GL ( V ) with g i ’s block-diagonal of the same shape: � g i , 1 � 0 g i = 0 g i , 2 G -equivariant map between representations: hom G ( V 1 , V 2 ) = { M : V 1 → V 2 | M ( gv ) = g ( M ( v )) } Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 4 / 19
Functions on G / H , H \ G , H \ G / K How to view the functions on the coset spaces G / H , H \ G , H \ G / K ? First way: as functions on the coset spaces G / H , H \ G , H \ G / K . For f ∈ L 2 ( G ), there are projection operators to L 2 ( G / H ) , L 2 ( H \ G ) and L 2 ( H \ G / K ) 1 � G / H Avg H f ( x ) = H f ( xh ) dh | H | 1 � H \ G Avg H f ( x ) = H f ( hx ) dh | H | 1 � � H \ G / K Avg H , K f ( x ) = K f ( hxk ) dhdk | H || K | H Second way: are functions invariant on the left by H , on the right by H or on the left by H while on the right by K . There are lifting operators from these cosets to G : L 2 ( G / H ) → L 2 ( G ) L 2 ( H \ G ) L 2 ( H \ G / K ) f �→ ˜ f Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 5 / 19
Convolution on G / H , H \ G , H \ G / K We can also define convolutions of functions on cosets by simply taking the convolutions of their lifts, and then descend to some cosets according to their invariance properties: f ∈ L 2 ( G ) g ∈ L 2 ( G / H ) f ∗ g ∈ L 2 ( G / H ) f ∈ L 2 ( G / H ) g ∈ L 2 ( H \ G ) f ∗ g ∈ L 2 ( G ) f ∈ L 2 ( G / H ) g ∈ L 2 ( H \ G / K ) f ∗ g ∈ L 2 ( G / K ) where f ∗ g := ˜ f ∗ ˜ g Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 6 / 19
Examples G , H G / H and functions f = � a n cos( n θ ) G = S 1 = { e i θ } H = {± 1 } Spherical harmonics Y m ℓ ( θ, ϕ ) on S 2 G = SO (3), H = SO (2) G = S n , H = S k , K = S n − k Size k subsets in { 1 , . . . , n } Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 7 / 19
What is a CNN A layer is considered as the space of maps: L l := L ( X l , V l ) = { f l : X l → V l } associating each node in the index set to a vector in V l Some group G acts on the index set X l of each layer. The action can be transferred to L ( X l , V l ): ( g · f )( x ) = f ( g − 1 x ) L ( X l , V l ) is a vector space with a linear group action. Therefore it is a representation of G . Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 8 / 19
What is a CNN Between each pairs of layers L l − 1 and L l there is An (affine) linear map φ l : L l − 1 → L l A nonlinearity σ l : V l → V l . A MFF-NN is a sequence of such maps � L 1 � L 2 � . . . L 0 � ( σ 1 ◦ φ 1 )( f 0 ) � ( σ 2 ◦ φ 2 )( σ 1 ◦ φ 1 )( f 0 ) � . . . f 0 What are the φ l ’s? φ l can be arbitrary linear functions on L l − 1 (fully connected) � φ l : f l − 1 �→ w l ( y , x ) f ( y ) dy X l − 1 where we represent the weights w ( y , x ) as a function on X l − 1 × X l learned through back-propagation, can be a convolution kernel χ l ( xy − 1 ). Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 9 / 19
Equivariance and G -CNN We require the index set be X i = G / H i for some close subgroup H i ⊂ G . Also require the map φ l : L ( X l − 1 , V l − 1 ) → L ( X l , V l ) be G - equivariant g ◦ ( φ l ( f )) = φ l ( g ◦ f ) A MFF-NN is called a G -CNN if the index sets are X i = G / H i , and the linear maps φ l : L ( X l − 1 , V l − 1 ) → L ( X l , V l ) are convolutions φ l ( f l − 1 ) = f l − 1 ∗ χ l for some filter χ l on H l − 1 \ G / H l , with value in V l − 1 × V l , or more correctly, V ∗ l − 1 ⊗ V l . Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 10 / 19
Explicitly, set d l = dim V l , writing down coordinates of each function in L l − 1 , χ 1 , 1 . . . χ 1 , d l . . . . ( a 1 , . . . , a d l − 1 ) ∗ . . χ d l − 1 , 1 . . . χ d l − 1 , d l d l − 1 d l − 1 � � =( a i ∗ χ i , 1 , . . . , a i ∗ χ i , d l ) 1 1 where a i : X l − 1 → C , χ i , j : H l − 1 \ G / H l → C are the coordinate functions of the layer and the filter, respectively. Theorem A MFF-NN with each layer indexed by X l = G / H l is G-equivariant if and only if it is a G-CNN. Proving properties of a function X l → V can be reduced to proving properties of functions X l → C . Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 11 / 19
First Way to Look at Representations Intuitively, we can view a group representation V as a specification of matrix representations for elements g ∈ G . Permutation group S 3 , this representation is NOT irreducible 1 0 0 1 0 0 0 1 0 , , 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 , , 0 0 1 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 Circle S 1 ∼ � e im θ � 1 × 1 matrices with m ∈ Z . = SO (2): irreducibles are � e im θ � � � cos m θ sin m θ 0 � e − im θ − sin m θ cos m θ 0 Splits as two 1-dim matrices. Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 12 / 19
First Way to Look at Representations Rotation group SO (3) with ( ψ, θ, φ )-Euler angle coordinates. There is one such irreducible matrix in each dimension: � � 2 e − ψ + φ φ − ψ i − sin θ cos θ i 2 2 e 2 2 e − φ − ψ ψ + φ i i sin θ cos θ 2 2 e 2 2 e − i ( ψ + φ ) − e − i ψ sin θ cos 2 θ sin 2 θ 2 e i ( φ − ψ ) √ 2 e − i φ sin θ − ei φ sin θ cos θ √ √ 2 2 ei ψ sin θ sin 2 θ 2 e − i ( φ − ψ ) cos 2 θ 2 e i ( ψ + φ ) √ 2 2 e − 3 2 e − 1 1 � 3 � 2 i ( ψ + φ ) 2 i (3 ψ + φ ) 2 i ( φ − 3 ψ ) 2 i ( φ − ψ ) cos3 θ − 1 √ 3 sin2 θ csc θ 1 √ 3 sin θ sin3 θ 2 sin( θ ) e − e 4 2 2 2 e − 1 2 (3 cos θ − 1) e − 1 1 1 2 i ( ψ +3 φ ) 2 i ( ψ + φ ) 2 i ( φ − ψ ) 2 i (3 φ − ψ ) 1 3 sin2 θ csc θ 1 − 1 1 √ 2 cos θ 2 sin θ √ 3 sin θ 2 (3 cos θ +1) e 2 sin θ e 4 2 2 sin θ e − 1 2 (3 cos( θ )+1) e − 1 � θ 1 1 2 i (3 φ − ψ ) 2 i ( φ − ψ ) 2 i ( ψ + φ ) − 1 2 i ( ψ +3 φ ) 1 √ 3 sin θ 1 2 sin θ 1 � √ 3 sin2 θ csc θ 2 cos (3 cos θ − 1) e 2 e 2 2 4 2 e − 3 2 sin θ e − 1 1 3 2 i ( φ − ψ ) 2 i ( φ − 3 ψ ) 2 i (3 ψ + φ ) 2 i ( ψ + φ ) sin3 θ 1 √ 3 sin θ 1 √ 3 sin2 θ csc θ cos3 θ 2 e 2 e 2 4 Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 13 / 19
Second Way to Look at Representations We can look at representations as a vector space V with G action. Representations V , W are said to be isomorphic if it is possible to choose a basis for each space with the same G -action matrices. Space of G -equivariant maps f ∈ hom G ( V , W ) f ( gv ) = g ( f ( v )) Schur’s lemma: if V , W are irreducible, then � 0 if V ≇ W hom G ( V , W ) = if V ∼ C I = W In general, if we can break V , W into irreducibles V = ⊕ V i and W = ⊕ W j hom G ( V , W ) = ⊕ i ⊕ j hom G ( V i , W j ) Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 14 / 19
Why Schur’s lemma makes sense? For the circle group SO (2), irreducible representations are 1-dimensional representations with an abstract basis vector v m on which G acts by: v n �→ e in θ v n Consider a G -equivariant map φ l acting on the basis elements � φ l ( v n ) = b m , n v m m Since the map φ l is G -equivariant, we should have � φ l ( e in θ v n ) = b m , n e im θ v m m But this map is also linear, so φ l ( e in θ v n ) = � b m , n e in θ v m m m b m , n e i ( m − n ) θ v m = � Thus � m b m , n v m for every θ . b m , n is required to vanish except for m = n . Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 15 / 19
Recommend
More recommend