Yeni Kavramları En Az Denetim ile Temsil Etme ve A¸ cıklama Zeynep Akata Bilim Akademisi - Bilkent ¨ Universitesi Yapay ¨ O˘ grenme Yaz Okulu 2020 30 Haziran 2020 1
Outline Generalized Low-Shot Learning with Side-Information Generating Natural Language Explanations for Visual Decisions Summary and Future Work 2
Outline Generalized Low-Shot Learning with Side-Information Generating Natural Language Explanations for Visual Decisions Summary and Future Work 3
Data Distribution in Large-Scale Datasets Akata et.al. TPAMI’14 number of images number of classes 4
Learning via Explanation Lombrozo TICS’16 5
Learning via Explanation Lombrozo TICS’16 5
Learning via Explanation Lombrozo TICS’16 5
Learning via Explanation Lombrozo TICS’16 5
Learning via Explanation Lombrozo TICS’16 5
Attributes as Explanations Lampert et al. CVPR’09 images attributes class black-white has tail zebra lives on land [1 0 1 1 0 1] small gray [0 1 1 0 1 0] has tail whale lives in water big 6
Attributes as Explanations Lampert et al. CVPR’09 images attributes class black-white has tail zebra lives on land [1 0 1 1 0 1] small gray [0 1 1 0 1 0] has tail whale lives in water big 6
Attributes as Explanations Lampert et al. CVPR’09 images attributes class black-white has tail zebra lives on land [1 0 1 1 0 1] small gray [0 1 1 0 1 0] has tail whale lives in water big 6
Generalized Zero-Shot Learning images attributes black-white has tail ... lives on land small black-white no tail ... lives on land medium gray has tail ... lives in water big white has tail lives on land tiny 7
Muldimodal Embeddings Akata et al. CVPR’13 & TPAMI’16 IMAGE CLASS CLASS IMAGES FEATURES ATTRIBUTES LABELS zebra black whale white 8
Multimodal Embeddings Akata et al.CVPR’13 & TPAMI’16 S = { ( x, y, ϕ ( y )) | x ∈ X , y ∈ Y s , ϕ ( y ) ∈ C} and U = { ( y, ϕ ( y )) | y ∈ Y u , ϕ ( y ) ∈ C} 9
Multimodal Embeddings Akata et al.CVPR’13 & TPAMI’16 S = { ( x, y, ϕ ( y )) | x ∈ X , y ∈ Y s , ϕ ( y ) ∈ C} and U = { ( y, ϕ ( y )) | y ∈ Y u , ϕ ( y ) ∈ C} Learn f : X → Y by minimizing regularized empirical risk: N 1 � L ( y n , f ( x n ; W )) + Ω( W ) N n =1 L ( . ) = loss function, Ω( . ) = regularization term, using pairwise ranking loss: 9
Multimodal Embeddings Akata et al.CVPR’13 & TPAMI’16 S = { ( x, y, ϕ ( y )) | x ∈ X , y ∈ Y s , ϕ ( y ) ∈ C} and U = { ( y, ϕ ( y )) | y ∈ Y u , ϕ ( y ) ∈ C} Learn f : X → Y by minimizing regularized empirical risk: N 1 � L ( y n , f ( x n ; W )) + Ω( W ) N n =1 L ( . ) = loss function, Ω( . ) = regularization term, using pairwise ranking loss: � L ( x n , y n , y ; W ) = [∆( y n , y ) + F ( x n , y ; W ) − F ( x n , y n ; W )] + y ∈Y s with the compatibility function: F ( x, y ; W ) = θ ( x ) T Wϕ ( y ) 9
Benchmark Example Datasets 10
Benchmark Results Xian et al. CVPR 2017 CUB AWA Method u s H u s H Supervised Learning – 82 . 1 – – 96 . 2 – Multimodal Embeddings 23 . 7 62 . 8 34 . 4 16 . 8 76 . 1 27 . 5 �Y u/s � # samples in c and H = 2 ∗ acc Y s ∗ acc Y u 1 # correct in c � u / s : acc Y u/s = �Y u/s � acc Y s + acc Y u c =1 11
How to Tackle the Missing Data Problem? Labels are difficult to obtain, attributes require expert knowledge 12
How to Tackle the Missing Data Problem? Labels are difficult to obtain, attributes require expert knowledge Proposed solution: Free text to image synthesis! 12
Detailed Visual Descriptions as Side Information Reed et al. CVPR’16 The bird has a white This bird has This swimming bird underbelly, black distinctive-looking has a black crown feathers in the wings, brown and white with a large white a large wingspan, and stripes all over its strip on its head, a white beak. body, and its brown and yellow eyes. tail sticks up. This flower has a Light purple petals This flower is yellow central white blossom with orange and and orange in color, surrounded by large black middle green with petals that are pointed red petals leaves ruffled along the which are veined and edges. leaflike. 13
Deep Representations of Text Reed et al. CVPR’16 Sequential encoding Convolutional encoding The beak is yellow and pointed and the wings are blue. 14
GAN 1 Conditioned on Text Reed et al. ICML’16 & NIPS’16 φ φ φ(t) x := G (z, φ (t)) D (x’, φ (t)) z ~ N(0,1) This flower has small, round violet This flower has small, round violet petals with a dark purple center petals with a dark purple center Generator Network Discriminator Network 1 Generative Adversarial Networks [Goodfellow et al. NIPS’14] 15
Text to Image Synthesis Results ‘Blue bird with black beak’ → ‘Red bird with black beak’ ‘This bird is completely red with black wings’ ‘Small blue bird with black wings’ → ‘A small sized bird that has a cream belly and ‘Small yellow bird with black wings’ a short pointed bill’ ‘This bird is bright.’ → ‘This bird is dark.’ ‘This is a yellow bird. The wings are bright blue’ 16
Generalized Zero-Shot Learning with Synthesized Images CUB Data u s H Only real data 23 . 7 62 . 8 34 . 4 17
Generalized Zero-Shot Learning with Synthesized Images CUB Data u s H Only real data 23 . 7 62 . 8 34 . 4 With generated images 23 . 8 48 . 5 31 . 9 This is not better than having no images! 17
Head color: red x seen Back color: black Crown color: red x g Wing shape: short G ( z , a ) unseen z ~ N ( 0 , 1 ) ? ResNet space f-CLSWGAN for Text to Image Feature Synthesis Xian et al. CVPR’18 synthetic CNN feature image space CNN This is a small bird f-CLSWGAN with a brown head and a yellow belly. real CNN image 18
Head color: red x seen Back color: black Crown color: red x g Wing shape: short G ( z , a ) unseen z ~ N ( 0 , 1 ) ? ResNet space f-CLSWGAN for Text to Image Feature Synthesis Xian et al. CVPR’18 synthetic CNN feature image space CNN This is a small bird f-CLSWGAN with a brown head and a yellow belly. real CNN image S = { ( x, y, ϕ ( y )) | x ∈ X , y ∈ Y s , ϕ ( y ) ∈ C} and x = G ( z, ϕ ( y )) , y ∈ Y u , ϕ ( y ) ∈ C} : combine to train a classifier U = { (˜ x, y, ϕ ( y )) | ˜ 18
Generalized Zero-Shot Learning with Synthesized Image Features CUB Data u s H Only real data 23 . 7 62 . 8 34 . 4 With generated images 23 . 8 48 . 5 31 . 9 19
Generalized Zero-Shot Learning with Synthesized Image Features CUB Data u s H Only real data 23 . 7 62 . 8 34 . 4 With generated images 23 . 8 48 . 5 31 . 9 With generated features ( f-CLSWGAN ) 43 . 7 57 . 7 49 . 7 19
E1 D1 red head pink belly E2 brown wings D2 gray beak CADA-VAE for Text to Latent Feature Synthesis Sch¨ onfeld et al. CVPR’19 E1 D1 red head pink belly E2 brown wings D2 gray beak 20
DETAILED FIGURE COMPACT FIGURES SLIGHTLY MORE DETAILED FIGURES (SMALL ENOUGH TO PUT 3 IN A ROW) (THE EQUATIONS ON THE RIGHT ARE (PROBABLY TOO BIG TO PUT 3 IN A THE CROSS-RECONSTRUCTION LOSS. ROW) THE BASIC VAE LOSS IS NOT SHOWN) CADA-VAE: E1 D1 D1 E1 D2 E2 E2 D2 Current choice: CADA-VAE for Text to Latent Feature Synthesis Sch¨ onfeld et al. CVPR’19 E1 D1 E1 D1 D1 E1 DA-VAE: red head pink belly E2 brown wings D2 D2 E2 gray beak E2 D2 20 D1 E1 E1 D1 CA-VAE: D2 E2 E2 D2
DETAILED FIGURE COMPACT FIGURES SLIGHTLY MORE DETAILED FIGURES (SMALL ENOUGH TO PUT 3 IN A ROW) (THE EQUATIONS ON THE RIGHT ARE (PROBABLY TOO BIG TO PUT 3 IN A THE CROSS-RECONSTRUCTION LOSS. ROW) THE BASIC VAE LOSS IS NOT SHOWN) CADA-VAE: E1 D1 D1 E1 D2 E2 E2 D2 Current choice: CADA-VAE for Text to Latent Feature Synthesis Sch¨ onfeld et al. CVPR’19 E1 D1 E1 D1 D1 E1 DA-VAE: red head pink belly E2 brown wings D2 D2 E2 gray beak E2 D2 S = { ( z, y, c ) | z ∈ z 1 , y ∈ Y s , c ∈ C} and U = { ( z, y, c ) | z ∈ z 2 , y ∈ Y u , c ∈ C} : combine to train a classifier 20 D1 E1 E1 D1 CA-VAE: D2 E2 E2 D2
Generalized Zero-Shot Learning with Latent Features CUB Data u s H Only real data 23 . 7 62 . 8 34 . 4 With generated images 23 . 8 48 . 5 31 . 9 With generated features ( f-CLSWGAN ) 43 . 7 57 . 7 49 . 7 With generated features ( CADA-VAE ) 63 . 6 51 . 6 52 . 4 21
Seen Feature Reconstruction ( f-VAE ) Decoder/Generator(G) Encoder (E) Discriminator1 (D 1 ) Cape May Novel Feature Warbler Generation ( f-WGAN ) Discriminator2 VAE (D 2 ) Transductive GAN Learning D2 ( D2 ) f-VAEGAN-D2 for Text to Image Feature Synthesis Xian et al. CVPR’19 Seen Feature Reconstruction ( f-VAE ) Decoder/Generator(G) Encoder (E) Cape May Warbler 22
Seen Feature Reconstruction ( f-VAE ) Decoder/Generator(G) Encoder (E) Discriminator1 (D 1 ) Cape May Novel Feature Warbler Generation ( f-WGAN ) Discriminator2 VAE (D 2 ) Transductive GAN Learning D2 ( D2 ) f-VAEGAN-D2 for Text to Image Feature Synthesis Xian et al. CVPR’19 Seen Feature Reconstruction ( f-VAE ) Decoder/Generator(G) Encoder (E) Discriminator1 (D 1 ) Cape May Novel Feature Warbler Generation ( f-WGAN ) 22
Recommend
More recommend