Contrastive Learning for Unpaired Image-to-Image Translation Taesung Park Alexei A. Efros Richard Zhang Jun-Yan Zhu UC Berkeley Adobe Research ECCV 2020
What is Unpaired Image-to-Image Translation? ⋯ ⋯ Training Set Test-time behavior
cycle-consistency loss CycleGAN (��� �� ��., ICCV�17) DiscoGAN (K�� �� ��., ICML �17) DualGAN (�� �� ��., ICCV�17) Also used in MUNIT (H���� �� ��., ECCV�18) DRIT (L�� �� ��., ECCV�18)
interchangeable differentiated
invariant sensitive
What makes for a good output? Input (horse) Output (zebra) ? 𝐻
Retaining input con ontent tent Input (horse) Output (zebra) 𝐻 Discriminator
Retaining input con ontent tent Input (horse) Output (zebra) Invariant 𝐻 Sensitive 𝑨 1− 𝑨 �− 𝑨 �− 𝑨 � 𝑨 Corresponding patches should have hi high s h sim imil ilar arit ity
Patch-based Contrastive Loss 𝑨 � 𝑨 � /𝜐 1 − 𝑨 � 𝑨 1 /𝜐 0 Input (horse) Output (zebra) − 𝑨 � 𝑨 � /𝜐 0 softmax 𝐻 − 𝑨 � 𝑨 � /𝜐 0 softmax ( /𝜐 ) cosine similarities 𝑨 1 − 𝑨 � − 𝑨 � − 𝑨 � 𝑨 𝜐 =0.07 InfoNCE loss (Gutmann et al., AISTATS18 , van den Oord et al., 2018) used in MoCo and SimCLR • To produce positive pairs: • Handcrafted data augmentation (MoCo, SimCLR, etc.) • Input and synthesized image (ours) • MoCo: He et al., CVPR20, SimCLR: Chen et al., ICML20
Patchwise contrastive loss 𝐻
Patchwise contrastive loss Multilayer, Patchwise Contrastive Loss 𝐻 �nc 𝐻 d�c 𝐻 �nc
Patchwise contrastive loss Multilayer, Patchwise Contrastive Loss 𝐻 �nc 𝐻 d�c 𝐻 �nc + No fixed similarity metric (e.g., L1 or perceptual loss) + One-sided (no inverse mapping needed)
Internal vs External Patches 𝐻 Internal Patches
Internal vs External Patches 𝐻 Internal Patches External Patches Mo MoCo: : He et et al. l., , CVPR2 R20; ; SimCL CLR: : Ch Chen et et al. l., , ICML ML20 use a large set of external images as negative samples External patches make things worse
Power of In Inter ternal nal patches Texture Synthesis by Non-parametric Sampling (Efros & L����, ICCV�99, Efros & F������, SIGGRAPH�01) ����� - S���� S���� -resolution using Deep Internal Learning (Shocher , C���� & I���� CVPR�18)
Internal vs External Patches input in internal rnal patches hes external patches Mode Collapse!
Identity Loss Regularization X G(X) 𝐻 Normally, Contrastive Loss between X and G(X) DTN (Taigman �� ��., ICLR�17), CycleGAN (��� �� ��., ICCV�17)
Identity Loss Regularization X G(X) 𝐻 Normally, Contrastive Loss between X and G(X) Y G(Y) Identity loss regularization 𝐻 Contrastive Loss between Y and G(Y) DTN (Taigman �� ��., ICLR�17), CycleGAN (��� �� ��., ICCV�17)
CUT FastCUT Contrastive Unpaired Translation Contrastive Loss 𝜇 � 1 𝜇 � 10 Identity Loss Regularization Conservative, Flexible, Faster than CycleGAN Even Faster than CUT
Lighter Footprint Tr Training time (sec/it iter, low ower is better) r) 0.8 0.7 0.6 0.5 0.4 0.3 < 0.5x 0.2 0.1 0 CycleGAN CycleGAN CU CUT CU CUT Fast Fast astCU astCU CUT CUT
Lighter Footprint Tr Training time (sec/it iter, low ower is better) r) 0.8 0.7 0.6 0.5 0.4 0.3 < 0.5x 0.2 0.1 0 CycleGAN CU CUT Fast astCU CUT MUNIT DRIT
Lighter Footprint Tr Training time (sec/it iter, low ower is better) r) 0.8 0.7 0.6 0.5 0.4 0.3 < 0.5x 0.2 0.1 0 CycleGAN GcGAN CU CUT Fast astCU CUT MUNIT DRIT DistanceGAN Self-DistGAN
Input CU CUT CycleGAN Fast astCUT CUT MUNIT DRIT DistanceGAN GcGAN
Dealing with Dataset Bias Source training set Target training set horse 17.9% zebra 36.8%
Dealing with Dataset Bias Source training set Target training set Input CUT FastCUT CycleGAN horse 17.9% detected pixels: zebra 30.8% zebra 25.9% zebra 19.1% zebra 36.8%
Cat � Dog Yosemite Summer � Winter Apple � Orange Paris � Burano GTA � Cityscapes
FID evaluating the realism of output images (lower is better) 180 160 140 120 100 80 60 horse2zebra cityscapes 40 cat2dog 20 0 CycleGAN MUNIT DRIT DistanceGAN SelfDistanceGAN GCGAN CUT FastCUT
Segmentation Score evaluating correspondences mean Intersection-over-Union (%), higher is better 30 25 I2I Model 20 15 % match 10 Segmenter 5 0
Single Image Translation C����� M������ �������� 𝐻 Internal contrastive loss is well-suited for single image translation. Also see InGAN (Shocher �� ��., ICCV�19), SinGAN (Shaham �� ��., ICCV�19)
Reference photo Single Image Translation C����� M������ �������� 𝐻 Internal contrastive loss is well-suited for single image translation. Also see InGAN (Shocher �� ��., ICCV�19), SinGAN (Shaham �� ��., ICCV�19)
Reference photo Single Image Translation C����� M������ �������� 𝐸 𝐻 Internal contrastive loss is well-suited for single image translation. Also see InGAN (Shocher �� ��., ICCV�19), SinGAN (Shaham �� ��., ICCV�19)
Painting
Reference Photo Reference Photo Painting
Reference Photo Reference Photo Painting Gatys �� ��. CVPR�16
Reference Photo Painting STROTSS (Kolkin �� ��., CVPR�19)
Reference Photo WCT 2 (��� �� ��., ICCV�19) Painting
Reference Photo Painting Our translation result
Reference Photo Painting CycleGAN
Painting
Reference Photo Gatys �� ��. CVPR�16 Painting
Reference Photo STROTSS (K����� �� ��., CVPR�19) Painting
Reference Photo WCT 2 (��� �� ��., ICCV�19) Painting
Reference Photo Ours Painting
Reference Photo CycleGAN Painting
Reference Photo Our translation result Painting
Reference Photo Our translation result Painting
Reference Photo Our translation result Painting
Reference Photo Our translation result Painting
Questions or Comments?
inter er-im image image ge tra-ima Disentanglement? intra
style yle ent onten con MUNIT (Huang, Liu, Belongie , Ka���, ECCV�18)
Structure for each row
Style for each column dark brown, light brown, white white, black uniform spotted striped
Extracting style and structure from an image 𝐹 style code 𝐻 structure code
Extracting style and structure from an image 𝐹 style code 𝐻 structure code
Extracting style and structure from an image 𝐹 style code 𝐻 structure code
Extracting style and structure from an image Co-occurrence Patch-based Discriminator 𝐹 style code 𝐻 structure code
Reconstruction 𝐸 Auto- structure code 𝐹 𝐻 encode style code
Reconstruction 𝐸 Auto- 𝐹 𝐻 encode Swap 𝐹 𝐻 𝐸
Reconstruction 𝐸 Auto- 𝐹 𝐻 encode Swap 𝐹 𝐻 𝐸 Reference patches Real/fake? Patch co-occurrence discriminator 𝐸 �atc�
style structure
style structure
Patch Co-Occurrence Discriminator is a Texture Discriminator What is Texture? �A� i�age �ha� ca� be �e��e�e��ed b� fi��� a�d �ec��d - ��de� ��a�i��ic�� Conjecture by Bela Julesz, 1962 Two textures that differ by first-order statistics
Patch Co-Occurrence Discriminator is a Texture Discriminator What is Texture? �A� i�age �ha� ca� be �e��e�e��ed b� fi��� a�d �ec��d - ��de� ��a�i��ic�� left adjacent pixel left adjacent pixel dark bright dark bright bright dark bright dark right pixel right pixel Conjecture by Bela Julesz, 1962 Two textures that differ by second-order statistics
Patch Co-Occurrence Discriminator is a Texture Discriminator What is Texture? �A� i�age �ha� ca� be �e��e�e��ed b� fi��� a�d �ec��d - ��de� ��a�i��ic�� Modeling joint probability is (almost) enough to capture texture Conjecture by Bela Julesz, 1962 Two textures that differ by third-order statistics
Patch Co-Occurrence Discriminator is a Texture Discriminator D( , ) =
Patch Co-Occurrence Discriminator is a Texture Discriminator D( , ) = Different Image
Patch Co-Occurrence Discriminator is a Texture Discriminator D( , ) =
Patch Co-Occurrence Discriminator is a Texture Discriminator D( , ) = Same Image
Patch Co-Occurrence Discriminator is a Texture Discriminator D( , ) =
Recommend
More recommend