conditional adversarial networks
play

Conditional Adversarial Networks (or mapping from A to B) CS448V - PowerPoint PPT Presentation

Conditional Adversarial Networks (or mapping from A to B) CS448V Computational Video Manipulation May 22 th , 2019 Why? - Cool! Trendy! - Google Scholar Pix2Pix CycleGAN Hundreds of applications and follow-up works Why? -


  1. Conditional Adversarial Networks (or “mapping from A to B”) CS448V — Computational Video Manipulation May 22 th , 2019

  2. Why? - Cool! Trendy! - Google Scholar Pix2Pix CycleGAN … Hundreds of applications and follow-up works …

  3. Why? - Cool! Trendy! - Google Scholar Pix2Pix CycleGAN … Hundreds of applications and follow-up works …

  4. Enhancing Transitions

  5. Single-Photo Facial Animation

  6. Text-based Editing

  7. Few-Shot Reenactment

  8. Digital Humans

  9. Overview • Convolutional Neural Networks • Generative Modeling • Pix2Pix (“mapping from A to B”)

  10. Convolutional Neural Network Components? • 2D Convolution Layers (Conv2D) • Subsampling Layers (MaxPool , …) • Non-linearity Layers (ReLU , …) • Normalization Layers (BatchNorm , …) • Upsampling Layers (TransposedConv , …) • …

  11. Convolutional Neural Network Components? • 2D Convolution Layers (Conv2D) • Subsampling Layers (MaxPool , …) • Non-linearity Layers (ReLU , …) • Normalization Layers (BatchNorm , …) • Upsampling Layers (TransposedConv , …) • …

  12. Convolution 32x32x3 image height 32 32 width 3 depth

  13. Convolution 32x32x3 image 5x5x3 filter height 32 5 Convolve the filter with the image, 3 5 i.e., “slide over the image spatially, computing dot products” 32 width 3 depth

  14. Convolution 32x32x3 image 5x5x3 filter height 32 Result: 1 number, the result of taking the dot product between the filter and a small 5x5x3 chunk of the image, i.e., 5x5x3 = 70-dimensional dot product + bias width 32 w T x + b 3 depth

  15. Convolution 32x32x3 image Activation map 5x5x3 filter height 28 32 Convolve (slide) over all spatial locations width 28 32 3 1 depth

  16. Convolution 32x32x3 image Activation map ? 5x5x3 filter height 28 32 Convolve (slide) over all spatial locations width 28 32 3 1 depth

  17. Convolution 32x32x3 image Activation map height 28 32 Convolve (slide) over all spatial locations width 28 32 Invariant to? 3 1 ? depth Rotation ? Translation ? Scaling

  18. Convolution 32x32x3 image Activation map height 28 32 Convolve (slide) over all spatial locations width 28 32 Invariant to? 3 1 depth Rotation Translation Scaling

  19. Convolution 32x32x3 image Activation map height 32 Convolve (slide) over all spatial locations width 32 3 depth

  20. Convolution Layer 32x32x3 image Activation tensor height 32 Convolution Layer width 32 3 depth

  21. Convolutional Neural Network 32x32x3 image 32 ? Convolution ReLU e.g. 6 5x5x3 32 filters 3

  22. Convolutional Neural Network 32x32x3 image 28x28x6 tensor 32 28 Convolution ReLU e.g. 6 5x5x3 32 28 filters 3 6

  23. Convolutional Neural Network 32x32x3 image 28x28x6 tensor 32 28 ? Convolution Convolution ReLU ReLU e.g. 6 e.g. 10 5x5x3 5x5x6 32 28 filters filters 3 6

  24. Convolutional Neural Network 32x32x3 image 28x28x6 tensor 24x24x10 tensor 32 28 24 ... Convolution Convolution Convolution ReLU ReLU ReLU e.g. 6 e.g. 10 5x5x3 5x5x6 32 28 24 filters filters 3 6 10

  25. Convolutional Neural Networks [LeNet-5, LeCun 1980]

  26. Feature Hierarchy Learn the features from data instead of hand engineering them! (If enough data is available)

  27. U-Net Skip connections “Propagate low -level features directly, helps with details”

  28. Overview • Convolutional Neural Networks • Generative Modeling • Pix2Pix

  29. Overview • Convolutional Neural Networks • Generative Modeling • Pix2Pix

  30. Generative Modeling 𝑂 𝐲 𝑗 𝑗=1 𝑞(X) x ~ 𝑞 X Density Function “more of the same!” 𝑞 X Training Data New Samples We want to learn 𝑞 X from data, such that we can “sample from it”!

  31. Generative 2D Face Modeling 𝑂 𝐲 𝑗 𝑗=1 x ~ 𝑞 X Training Data New Samples The world needs more celebrities … or not … ?

  32. 3.5 Years of Progress on Faces

  33. https://thispersondoesnotexist.com 2018

  34. StyleGAN - Interpolation

  35. Overview • Convolutional Neural Networks • Generative Modeling • Pix2Pix (“mapping from A to B”)

  36. Overview • Convolutional Neural Networks • Generative Modeling • Pix2Pix (“mapping from A to B”)

  37. Image-to-Image Translation

  38. Image-to-Image Translation

  39. Image-to-Image Translation

  40. Image-to-Image Translation

  41. Image-to-Image Translation G argmin 𝔽 𝐲,𝐳 [L(G 𝐲 , 𝐳)] G Loss Neural Network [Zhang et al., ECCV 2016]

  42. Image-to-Image Translation G Paired! argmin 𝔽 𝐲,𝐳 [L(G 𝐲 , 𝐳)] G Loss Neural Network [Zhang et al., ECCV 2016]

  43. Image-to-Image Translation G argmin 𝔽 𝐲,𝐳 [L(G 𝐲 , 𝐳)] G Loss Neural Network [Zhang et al., ECCV 2016]

  44. Image-to-Image Translation G argmin 𝔽 𝐲,𝐳 [L(G 𝐲 , 𝐳)] G “ What should I do?” Neural Network [Zhang et al., ECCV 2016]

  45. Image-to-Image Translation G argmin 𝔽 𝐲,𝐳 [L(G 𝐲 , 𝐳)] G “ What should I do?” “ How should I do it?” [Zhang et al., ECCV 2016]

  46. Be careful what you wish for! 2 𝑀 𝐳, 𝐳 = 𝐳 − 𝐳 2

  47. Degradation to the mean! 2 𝑀 𝐳, 𝐳 = 𝐳 − 𝐳 2

  48. Automate Design of the Loss?

  49. Automate Design of the Loss?

  50. Automate Design of the Loss? Deep learning got rid of handcrafted features. Can we also get rid of handcrafting the loss function?

  51. Automate Design of the Loss? Deep learning got rid of handcrafted features. Can we also get rid of handcrafting the loss function? Universal loss function?

  52. Automate Design of the Loss? Deep learning got rid of handcrafted features. Can we also get rid of handcrafting the loss function? Universal loss function?

  53. Discriminator as a Loss Function Discriminator Real or Fake? (Classifier)

  54. Conditional GAN

  55. Conditional GAN Input 𝐲 Output 𝐳 Generator (G)

  56. Conditional GAN Input 𝐲 Output 𝐳 Discriminator Generator (G) Real or Fake? (D) G tries to synthesize fake images that fool D D tries to tell real from fake

  57. Conditional GAN (Discriminator) Input 𝐲 Output 𝐳 “1” Discriminator Generator (G) Fake (0.9) (D) D tries to identify the fakes arg max D 𝔽 𝐲,𝐳 [ log D G 𝐲 + log 1 − D 𝐳 ]

  58. Conditional GAN (Discriminator) Input 𝐲 Output 𝐳 “1” Discriminator Generator (G) Fake (0.9) (D) “0” Discriminator D tries to identify the fakes Real (0.1) (D) D tries to identify the real images GT 𝐳 arg max D 𝔽 𝐲,𝐳 [ log D G 𝐲 + log 1 − D 𝐳 ]

  59. Conditional GAN (Generator) Input 𝐲 Output 𝐳 “0” Discriminator Generator (G) Real (0.1) (D) G tries to synthesize fake images that fool D. arg min G 𝔽 𝐲,𝐳 [ log D G 𝐲 + log 1 − D 𝐳 ]

  60. Conditional GAN (Generator) Input 𝐲 Output 𝐳 “0” Discriminator Generator (G) Real (0.1) (D) G tries to synthesize fake images that fool D. arg min G 𝔽 𝐲,𝐳 [ log D G 𝐲 + log 1 − D 𝐳 ]

  61. Conditional GAN Input 𝐲 Output 𝐳 Discriminator Generator (G) Real or Fake? (D) G tries to synthesize fake images that fool the best D. arg min max 𝔽 𝐲,𝐳 [ log D G 𝐲 + log 1 − D 𝐳 ] G D

  62. Conditional GAN Input 𝐲 Output 𝐳 Loss Function Generator (G) Real or Fake? (D) G’s perspective: D is a loss function Rather than being hand-designed, it is learned jointly !

  63. Conditional Discriminator Input 𝐲 Output 𝐳 Generator (G) Discriminator (D) Input 𝐲 arg min max 𝔽 𝐲,𝐳 [ log D 𝐲, G 𝐲 + log 1 − D 𝐲, 𝐳 ] G D

  64. Patch Discriminator “Rather than penalizing if the output image looks fake, penalize if each overlapping patch in the output looks fake”

  65. 1x1 Pixel Discriminator

  66. Image Discriminator

  67. 70x70 Patch Discriminator

  68. Conditional Discriminator Input 𝐲 Output 𝐳 Generator (G) Discriminator (D) 𝑀 𝑑𝐻𝐵𝑂 G, D = 𝔽 𝐲,𝐳 [ log D 𝐲, G 𝐲 + log 1 − D 𝐲, 𝐳 ]

  69. Reconstruction Loss 𝑚 1 Generator (G) 𝑀 𝑚 1 G = 𝔽 𝐲,𝐳 G x − y 1 “Stable training + fast convergence” 𝐻 ∗ = arg min max 𝑀 𝑑𝐻𝐵𝑂 G, D + 𝜇 𝑀 𝑚 1 (G) G D 100

  70. Ablation Study ?

  71. Ablation Study

  72. Results on the Test Split

  73. Results for Hand Drawings ?

  74. Demo: Pix2Pix

  75. Limitations 1. Paired data is required 2. Temporally instable if applied per-frame to a video sequence 3. Does not generalize to 3D transformations

  76. CycleGAN

  77. Cycle Consistency

  78. CycleGAN

  79. Recycle-GAN

  80. Limitations 1. Paired data is required 2. Temporally instable if applied per-frame to a video sequence 3. Does not generalize to 3D transformations

  81. Limitations 1. Paired data is required 2. Temporally instable if applied per-frame to a video sequence 3. Does not generalize to 3D transformations

  82. Vid2Vid

  83. Vid2Vid

Recommend


More recommend