manipulation and synthesis
play

Manipulation and Synthesis Jun-Yan Zhu UC Berkeley 2017/01/11 @ - PowerPoint PPT Presentation

Deep Learning for Visual Manipulation and Synthesis Jun-Yan Zhu UC Berkeley 2017/01/11 @ VALSE What is visual manipulation? Image Editing Program input photo User Input result Desired output: stay close to the input.


  1. Deep Learning for Visual Manipulation and Synthesis Jun-Yan Zhu 朱俊彦 UC Berkeley 2017/01/11 @ VALSE

  2. What is visual manipulation? Image Editing Program input photo User Input result Desired output:  stay close to the input.  satisfy user’s constraint. [ Schaefer et al. 2006]

  3. What is Visual Synthesis? Image Generation Program user input result Desired output:  satisfy user’s constraint. Sketch2Photo [ Tao et al. 2009]

  4. So far so good

  5. Things can get really bad The lack of “safety wheels”

  6. Adding the “safety wheels” Image Editing Program Output Result User Input Input Photo A desired output:  stay close to the input. Natural Image  satisfy user’s constraint. Manifold  Lie on the natural image manifold

  7. Prior work: Heuristic-based Gradient [ Perez et al. 2003] “Bleeding” artifacts [Tao et al. 2010] Color [Reinhard et al. 2004] Color and Texture [ Johnson et al. 2011]

  8. Prior work: Discriminative Learning Natural Human Motion Image Compositing Image Deblurring ( 34 subjects) ( 20 images) ( 40 images) [Ren et al. 2005] [Xue et al. 2012] [Liu et al. 2013]

  9. Our Goal: - Learn the manifold of natural images without direct human annotations. - Improve visual manipulation and synthesis by constraining the result to lie on that learned manifold.

  10. Why Deep Learning Methods? • Impressive results on visual recognition. – Classification, detection, segmentation,3D vision, videos, etc. • No feature engineering. • Recent development of generative models. (e.g. Generative Adversarial Networks)

  11. Deep Learning trends: performance

  12. Deep Learning trends: research AlexNet [Krizhevsky et al.] ImageNet [Jia et al.]

  13. Predict Discriminative Model Image Realism Realism Editing M: {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} CNN Model [ICCV 15’] Improve Editing Generative Model M: {𝑦|𝑦 = 𝐻 𝑨 } Project Edit Transfer Editing UI [SIGGRAPH 14’] [ECCV 16’]

  14. Predict Discriminative Model Image Realism Realism Editing M: {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} CNN Model [ICCV 15’] Improve Editing Image Composite 𝐽 Foreground Object 𝐺 Background 𝐶

  15. Learning Visual Realism CNN Training Composite images Classifying 25K natural photos Natural Photos vs. 25k composite images

  16. How do we get composite images? Target Object Composite Images Object Mask Object Masks with Similar Shapes Object Mask: (1) Human Annotation (2) Object Proposal [Lalonde and Efros 2007]

  17. Ranking of Training Composites Most realistic composites Least realistic composites

  18. Evaluation Dataset Area under ROC Curve • [Lalonde and Efros 2007] Methods without object mask • Task: binary classification 0.61 Lalonde and Efros (no mask) • 360 realistic photos AlexNet + SVM 0.73 (natural images + realistic RealismCNN 0.84 composites) 𝟏. 𝟗𝟗 RealismCNN + SVM • 360 unrealistic photos 0.91 Human Methods using object mask 0.66 Reinhard et al. Lalonde and Efros (with mask) 0.81

  19. Visual Realism Ranking Least Realistic Most Realistic Snowy Mountain Highway Ocean Red : unrealistic composite, Green : realistic composite, Blue : natural image

  20. Our Pipeline Predict Realism Image Realism Editing CNN Model Improve Composites

  21. Improving Visual Realism Editing model: Realism Color adjustment 𝒉 CNN Foreground object F Original Composite Improved Composite (Realism score: 0.0) (Realism score: 0.8) 𝐹(𝑕, 𝐺) = 𝐹 𝐷𝑂𝑂 + 𝐹 𝑠𝑓𝑕 Quasi-Newton (L-BFGS)

  22. Selecting Suitable Objects Best-fitting object selected by RealismCNN Object with most similar shape

  23. Optimizing Color Compatibility Object mask Cut-n-paste Lalonde et al. Xue et al. Ours

  24. Sanity Check: Real Photos Object mask Cut-n-paste Lalonde et al. Xue et al. Ours

  25. 𝜖𝐹 Visualizing and Localizing Errors ( 𝜖𝐽 𝑞 ) Number of L-BFGS iterations Result Gradient Map 𝐹 = 50.73 9.38 5.05 3.44 3.00

  26. Discriminative Model {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} • Pros: – CNN is easy to train. – Graphics programs often produce better images than generative models. – General framework for many tasks (e.g. deblurring, retargeting, etc.) • Cons: – Task-specific: cannot apply pre-trained model to other tasks. – Graphics programs are often non-parametric and non-differentiable. – Graphics programs often require user in the loop: thus automatically generating results for CNN training is challenging. • Code: github.com/junyanz/RealismCNN • Data: people.eecs.berkeley.edu/~junyanz/projects/realism/

  27. Predict Discriminative Model Image Realism Realism Editing M: {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} CNN Model [ICCV 15’] Improve Editing Generative Model M: {𝑦|𝑦 = 𝐻 𝑨 } Project Edit Transfer Editing UI [SIGGRAPH 14’] [ECCV 16’]

  28. Learning Natural Image Manifold • Deep generative models: – Generative Adversarial Network ( GAN ) [Goodfellow et al. 2014] [Radford et al. 2015] [Denton et al 2015] – Variational Auto-Encoder ( VAE ) [Kingma and Welling 2013] – DRAW (Recurrent Neural Network) [Gregor et al 2015] – Pixel RNN and Pixel CNN ([Oord et al 2016]) – …

  29. Image Classification via Neural Network “ Cat ” Input image 𝐽 Slides credit: Andrew Owens

  30. Can We Generate Images with Neural Networks? Image Gaussian noise or Random Distribution

  31. Generative Adversarial Networks (GAN) Generative Model Synthesized image [Goodfellow et al. 2014]

  32. Generative Adversarial Networks (GAN) Generative Model Discriminative Model “ real ” [Goodfellow et al. 2014]

  33. Generative Adversarial Networks (GAN) Generative Model Discriminative Model “fake” [Goodfellow et al. 2014]

  34. Cat Generation (w.r.t. training iterations

  35. GAN as Manifold Approximation Random image samples Sample training images from Generator G(z ) from “Amazon Shirts” [Radford et al. 2015]

  36. Traverse on the GAN Manifold 𝐻(𝑨 0 ) Linear Interpolation in z space: 𝐻(𝑨 0 + 𝑢 ⋅ (𝑨 1 − 𝑨 0 )) 𝐻(𝑨 1 ) Limitations : • not photo-realistic enough, low resolution • produce images randomly, no user control [Radford et al. 2015]

  37. Overview original photo different degree of image manipulation Project Edit Transfer Editing UI projection on manifold transition between the original and edited projection

  38. Overview original photo different degree of image manipulation Project Edit Transfer Editing UI projection on manifold transition between the original and edited projection

  39. Projecting an Image onto the Manifold Input: real image 𝑦 𝑆 Output: latent vector z Optimization 0.196 0.238 0.332 Reconstruction loss 𝑀 Generative model 𝐻(𝑨)

  40. Projecting an Image onto the Manifold Input: real image 𝑦 𝑆 Output: latent vector z Optimization 0.196 0.238 0.332 Inverting Network z = 𝑄 𝑦 Auto-encoder 0.242 0.336 0.218 with a fixed decoder G

  41. Projecting an Image onto the Manifold Input: real image 𝑦 𝑆 Output: latent vector z Optimization 0.196 0.238 0.332 Inverting Network z = 𝑄 𝑦 0.242 0.336 0.218 Hybrid Method Use the network as initialization for the optimization problem 0.167 0.268 0.153

  42. Overview original photo different degree of image manipulation Project Edit Transfer Editing UI projection on manifold transition between the original and edited projection

  43. Manipulating the Latent Vector constraint violation loss 𝑀 𝑕 user guidance image Objective: Guidance 𝑤 𝑕 𝐻(𝑨 ) 𝑨 0

  44. Overview original photo different degree of image manipulation Project Edit Transfer Editing UI projection on manifold transition between the original and edited projection

  45. Edit Transfer Motion (u, v)+ Color ( 𝑩 𝟒×𝟓 ): estimate per-pixel geometric and color variation 𝐻(𝑨 0 ) Linear Interpolation in 𝑨 space 𝐻(𝑨 1 ) Input

  46. Edit Transfer Motion (u, v)+ Color ( 𝑩 𝟒×𝟓 ): estimate per-pixel geometric and color variation Motion (u, v)+ Color ( 𝑩 𝟒×𝟓 ): estimate per-pixel geometric and color variation 𝐻(𝑨 0 ) Linear Interpolation in 𝑨 space 𝐻(𝑨 1 ) Input

  47. Edit Transfer Motion (u, v)+ Color ( 𝑩 𝟒×𝟓 ): estimate per-pixel geometric and color variation 𝐻(𝑨 0 ) Linear Interpolation in 𝑨 space 𝐻(𝑨 1 ) Result Input

  48. Image Manipulation Demo

  49. Image Manipulation Demo

  50. Designing Products

  51. Interactive Image Generation

  52. The Simplest Generative Model: Averaging 𝑥𝑏𝑠𝑞 } AverageExplorer : {𝑦|𝑦 = 𝑜 𝑥 𝑜 ⋅ 𝐽 𝑜 • Generative model: weighted average of warped images. • Limitations: cannot synthesize novel content. [Zhu et al. 2014]

  53. Generative Image Transformation

  54. iGAN (aka. interactive GAN) • Get the code: github.com/junyanz/iGAN • Intelligent drawing tools via GAN. • Debugging tools for understanding and visualizing deep generative networks. • Work in progress: supporting more models (GAN, VAE, theano/tensorflow).

Recommend


More recommend