Deep Learning for Visual Manipulation and Synthesis Jun-Yan Zhu 朱俊彦 UC Berkeley 2017/01/11 @ VALSE
What is visual manipulation? Image Editing Program input photo User Input result Desired output: stay close to the input. satisfy user’s constraint. [ Schaefer et al. 2006]
What is Visual Synthesis? Image Generation Program user input result Desired output: satisfy user’s constraint. Sketch2Photo [ Tao et al. 2009]
So far so good
Things can get really bad The lack of “safety wheels”
Adding the “safety wheels” Image Editing Program Output Result User Input Input Photo A desired output: stay close to the input. Natural Image satisfy user’s constraint. Manifold Lie on the natural image manifold
Prior work: Heuristic-based Gradient [ Perez et al. 2003] “Bleeding” artifacts [Tao et al. 2010] Color [Reinhard et al. 2004] Color and Texture [ Johnson et al. 2011]
Prior work: Discriminative Learning Natural Human Motion Image Compositing Image Deblurring ( 34 subjects) ( 20 images) ( 40 images) [Ren et al. 2005] [Xue et al. 2012] [Liu et al. 2013]
Our Goal: - Learn the manifold of natural images without direct human annotations. - Improve visual manipulation and synthesis by constraining the result to lie on that learned manifold.
Why Deep Learning Methods? • Impressive results on visual recognition. – Classification, detection, segmentation,3D vision, videos, etc. • No feature engineering. • Recent development of generative models. (e.g. Generative Adversarial Networks)
Deep Learning trends: performance
Deep Learning trends: research AlexNet [Krizhevsky et al.] ImageNet [Jia et al.]
Predict Discriminative Model Image Realism Realism Editing M: {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} CNN Model [ICCV 15’] Improve Editing Generative Model M: {𝑦|𝑦 = 𝐻 𝑨 } Project Edit Transfer Editing UI [SIGGRAPH 14’] [ECCV 16’]
Predict Discriminative Model Image Realism Realism Editing M: {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} CNN Model [ICCV 15’] Improve Editing Image Composite 𝐽 Foreground Object 𝐺 Background 𝐶
Learning Visual Realism CNN Training Composite images Classifying 25K natural photos Natural Photos vs. 25k composite images
How do we get composite images? Target Object Composite Images Object Mask Object Masks with Similar Shapes Object Mask: (1) Human Annotation (2) Object Proposal [Lalonde and Efros 2007]
Ranking of Training Composites Most realistic composites Least realistic composites
Evaluation Dataset Area under ROC Curve • [Lalonde and Efros 2007] Methods without object mask • Task: binary classification 0.61 Lalonde and Efros (no mask) • 360 realistic photos AlexNet + SVM 0.73 (natural images + realistic RealismCNN 0.84 composites) 𝟏. 𝟗𝟗 RealismCNN + SVM • 360 unrealistic photos 0.91 Human Methods using object mask 0.66 Reinhard et al. Lalonde and Efros (with mask) 0.81
Visual Realism Ranking Least Realistic Most Realistic Snowy Mountain Highway Ocean Red : unrealistic composite, Green : realistic composite, Blue : natural image
Our Pipeline Predict Realism Image Realism Editing CNN Model Improve Composites
Improving Visual Realism Editing model: Realism Color adjustment 𝒉 CNN Foreground object F Original Composite Improved Composite (Realism score: 0.0) (Realism score: 0.8) 𝐹(, 𝐺) = 𝐹 𝐷𝑂𝑂 + 𝐹 𝑠𝑓 Quasi-Newton (L-BFGS)
Selecting Suitable Objects Best-fitting object selected by RealismCNN Object with most similar shape
Optimizing Color Compatibility Object mask Cut-n-paste Lalonde et al. Xue et al. Ours
Sanity Check: Real Photos Object mask Cut-n-paste Lalonde et al. Xue et al. Ours
𝜖𝐹 Visualizing and Localizing Errors ( 𝜖𝐽 𝑞 ) Number of L-BFGS iterations Result Gradient Map 𝐹 = 50.73 9.38 5.05 3.44 3.00
Discriminative Model {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} • Pros: – CNN is easy to train. – Graphics programs often produce better images than generative models. – General framework for many tasks (e.g. deblurring, retargeting, etc.) • Cons: – Task-specific: cannot apply pre-trained model to other tasks. – Graphics programs are often non-parametric and non-differentiable. – Graphics programs often require user in the loop: thus automatically generating results for CNN training is challenging. • Code: github.com/junyanz/RealismCNN • Data: people.eecs.berkeley.edu/~junyanz/projects/realism/
Predict Discriminative Model Image Realism Realism Editing M: {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} CNN Model [ICCV 15’] Improve Editing Generative Model M: {𝑦|𝑦 = 𝐻 𝑨 } Project Edit Transfer Editing UI [SIGGRAPH 14’] [ECCV 16’]
Learning Natural Image Manifold • Deep generative models: – Generative Adversarial Network ( GAN ) [Goodfellow et al. 2014] [Radford et al. 2015] [Denton et al 2015] – Variational Auto-Encoder ( VAE ) [Kingma and Welling 2013] – DRAW (Recurrent Neural Network) [Gregor et al 2015] – Pixel RNN and Pixel CNN ([Oord et al 2016]) – …
Image Classification via Neural Network “ Cat ” Input image 𝐽 Slides credit: Andrew Owens
Can We Generate Images with Neural Networks? Image Gaussian noise or Random Distribution
Generative Adversarial Networks (GAN) Generative Model Synthesized image [Goodfellow et al. 2014]
Generative Adversarial Networks (GAN) Generative Model Discriminative Model “ real ” [Goodfellow et al. 2014]
Generative Adversarial Networks (GAN) Generative Model Discriminative Model “fake” [Goodfellow et al. 2014]
Cat Generation (w.r.t. training iterations
GAN as Manifold Approximation Random image samples Sample training images from Generator G(z ) from “Amazon Shirts” [Radford et al. 2015]
Traverse on the GAN Manifold 𝐻(𝑨 0 ) Linear Interpolation in z space: 𝐻(𝑨 0 + 𝑢 ⋅ (𝑨 1 − 𝑨 0 )) 𝐻(𝑨 1 ) Limitations : • not photo-realistic enough, low resolution • produce images randomly, no user control [Radford et al. 2015]
Overview original photo different degree of image manipulation Project Edit Transfer Editing UI projection on manifold transition between the original and edited projection
Overview original photo different degree of image manipulation Project Edit Transfer Editing UI projection on manifold transition between the original and edited projection
Projecting an Image onto the Manifold Input: real image 𝑦 𝑆 Output: latent vector z Optimization 0.196 0.238 0.332 Reconstruction loss 𝑀 Generative model 𝐻(𝑨)
Projecting an Image onto the Manifold Input: real image 𝑦 𝑆 Output: latent vector z Optimization 0.196 0.238 0.332 Inverting Network z = 𝑄 𝑦 Auto-encoder 0.242 0.336 0.218 with a fixed decoder G
Projecting an Image onto the Manifold Input: real image 𝑦 𝑆 Output: latent vector z Optimization 0.196 0.238 0.332 Inverting Network z = 𝑄 𝑦 0.242 0.336 0.218 Hybrid Method Use the network as initialization for the optimization problem 0.167 0.268 0.153
Overview original photo different degree of image manipulation Project Edit Transfer Editing UI projection on manifold transition between the original and edited projection
Manipulating the Latent Vector constraint violation loss 𝑀 user guidance image Objective: Guidance 𝑤 𝐻(𝑨 ) 𝑨 0
Overview original photo different degree of image manipulation Project Edit Transfer Editing UI projection on manifold transition between the original and edited projection
Edit Transfer Motion (u, v)+ Color ( 𝑩 𝟒×𝟓 ): estimate per-pixel geometric and color variation 𝐻(𝑨 0 ) Linear Interpolation in 𝑨 space 𝐻(𝑨 1 ) Input
Edit Transfer Motion (u, v)+ Color ( 𝑩 𝟒×𝟓 ): estimate per-pixel geometric and color variation Motion (u, v)+ Color ( 𝑩 𝟒×𝟓 ): estimate per-pixel geometric and color variation 𝐻(𝑨 0 ) Linear Interpolation in 𝑨 space 𝐻(𝑨 1 ) Input
Edit Transfer Motion (u, v)+ Color ( 𝑩 𝟒×𝟓 ): estimate per-pixel geometric and color variation 𝐻(𝑨 0 ) Linear Interpolation in 𝑨 space 𝐻(𝑨 1 ) Result Input
Image Manipulation Demo
Image Manipulation Demo
Designing Products
Interactive Image Generation
The Simplest Generative Model: Averaging 𝑥𝑏𝑠𝑞 } AverageExplorer : {𝑦|𝑦 = 𝑜 𝑥 𝑜 ⋅ 𝐽 𝑜 • Generative model: weighted average of warped images. • Limitations: cannot synthesize novel content. [Zhu et al. 2014]
Generative Image Transformation
iGAN (aka. interactive GAN) • Get the code: github.com/junyanz/iGAN • Intelligent drawing tools via GAN. • Debugging tools for understanding and visualizing deep generative networks. • Work in progress: supporting more models (GAN, VAE, theano/tensorflow).
Recommend
More recommend