Lecture #8 – Image Synthesis
Aykut Erdem // Hacettepe University // Spring 2019
CMP722
ADVANCED COMPUTER VISION
Illustration: StyleGAN trained on Portrait by Yuli-Ban
CMP722 ADVANCED COMPUTER VISION Lecture #8 Image Synthesis - - PowerPoint PPT Presentation
Illustration: StyleGAN trained on Portrait by Yuli-Ban CMP722 ADVANCED COMPUTER VISION Lecture #8 Image Synthesis Aykut Erdem // Hacettepe University // Spring 2019 Image credit: Three Robots (Love, Death & Robots, 2019) Previously
Lecture #8 – Image Synthesis
Aykut Erdem // Hacettepe University // Spring 2019
Illustration: StyleGAN trained on Portrait by Yuli-Ban
actions
Previously on CMP722
Image credit: Three Robots (Love, Death & Robots, 2019)
Lecture overview
—Bill Freeman, Antonio Torralba and Phillip Isola’s MIT 6.869 class
3
…
Classifier
image X
“Fish”
label Y
Image classification
Generator
image X
“Fish”
label Y
Image synthesis
In vision, this is usually what we are interested in! Model of high-dimensional structured data
X is high-dimensional!
Image synthesis via generative modeling
Gaussian noise Synthesized image
Generative Model
Synthesized image “bird”
Conditional Generative Model
Synthesized image “A yellow bird on a branch”
Conditional Generative Model
Synthesized image
Conditional Generative Model
Semantic segmentation
[Long et al. 2015, …]
Edge detection
[Xie et al. 2015, …] [Reed et al. 2014, …]
Text-to-photo
“this small bird has a pink breast and crown…”
Future frame prediction
[Mathieu et al. 2016, …]
Data prediction problems (“structured prediction”)
“Sky” “Sky” “Bird” “Bird”
What’s the object class of the center pixel?
Each prediction is done independently!
Independent prediction per-pixel Find a configuration of compatible labels
[“Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials”, Krahenbuhl and Koltun, NIPS 2011]
Input
Define an objective that penalizes bad structure! (e.g., a graphical model)
Structured prediction
All learning objectives we have seen in this class so far had this form! Per-datapoint least-squares regression: Per-pixels softmax regression:
Unstructured prediction
Structured prediction with a CRF
Model joint configuration of all outputs y
Structured prediction with a generative model
Challenges in visual prediction
Properties of generative models
possible outputs
Objective function (loss) Neural Network
Training data
… Input Output
Image-to-Image Translation
“Wh What at should I do” “Ho How should I do it?” Input Output
Image-to-Image Translation
Input Output Ground truth
Designing loss functions
L2( b Y, Y) = 1 2 X
h,w
Yh,w
2
<latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit><latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit><latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit><latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit>L2( b Y, Y) = 1 2 X
h,w
Yh,w
2
<latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit><latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit><latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit><latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit>Color distribution cross-entropy loss with colorfulness enhancing term. Zhang et al. 2016 Input Ground truth
Designing loss functions
Be careful what you wish for!
Designing loss functions
Image colorization L2 regression Super-resolution
[Johnson, Alahi, Li, ECCV 2016]
L2 regression
[Zhang, Isola, Efros, ECCV 2016]
Designing loss functions
Image colorization Cross entropy objective, with colorfulness term Deep feature covariance matching objective
[Johnson, Alahi, Li, ECCV 2016]
Super-resolution
[Zhang, Isola, Efros, ECCV 2016]
Designing loss functions
Universal loss?
… …
…
Generated vs Real
(classifier)
[Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio 2014]
“Generative Adversarial Network” (GANs)
Real photos
Generated images
…
…
Generator
G tries s to syn synthesi size fake ake imag ages s that at fool D D tries s to identify y the fake akes
Generator Discriminator
real or fake?
fake ake (0 (0.9 .9) real al (0 (0.1 .1)
G tries s to syn synthesi size fake ake imag ages s that at fo fool D: real or fake?
G tries s to syn synthesi size fake ake imag ages s that at fo fool th the best st D: real or fake?
Loss Function
G’s perspective: D is a loss function. Rather than being hand-designed, it is learned.
real or fake?
real!
(“Aquarius”)
real or fake pair ?
real or fake pair ?
fake pair
real al pair
real or fake pair ?
Input Output Input Output Input Output
Data from [Russakovsky et al. 2015]
BW → Color
#edges2cats [Chris Hesse]
Ivy Tasi @ivymyt Vitaly Vidmirov @vvid
Model joint configuration
A GAN, with sufficient capacity, samples from the full joint distribution when perfectly optimized. Most generative models have this property! Give them sufficient capacity and infinite data, and they are the complete solution to prediction problems.
Structured Prediction
1/0 N pixels N pixels
Rather than penalizing if output image looks fake, penalize if each overlapping patch in output looks fake
[Li & Wand 2016] [Shrivastava et al. 2017] [Isola et al. 2017]
Shrinking the capacity: Patch Discriminator
Input 1x1 Discriminator
Data from [Tylecek, 2013]
Labels → Facades
Input 16x16 Discriminator
Data from [Tylecek, 2013]
Labels → Facades
Input 70x70 Discriminator
Data from [Tylecek, 2013]
Labels → Facades
Input Full image Discriminator
Data from [Tylecek, 2013]
Labels → Facades
1/0 N pixels N pixels
Rather than penalizing if output image looks fake, penalize if each overlapping patch in output looks fake
Patch Discriminator
Properties of generative models
possible outputs
—> Use a deep net, D, to model output!
Gaussian noise Synthesized image
Can we generate images from stratch?
Generator
[Goodfellow et al., 2014]
G tries s to syn synthesi size fake ake imag ages s that at fool D D tries s to identify y the fake akes
Generator Discriminator
real or fake?
[Goodfellow et al., 2014]
GANs are implicit generative models
Progressive GAN [Karras et al., 2018]
Progressive GAN [Karras et al., 2018]
61
Semantic layout
sky mountain ground
Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]
457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 47362
Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]
63
prediction
night
Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]
64
prediction
sunset
prediction
Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]
65
snow
prediction
Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]
66
winter
prediction
Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]
67
Spring and clouds
prediction
Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]
68
Moist, rain and fog
prediction
Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]
69
flowers
prediction
Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]
Properties of generative models
possible outputs
—> Use a deep net, D, to model output! —> Generator is stochastic, learns to match data distribution
Paired data
Unpaired data Paired data
Jun-Yan Zhu Taesung Park
real or fake pair ?
real or fake pair ?
No input-output pairs!
real or fake? Usually loss functions check if output matches a target instance GAN loss checks if output is part of an admissible set
Gaussian Target distribution
Horses Zebras
Real!
Real too!
Nothing to force output to correspond to input
[Zhu et al. 2017], [Yi et al. 2017], [Kim et al. 2017]
Cycle-Consistent Adversarial Networks
Cycle-Consistent Adversarial Networks
Cycle Consistency Loss
Cycle Consistency Loss
Collection Style Transfer
Photograph @ Alexei Efros Monet Van Gogh Cezanne Ukiyo-e
Cezanne Ukiyo-e Monet Input Van Gogh
Monet's paintings → photos
Monet's paintings → photos
Failure case
Failure case
a
, t s n
93