CMP722 ADVANCED COMPUTER VISION Lecture #8 Image Synthesis - - PowerPoint PPT Presentation

cmp722
SMART_READER_LITE
LIVE PREVIEW

CMP722 ADVANCED COMPUTER VISION Lecture #8 Image Synthesis - - PowerPoint PPT Presentation

Illustration: StyleGAN trained on Portrait by Yuli-Ban CMP722 ADVANCED COMPUTER VISION Lecture #8 Image Synthesis Aykut Erdem // Hacettepe University // Spring 2019 Image credit: Three Robots (Love, Death & Robots, 2019) Previously


slide-1
SLIDE 1

Lecture #8 – Image Synthesis

Aykut Erdem // Hacettepe University // Spring 2019

CMP722

ADVANCED COMPUTER VISION

Illustration: StyleGAN trained on Portrait by Yuli-Ban

slide-2
SLIDE 2
  • imitation learning
  • reinforcement learning
  • why vision?
  • connecting language and vision to

actions

  • case study: embodied QA

Previously on CMP722

Image credit: Three Robots (Love, Death & Robots, 2019)

slide-3
SLIDE 3

Lecture overview

  • image synthesis via generative models
  • conditional generative models
  • structured vs unstructured prediction
  • image-to-image translation
  • generative adversarial networks
  • cycle-consistent adversarial networks
  • Disclaimer: Much of the material and slides for this lecture were borrowed from

—Bill Freeman, Antonio Torralba and Phillip Isola’s MIT 6.869 class

3

slide-4
SLIDE 4

Classifier

image X

“Fish”

label Y

Image classification

slide-5
SLIDE 5

Generator

image X

“Fish”

label Y

Image synthesis

slide-6
SLIDE 6

In vision, this is usually what we are interested in! Model of high-dimensional structured data

X is high-dimensional!

Image synthesis via generative modeling

slide-7
SLIDE 7

Gaussian noise Synthesized image

Generative Model

slide-8
SLIDE 8

Synthesized image “bird”

Conditional Generative Model

slide-9
SLIDE 9

Synthesized image “A yellow bird on a branch”

Conditional Generative Model

slide-10
SLIDE 10

Synthesized image

Conditional Generative Model

slide-11
SLIDE 11

Semantic segmentation

[Long et al. 2015, …]

Edge detection

[Xie et al. 2015, …] [Reed et al. 2014, …]

Text-to-photo

“this small bird has a pink breast and crown…”

Future frame prediction

[Mathieu et al. 2016, …]

Data prediction problems (“structured prediction”)

slide-12
SLIDE 12

“Sky” “Sky” “Bird” “Bird”

What’s the object class of the center pixel?

Each prediction is done independently!

slide-13
SLIDE 13

Independent prediction per-pixel Find a configuration of compatible labels

[“Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials”, Krahenbuhl and Koltun, NIPS 2011]

Input

slide-14
SLIDE 14

Define an objective that penalizes bad structure! (e.g., a graphical model)

Structured prediction

slide-15
SLIDE 15

All learning objectives we have seen in this class so far had this form! Per-datapoint least-squares regression: Per-pixels softmax regression:

Unstructured prediction

slide-16
SLIDE 16

Structured prediction with a CRF

slide-17
SLIDE 17

Model joint configuration of all outputs y

Structured prediction with a generative model

slide-18
SLIDE 18

Challenges in visual prediction

  • 1. Output is a high-dimensional, structured object
  • 2. Uncertainty in the mapping, many plausible
  • utputs
slide-19
SLIDE 19

Properties of generative models

  • 1. Model high-dimensional, structured output
  • 2. Model uncertainty; a whole distribution of

possible outputs

slide-20
SLIDE 20

Objective function (loss) Neural Network

Training data

… Input Output

Image-to-Image Translation

slide-21
SLIDE 21

“Wh What at should I do” “Ho How should I do it?” Input Output

Image-to-Image Translation

slide-22
SLIDE 22

Input Output Ground truth

Designing loss functions

L2( b Y, Y) = 1 2 X

h,w

  • Yh,w − b

Yh,w

  • 2

2

<latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit><latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit><latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit><latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit>
slide-23
SLIDE 23

L2( b Y, Y) = 1 2 X

h,w

  • Yh,w − b

Yh,w

  • 2

2

<latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit><latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit><latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit><latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit>
slide-24
SLIDE 24

Color distribution cross-entropy loss with colorfulness enhancing term. Zhang et al. 2016 Input Ground truth

Designing loss functions

slide-25
SLIDE 25
slide-26
SLIDE 26

Be careful what you wish for!

Designing loss functions

slide-27
SLIDE 27

Image colorization L2 regression Super-resolution

[Johnson, Alahi, Li, ECCV 2016]

L2 regression

[Zhang, Isola, Efros, ECCV 2016]

Designing loss functions

slide-28
SLIDE 28

Image colorization Cross entropy objective, with colorfulness term Deep feature covariance matching objective

[Johnson, Alahi, Li, ECCV 2016]

Super-resolution

[Zhang, Isola, Efros, ECCV 2016]

Designing loss functions

slide-29
SLIDE 29

Universal loss?

… …

slide-30
SLIDE 30

Generated vs Real

(classifier)

[Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio 2014]

“Generative Adversarial Network” (GANs)

Real photos

Generated images

slide-31
SLIDE 31

Generator

slide-32
SLIDE 32

G tries s to syn synthesi size fake ake imag ages s that at fool D D tries s to identify y the fake akes

Generator Discriminator

real or fake?

slide-33
SLIDE 33

fake ake (0 (0.9 .9) real al (0 (0.1 .1)

slide-34
SLIDE 34

G tries s to syn synthesi size fake ake imag ages s that at fo fool D: real or fake?

slide-35
SLIDE 35

G tries s to syn synthesi size fake ake imag ages s that at fo fool th the best st D: real or fake?

slide-36
SLIDE 36

Loss Function

G’s perspective: D is a loss function. Rather than being hand-designed, it is learned.

slide-37
SLIDE 37

real or fake?

slide-38
SLIDE 38

real!

(“Aquarius”)

slide-39
SLIDE 39

real or fake pair ?

slide-40
SLIDE 40

real or fake pair ?

slide-41
SLIDE 41

fake pair

slide-42
SLIDE 42

real al pair

slide-43
SLIDE 43

real or fake pair ?

slide-44
SLIDE 44

Input Output Input Output Input Output

Data from [Russakovsky et al. 2015]

BW → Color

slide-45
SLIDE 45

#edges2cats [Chris Hesse]

slide-46
SLIDE 46

Ivy Tasi @ivymyt Vitaly Vidmirov @vvid

slide-47
SLIDE 47

Model joint configuration

  • f all pixels

A GAN, with sufficient capacity, samples from the full joint distribution when perfectly optimized. Most generative models have this property! Give them sufficient capacity and infinite data, and they are the complete solution to prediction problems.

Structured Prediction

slide-48
SLIDE 48

1/0 N pixels N pixels

Rather than penalizing if output image looks fake, penalize if each overlapping patch in output looks fake

[Li & Wand 2016] [Shrivastava et al. 2017] [Isola et al. 2017]

Shrinking the capacity: Patch Discriminator

slide-49
SLIDE 49

Input 1x1 Discriminator

Data from [Tylecek, 2013]

Labels → Facades

slide-50
SLIDE 50

Input 16x16 Discriminator

Data from [Tylecek, 2013]

Labels → Facades

slide-51
SLIDE 51

Input 70x70 Discriminator

Data from [Tylecek, 2013]

Labels → Facades

slide-52
SLIDE 52

Input Full image Discriminator

Data from [Tylecek, 2013]

Labels → Facades

slide-53
SLIDE 53

1/0 N pixels N pixels

Rather than penalizing if output image looks fake, penalize if each overlapping patch in output looks fake

  • Faster, fewer parameters
  • More supervised observations
  • Applies to arbitrarily large images

Patch Discriminator

slide-54
SLIDE 54

Properties of generative models

  • 1. Model high-dimensional, structured output
  • 2. Model uncertainty; a whole distribution of

possible outputs

—> Use a deep net, D, to model output!

slide-55
SLIDE 55

Gaussian noise Synthesized image

Can we generate images from stratch?

slide-56
SLIDE 56

Generator

[Goodfellow et al., 2014]

slide-57
SLIDE 57

G tries s to syn synthesi size fake ake imag ages s that at fool D D tries s to identify y the fake akes

Generator Discriminator

real or fake?

[Goodfellow et al., 2014]

slide-58
SLIDE 58

GANs are implicit generative models

slide-59
SLIDE 59

Progressive GAN [Karras et al., 2018]

slide-60
SLIDE 60

Progressive GAN [Karras et al., 2018]

slide-61
SLIDE 61

61

Semantic layout

sky mountain ground

Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]

457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473
slide-62
SLIDE 62

62

Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]

slide-63
SLIDE 63

63

prediction

night

Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]

slide-64
SLIDE 64

64

prediction

sunset

prediction

Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]

slide-65
SLIDE 65

65

snow

prediction

Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]

slide-66
SLIDE 66

66

winter

prediction

Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]

slide-67
SLIDE 67

67

Spring and clouds

prediction

Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]

slide-68
SLIDE 68

68

Moist, rain and fog

prediction

Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]

slide-69
SLIDE 69

69

flowers

prediction

Manipulating Attributes of Natural Scenes via Hallucination [Karacan et al., 2018]

slide-70
SLIDE 70

Properties of generative models

  • 1. Model high-dimensional, structured output
  • 2. Model uncertainty; a whole distribution of

possible outputs

—> Use a deep net, D, to model output! —> Generator is stochastic, learns to match data distribution

slide-71
SLIDE 71

Paired data

slide-72
SLIDE 72

Unpaired data Paired data

Jun-Yan Zhu Taesung Park

slide-73
SLIDE 73

real or fake pair ?

slide-74
SLIDE 74

real or fake pair ?

No input-output pairs!

slide-75
SLIDE 75

real or fake? Usually loss functions check if output matches a target instance GAN loss checks if output is part of an admissible set

slide-76
SLIDE 76

Gaussian Target distribution

slide-77
SLIDE 77

Horses Zebras

slide-78
SLIDE 78

Real!

slide-79
SLIDE 79

Real too!

Nothing to force output to correspond to input

slide-80
SLIDE 80

[Zhu et al. 2017], [Yi et al. 2017], [Kim et al. 2017]

Cycle-Consistent Adversarial Networks

slide-81
SLIDE 81

Cycle-Consistent Adversarial Networks

slide-82
SLIDE 82

Cycle Consistency Loss

slide-83
SLIDE 83

Cycle Consistency Loss

slide-84
SLIDE 84
slide-85
SLIDE 85
slide-86
SLIDE 86

Collection Style Transfer

Photograph @ Alexei Efros Monet Van Gogh Cezanne Ukiyo-e

slide-87
SLIDE 87

Cezanne Ukiyo-e Monet Input Van Gogh

slide-88
SLIDE 88

Monet's paintings → photos

slide-89
SLIDE 89

Monet's paintings → photos

slide-90
SLIDE 90

Failure case

slide-91
SLIDE 91

Failure case

slide-92
SLIDE 92
  • Fig. 1. The pGAN method is based on a conditional adversarial network with
  • Image Synthesis in Multi-Contrast MRI [Ul Hassan Dar et al. 2019]

a

  • f
  • e

, t s n

  • Fig. 3. The proposed approach was demonstrated for synthesis of T -weighted
  • Fig. 2. The cGAN method is based on a conditional adversarial network with
slide-93
SLIDE 93

Next Lecture: Graph Networks

93