cmp722
play

CMP722 ADVANCED COMPUTER VISION Lecture #8 Image Synthesis - PowerPoint PPT Presentation

Illustration: StyleGAN trained on Portrait by Yuli-Ban CMP722 ADVANCED COMPUTER VISION Lecture #8 Image Synthesis Aykut Erdem // Hacettepe University // Spring 2019 Image credit: Three Robots (Love, Death & Robots, 2019) Previously


  1. Illustration: StyleGAN trained on Portrait by Yuli-Ban CMP722 ADVANCED COMPUTER VISION Lecture #8 – Image Synthesis Aykut Erdem // Hacettepe University // Spring 2019

  2. Image credit: Three Robots (Love, Death & Robots, 2019) Previously on CMP722 • imitation learning • reinforcement learning • why vision? • connecting language and vision to actions • case study: embodied QA

  3. Lecture overview • image synthesis via generative models • conditional generative models • structured vs unstructured prediction • image-to-image translation • generative adversarial networks • cycle-consistent adversarial networks • Disclaimer: Much of the material and slides for this lecture were borrowed from — Bill Freeman, Antonio Torralba and Phillip Isola’s MIT 6.869 class 3

  4. Image classification Classifier “Fish” … image X label Y

  5. Image synthesis Generator “Fish” label Y image X

  6. Image synthesis via generative modeling X is high-dimensional! Model of high-dimensional structured data In vision, this is usually what we are interested in!

  7. Generative Model Gaussian noise Synthesized image

  8. Conditional Generative Model “bird” Synthesized image

  9. Conditional Generative Model “A yellow bird on a branch” Synthesized image

  10. Conditional Generative Model Synthesized image

  11. Data prediction problems (“structured prediction”) Semantic segmentation Edge detection [Xie et al. 2015, …] [Long et al. 2015, …] Future frame prediction Text-to-photo “ this small bird has a pink breast and crown…” [Reed et al. 2014, …] [Mathieu et al. 2016, …]

  12. What’s the object class of the center pixel? “Bird” “Bird” “Sky” “Sky” Each prediction is done independently!

  13. Independent prediction Find a configuration of Input per-pixel compatible labels [“Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials”, Krahenbuhl and Koltun, NIPS 2011]

  14. Structured prediction Define an objective that penalizes bad structure! (e.g., a graphical model)

  15. Unstructured prediction All learning objectives we have seen in this class so far had this form! Per-datapoint least-squares regression: Per-pixels softmax regression:

  16. Structured prediction with a CRF

  17. Structured prediction with a generative model Model joint configuration of all outputs y

  18. Challenges in visual prediction 1. Output is a high-dimensional, structured object 2. Uncertainty in the mapping, many plausible outputs

  19. Properties of generative models 1. Model high-dimensional, structured output 2. Model uncertainty; a whole distribution of possible outputs

  20. Image-to-Image Translation Input Output Training data Objective function Neural Network … (loss)

  21. Image-to-Image Translation Input Output “ Ho How should I do it?” “ Wh What at should I do”

  22. <latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit> <latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit> <latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit> <latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit> Designing loss functions Input Output Ground truth � � X Y , Y ) = 1 2 � � L 2 ( b � Y h,w − b Y h,w � 2 2 h,w

  23. <latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit> <latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit> <latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit> <latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit> � � X Y , Y ) = 1 2 � � L 2 ( b � Y h,w − b Y h,w � 2 2 h,w

  24. Designing loss functions Input Zhang et al. 2016 Ground truth Color distribution cross-entropy loss with colorfulness enhancing term.

  25. Designing loss functions Be careful what you wish for!

  26. Designing loss functions Image colorization L2 regression [Zhang, Isola, Efros, ECCV 2016] Super-resolution L2 regression [Johnson, Alahi, Li, ECCV 2016]

  27. Designing loss functions Image colorization Cross entropy objective, with colorfulness term [Zhang, Isola, Efros, ECCV 2016] Super-resolution Deep feature covariance matching objective [Johnson, Alahi, Li, ECCV 2016]

  28. Universal loss? … …

  29. Generated images “Generative Adversarial Network” (GANs) Generated … … vs Real (classifier) Real photos [Goodfellow, Pouget-Abadie, Mirza, Xu, … Warde-Farley, Ozair, Courville, Bengio 2014]

  30. Generator

  31. real or fake? Generator Discriminator G tries at fool D s to syn synthesi size fake ake imag ages s that D tries s to identify y the fake akes

  32. fake ake (0 (0.9 .9) real al (0 (0.1 .1)

  33. real or fake? G tries fool D : s to syn synthesi size fake ake imag ages s that at fo

  34. real or fake? G tries st D : s to syn synthesi size fake ake imag ages s that at fo fool th the best

  35. Loss Function G ’s perspective: D is a loss function. Rather than being hand-designed, it is learned .

  36. real or fake?

  37. real! (“Aquarius”)

Recommend


More recommend