Illustration: StyleGAN trained on Portrait by Yuli-Ban CMP722 ADVANCED COMPUTER VISION Lecture #8 – Image Synthesis Aykut Erdem // Hacettepe University // Spring 2019
Image credit: Three Robots (Love, Death & Robots, 2019) Previously on CMP722 • imitation learning • reinforcement learning • why vision? • connecting language and vision to actions • case study: embodied QA
Lecture overview • image synthesis via generative models • conditional generative models • structured vs unstructured prediction • image-to-image translation • generative adversarial networks • cycle-consistent adversarial networks • Disclaimer: Much of the material and slides for this lecture were borrowed from — Bill Freeman, Antonio Torralba and Phillip Isola’s MIT 6.869 class 3
Image classification Classifier “Fish” … image X label Y
Image synthesis Generator “Fish” label Y image X
Image synthesis via generative modeling X is high-dimensional! Model of high-dimensional structured data In vision, this is usually what we are interested in!
Generative Model Gaussian noise Synthesized image
Conditional Generative Model “bird” Synthesized image
Conditional Generative Model “A yellow bird on a branch” Synthesized image
Conditional Generative Model Synthesized image
Data prediction problems (“structured prediction”) Semantic segmentation Edge detection [Xie et al. 2015, …] [Long et al. 2015, …] Future frame prediction Text-to-photo “ this small bird has a pink breast and crown…” [Reed et al. 2014, …] [Mathieu et al. 2016, …]
What’s the object class of the center pixel? “Bird” “Bird” “Sky” “Sky” Each prediction is done independently!
Independent prediction Find a configuration of Input per-pixel compatible labels [“Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials”, Krahenbuhl and Koltun, NIPS 2011]
Structured prediction Define an objective that penalizes bad structure! (e.g., a graphical model)
Unstructured prediction All learning objectives we have seen in this class so far had this form! Per-datapoint least-squares regression: Per-pixels softmax regression:
Structured prediction with a CRF
Structured prediction with a generative model Model joint configuration of all outputs y
Challenges in visual prediction 1. Output is a high-dimensional, structured object 2. Uncertainty in the mapping, many plausible outputs
Properties of generative models 1. Model high-dimensional, structured output 2. Model uncertainty; a whole distribution of possible outputs
Image-to-Image Translation Input Output Training data Objective function Neural Network … (loss)
Image-to-Image Translation Input Output “ Ho How should I do it?” “ Wh What at should I do”
<latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit> <latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit> <latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit> <latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit> Designing loss functions Input Output Ground truth � � X Y , Y ) = 1 2 � � L 2 ( b � Y h,w − b Y h,w � 2 2 h,w
<latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit> <latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit> <latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit> <latexit sha1_base64="xJ29wZXxweyTFS1NO2hizipUnRQ=">ACdXicbVFNT9wEHVSaGFLy0IvlSokl6UIJNgmqFJ7qYTaSw89gMQC1WaJHO9kY2EnkT0pWrn5B/13PgbXHqts6SifIw0vOb9zjcVJKYTAIrjz/ydz802cLi53nSy9eLndXVo9NUWkOA17IQp8mzIAUOQxQoITUgNTiYST5PxrUz/5CdqIj/CaQkjxSa5SAVn6Ki4+ztSDOt7Pc6tnv1VnQhxpAxtDM+Se2Put6ht4ftz1GqGbdh7dQ0MpWKbZDL+pIQorRr1tly+8+duM/jxaTzJmazmcu424v6AezoA9B2IeaeMg7l5G4JXCnLkhkzDIMSR5ZpFxC3YkqAyXj52wCQwdzpsCM7GxrNX3nmDFNC+0yRzpj/3dYpoyZqsQpm8nN/VpDPlYbVph+GlmRlxVCzm8apZWkWNDmC+hYaOAopw4wroWblfKMua2i+6iOW0J4/8kPwfFePwz64eGH3v6Xdh0L5A1ZJ1skJB/JPvlGDsiAcHLtvfbeuveH3/N3/A3b6S+13pekTvhv/8LmvXBtw=</latexit> � � X Y , Y ) = 1 2 � � L 2 ( b � Y h,w − b Y h,w � 2 2 h,w
Designing loss functions Input Zhang et al. 2016 Ground truth Color distribution cross-entropy loss with colorfulness enhancing term.
Designing loss functions Be careful what you wish for!
Designing loss functions Image colorization L2 regression [Zhang, Isola, Efros, ECCV 2016] Super-resolution L2 regression [Johnson, Alahi, Li, ECCV 2016]
Designing loss functions Image colorization Cross entropy objective, with colorfulness term [Zhang, Isola, Efros, ECCV 2016] Super-resolution Deep feature covariance matching objective [Johnson, Alahi, Li, ECCV 2016]
Universal loss? … …
Generated images “Generative Adversarial Network” (GANs) Generated … … vs Real (classifier) Real photos [Goodfellow, Pouget-Abadie, Mirza, Xu, … Warde-Farley, Ozair, Courville, Bengio 2014]
Generator
real or fake? Generator Discriminator G tries at fool D s to syn synthesi size fake ake imag ages s that D tries s to identify y the fake akes
fake ake (0 (0.9 .9) real al (0 (0.1 .1)
real or fake? G tries fool D : s to syn synthesi size fake ake imag ages s that at fo
real or fake? G tries st D : s to syn synthesi size fake ake imag ages s that at fo fool th the best
Loss Function G ’s perspective: D is a loss function. Rather than being hand-designed, it is learned .
real or fake?
real! (“Aquarius”)
Recommend
More recommend