StackGAN Text to Photo-realistic Image Synthesis with Stacked - PowerPoint PPT Presentation

Apr 28, 2023 •718 likes •884 views

StackGAN Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks The Problem: 2-Stage Network Stage 1. Generates 64x64 images Structural information Low detail Stage 2. Requires

StackGAN Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
The Problem:
2-Stage Network ● Stage 1. ○ Generates 64x64 images ○ Structural information ○ Low detail ● Stage 2. ○ Requires Stage 1. output ○ Upsamples to 256x256 ○ Higher detail, photorealistic Both stages take in the same conditioned textual input
Generalized Adversarial Networks (GAN) Composed of two models that are alternatively trained to compete with each other. ● The Generator G optimized to generate images that are difficult for the ○ discriminator D to differentiate from real images. ● The Discriminator D ○ optimized to distinguish real images from the synthetic images generated by G .
Loss Functions Scores from The Discriminator: Then alternate: Maximizing and minimizing
Stage-I Generator ● c - vector representing input sentence ● z - noise sampled from a unit gaussian distribution
Actually Creating Images Nice Deconvolution Animation But really they’re upsampling the activation maps using nearest neighbors-- then applying deconvolution
Stage-I Discriminator Down-Sampling ● Images ○ Stride-2 convolutions, Batch Norm., Leaky ReLU ○ 64 x 64 x 3 → 4 x 4 x 1024 ● Text ○ Fully-connected layer: � t → 128 ○ Spatially replicate to 4 x 4 x 128 ● Depth Concatenate ○ Total of 4 x 4 x 1152 Score ● 1x1 convolution, followed by 4x4 convolution ○ Produces scalar value between 0 and 1
Stage-II Generator ● Takes in… ○ Stage-I’s image ○ ‘Conditioned augmentation’ representing input text ● Downsampling via CNN, Batch Norm, Leaky Relu ● Residual Blocks, similar to ResNet ○ To jointly encode image and text features
Conditioning Augmentation Text Encoding ● Uses a “hybrid character-level convolutional recurrent neural network” ● Same as Reed et al. “GAN Text to Image Synthesis” paper Augmentation ● Randomly sample “latent variables” from the independent Gaussian distribution Ɲ ( � ( � t ), � ( � t ))
Variations due purely to Conditioning Augmentation The noise vector z and the text encoding vector � are fixed for each row. Only the samples from the distribution Ɲ ( � ( � t ), � ( � t )) actually change between images.
Stage-II Discriminator Down-sampling ● Same as Stage-I, but more layers Loss functions ● Same as before, but now G is “encourage[d] to extract previously ignored information” in order to trick a more perceptive and detail-oriented D .
Evaluation ● State of the art Inception score, 28.47% and 20.30% improvement ● People seem to like the results, too

Recommend

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis StackGAN, AttnGAN, TAGAN, ObjGAN Text-to-Video Synthesis GAN-based methods, VAE-based methods, StoryGAN Dialogue-based Image Synthesis

827 views • 45 slides

New Parameter Choice Rules for Regularization with Mixed Gaussian and Poissonian Noise Elias

Regularization Frequentist Approaches to Parameter Selection Numerical Results (preliminary) New Parameter Choice Rules for Regularization with Mixed Gaussian and Poissonian Noise Elias Helou (joint work with Alvaro De Pierro) ICMC - USP

856 views • 48 slides

Frames and Gabor Wavelets Carlo Tomasi A simple technical point: With sufficient sampling

Frames and Gabor Wavelets Carlo Tomasi A simple technical point: With sufficient sampling density, a discrete family of Gabor wavelets is a frame, so the representation is 1-1 Gabor frames can be made to be (simultaneously) very snug

79 views • 4 slides

Probability Review II Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline

Probability Review II Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline Random Variables Defining Random Variables Cumulative Distribution Functions Joint Distributions Conditioning and Independence Transformations of

1.38k views • 62 slides

Constrained MCMC Algorithms for ERG models Duy Vu and David Hunter Constraints ergm uses

Constrained MCMC Algorithms for ERG models Duy Vu and David Hunter Constraints ergm uses MCMC to handle the normalization constant in ML estimation of ERG models. The need of generating graphs randomly conditioning on some network

350 views • 9 slides

Microfluidic Sample Preparation: Opportunities, Challenges and Visual Proteomics Thomas

Thomas Braun Center for Cellular Imaging and NanoAnalytics Biozentrum, University of Basel thomas.braun@unibas.ch Microfluidic Sample Preparation: Opportunities, Challenges and Visual Proteomics Thomas Braun Center for Cellular

689 views • 43 slides

Radial Basis Function Generated Finite Differences (RBF FD): Basic Concepts and Some

Radial Basis Function Generated Finite Differences (RBF FD): Basic Concepts and Some Applications Bengt Fornberg University of Colorado, Boulder, Department of Applied Mathematics Natasha Flyer NCAR, IMAGe Institute for Mathematics Applied to

448 views • 21 slides

Invariances in Gaussian processes And how to learn them ST John PROWLER.io Outline 1. What

Invariances in Gaussian processes And how to learn them ST John PROWLER.io Outline 1. What are invariances? 2. Why do we want to make use of them? 3. How can we construct invariant GPs? 4. Where invariant GPs are actually crucial 5.

1.15k views • 75 slides

Phonology 9/10/2010 Key Words / Concepts Phonology vs. phonetics Phoneme vs. allophone

Linguistics 203 Phonology 9/10/2010 Key Words / Concepts Phonology vs. phonetics Phoneme vs. allophone Distribution types: contrastive / complimentary / free variation Distinctive feature Minimal Pair Phonetics vs.

459 views • 26 slides

Modern Discrete Probability III - Stopping times and martingales Review S ebastien Roch

Conditioning Stopping times Martingales Modern Discrete Probability III - Stopping times and martingales Review S ebastien Roch UWMadison Mathematics October 15, 2014 S ebastien Roch, UWMadison Modern Discrete Probability

837 views • 44 slides

An h-adaptive unfitted finite element method for interface elliptic boundary value problems Eric

An h-adaptive unfitted finite element method for interface elliptic boundary value problems Eric Neiva 1 , 3 Santiago Badia 2 , 3 Monash Workshop on Numerical Differential Equations and Applications 2020, MWNDEA 2020 , Feb. 2020. 1 Universitat

731 views • 25 slides

Reinforcement Learning in Psychology and Neuroscience with thanks to Elliot Ludvig University

Reinforcement Learning in Psychology and Neuroscience with thanks to Elliot Ludvig University of Warwick Bidirectional Influences Psychology Artificial Intelligence Reinforcement Learning Control Neuroscience Theory Any information

712 views • 32 slides

Draft Conditioning by Permutation Monte Carlo for Continuous-Time Markov Chains Pierre

1 Draft Conditioning by Permutation Monte Carlo for Continuous-Time Markov Chains Pierre LEcuyer Universit e de Montr eal, Canada joint work with Zdravko Botev , New South Wales University, Australia Rohan Shah , The University of

486 views • 26 slides

Logical Behaviorism vs. Behaviorism (in psychology) vs. Behavioral Psychology Reflex Theory

Logical Behaviorism vs. Behaviorism (in psychology) vs. Behavioral Psychology Reflex Theory Classical Conditioning The Law of Effect (a.k.a. instrumental conditioning, operant conditioning) Drive Theory Problems with Drive Theory

452 views • 21 slides

The condensation threshold in stochastic block models Joe Neeman (with Jess Banks, Cris Moore,

The condensation threshold in stochastic block models Joe Neeman (with Jess Banks, Cris Moore, Praneeth Netrapalli) Austin, May 9, 2016 1 Stochastic block model G ( n , k , a , b ) 1. n nodes, k colors, about n / k nodes of each color a

598 views • 47 slides

Positive Reinforcement Training A Primer How Dogs (and People) Learn If a dog does something

2/18/19 Positive Reinforcement Training A Primer How Dogs (and People) Learn If a dog does something and it is followed by something he likes, he will do that behavior again Examples Counter-surfing Jumping on people Sitting to

577 views • 6 slides

Communication Model David Woodruff IBM Almaden k-party Number-In-Hand Model P 1 x 1 -

Tutorial: Message Passing Communication Model David Woodruff IBM Almaden k-party Number-In-Hand Model P 1 x 1 - Point-to-point P k P 2 communication x 2 x k - Protocol transcript P 3 x 3 determines who speaks next P 4 x 4 Goals: -

700 views • 33 slides

Introduction to Reinforcement Learning A. LAZARIC ( SequeL Team @INRIA-Lille ) ENS Cachan - Master

Introduction to Reinforcement Learning A. LAZARIC ( SequeL Team @INRIA-Lille ) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course A Bit of History: From Psychology to Machine Learning A Bit of History From Psychology to Machine

482 views • 27 slides

ALM: An R Package for Simulating Associative Learning Models Ching-Fan Sheu & Teng-Chang

ALM: An R Package for Simulating Associative Learning Models Ching-Fan Sheu & Teng-Chang Cheng National Cheng Kung University, Taiwan 9 July 2009 Sheu & Cheng (NCKU) ALM 9 July 2009 1 / 24 Outline 1 Introduction & Motivation 2

254 views • 24 slides

Pavlovian, Skinner and other Intelligence. Desription Learning Behaviourism reign behaviourists

Abstract Pavlovian, Skinner and other Intelligence. Desription Learning Behaviourism reign behaviourists contribution to AI Biological learning AI and animal learning Witold KOSI Propositions for solution NSKI Dominika

256 views • 12 slides

CSE/NEURO 528 Lecture 13: Reinforcement Learning & Course Review (Chapter 9) 1

CSE/NEURO 528 Lecture 13: Reinforcement Learning & Course Review (Chapter 9) 1 Animation: Tom Creed, SJU Early Results: Pavlov and his Dog F Classical (Pavlovian) conditioning experiments F Training: Bell Food F After: Bell

425 views • 18 slides

CS449/649: Human-Computer Interaction Spring 2017 Lecture VI Anastasia Kuzminykh Translating

CS449/649: Human-Computer Interaction Spring 2017 Lecture VI Anastasia Kuzminykh Translating Needs Into Functionalities Make data Identify right time Turn problems actionable and place into tasks Translating Needs Into Functionalities

409 views • 37 slides

ControlBasis-III double recommended-setpoint(actions[NACTIONS][NDOF]; (scope is a project

Recommended Supervisor Structure action set A ControlBasis-III double recommended-setpoint(actions[NACTIONS][NDOF]; (scope is a project i.e. a task supervisor) automating sequential control composition Inside all supervisors: state = 0;

444 views • 13 slides

Entropy and mixing for Z d SFTs Ronnie Pavlov University of Denver www.math.du.edu/ rpavlov

Non-uniqueness of MMEs for iceberg model Uniqueness of MME for hard square shift Entropy and mixing for Z d SFTs Ronnie Pavlov University of Denver www.math.du.edu/ rpavlov 1st School on Dynamical Systems and Computation (DySyCo) CMM,

820 views • 60 slides