ZOOM, ENHANCE, SYNTHESIZE! MAGIC UPSCALING AND MATERIAL SYNTHESIS USING DEEP LEARNING Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies
DEEP LEARNING FOR ART Active R&D but ready now ▪ Style transfer ▪ Generative networks creating images and voxels ▪ Adversarial networks (DCGAN) – still early but promising ▪ DL & ML based tools from NVIDIA and partners ▪ NVIDIA ▪ Artomatix ▪ Allegorithmic ▪ Autodesk 2
STYLE TRANSFER Something Fun ▪ Doodle a masterpiece! Content Style ▪ Uses CNN to take the “style” from one image and apply it to another ▪ Sept 2015: A Neural Algorithm of Artistic Style by Gatys et al ▪ Dec 2015: neural-style (github) ▪ Mar 2016: neural-doodle (github) Mar 2016: texture-nets (github) ▪ Oct 2016: fast-neural-style (github) ▪ 2 May 2017 (last week!): Deep Image Analogy (arXiv) ▪ ▪ Also numerous services: Vinci, Prisma, Artisto, Ostagram 3
HTTP :// OSTAGRAM . RU / STATIC _ PAGES / LENTA 4
STYLE TRANSFER Something Useful ▪ Game remaster & texture enhancement ▪ Try Neural Style and use a real- world photo for the “style” ▪ For stylized or anime up-rez try https://github.com/nagadomi/waifu2x ▪ Experiment with art styles ▪ Dream or power-up sequences ▪ “Come Swim” by Kirsten Stewart - https://arxiv.org/pdf/1701.04928v1.pdf 5
GAMEWORKS: MATERIALS & TEXTURES Using DL for Game Development & Content Creation ▪ Set of tools targeting the game industry using machine learning and deep learning ▪ Launched at Game Developer Conference in March, tools run as a web service ▪ Sign up for the Beta at: https://gwmt.nvidia.com ▪ Tools in this initial release: ▪ Photo to Material: 2shot ▪ Texture Multiplier ▪ Super-Resolution 6
PHOTO TO MATERIAL The 2Shot Tool ▪ From two photos of a surface, generate a “material” ▪ Based on a SIGGRAPH 2015 paper by NVIDIA Research & Aalto University (Finland) ▪ “Two - Shot SVBRDF Capture for Stationary Materials” ▪ https://mediatech.aalto.fi/publications/graphics/TwoShotSVBRDF/ ▪ Input is pixel aligned “flash” and “guide” photographs ▪ Use tripod and remote shutter or bracket ▪ Or align later ▪ Use for flat surfaces with repeating patterns 7
MATERIAL SYNTHESIS FROM TWO PHOTOS Flash image Guide image Diffuse Specular Normals Glossiness Anisotropy albedo 8
TEXTURE MULTIPLIER Organic variations of textures ▪ Put simply: texture in, new texture out ▪ Inspired by Gatys, Ecker & Bethge ▪ Texture Synthesis Using Convolutional Neural Networks ▪ https://arxiv.org/pdf/1505.07376.pdf ▪ Artomatix ▪ Similar product “Texture Mutation” ▪ https://artomatix.com/ 9
SUPER RESOLUTION 10
SUPER RESOLUTION Zoom.. ENHANCE! OK! Sure! Can you Zoom in on the enhance that? license plate 11
SUPER RESOLUTION Construct a high-resolution image The task at hand Given a low-resolution image Upscale H n * H (magic?) W n * W 12
UPSCALE: CREATE MORE PIXELS An ill-posed task? Pixels of the upscaled image ? ? ? Pixels of the given image ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 13
TRADITIONAL APPROACH ▪ Interpolation (bicubic, lanczos, etc.) ▪ Interpolation + Sharpening (and other filtration) Interpolation Filter-based sharpening ▪ Rough estimation of the data behavior too general ▪ Too many possibilities (8x8 grayscale has 256 (8∗8) ≈ 10 153 pixel combinations!) 14
A NEW APPROACH First: narrow the possible set Photos Natural images Textures All possible images Focus on the domain of “natural images” 15
A NEW APPROACH Second: Place image in the domain, then reconstruct Data from natural images is sparse, it’s compressible in some domain Then “reconstruct” images (rather than create new ones) Compress Reconstruct + prior information + constraints 16
PATCH-BASED MAPPING: TRAINING Low-resolution patch High-resolution patch Mapping Model params training , LR,HR pairs of patches Training images 17
PATCH-BASED MAPPING 𝒚 𝑰 𝒚 𝑴 Encode Decode LR patch HR patch High-level information about the patch 18
PATCH-BASED MAPPING: SPARSE CODING 𝒚 𝑰 𝒚 𝑴 Encode Decode LR patch HR patch High-level information about the patch “Features” Sparse code 19
PATCH FEATURES & RECONSTRUCTION Image patch can be reconstructed as a sparse linear combination of features Features are learned from the dataset over time 𝑬 𝒚 = 𝑬𝒜 = 𝒆 𝟐 𝒜 𝟐 + ⋯ + 𝒆 𝑳 𝒜 𝑳 𝑬 - dictionary 𝒚 - patch = 0.8 * + 0.3 * + 0.5 * 𝒜 - sparse code 𝒚 𝒆 𝟒𝟕 𝒆 𝟓𝟑 𝒆 𝟕𝟒 20
GENERALIZED PATCH-BASED MAPPING Mapping in Mapping Mapping feature space LR patch High-level High-level HR patch representation of representation of the LR patch the HR patch “Features” 21
GENERALIZED PATCH-BASED MAPPING Mapping in Mapping Mapping feature space 𝑋 𝑋 𝑋 1 2 3 LR patch HR patch Trainable parameters 22
MAPPING OF THE WHOLE IMAGE Using Convolutions Convolutional operators HR image LR image Mapping Mapping in Mapping feature space 23
AUTO-ENCODERS input output ≈ input 24
AUTO-ENCODER Decode Encode input output ≈ input features 25
AUTO-ENCODER Parameters 𝑋 Inference 𝑧 = 𝐺 𝑋 (𝑦) 𝑦 𝑧 Training 𝑋 = 𝑏𝑠𝑛𝑗𝑜 𝐸𝑗𝑡𝑢(𝑦 𝑗 , 𝐺 𝑋 𝑦 𝑗 ) 𝑗 𝑦 𝑗 - training set 26
AUTO-ENCODER Encode ▪ Our encoder is LOSSY by definition input information loss 27
SUPER-RESOLUTION AUTO-ENCODER Parameters 𝑋 Inference 𝑧 = 𝐺 𝑋 (𝑦) 𝑦 𝑧 Training 𝑋 = 𝑏𝑠𝑛𝑗𝑜 𝐸𝑗𝑡𝑢(𝑦 𝑗 , 𝐺 𝑋 𝑦 𝑗 ) 𝑗 𝑦 𝑗 - training set 28
SUPER RESOLUTION AE: TRAINING y 𝑦 𝑦 ො 𝐺 W 𝐸 Downscaling SR AE 𝑋 LR image Ground-truth HR image Reconstructed HR image 𝑋 = 𝑏𝑠𝑛𝑗𝑜 𝐸𝑗𝑡𝑢(𝑦 𝑗 , 𝐺 𝑋 𝐸(𝑦 𝑗 ) ) 𝑗 𝑦 𝑗 - training set 29
SUPER RESOLUTION AE: INFERENCE y 𝑦 ො 𝐺 W SR AE 𝑋 Given LR image Constructed HR image 𝑧 = 𝐺 𝑋 (ො 𝑦) 30
SUPER-RESOLUTION: ILL-POSED TASK? 31
THE LOSS FUNCTION 32
THE LOSS FUNCTION Measuring the “distance” from a good result Distance function is a key element to obtaining good results. 𝑋 = 𝑏𝑠𝑛𝑗𝑜 𝐸 𝑦 𝑗 , 𝐺 𝑋 (𝑦 𝑗 ) 𝑗 Choice of the loss function is an important decision 33
LOSS FUNCTION MSE Mean Squared Error 1 2 𝑂 𝑦 − 𝐺 𝑦 34
LOSS FUNCTION: PSNR MSE PSNR Mean Squared Error Peak Signal-to-Noise Ratio 1 𝑁𝐵𝑌 2 2 𝑂 𝑦 − 𝐺 𝑦 10 ∗ 𝑚𝑝 10 𝑁𝑇𝐹 35
LOSS FUNCTION: HFEN MSE PSNR Mean Squared Error Peak Signal-to-Noise Ratio 1 𝑁𝐵𝑌 2 2 𝑂 𝑦 − 𝐺 𝑦 10 ∗ 𝑚𝑝 10 𝑁𝑇𝐹 HFEN (see A) High-Pass filter High Frequency Error Norm 𝐼𝑄(𝑦 − 𝐺 𝑦 ) 2 Perceptual loss Ref A: http://ieeexplore.ieee.org/document/5617283/ 36
REGULAR LOSS Result 4x Result 4x 37
REGULAR LOSS + PERCEPTUAL LOSS Result 4x Result 4x 38
WARNING… THIS IS EXPERIMENTAL! 39
SUPER-RESOLUTION: GAN-BASED LOSS 𝐺(𝑦) real 𝑦 𝑧 𝐸(𝑧) Generator Discriminator fake = −𝑚𝑜𝐸(𝐺 𝑦 ) GAN loss Total loss = Regular (MSE+PSNR+HFEN) loss + GAN loss 40
QUESTIONS? Extended presentation from Game Developer Conference 2017 https://developer.nvidia.com/deep-learning-games GameWorks: Materials & Textures https://gwmt.nvidia.com
Recommend
More recommend