Learning Representations for Automatic Colorization Gustav Larsson, - PowerPoint PPT Presentation

Learning Representations for Automatic Colorization Gustav Larsson, Michael Maire, Greg Shakhnarovich TTI Chicago / University of Chicago ECCV 2016

Colorization Let us first define “colorization”

Colorization Definition 1: The inverse of desaturation. Original

Colorization Definition 1: The inverse of desaturation. Desaturate Original Grayscale

Colorization Definition 1: The inverse of desaturation. Grayscale

Colorization Definition 1: The inverse of desaturation. Colorize Original Grayscale

Colorization Definition 1: The inverse of desaturation. ( Underconstrained! ) Colorize Original Grayscale

Colorization Definition 2: An inverse of desaturation, that... Grayscale

Colorization Definition 2: An inverse of desaturation, that... Colorize Our Method Grayscale ... is plausible and pleasing to a human observer.

Colorization Definition 2: An inverse of desaturation, that... Colorize Our Method Grayscale ... is plausible and pleasing to a human observer. • Def. 1: Training + Quantitative Evaluation • Def. 2: Qualitative Evaluation

Manual colorization I thought I would give it a quick try...

Manual colorization Grass texture Low-level features

Manual colorization Tree Grass texture Mid-level features

Manual colorization Landscape scene Tree Grass texture High-level features

Manual colorization Grass is green

Manual colorization Sky is blue

Manual colorization Mountains are... brown?

Manual colorization Manual ( ≈ 15 s)

Manual colorization Manual ( ≈ 15 s) Manual ( ≈ 3 min)

Manual colorization Manual ( ≈ 15 s) Manual ( ≈ 3 min) Automatic ( < 1 s) Our Method

Motivation 1. Colorize old B&W photographs

Motivation 1. Colorize old B&W photographs 2. Proxy for visual understanding • Learning representations useful for other tasks

Related work Scribble-based methods Levin et al. (2004); Huang et al. (2005); Qu et al. (2006); Luan et al. (2007) Input Output Transfer-based methods Welsh et al. (2002); Irony et al. (2005); Charpiat et al. (2008); Morimoto et al. (2009); Chia et al. (2011) Reference Input Output Prediction-based methods Deshpande et al. (2015); Cheng et al. (2015) Iizuka et al. (2016) Zhang et al. (2016); Larsson et al. (2016) Input Output

Related work Scribble-based methods Levin et al. (2004); Huang et al. (2005); Qu et al. (2006); Luan et al. (2007) Input Output Transfer-based methods Welsh et al. (2002); Irony et al. (2005); Charpiat et al. (2008); Morimoto et al. (2009); Chia et al. (2011) Reference Input Output Prediction-based methods Deshpande et al. (2015); Cheng et al. (2015) ← ICCV Iizuka et al. (2016) Zhang et al. (2016); Larsson et al. (2016) Input Output

Related work Scribble-based methods Levin et al. (2004); Huang et al. (2005); Qu et al. (2006); Luan et al. (2007) Input Output Transfer-based methods Welsh et al. (2002); Irony et al. (2005); Charpiat et al. (2008); Morimoto et al. (2009); Chia et al. (2011) Reference Input Output Prediction-based methods Deshpande et al. (2015); Cheng et al. (2015) Iizuka et al. (2016) ← SIGGRAPH Zhang et al. (2016); Larsson et al. (2016) Input Output

Related work Scribble-based methods Levin et al. (2004); Huang et al. (2005); Qu et al. (2006); Luan et al. (2007) Input Output Transfer-based methods Welsh et al. (2002); Irony et al. (2005); Charpiat et al. (2008); Morimoto et al. (2009); Chia et al. (2011) Reference Input Output Prediction-based methods Deshpande et al. (2015); Cheng et al. (2015) Iizuka et al. (2016) Zhang et al. (2016); Larsson et al. (2016) ← ECCV Input Output

Design principles p Input: Grayscale Image

Design principles • Semantic knowledge p Input: Grayscale Image

Design principles • Semantic knowledge → Leverage ImageNet-based classifier VGG-16-Gray (fc7) conv7 (fc6) conv6 conv5 3 conv1 1 p Input: Grayscale Image

Design principles • Semantic knowledge → Leverage ImageNet-based classifier • Low-level/high-level features VGG-16-Gray (fc7) conv7 (fc6) conv6 conv5 3 conv1 1 p Input: Grayscale Image

Design principles • Semantic knowledge → Leverage ImageNet-based classifier • Low-level/high-level features → Zoom-out/Hypercolumn VGG-16-Gray Hypercolumn (fc7) conv7 (fc6) conv6 conv5 3 conv1 1 p Input: Grayscale Image

Design principles • Semantic knowledge → Leverage ImageNet-based classifier • Low-level/high-level features → Zoom-out/Hypercolumn • Colorization not unique VGG-16-Gray Hypercolumn (fc7) conv7 (fc6) conv6 conv5 3 conv1 1 p Input: Grayscale Image

Design principles • Semantic knowledge → Leverage ImageNet-based classifier • Low-level/high-level features → Zoom-out/Hypercolumn • Colorization not unique → Predict histograms Hue VGG-16-Gray Hypercolumn (fc7) conv7 (fc6) conv6 conv5 3 h fc1 conv1 1 Chroma p Input: Grayscale Image

Design principles • Semantic knowledge → Leverage ImageNet-based classifier • Low-level/high-level features → Zoom-out/Hypercolumn • Colorization not unique → Predict histograms Hue VGG-16-Gray Hypercolumn (fc7) conv7 (fc6) conv6 conv5 3 h fc1 − Expectation conv1 1 ← Chroma − Median p ← Input: Grayscale Image

Design principles • Semantic knowledge → Leverage ImageNet-based classifier • Low-level/high-level features → Zoom-out/Hypercolumn • Colorization not unique → Predict histograms Hue VGG-16-Gray Hypercolumn Ground-truth (fc7) conv7 (fc6) conv6 conv5 3 h fc1 conv1 1 Chroma p Lightness Input: Grayscale Image Output: Color Image

Histogram prediction The histogram representation is rich and flexible:

Training • Start with an ImageNet pretrained network

Training • Start with an ImageNet pretrained network • Adapt to grayscale input

Training • Start with an ImageNet pretrained network • Adapt to grayscale input • Fine-tune for colorization with log-loss on ImageNet without labels

Sparse Training Trained as a fully convolutional network with:

Sparse Training Trained as a fully convolutional network with: Dense hypercolumns • Low-level layers are upsampled • ✗ High memory footprint

Sparse Training Trained as a fully convolutional network with: Dense hypercolumns Sparse hypercolumns • Low-level layers are upsampled • Direct bilinear sampling • ✗ High memory footprint • ✓ Low memory footprint

Sparse Training Trained as a fully convolutional network with: Dense hypercolumns Sparse hypercolumns • Low-level layers are upsampled • Direct bilinear sampling • ✗ High memory footprint • ✓ Low memory footprint Source code available for Caffe and TensorFlow

Comparison: Previous work Significant improvement over state-of-the-art: 0.25 1.0 Cheng et al. Our method 0.20 0.8 Frequency 0.6 0.15 % Pixels No colorization Welsh et al. 0.10 0.4 Deshpande et al. Ours 0.2 0.05 Deshpande et al. (GTH) Ours (GTH) 0.0 0.00 0.0 0.2 0.4 0.6 0.8 1.0 10 15 20 25 30 35 RMSE ( αβ ) PSNR vs. Cheng et al. (2015) vs. Deshpande et al. (2015)

Comparison: Concurrent work Model MSE PSNR Zhang et al. 270.17 21.58 Baig et al. 194.12 23.72 Ours 154.69 24.80 Source: Baig and Torresani (2016) [Arxiv]

Comparison: Concurrent work Model MSE PSNR Zhang et al. 270.17 21.58 Baig et al. 194.12 23.72 Ours 154.69 24.80 Source: Baig and Torresani (2016) [Arxiv] AuC CMF VGG Top-1 Turk Model non-rebal rebal Classification Labeled Real (%) (%) (%) Accuracy (%) mean std Ground Truth 100.00 100.00 68.32 50.00 – Zhang et al. 91.57 65.12 56.56 25.16 2.26 Zhang et al. (rebal) 89.50 67.29 56.01 32.25 2.41 Ours 91.70 65.93 59.36 27.24 2.31 Source: Zhang et al. (2016) [ECCV]

Examples Input Our Method Ground-truth Input Our Method Ground-truth

Examples B&W photographs

Examples Failure modes

Self-supervision (ongoing work) 1. Train colorization from scratch

Self-supervision (ongoing work) 1. Train colorization from scratch Initialization RMSE PSNR ImageNet Classifier 0.299 24.45 Random 0.311 24.25 How much does ImageNet pretraining help colorization?

Self-supervision (ongoing work) 1. Train colorization from scratch Initialization RMSE PSNR ImageNet Classifier 0.299 24.45 Random 0.311 24.25 How much does ImageNet pretraining help colorization? 2. Use network for other tasks, such as semantic segmentation:

Learning Representations for Automatic Colorization Gustav Larsson, - PowerPoint PPT Presentation

Learning Representations for Automatic Colorization Gustav Larsson, Michael Maire, Greg Shakhnarovich TTI Chicago / University of Chicago ECCV 2016 Colorization Let us first define colorization Colorization Definition 1: The inverse of

Colorization using Optimization Anat Levin Dani Lischinski Yair Weiss Colorization Colorization

Automatic Colorization Gustav Larsson TTI Chicago / University of Chicago Joint work with

2004: Poisson Matting 2004: Flash/No-Flash 2004: Flash/No-Flash 2004: Flash/No-Flash 2004: The

COLORIZATION USING KNET Jeffrey Lu and Kevin Liu 6.338/18.337 Fall 2017 MOTIVATION Image

Lo Local Im Image Pri riors for Automatic Im Image Colorization with Simultaneous Cla lassifi

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Colorful Image Colorization Richard Zhang, Phillip Isola, Alexei A. Efros Presenters: Aditya

61A Lecture 16 Announcements String Representations String Representations 4 String

CSC421/2516 Lecture 3: Automatic Differentiation & Distributed Representations Jimmy Ba

Language-based Colorization of Scene Sketches Changqing Zou* 1,2 , Haoran Mo* 1 , Chengying Gao 1 ,

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Rich representations for Rich representations for learning visual recognition learning visual

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

Learning text representations from character-level data Grzegorz Chrupa la Department of

YORK ZUCCHI S A I C A - L E A D E R S H I P I N A T I M E O F C R I S I S S E R I E S

Leading Graphite Developer on a Pathway to Production Investor Presentation July 2020

Drone A Assisted Fi Field I Irrigati tion Remediati tion CSU Agriculture Research Institute

Dense optical flow estimation in image sequences and disparity map computation for stereo pairs

Choral Music in the 21st Century: Integrating Technology into Choral Music Christopher J.

Planning 4. Who are your biggest competitors? 5. What sets you apart from your competition?

F ORT IS Par c e l 42 T he F ORT I S Co mpa nie s, in c o njunc tio n with R2L :Arc

Clouds and auroras 3 December 2015 Jane Golding : Supervising Meteorologist, NSW and ACT Region

Learning Representations for Automatic Colorization Gustav Larsson, - PowerPoint PPT Presentation

Learning Representations for Automatic Colorization Gustav Larsson, Michael Maire, Greg Shakhnarovich TTI Chicago / University of Chicago ECCV 2016 Colorization Let us first define colorization Colorization Definition 1: The inverse of

Colorization using Optimization Anat Levin Dani Lischinski Yair Weiss Colorization Colorization

Automatic Colorization Gustav Larsson TTI Chicago / University of Chicago Joint work with

2004: Poisson Matting 2004: Flash/No-Flash 2004: Flash/No-Flash 2004: Flash/No-Flash 2004: The

COLORIZATION USING KNET Jeffrey Lu and Kevin Liu 6.338/18.337 Fall 2017 MOTIVATION Image

Lo Local Im Image Pri riors for Automatic Im Image Colorization with Simultaneous Cla lassifi

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Colorful Image Colorization Richard Zhang, Phillip Isola, Alexei A. Efros Presenters: Aditya

61A Lecture 16 Announcements String Representations String Representations 4 String

CSC421/2516 Lecture 3: Automatic Differentiation &amp; Distributed Representations Jimmy Ba

Language-based Colorization of Scene Sketches Changqing Zou* 1,2 , Haoran Mo* 1 , Chengying Gao 1 ,

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Rich representations for Rich representations for learning visual recognition learning visual

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

Learning text representations from character-level data Grzegorz Chrupa la Department of

YORK ZUCCHI S A I C A - L E A D E R S H I P I N A T I M E O F C R I S I S S E R I E S

Leading Graphite Developer on a Pathway to Production Investor Presentation July 2020

Drone A Assisted Fi Field I Irrigati tion Remediati tion CSU Agriculture Research Institute

Dense optical flow estimation in image sequences and disparity map computation for stereo pairs

Choral Music in the 21st Century: Integrating Technology into Choral Music Christopher J.

Planning 4. Who are your biggest competitors? 5. What sets you apart from your competition?

F ORT IS Par c e l 42 T he F ORT I S Co mpa nie s, in c o njunc tio n with R2L :Arc

Clouds and auroras 3 December 2015 Jane Golding : Supervising Meteorologist, NSW and ACT Region

CSC421/2516 Lecture 3: Automatic Differentiation & Distributed Representations Jimmy Ba