learning representations for automatic colorization
play

Learning Representations for Automatic Colorization Gustav Larsson, - PowerPoint PPT Presentation

Learning Representations for Automatic Colorization Gustav Larsson, Michael Maire, Greg Shakhnarovich TTI Chicago / University of Chicago ECCV 2016 Colorization Let us first define colorization Colorization Definition 1: The inverse of


  1. Learning Representations for Automatic Colorization Gustav Larsson, Michael Maire, Greg Shakhnarovich TTI Chicago / University of Chicago ECCV 2016

  2. Colorization Let us first define “colorization”

  3. Colorization Definition 1: The inverse of desaturation. Original

  4. Colorization Definition 1: The inverse of desaturation. Desaturate Original Grayscale

  5. Colorization Definition 1: The inverse of desaturation. Grayscale

  6. Colorization Definition 1: The inverse of desaturation. Colorize Original Grayscale

  7. Colorization Definition 1: The inverse of desaturation. ( Underconstrained! ) Colorize Original Grayscale

  8. Colorization Definition 2: An inverse of desaturation, that... Grayscale

  9. Colorization Definition 2: An inverse of desaturation, that... Colorize Our Method Grayscale ... is plausible and pleasing to a human observer.

  10. Colorization Definition 2: An inverse of desaturation, that... Colorize Our Method Grayscale ... is plausible and pleasing to a human observer. • Def. 1: Training + Quantitative Evaluation • Def. 2: Qualitative Evaluation

  11. Manual colorization I thought I would give it a quick try...

  12. Manual colorization Grass texture Low-level features

  13. Manual colorization Tree Grass texture Mid-level features

  14. Manual colorization Landscape scene Tree Grass texture High-level features

  15. Manual colorization Grass is green

  16. Manual colorization Sky is blue

  17. Manual colorization Mountains are... brown?

  18. Manual colorization Manual ( ≈ 15 s)

  19. Manual colorization Manual ( ≈ 15 s) Manual ( ≈ 3 min)

  20. Manual colorization Manual ( ≈ 15 s) Manual ( ≈ 3 min) Automatic ( < 1 s) Our Method

  21. Motivation 1. Colorize old B&W photographs

  22. Motivation 1. Colorize old B&W photographs 2. Proxy for visual understanding • Learning representations useful for other tasks

  23. Related work Scribble-based methods Levin et al. (2004); Huang et al. (2005); Qu et al. (2006); Luan et al. (2007) Input Output Transfer-based methods Welsh et al. (2002); Irony et al. (2005); Charpiat et al. (2008); Morimoto et al. (2009); Chia et al. (2011) Reference Input Output Prediction-based methods Deshpande et al. (2015); Cheng et al. (2015) Iizuka et al. (2016) Zhang et al. (2016); Larsson et al. (2016) Input Output

  24. Related work Scribble-based methods Levin et al. (2004); Huang et al. (2005); Qu et al. (2006); Luan et al. (2007) Input Output Transfer-based methods Welsh et al. (2002); Irony et al. (2005); Charpiat et al. (2008); Morimoto et al. (2009); Chia et al. (2011) Reference Input Output Prediction-based methods Deshpande et al. (2015); Cheng et al. (2015) ← ICCV Iizuka et al. (2016) Zhang et al. (2016); Larsson et al. (2016) Input Output

  25. Related work Scribble-based methods Levin et al. (2004); Huang et al. (2005); Qu et al. (2006); Luan et al. (2007) Input Output Transfer-based methods Welsh et al. (2002); Irony et al. (2005); Charpiat et al. (2008); Morimoto et al. (2009); Chia et al. (2011) Reference Input Output Prediction-based methods Deshpande et al. (2015); Cheng et al. (2015) Iizuka et al. (2016) ← SIGGRAPH Zhang et al. (2016); Larsson et al. (2016) Input Output

  26. Related work Scribble-based methods Levin et al. (2004); Huang et al. (2005); Qu et al. (2006); Luan et al. (2007) Input Output Transfer-based methods Welsh et al. (2002); Irony et al. (2005); Charpiat et al. (2008); Morimoto et al. (2009); Chia et al. (2011) Reference Input Output Prediction-based methods Deshpande et al. (2015); Cheng et al. (2015) Iizuka et al. (2016) Zhang et al. (2016); Larsson et al. (2016) ← ECCV Input Output

  27. Design principles p Input: Grayscale Image

  28. Design principles • Semantic knowledge p Input: Grayscale Image

  29. Design principles • Semantic knowledge → Leverage ImageNet-based classifier VGG-16-Gray (fc7) conv7 (fc6) conv6 conv5 3 conv1 1 p Input: Grayscale Image

  30. Design principles • Semantic knowledge → Leverage ImageNet-based classifier • Low-level/high-level features VGG-16-Gray (fc7) conv7 (fc6) conv6 conv5 3 conv1 1 p Input: Grayscale Image

  31. Design principles • Semantic knowledge → Leverage ImageNet-based classifier • Low-level/high-level features → Zoom-out/Hypercolumn VGG-16-Gray Hypercolumn (fc7) conv7 (fc6) conv6 conv5 3 conv1 1 p Input: Grayscale Image

  32. Design principles • Semantic knowledge → Leverage ImageNet-based classifier • Low-level/high-level features → Zoom-out/Hypercolumn • Colorization not unique VGG-16-Gray Hypercolumn (fc7) conv7 (fc6) conv6 conv5 3 conv1 1 p Input: Grayscale Image

  33. Design principles • Semantic knowledge → Leverage ImageNet-based classifier • Low-level/high-level features → Zoom-out/Hypercolumn • Colorization not unique → Predict histograms Hue VGG-16-Gray Hypercolumn (fc7) conv7 (fc6) conv6 conv5 3 h fc1 conv1 1 Chroma p Input: Grayscale Image

  34. Design principles • Semantic knowledge → Leverage ImageNet-based classifier • Low-level/high-level features → Zoom-out/Hypercolumn • Colorization not unique → Predict histograms Hue VGG-16-Gray Hypercolumn (fc7) conv7 (fc6) conv6 conv5 3 h fc1 − Expectation conv1 1 ← Chroma − Median p ← Input: Grayscale Image

  35. Design principles • Semantic knowledge → Leverage ImageNet-based classifier • Low-level/high-level features → Zoom-out/Hypercolumn • Colorization not unique → Predict histograms Hue VGG-16-Gray Hypercolumn Ground-truth (fc7) conv7 (fc6) conv6 conv5 3 h fc1 conv1 1 Chroma p Lightness Input: Grayscale Image Output: Color Image

  36. Histogram prediction The histogram representation is rich and flexible:

  37. Histogram prediction The histogram representation is rich and flexible:

  38. Histogram prediction The histogram representation is rich and flexible:

  39. Histogram prediction The histogram representation is rich and flexible:

  40. Histogram prediction The histogram representation is rich and flexible:

  41. Histogram prediction The histogram representation is rich and flexible:

  42. Training • Start with an ImageNet pretrained network

  43. Training • Start with an ImageNet pretrained network • Adapt to grayscale input

  44. Training • Start with an ImageNet pretrained network • Adapt to grayscale input • Fine-tune for colorization with log-loss on ImageNet without labels

  45. Sparse Training Trained as a fully convolutional network with:

  46. Sparse Training Trained as a fully convolutional network with: Dense hypercolumns • Low-level layers are upsampled • ✗ High memory footprint

  47. Sparse Training Trained as a fully convolutional network with: Dense hypercolumns Sparse hypercolumns • Low-level layers are upsampled • Direct bilinear sampling • ✗ High memory footprint • ✓ Low memory footprint

  48. Sparse Training Trained as a fully convolutional network with: Dense hypercolumns Sparse hypercolumns • Low-level layers are upsampled • Direct bilinear sampling • ✗ High memory footprint • ✓ Low memory footprint Source code available for Caffe and TensorFlow

  49. Comparison: Previous work Significant improvement over state-of-the-art: 0.25 1.0 Cheng et al. Our method 0.20 0.8 Frequency 0.6 0.15 % Pixels No colorization Welsh et al. 0.10 0.4 Deshpande et al. Ours 0.2 0.05 Deshpande et al. (GTH) Ours (GTH) 0.0 0.00 0.0 0.2 0.4 0.6 0.8 1.0 10 15 20 25 30 35 RMSE ( αβ ) PSNR vs. Cheng et al. (2015) vs. Deshpande et al. (2015)

  50. Comparison: Concurrent work Model MSE PSNR Zhang et al. 270.17 21.58 Baig et al. 194.12 23.72 Ours 154.69 24.80 Source: Baig and Torresani (2016) [Arxiv]

  51. Comparison: Concurrent work Model MSE PSNR Zhang et al. 270.17 21.58 Baig et al. 194.12 23.72 Ours 154.69 24.80 Source: Baig and Torresani (2016) [Arxiv] AuC CMF VGG Top-1 Turk Model non-rebal rebal Classification Labeled Real (%) (%) (%) Accuracy (%) mean std Ground Truth 100.00 100.00 68.32 50.00 – Zhang et al. 91.57 65.12 56.56 25.16 2.26 Zhang et al. (rebal) 89.50 67.29 56.01 32.25 2.41 Ours 91.70 65.93 59.36 27.24 2.31 Source: Zhang et al. (2016) [ECCV]

  52. Examples Input Our Method Ground-truth Input Our Method Ground-truth

  53. Examples B&W photographs

  54. Examples Failure modes

  55. Self-supervision (ongoing work) 1. Train colorization from scratch

  56. Self-supervision (ongoing work) 1. Train colorization from scratch Initialization RMSE PSNR ImageNet Classifier 0.299 24.45 Random 0.311 24.25 How much does ImageNet pretraining help colorization?

  57. Self-supervision (ongoing work) 1. Train colorization from scratch Initialization RMSE PSNR ImageNet Classifier 0.299 24.45 Random 0.311 24.25 How much does ImageNet pretraining help colorization? 2. Use network for other tasks, such as semantic segmentation:

Recommend


More recommend