Model Architectures and Training Techniques for High-Precision - PowerPoint PPT Presentation

Model Architectures and Training Techniques for High-Precision Landmark Localization Sina Honari Pavlo Molchanov Jason Yosinski Stephen Tyree Pascal Vincent Jan Kautz Christopher Pal

KEYPOINT DETECTION / LANDMARK LOCALIZATION The problem of localizing important points on images Keypoints for human face can be: left/right eye • nose • left/right mouth corner • Applications include: Face alignment/rectification • Emotion recognition • • Head pose estimation • Person identification 2

MOTIVATION Conventional ConvNets : alternating convolutional and max-pooling layers Max-pooled Features : Precision loses precise spatial information, but gets robust features Robustness Networks of only Conv layers : lots of false positives, False positives but keeps spatial information Precision Can we take robust pooled features and keep positional information ? 3

SUMMATION-BASED NETWORKS (SUMNETS) Sums features of different granularity (FCN[1], HyperColumn[2]) Coarse to Fine Branches C =Convolution P =Pooling U = Upsampling Branch =horizontal C, U layers [1] J. Long, E. Shelhamer and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR , 2015 [2] B.Hariharan,P.Arbela ́ ez,R.Girshick,andJ.Malik.Hyper- columns for object segmentation and fine-grained localization. In CVPR , 2015. 4

SUMMATION-BASED NETWORKS (SUMNETS) Pre-softmax activations Softmax probabilities Coarse to Fine Branches Branches Sum of 5

SUMMATION-BASED NETWORKS (SUMNETS) Pre-softmax activations Softmax probabilities Coarse to Fine Branches Branches Sum of 6

Montreal Institue for Learning Algorithms Recombinator Networks Learning Coarse-to-Fine Feature Aggregation (CVPR 2016) Sina Honari Jason Yosinski Pascal Vincent Christopher Pal

Recombinator Networks (RCNs) The model feeds coarse features into finer layers early in their computation Coarse to Fine Branches C =Convolution P =Pooling U = Upsampling K = Concatenation Branch =horizontal C, U layers

Recombinator Networks (RCNs) A Convolutional Encoder-Decoder Network with Skip Connections

SumNets vs. RCNs Summation-Based Networks (SumNets) Recombinator Networks (RCNs)

SumNets vs. RCNs Pre-softmax activations SumNet RCN

SumNets vs. RCNs SumNet Pre-softmax activations Softmax probabilities RCN Pre-softmax activations Softmax probabilities

Prediction Samples TCDCN SumNet RCN Red is model prediction Green is GT key-points

Comparison with SOTA Landmark Estimation Error (as percent; lower is better) 300W Dataset MTFL Dataset Error=Euclidean distance between GT & predicted key-points normalized by inter-ocular distance

Improving Landmark Localization with Semi-Supervised Learning (CVPR 2018) Sina Honari Pavlo Molchanov Jan Kautz Stephen Tyree Pascal Vincent Christopher Pal 15 15

MOTIVATION Manual landmark localization is a tedious task (to build datasets) Image Landmarks Attribute Smiling Labeling: Labeling: ~60s ~1s Looking straight Very Fast time consuming 16 16

MOTIVATION Manual landmark localization is a tedious task (to build datasets) Image Landmarks Attribute Smiling Labeling: Labeling: ~60s ~1s Looking straight Very Fast time consuming Can we use an attribute (e.g. head pose) to guide landmark localization? 17 17

SEMI-SUPERVISED LEARNING 1 Using CNNs with sequential multitasking 1. First predict landmarks 2. Use predicted landmarks to predict attribute 19 19

SEMI-SUPERVISED LEARNING 1 Using CNNs with sequential multitasking 1. First predict landmarks 2. Use predicted landmarks to predict attribute 3. Get gradient from attribute to the landmark localization network Forward pass: landmarks help attribute Backward pass 20 20

SEMI-SUPERVISED LEARNING 1 Soft-argmax To train the entire network end-to-end we use soft-argmax to predict landmarks Soft-argmax estimates location of the scaled centrum of mass: ü Continuous, not discrete ü Differentiable 21 21

SEMI-SUPERVISED LEARNING 2 Equivariant Landmark Transformation Ask the model to make equivariant prediction w.r.t to image transformations • Predict landmarks (L) on an image I • Apply a transformation T to image I Predict landmarks ( !′ ) on an image I ′ • • Apply transformation T to landmarks L Compare !′ with $ ⊙ ! • • Get gradient from ELT loss 22 22

SEMI-SUPERVISED LEARNING 2 Equivariant Landmark Transformation Ask the model to make equivariant prediction w.r.t to image transformations • Predict landmarks (L) on an image I • Apply a transformation T to image I Predict landmarks ( !′ ) on an image I ′ • • Apply transformation T to landmarks L Compare !′ with $ ⊙ ! • • Get gradient from ELT loss 23 23

LEARNING LANDMARKS Making use of all data Loss from GT Landmarks • Loss from Attributes (A) using • Sequential Multi-tasking • Loss from Equivariant Landmark Transformation (ELT) S << M <= N • 24 24

RESULTS Faces Training data Method 100% just CNN 25

RESULTS Faces Training data Method 100% just CNN 5% just CNN 26

RESULTS Faces Training data Method 100% just CNN 5% just CNN semi supervised 5% CNN 27

COMPARISON WITH SOTA Faces 6 100% labeled data 1% 5% 100% 5.43 5 4.35 4.25 Error metric 4.05 3.92 4 3.73 2.88 3 2.72 2.46 2.17 2.03 2 1.59 AFLW dataset: 19 landmarks 1 • Head pose: yaw, pitch, roll • 0 F S r ~25k images M T M R L . L L L • S l u B C S S S R P D D a F O S S S L C C E C S C + + + t R e r r r u u u v O O O L 28

COMPARISON WITH SOTA Faces RCN+ (L) 1% (L+ELT+A) RCN+ 1% (L+ELT+A) RCN+ 100% 32

CONCLUSIONS • Fuse predictions from multiple granularity • Let network to learn fusing method • Additional attributes and prior improve results with semi-supervised learning • Works with landmarks on hands as well • Read more in our CVPR2016 paper: https://arxiv.org/abs/1511.07356 , and CVPR2018: https://arxiv.org/abs/1709.01591 33

THANK YOU! Pavlo Molchanov pmolchanov@nvidia.com Sina Honari honaris@iro.umontreal.ca

Comparison with SOTA

Loss & Error Loss: Softmax loss on key-points + L2-norm Error : Euclidean distance between GT & predicted key-points normalized by inter-ocular distance

Masking Branches SumNet Coarse to Fine SumNet RCN Mask AFLW AFW AFLW AFW coarse → fine 1, 0, 0, 0 10.54 10.63 10.61 10.89 0, 1, 0, 0 11.28 11.43 11.56 11.87 1, 1, 0, 0 9.47 9.65 9.31 9.44 RCN 0, 0, 1, 0 16.14 16.35 15.78 15.91 0, 0, 0, 1 45.39 47.97 46.87 48.61 Coarse to Fine 0, 0, 1, 1 13.90 14.14 12.67 13.53 0, 1, 1, 1 7.91 8.22 7.62 7.95 1, 0, 0, 1 6.91 7.51 6.79 7.27 1, 1, 1, 1 6.44 6.78 6.37 6.43 Mask : 0 branch is omitted, 1 branch in included.

Adding More Branches SumNet Coarse to Fine RCN Coarse to Fine

Model Architectures and Training Techniques for High-Precision - PowerPoint PPT Presentation

Model Architectures and Training Techniques for High-Precision Landmark Localization Sina Honari Pavlo Molchanov Jason Yosinski Stephen Tyree Pascal Vincent Jan Kautz Christopher Pal KEYPOINT DETECTION / LANDMARK LOCALIZATION The problem

Architectures Architectural styles Software architectures Architectures versus middleware

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Intraday Techniques Intraday Techniques Intraday Techniques Intraday Techniques Combining

Device I/O I/O architectures: busses 10A I/O Architectures 10B I/O Mechanisms CPU

Device I/O I/O architectures: busses 10A I/O Architectures 10B I/O Mechanisms CPU

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

CompSci 356: Computer Network Architectures Lecture 2: Network Architectures Xiaowei Yang

Architectures, Architectures, Microkernels, IPC, Microkernels, IPC, Capabilities Capabilities

Overview Agent Architectures Definition of agent architecture Classical Architectures for

CompSci 356: Computer Network Architectures Lecture 2: Network Architectures Xiaowei Yang

HPC Architectures Types of resource currently in use Outline Shared memory architectures

HPC Architectures Types of resource currently in use Outline Shared memory architectures

Communicating Process Architectures 2016 (CPA 2016) Introduction: Motivation to conceive the

Compliance Training 2012 Compliance Training 2012 Training Objectives Training Objectives

Alternative Architectures Philipp Koehn 15 October 2020 Philipp Koehn Machine Translation:

CSCE 496/896 Lecture 6: Architectures Stephen Scott Recurrent Architectures Introduction Basic

JIM FOLEY, CRISTO REY COLUMBUS HIGH SCHOOL L to R: Jason Wells, Columbus Landmarks President; Jim

French Challenge Week: Which famous landmark? Some members of Saint Paulinus School have been on

Knowledge Utilisation@NWO Naam spreker| gelegenheid| datum Definition societal impact Cultural,

Health Foundation, Shared Purpose Bringing Healthcare Home Project The Health Foundation Shared

REPLACEMENT PROJECT Draft Environmental Impact Report (EIR)/Environmental Assessment (EA) and

Boston Landmarks Commission January 28, 2020 PREPARED BY Dyer Brown Architects One

10/23/2013 What is the Landmarks Preservation Commission? Preservation 101: The Landmarks

Demolition Policy Analysis for Clark County, Washington Patience ience Stuar art Tim Wood od