CS 103: Representation Learning, Information Theory and Control - PowerPoint PPT Presentation

CS 103: Representation Learning, Information Theory and Control Lecture 3, Jan 25, 2019

Seen last time What is a nuisance for a task? How do we design nuisance invariant representations? Invariance, equivariance, canonization A linear transformation is group equivariant if and only if it is a group convolution (no proof) 2

Today’s program 1. A linear transformation is group equivariant if and only if it is a group convolution • Building equivariant representations for translations, sets and graphs 2. Image canonization with equivariant reference frame detector • Applications to multi-object detection 3. Accurate reference frame detection: the SIFT descriptor • A sufficient statistic for visual inertial systems 3

Canonization

Invariance by canonization Idea: Instead of finding an invariant representation, apply a transformation to put the input in a standard form. I ( ξ , ν ) ⟼ g ν → ν 0 ∘ I ( ξ , ν ) = I ( ξ , ν 0 ) g ν → ν 0 5

Canonization for translations Suppose we want to canonize the image with respect to translations. 1. Decide a reference point that is equivariant for translations.   Examples: The barycenter of the image, the maximum (assuming it’s unique) 2. Find the position of the reference point 3. Center the reference point Reference point (minimum) 6

Canonization for translations Suppose we want to canonize the image with respect to translations. 1. Decide a reference point that is equivariant for translations.   Examples: The barycenter of the image, the maximum (assuming it’s unique) 2. Find the position of the reference point 3. Center the reference point g ν ′ � → ν 0 Reference point (minimum) 6

Equivariant reference frame detector A reference frame detector R for a group G is any function R(x): X → G such that R ( g ⋅ x ) = g ⋅ R ( x ) That is, a reference frame detector is any equivariant function from X to G. Example: Let G = R 2 be the group of translations. Then R(x) = “position of the maximum of x” is a reference frame, assuming the maximum is unique. 7

From equivariant frame detector to invariant representations Proposition. Let R be a reference frame detector for the group G . Define a representation f(x) as: f ( x ) = R ( x ) − 1 ⋅ x Then f(x) is a G -invariant representation. 8

From equivariant frame detector to invariant representations Proposition. Let R be a reference frame detector for the group G . Define a representation f(x) as: f ( x ) = R ( x ) − 1 ⋅ x Then f(x) is a G -invariant representation. f ( g ⋅ x ) = R ( g ⋅ x ) − 1 ⋅ ( g ⋅ x ) Proof: = ( g ⋅ R ( x )) − 1 ⋅ g ⋅ x = R ( x ) − 1 ⋅ g − 1 ⋅ g ⋅ x = R ( x ) − 1 ⋅ x = f ( x ) 8

The canonization pipeline Canonization consists of the following steps 1. Build an equivariant reference frame detector 2. Choose a “ canonical ” reference frame 3. Find the reference frame of the input image 4. Invert the transformation to make the reference frame canonical Canonical frame Reference frame of input R ( x ) − 1 9

Some examples of canonization in vision Document analysis: Find border of the document and un-warp the image prior to analysis. Also: Normalize contrast and illumination 10 Image from https://blogs.dropbox.com/tech/2016/08/fast-document-rectification-and-enhancement/

Saccades Eyes move rapidly while looking at a fixed object. Image Trace of saccades Can we consider this a form of translation invariance by canonization? 11 Video and Images from https://en.wikipedia.org/wiki/Saccade

The R-CNN model for multi-object detection Region proposal: find regions of the image that may contain an interesting object (i.e., reference frame proposal) CNN classifier: warp the region to put it in canonical form (invariance) and feed it to a classifier Region proposal + CNN classifier = R-CNN 12 Image from Girshick et al., 2014

Region Proposal Selective Search for Object Recognition , Uijlings et al., 2013 Originally: hand-crafted proposal mechanisms based on saliency, uniformity of texture, scale, and so on. 13

Region Proposal Selective Search for Object Recognition , Uijlings et al., 2013 Originally: hand-crafted proposal mechanisms based on saliency, uniformity of texture, scale, and so on. Illumination invariant colorspace 13

Region Proposal Selective Search for Object Recognition , Uijlings et al., 2013 Originally: hand-crafted proposal mechanisms based on saliency, uniformity of texture, scale, and so on. Illumination invariant colorspace Maddern et al., ICRA 2014 13

Region Proposal Selective Search for Object Recognition , Uijlings et al., 2013 Originally: hand-crafted proposal mechanisms based on saliency, uniformity of texture, scale, and so on. Illumination invariant colorspace 13

Region Proposal Selective Search for Object Recognition , Uijlings et al., 2013 Originally: hand-crafted proposal mechanisms based on saliency, uniformity of texture, scale, and so on. Illumination invariant colorspace Initial region proposal 13

Region Proposal Selective Search for Object Recognition , Uijlings et al., 2013 Originally: hand-crafted proposal mechanisms based on saliency, uniformity of texture, scale, and so on. Illumination invariant colorspace Initial region proposal s ( r i , r j ) = a 1 s colour ( r i , r j )+ a 2 s texture ( r i , r j )+ Hierarchical clustering a 3 s size ( r i , r j )+ a 4 s fill ( r i , r j ) , 13

CNN based region proposal Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, Ren et al., 2016 Nowadays: The same network does both the region proposal and the classification inside each region classifier k anchor boxes 2 k scores 4 k coordinates RoI pooling cls layer reg layer proposals 256-d intermediate layer Region Proposal Network feature maps sliding window conv layers conv feature map image 14

Spatial Transformer Network Learning to find and canonize interesting regions of the image Can we do something more similar to saccades? Localisation network selects a local reference frame in the image Transformer resamples using that reference frame 15

When precision matters The previous methods find a transformation that approximatively canonize an object. But what if we want a very accurate reference frame? 16 Images from Oxford Buildings Dataset

Problems Reference frame need to be unique and robust. Due to occlusions, we can only trust local features and need redundancy Need to be robust to all geometric transformations and small deformations. Need to be robust to changes of illuminations, shadows, … 17

SIFT: Scale Invariant Feature Transform 18 Image from http://www.robots.ox.ac.uk/~vgg/practicals/instance-recognition/index.html

SIFT: Finding the scale Something for you Find “interesting points” ( i.e. , local maxima and minima) at all scales. Done by constructing the scale space of the image and finding the first scale at which a local maximum (minimum) stops being a local maximum (minimum). 19

Harris corner detector Points along edges are not useful keypoints, as they cannot be localized exactly. Idea: Compute the Hessian at each interesting point. Consider only the points that have large eigenvalues of the same magnitude . 20 Image from https://docs.opencv.org/3.4.2/dc/d0d/tutorial_py_features_harris.html

Find corner orientation Decide the orientation of the corner by plotting the histogram of the gradients orientation and find the most frequent orientation. If multiple orientations are very frequent (> 0.8 * max), select all. 21 Image from http://aishack.in/tutorials/sift-scale-invariant-feature-transform-keypoint-orientation/

Corner descriptor Gradient orientation is the only invariant to contrast changes. Idea: Describe local patch around corner using orientations of the gradients. Bin together gradients in a patch for robustness to small deformations 22 Image from http://aishack.in/tutorials/sift-scale-invariant-feature-transform-keypoint-orientation/

The final algorithm (with refinements) 23 Image from http://www.cmap.polytechnique.fr/~yu/research/ASIFT/demo.html

Feature matching in Visual-Inertial SLAM system Robust Inference for Visual-Inertial Sensor Fusion , K. Tsotsos et al., 2015 24 Demo video from https://sites.google.com/site/ktsotsos/visual-inertial-sensor-fusion

CS 103: Representation Learning, Information Theory and Control - PowerPoint PPT Presentation

CS 103: Representation Learning, Information Theory and Control Lecture 3, Jan 25, 2019 Seen last time What is a nuisance for a task? How do we design nuisance invariant representations? Invariance, equivariance, canonization A linear

Slide 1 / 103 Slide 2 / 103 1 What is metabolism? 2 What role do enzymes play in metabolic

Slide 1 / 103 1 What is metabolism? Slide 2 / 103 2 What role do enzymes play in metabolic

CS 103: Representation Learning, Information Theory and Control Lecture 5, Feb 8, 2019

Eukaryotic Cellular Reproduction: Mitosis & Meiosis www.njctl.org Slide 3 / 103 Slide 4 /

Eukaryotic Cellular Reproduction: Mitosis & Meiosis www.njctl.org Slide 3 / 103 Slide 4 /

CS 103: Representation Learning, Information Theory and Control Lecture 8, Mar 1, 2019 Recap

CS 103: Representation Learning, Information Theory and Control Lecture 6, Feb 15, 2019 VAEs and

CS 103: Representation Learning, Information Theory and Control Lecture 4, Feb 1, 2019 Seen last

CS 103: Representation Learning, Information Theory and Control Lecture 1, Jan 11, 2019 What is

CS 103: Representation Learning, Information Theory and Control Lecture 2, Jan 18, 2019

BUILDING INFORMATION: 103-105 GREENE STREET ADDRESS: 103-105 GREENE STREET AKA 101 GREENE

Safety Reviews Highways 101, 103 and 104 Purpose of Road Safety Reviews (Highways 101, 103 and

Title: Healing Class 103 Week 1 Healing 103 Week 1 Jesus Healing Individuals Part 5

Membership Survey Completion Rates of 660 Responses! 120% 103% 103% 100% 91% 91% 84% 80%

MKA-65-B MKA-65-PM MKA-66-P MKA-87-P MKA-103-N MKA-103-BNBF MKA-34NL MKA-34-NLP MKA-44N

EKT 103 KT 103 CHAPTER CHAPTER 5 5 DC Machine Contents Contents Overview of Direct

Infrequent words are more difficult to comprehend. Shashank Sonkar Computer Sc. &

Modeling Human Reading with Neural Attention Michael Hahn Frank Keller Stanford University

Beekeeping Session Three: The honeybee 1 Tonights Agenda Anatomy of the honeybee

I have no disclosures. Lori Strachowski, MD Clinical Professor of Radiology, UCSF Chief of

80% 90% Impact 45 90 0 45 0 90 System A System B 3D position of

Learning Flexible Goal-Directed Behavior Christian Balkenius Lund University Cognitive Science

LT 4254 PSYCHOLINGUISTICS OF READING To what extent does the language proficiency of the L2

Visual Parsing with Weak Supervision Jia Xu Department of Computer Sciences University of

CS 103: Representation Learning, Information Theory and Control - PowerPoint PPT Presentation

CS 103: Representation Learning, Information Theory and Control Lecture 3, Jan 25, 2019 Seen last time What is a nuisance for a task? How do we design nuisance invariant representations? Invariance, equivariance, canonization A linear

Slide 1 / 103 Slide 2 / 103 1 What is metabolism? 2 What role do enzymes play in metabolic

Slide 1 / 103 1 What is metabolism? Slide 2 / 103 2 What role do enzymes play in metabolic

CS 103: Representation Learning, Information Theory and Control Lecture 5, Feb 8, 2019

Eukaryotic Cellular Reproduction: Mitosis &amp; Meiosis www.njctl.org Slide 3 / 103 Slide 4 /

Eukaryotic Cellular Reproduction: Mitosis &amp; Meiosis www.njctl.org Slide 3 / 103 Slide 4 /

CS 103: Representation Learning, Information Theory and Control Lecture 8, Mar 1, 2019 Recap

CS 103: Representation Learning, Information Theory and Control Lecture 6, Feb 15, 2019 VAEs and

CS 103: Representation Learning, Information Theory and Control Lecture 4, Feb 1, 2019 Seen last

CS 103: Representation Learning, Information Theory and Control Lecture 1, Jan 11, 2019 What is

CS 103: Representation Learning, Information Theory and Control Lecture 2, Jan 18, 2019

BUILDING INFORMATION: 103-105 GREENE STREET ADDRESS: 103-105 GREENE STREET AKA 101 GREENE

Safety Reviews Highways 101, 103 and 104 Purpose of Road Safety Reviews (Highways 101, 103 and

Title: Healing Class 103 Week 1 Healing 103 Week 1 Jesus Healing Individuals Part 5

Membership Survey Completion Rates of 660 Responses! 120% 103% 103% 100% 91% 91% 84% 80%

MKA-65-B MKA-65-PM MKA-66-P MKA-87-P MKA-103-N MKA-103-BNBF MKA-34NL MKA-34-NLP MKA-44N

EKT 103 KT 103 CHAPTER CHAPTER 5 5 DC Machine Contents Contents Overview of Direct

Infrequent words are more difficult to comprehend. Shashank Sonkar Computer Sc. &amp;

Modeling Human Reading with Neural Attention Michael Hahn Frank Keller Stanford University

Beekeeping Session Three: The honeybee 1 Tonights Agenda Anatomy of the honeybee

I have no disclosures. Lori Strachowski, MD Clinical Professor of Radiology, UCSF Chief of

80% 90% Impact 45 90 0 45 0 90 System A System B 3D position of

Learning Flexible Goal-Directed Behavior Christian Balkenius Lund University Cognitive Science

LT 4254 PSYCHOLINGUISTICS OF READING To what extent does the language proficiency of the L2

Visual Parsing with Weak Supervision Jia Xu Department of Computer Sciences University of

Eukaryotic Cellular Reproduction: Mitosis & Meiosis www.njctl.org Slide 3 / 103 Slide 4 /

Eukaryotic Cellular Reproduction: Mitosis & Meiosis www.njctl.org Slide 3 / 103 Slide 4 /

Infrequent words are more difficult to comprehend. Shashank Sonkar Computer Sc. &