Computational Linguistics: Language and Vision I Raffaella Bernardi - PowerPoint PPT Presentation

Computational Linguistics: Language and Vision I Raffaella Bernardi Contents First Last Prev Next ◭

Contents 1 Credits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 What is (Computer) Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1 Interdisciplinary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 How did it started? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 What is Computer Vision goal? . . . . . . . . . . . . . . . . . . . . . . . 11 3 How to represent an image: Pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.1 How to represent an image: Keep all the pixels . . . . . . . . . 13 3.2 How to represent an image: Compute average pixel. . . . . . 14 3.3 How to represent an image: Spatial grid of average pixel colors?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4 Image representation challenges: Invariance . . . . . . . . . . . . 17 4 A CV sample task: Object Classification . . . . . . . . . . . . . . . . . . . . . 18 4.1 Object Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 Data Driven . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 The image classification pipeline . . . . . . . . . . . . . . . . . . . . . . 21 4.4 Nearest Neighbor Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.5 Nearest Neighbor examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Contents First Last Prev Next ◭

4.6 Image distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.7 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.8 K-Nearest Neighbor Classifier . . . . . . . . . . . . . . . . . . . . . . . . 26 4.9 Validation dataset vs Test dataset. . . . . . . . . . . . . . . . . . . . . 27 4.10 First problem: the classifier . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.11 Second problem: the Raw Pixel representation . . . . . . . . . . 29 5 Representation Problem: From pixel to feature . . . . . . . . . . . . . . . . 31 5.1 Two methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.2 Bag of Visual Words: Pipeline . . . . . . . . . . . . . . . . . . . . . . . . 33 5.3 Low-level Features extraction . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.4 Characteristics of good low-features . . . . . . . . . . . . . . . . . . . 35 5.5 Example visual vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.6 Image Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.7 Summary: Images representation pipeline . . . . . . . . . . . . . . 38 5.8 From hand-crafted feature to feature learning . . . . . . . . . . . 39 5.9 Convolutional Neural Network: transfer . . . . . . . . . . . . . . . . 40 5.10 Inspiration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.11 Hierarchy of features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6 Classifier problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Contents First Last Prev Next ◭

6.1 Score and Loss functions: example . . . . . . . . . . . . . . . . . . . . 44 6.2 Score and Loss functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.3 Score function: Linear Classifier . . . . . . . . . . . . . . . . . . . . . . 46 6.4 Loss Function: Super Vector Machine . . . . . . . . . . . . . . . . . 47 6.5 Linear Classifier: cartoon representation . . . . . . . . . . . . . . . 48 6.6 non linear problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 7 Applications: CV exploits NLP and vice-versa . . . . . . . . . . . . . . . . 50 8 Computer Vision exploits language . . . . . . . . . . . . . . . . . . . . . . . . . . 51 8.1 Traditional CV task: Object recognition . . . . . . . . . . . . . . . 52 8.2 Object recognition: methods . . . . . . . . . . . . . . . . . . . . . . . . . 53 8.3 Corpora as KB source: Object recognition . . . . . . . . . . . . . 54 8.4 Corpora as KB source: Action recognition . . . . . . . . . . . . . 55 8.5 Caption generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 8.6 Caption generation: biblio . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 9 Visual Question Answering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 10 NLP exploits vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 10.1 Lexical Preference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 10.2 Translation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 10.3 Co-reference Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Contents First Last Prev Next ◭

10.4 Co-reference Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 11 Summary: CV and NLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 12 Foundational: Grounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 13 Foundational: Reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 14 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 14.1 CIFAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 14.2 ImageNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 14.3 VisA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 14.4 SUN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 15 Dataset for sentence-based image description. . . . . . . . . . . . . . . . . . 75 15.1 Online Caption? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 15.2 Photo-sharing? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 15.3 Photo-sharing? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 15.4 IAPR-TC12 data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 15.5 ILLINOIS PASCAL data set . . . . . . . . . . . . . . . . . . . . . . . . . 80 15.6 Crowdsource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 15.7 Crowdsource results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 15.8 LabelMe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 16 Demos TBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Contents First Last Prev Next ◭

17 Softwares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 18 Language and Vision Research Groups . . . . . . . . . . . . . . . . . . . . . . . 86 19 Language and Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 20 Other Useful Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Contents First Last Prev Next ◭

1. Credits Honglak Lee, L. Fei Fei, Tamara Berg, Angeliki Lazaridou, Elia Bruni, Marco Ba- roni, Desmond Eliott, Douwe Kiela, Contents First Last Prev Next ◭

2. What is (Computer) Vision Contents First Last Prev Next ◭

2.1. Interdisciplinary Contents First Last Prev Next ◭

2.2. How did it started? Contents First Last Prev Next ◭

2.3. What is Computer Vision goal? Contents First Last Prev Next ◭

3. How to represent an image: Pixels Raw images representation consists of pixels (a pixel is the minimum element of an image). Pixels, identified by their physical coordinates, are stored as numbers encoding their color intensity. For instance, a black and white image is a 1-D representation of the pixel brightness); a colored image is a 3-D arity of intensity values:  red ( x, y ) , f ( x, y ) = green ( x, y ) ,  blue ( x, y ) where color(x,y) is the intensity of that color (x) at position (y). If we want to retrieve images similar to a given one, or we want to recognize the object in an image or perform other tasks, pixel representations are not suitable, we need to have an abstract representation of the image. Contents First Last Prev Next ◭

3.1. How to represent an image: Keep all the pixels Contents First Last Prev Next ◭

3.2. How to represent an image: Compute average pixel Contents First Last Prev Next ◭

Contents First Last Prev Next ◭

3.3. How to represent an image: Spatial grid of average pixel colors? Contents First Last Prev Next ◭

3.4. Image representation challenges: Invariance Contents First Last Prev Next ◭

4. A CV sample task: Object Classification Slides taken from http://cs231n.github.io/classification/ Contents First Last Prev Next ◭

4.1. Object Classification Contents First Last Prev Next ◭

4.2. Data Driven Data-driven approach : it relies on first accumulating a training dataset of labeled images. Contents First Last Prev Next ◭

Computational Linguistics: Language and Vision I Raffaella Bernardi - PowerPoint PPT Presentation

Computational Linguistics: Language and Vision I Raffaella Bernardi Contents First Last Prev Next Contents 1 Credits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 What is

Introduction to Linguistics Darrell Larsen Linguistics 101 Darrell Larsen Introduction to

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Topics in Computational Linguistics Topics in Computational Linguistics March 28, 2014 GIL,

Outline zipfR zipfR (Computational) linguistics Evert & Baroni Evert & Baroni

Introduction to English Linguistics 1: Introduction Linguistics or Medieval Studies? Figure:

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Linguistics: Towards an Answer to the The Science of Human Language Question How Language Is,

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Linguistics 201 Personnel Introduction to Linguistics General Course Description Syllabus

Computational Linguistics I CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu

Computational Linguistics I CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu

North American Computational Linguistics Olympiad Lori Levin Language Technologies Institute

One-Shot Learning: Language Acquisition for Machine SS16 Computational Linguistics for

TopoAct: Visually Exploring the Shape of Activations in Deep Learning Topological Data Analysis +

Briefly: Pandoras box? Rutherfords discovery First: e & plum pudding model

Plotting a Bearing onto your map Why we plot bearings Where am I? Location by

ICASSP 2017 Tutorial on Methods for Interpreting and Understanding Deep Neural Networks G.

Outline Introduction 1 Reproducibility 2 Bootstrappability 3 Thanks 4 janneke@gnu.org

CS 335 Software Development Introduction/Review of Object-Oriented Concepts Feb 5, 2014

Distributed Teams Week 13 INFM 603 Agenda Distributed teams Project presentation prep

Natural Language Processing: The Class and Preliminaries CSE354 - Spring 2020 Instructor: Andrew

Sambuz

Useful Links

Newsletter

Mail Us

Computational Linguistics: Language and Vision I Raffaella Bernardi - PowerPoint PPT Presentation

Computational Linguistics: Language and Vision I Raffaella Bernardi Contents First Last Prev Next Contents 1 Credits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 What is

Introduction to Linguistics Darrell Larsen Linguistics 101 Darrell Larsen Introduction to

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Topics in Computational Linguistics Topics in Computational Linguistics March 28, 2014 GIL,

Outline zipfR zipfR (Computational) linguistics Evert &amp; Baroni Evert &amp; Baroni

Introduction to English Linguistics 1: Introduction Linguistics or Medieval Studies? Figure:

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Linguistics: Towards an Answer to the The Science of Human Language Question How Language Is,

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Linguistics 201 Personnel Introduction to Linguistics General Course Description Syllabus

Computational Linguistics I CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu

Computational Linguistics I CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu

North American Computational Linguistics Olympiad Lori Levin Language Technologies Institute

One-Shot Learning: Language Acquisition for Machine SS16 Computational Linguistics for

TopoAct: Visually Exploring the Shape of Activations in Deep Learning Topological Data Analysis +

Briefly: Pandoras box? Rutherfords discovery First: e &amp; plum pudding model

Plotting a Bearing onto your map Why we plot bearings Where am I? Location by

ICASSP 2017 Tutorial on Methods for Interpreting and Understanding Deep Neural Networks G.

Outline Introduction 1 Reproducibility 2 Bootstrappability 3 Thanks 4 janneke@gnu.org

CS 335 Software Development Introduction/Review of Object-Oriented Concepts Feb 5, 2014

Distributed Teams Week 13 INFM 603 Agenda Distributed teams Project presentation prep

Natural Language Processing: The Class and Preliminaries CSE354 - Spring 2020 Instructor: Andrew

Sambuz

Useful Links

Newsletter

Mail Us

Outline zipfR zipfR (Computational) linguistics Evert & Baroni Evert & Baroni

Briefly: Pandoras box? Rutherfords discovery First: e & plum pudding model