Contextual Media Retrieval Using Natural Language Queries IMPRS-CS - PowerPoint PPT Presentation

Question – Answering Dynamic world what is there in front of MPI-INF? Ambiguous query Subjective; multiple correct Our Q&A Model answers Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 9

Question – Answering Dynamic world Media as answer what is there in front of MPI-INF? Ambiguous query Subjective; multiple correct Our Q&A Model answers Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 9

Dynamic-Egocentric Extension 𝑿𝒑𝒔𝒎𝒆 (𝒙) Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 10

Dynamic-Egocentric Extension 𝑿𝒑𝒔𝒎𝒆 (𝒙) 𝑻𝒖𝒃𝒖𝒋𝒅 𝑿𝒑𝒔𝒎𝒆 (𝒙 𝒕 ) cafe(‘mensa’, 49.2560,7.0454). building(‘mpi_inf’, 49.2578,7.0460). Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 10

Dynamic-Egocentric Extension 𝑿𝒑𝒔𝒎𝒆 (𝒙) 𝑻𝒖𝒃𝒖𝒋𝒅 𝑿𝒑𝒔𝒎𝒆 (𝒙 𝒕 ) 𝑬𝒛𝒐𝒃𝒏𝒋𝒅 𝑿𝒑𝒔𝒎𝒆 𝒙 𝒆 cafe(‘mensa’, 49.2560,7.0454). building(‘mpi_inf’, 49.2578,7.0460). Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 10

Dynamic-Egocentric Extension 𝑿𝒑𝒔𝒎𝒆 (𝒙) 𝑻𝒖𝒃𝒖𝒋𝒅 𝑿𝒑𝒔𝒎𝒆 (𝒙 𝒕 ) 𝑬𝒛𝒐𝒃𝒏𝒋𝒅 𝑿𝒑𝒔𝒎𝒆 𝒙 𝒆 cafe(‘mensa’, 49.2560,7.0454). building(‘mpi_inf’, 49.2578,7.0460). 𝑽𝒕𝒇𝒔 𝑵𝒇𝒖𝒃𝒆𝒃𝒖𝒃 (𝒙 𝒆𝒗 ) person(49.2578,7.0460 ,’n’) . day(20150220). Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 10

Dynamic-Egocentric Extension 𝑿𝒑𝒔𝒎𝒆 (𝒙) 𝑻𝒖𝒃𝒖𝒋𝒅 𝑿𝒑𝒔𝒎𝒆 (𝒙 𝒕 ) 𝑬𝒛𝒐𝒃𝒏𝒋𝒅 𝑿𝒑𝒔𝒎𝒆 𝒙 𝒆 cafe(‘mensa’, 49.2560,7.0454). building(‘mpi_inf’, 49.2578,7.0460). 𝑽𝒕𝒇𝒔 𝑵𝒇𝒖𝒃𝒆𝒃𝒖𝒃 (𝒙 𝒆𝒗 ) 𝑫𝒑𝒎𝒎𝒇𝒅𝒖𝒋𝒘𝒇 𝑵𝒇𝒏𝒑𝒔𝒛 (𝒙 𝒆𝒏 ) person(49.2578,7.0460 ,’n’) . image(`img_20141111_165828',2014111 day(20150220). 1,49.2566,7.0442,`november'). video(`vid_20141121_120149',20141121 ,49.2569, 7.0456,`november'). Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 10

Qualitative Results Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 11

Outline • Motivation and Overview • Contextual Media Retrieval System • Results and Conclusion Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 12

Evaluation Agreement and Disagreement between users * Model tested on 500 test queries Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 13

Evaluation Agreement and Disagreement between users Total agreement * Model tested on 500 26.67% test queries Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 13

Evaluation Agreement and Disagreement between users Majority agreement * Model tested on 500 ~ 40% test queries Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 13

Evaluation Agreement and Disagreement between users * Model tested Disagreement on 500 test queries ~ 25% Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 13

Evaluation Agreement and Disagreement between users * Model tested Future scope for on 500 personalization test queries ~ 25% Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 13

Evaluation Study of human reference frame resolution Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 14

Evaluation Study of human reference frame resolution Future scope for using other Knowledgebases Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 14

Summary We have: • Instantiated a ‚ Collective Memory” of media content • Developed a novel architecture for media retrieval with natural language voice queries in a dynamic setting - Xplore-M-Ego • Integrated ‘ egocentrism ’ to media retrieval Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 15

Summary We have: • Instantiated a ‚ Collective Memory” of media content • Developed a novel architecture for media retrieval with natural language voice queries in a dynamic setting - Xplore-M-Ego • Integrated ‘ egocentrism ’ to media retrieval Thank You Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 15

References • Photo Tourism: Exploring Photo Collections in 3D Noah Snavely, Steven M. Seitz, Richard Szeliski • Video Collections in Panoramic Contexts J.Tompkin, F.Pece, R.Shah, S.Izadi, J.Kautz, C.Theobalt • Videoscapes: Exploring Sparse, Unstructures Video Collections J.Tompkin, K. In Kim, J.Kautz, C.Theobalt • PhotoScope:Visualizing Spatiotemporal Coverage of Photos for Construction Management F.Wu, M.Tory Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 16

References • Learning Dependency-Based Compositional Semantics Percy Liang, Michael I. Jordan, Dan Klein • A multi-world approach to question answering about real-world scenes based on uncertain input M. Malinowski, M. Fritz • Image Retrieval with Structured Object Queries Using Latent Ranking SVM T.Lan, W.Yang, Y.Wang, G.Mori • Interpretation of Spatial Language in a Map Navigation Task M. Levit, D. Roy Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 16

Extra Material 23-02-2015

Contribution • Instantiation of a ‚ Collective Memory ‛ of media files • Extension of question-answering to a dynamic setting • Extension of spatio-temporal exploration of media to a dynamic setting • Incorporation of ‘egocentrism’ to media retrieval • Use of natural language voice queries for media retrieval 23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

System Overview Modules of Xplore-M-Ego • The Google Glass: User Interface • Pre-processing : Modification of query, Mapping of a dynamic environment to a static environment • Semantic Parser + Denotation : Semantic parsing and prediction of answer • Collective Memory : Store of media files 23-02-2015 69

Related Work • Spatio-temporal Media Retrieval Paper Author(s) Overview Photo tourism: N. Snavely, S. M. Seitz, Exploration of popular exploring photo and R. Szeliski world sites by collections in 3D browsing through images Video collections in J. Tompkin, F. Pece, S. Spatio-temporal panoramic contexts Rajvi, I. Shahram, K. exploration of videos Jan, and C. Theobalt embedded on a panoramic context 23-02-2015 70

Related Work • Natural Language Question-Answering Paper Author(s) Overview Learning P. Liang, M. I. Jordan, Training of a semantic Dependency-based and D. Klein parser with question- compositional answer pairs; single Semantics static world approach A multi-world M. Malinowski and M. Question-answering approach to question Fritz task based on real answering world indoor images; about real-world static multi-world scenes based on approach uncertain input 23-02-2015 71

Related Work • Media Retrieval with Natural Language Queries Paper Author(s) Overview Towards surveillance S. Tellex and D. Roy Retrieval of video video search by frames from natural language surveillance videos query with spatial relations ‚across‛ and ‚along‛ Image retrieval with T. Lan, W. Yang, Y. Retrieval of images structured Wang, and G. Mori based on scene object queries using contents using short latent ranking SVM structured phrases as queries 23-02-2015 72

Data Collection 1. Map information : OpenStreetMap Contains –  Type of the entity  GPS coordinates  Name  Address 10-02-2015 73

Data Collection 2. Collection of media files : Collective Memory ** Media files were captured with smart phones 10-02-2015 74

Data Collection 3. Training and Test data  Synthetically-generated Data (‚ What is there in front of MPI- INF?‛, answer(A, ( frontOf(A, ‘ mpi inf ’)))) (‚What is there behind MPI- INF?‛, answer(A , (behind(A, ‘ mpi inf ’)))) (‚What is there on the right of MPI- INF?‛, answer(A , (rightOf(A, ‘ mpi inf ’)))) (‚What is there on the left of MPI- INF?‛, answer(A , (leftOf(A, ‘ mpi inf ’))))  Real-world Data (‚What is there on the left of MPI-INF ?‛, ‘img_20141102_123406’) (‚ What is on the left of MPI- INF?‛, ‘img_20141113_160930’) (‚ What is to the left of MPI-INF ?‛, ‘img_20141109_134914’) (‚ What is on the left side of MPI-INF ?‛, ‘img_20141115_100705’) 10-02-2015 75

Data Collection 23-02-2015 76

Semantic Parser Dependency-based Compositional Semantics (DCS) by Percy Liang • DCS tree defines relations between predicates • Denotation are solutions satisfying the relations • city, major, loc, CA are predicates 23-02-2015 77

Semantic Parser Example Questions World( w ): ‚What is the highest point in state('california','ca', 'sacramento', Florida ?‛ 23.67e+6, 158.0e+3,31, 'los angeles', 'san diego', 'san francisco', 'san jose'). ‚Which State has the shortest river ?‛ city('alabama','al','birmingham',284413). ‚What is the capital of Maine ?‛ river('arkansas',2333,['colorado','kansas', 'oklahoma','arkansas']). ‚What are the populations of states through which the mountain('alaska','ak','mckinley',6194). Mississippi river run?‛ road('86',['massachusetts','connecticut']). ‚Name all the lakes of US ?‛ country('usa',307890000,9826675). 23-02-2015 78

**slide courtesy: Percy Liang Semantic Parser Learning in DCS 23-02-2015 79

Semantic Parser • Induction of logical forms • Logical forms (DCS trees) induced as latent variables according to a probability distribution parametrized with θ • Answer y evaluated with respect to world w 23-02-2015 80

Semantic Parser • Induction of logical forms Requirements – A set of rules/predicates: city(cityid(City,St)) :- state(State,St,_ ,_ ,_ ,_ ,City,_ ,_ ,_ ). loc(cityid(City,St),stateid(State)) :- state(State,St,_,_ ,_ ,_ ,City,_ ,_ ,_ ). river(riverid(R)) :- river(R,_ ,_ ). loc(cityid(City,St),stateid(State)) :- city(State,St,City, ). traverse(riverid(R),stateid(S)) :- river(R, ,States), member(S,States). area(stateid(X),squared mile(Area)) :- state(X,_ ,_ ,_ ,Area,_ ,_ ,_ ,_ ,_ ). population(countryid(X),Pop) :- country(X,Pop,_). major(X) :- city(X), population(X,moreThan(150000)). 23-02-2015 81

Semantic Parser • Induction of logical forms Requirements – A set of lexical triggers( L ): <(function words; predicate )> <([ POS tags ]; [predicates])> ( most, size). ( WRB ; loc) ( total , sum). ([ NN;NNS ]; [city,state,country,lake,mountain,river,place) ( called , nameObj). ([ NN;NNS ]; [person,capital,population]) ([ NN;NNS; JJ ]; [len,negLen,size,negSize,elevation) ([ NN;NNS; JJ ]; [negElevation,density,negDensity,area,negArea]) ( JJ ; major) Augmented Lexicon( L+ ): ( long , len). ( large , size). ( small , negSize). ( high , elevation). 23-02-2015 82

Media Retrieval from Denotations Example Questions World( w ): ‚What is there on the right of MPI- image(`img_20141111_165828',201 INF?‛ 41111,49.2566,7.0442,`november'). ‚What is there in front of postbank ?‛ video(`vid_20141121_120149',2014 1121,49.2569, 7.0456,`november'). ‚What is there on the left of Mensa ?‛ cafe(‘mensa’, 49.2560,7.0454). ‚What is there near Science Park ?‛ building(‘mpi_inf’, 49.2578,7.0460). ‚What happened here one day ago?‛ bank(‘postbank’, 49.2556,7.0449). ‚What does this place look like in December?‛ 23-02-2015 83

Dynamic-Egocentric Extension Lexical triggers: Basic lexicon L Augmented lexicon L+ ([ WP,WDT ], [image,video]). ( front , frontOf). ( NN , ( behind , behind). [atm,building,cafe,highway,parking,research_institution, ( right , rightOf). restaurant,shop,sport,tourism,university]). ( left , leftOf). ( JJS , [nearest]). ([ NN,NNS,VB ], [view]). ( VBD , [view]). Prediction accuracy: 17.9% Prediction accuracy: 47% 23-02-2015 84

Dynamic-Egocentric Extension g g g g g g g g g g g g g 23-02-2015 85

Dynamic-Egocentric Extension g g g g g g g g g g g g g 23-02-2015 86

POS tags from Penn Treebank • WRB : Wh-adverb • NN : Noun, singular or mass • NNS : Noun, plural • JJ : Adjective • WP : Wh-pronoun • WDT : Wh-determiner • NN : Noun, singular or mass • JJS : Adjective, superlative • NNS : Noun, plural • VB : Verb • VBD : Verb, past tense 23-02-2015 87

Reason behind hard-coding spatial relations • What is there left/VBN of MPI? • What is there on the left/NN of MPI? • What is there in front/NN of MPI? • What is there behind/IN MPI? • What is there right/RB of MPI? • What is there on the right/NN of MPI? 23-02-2015 88

Predicates used in Xplore-M-Ego 23-02-2015 89

Results and Evaluation • Synthetically generated question-answer pairs used for training and testing • Maximum prediction accuracy – 47% 23-02-2015 90

Results and Evaluation Performance Measures: 𝑟 𝑛 = 𝑜𝑣𝑛𝑐𝑓𝑠 𝑝𝑔 𝑟𝑣𝑓𝑠𝑗𝑓𝑡 𝑥𝑗𝑢𝑖 𝑛𝑓𝑒𝑗𝑏 𝑠𝑓𝑢𝑠𝑗𝑓𝑤𝑏𝑚𝑡 • 𝑟 𝑠 = 𝑜𝑣𝑛𝑐𝑓𝑠 𝑝𝑔 𝑟𝑣𝑓𝑠𝑗𝑓𝑡 𝑥𝑗𝑢𝑖 𝑠𝑓𝑚𝑓𝑤𝑏𝑜𝑢 𝑠𝑓𝑢𝑠𝑗𝑓𝑤𝑏𝑚𝑡 𝑏𝑛𝑝𝑜𝑕 𝑟𝑛 • 𝑟 𝑢 = 𝑜𝑣𝑛𝑐𝑓𝑠 𝑝𝑔 𝑟𝑣𝑓𝑠𝑗𝑓𝑡 𝑥𝑗𝑢𝑖 𝑢𝑓𝑦𝑢𝑣𝑏𝑚 𝑠𝑓𝑢𝑠𝑗𝑓𝑤𝑏𝑚𝑡 𝑏𝑜𝑒 𝑜𝑝 𝑠𝑓𝑢𝑠𝑗𝑓𝑤𝑏𝑚𝑡 • 𝑟 𝑠 𝑏𝑤𝑓𝑠𝑏𝑕𝑓 𝑞𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 = • 𝑟 𝑛 𝑟 𝑠 𝑏𝑤𝑓𝑠𝑏𝑕𝑓 𝑠𝑓𝑑𝑏𝑚𝑚 = • 𝑟 𝑛 + 𝑟 𝑢 23-02-2015 91

Results and Evaluation • ‚human -in-the- loop‛ training of the model o Five different models were trained o Training accuracies ranged from 42.6% to 48.8% o The best model based on training accuracy was used for further evaluations 23-02-2015 92

Results and Evaluation • ‚human -in-the- loop‛ training of the model • It is a method of training the semantic parser by human users through relevance feedback • ‚Correct‛/‚Wrong‛ decisions are made solely based on the predicted answers • The models are trained with real questions from human users 23-02-2015 93

Results and Evaluation • ‚human -in-the- loop‛ training of the model o Automatic training of the semantic parser with the real data was not possible because – • GPS coordinates of media files showing a particular entity does not match that of the map data • Humans are inconsistent with regards to reference frames • Question- answer pairs didn’t follow any pattern • Denotations (often more than one answer) never matched with true answers, hence EM-like algorithm failed to learn 23-02-2015 94

Results and Evaluation Human evaluation of model trained with real-world data • RealModel -model trained with real- world data • Relevance feedback collected from five users • Overall percentage of relevant retrievals = 26.67% 23-02-2015 95

Results and Evaluation • Recall of SynthModel = 15.88% • Recall of RealModel = 26.67% 23-02-2015 96

Evaluation Human evaluation of temporal and contextual Q&A • Five hypothetical locations and viewing directions provided to users • Relevance feedback collected for retrievals following a canonical reference frame and a user-centric reference frame Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 97

Evaluation Human evaluation of temporal and contextual Q&A • Canonical and User-centric reference frame: ‚front of‛ ‚right of‛ ‚left of‛ User heading East ‚behind‛ Original: What is there in front of MPI-INF? Altered: What is there on the right of MPI-INF? Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 98

Discussion Problem with matching GPS coordinates What is in front iCoffee of MPI-INF? retrieved media ground-truth media MPI-INF ‚Front of MPI -INF ‛ 23-02-2015 99

Discussion Challenges Limitations Converting a dynamic world to Spatial and temporal references a static world not identified Words tagged with incorrect Integrating ‘egocentrism’ POS tags Arguments not identified from Handling temporal queries sentences Collection of data Scalability Increasing the coverage of the Reference resolution is not static database handled 23-02-2015 100

Contextual Media Retrieval Using Natural Language Queries IMPRS-CS - PowerPoint PPT Presentation

Contextual Media Retrieval Using Natural Language Queries IMPRS-CS PhD Application Talk Sreyasi Nag Chowdhury Masters Thesis Supervisors Dr. Mario Fritz Dr. Andreas Bulling Adviser M.Sc. Mateusz Malinowski 1 Outline Motivation and

Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval

Information Retrieval Natural Language Processing and Machine Leanring Advanced Natural Language

Retrieval-augmented language models CS 685, Fall 2020 Advanced Natural Language Processing Mohit

Content-Based Image Retrieval Queries Commercial Systems Retrieval Features

Lecture 5: Language Modelling in Information Retrieval and Classification Information Retrieval

Statistical Natural Language Processing Prasad Tadepalli CS430 lecture Natural Language

Overview & Natural Language Processing: Natural Synergies to Support Digital Information

Media Indexing & Retrieval Media Indexing & Retrieval Prepared by Ling Guan Jose Lay

1. Social Media Outline 1.1. What is Social Media? 1.2. Opinion Retrieval 1.3. Feed

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Advanced Natural Language Processing and Information Retrieval LAB 1: IR, Indexing, Frequencies

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language

Information Retrieval and Natural Language Engineering for Intelligent Online Recruiting

Natural Language Processing and Information Retrieval Kernel Methods Alessandro Moschitti

Advanced Natural Language Processing and Information Retrieval LAB3: Kernel Methods for

Natural Language Processing and Information Retrieval Indexing and Vector Space Models

Natural Language Processing and Information Retrieval Indexing and Vector Space Models

Natural Language Processing and Information Retrieval Performance Evaluation Query Expansion

CSE 7/5337: Information Retrieval and Web Search Introduction and Boolean Retrieval (IIR 1)

CSE 7/5337: Information Retrieval and Web Search Dictionaries and tolerant retrieval (IIR 3)

Natural Language Processing and Information Retrieval Part II: Structured Output Alessandro

Natural Language Processing and Information Retrieval Automated Text Categorization Alessandro

Natural Language Understanding We want to communicate with computers using natural language

Introduction to Information Retrieval http://informationretrieval.org IIR 12: Language Models for

Contextual Media Retrieval Using Natural Language Queries IMPRS-CS - PowerPoint PPT Presentation

Contextual Media Retrieval Using Natural Language Queries IMPRS-CS PhD Application Talk Sreyasi Nag Chowdhury Masters Thesis Supervisors Dr. Mario Fritz Dr. Andreas Bulling Adviser M.Sc. Mateusz Malinowski 1 Outline Motivation and

Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval

Information Retrieval Natural Language Processing and Machine Leanring Advanced Natural Language

Retrieval-augmented language models CS 685, Fall 2020 Advanced Natural Language Processing Mohit

Content-Based Image Retrieval Queries Commercial Systems Retrieval Features

Lecture 5: Language Modelling in Information Retrieval and Classification Information Retrieval

Statistical Natural Language Processing Prasad Tadepalli CS430 lecture Natural Language

Overview &amp; Natural Language Processing: Natural Synergies to Support Digital Information

Media Indexing &amp; Retrieval Media Indexing &amp; Retrieval Prepared by Ling Guan Jose Lay

1. Social Media Outline 1.1. What is Social Media? 1.2. Opinion Retrieval 1.3. Feed

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Advanced Natural Language Processing and Information Retrieval LAB 1: IR, Indexing, Frequencies

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language

Information Retrieval and Natural Language Engineering for Intelligent Online Recruiting

Natural Language Processing and Information Retrieval Kernel Methods Alessandro Moschitti

Advanced Natural Language Processing and Information Retrieval LAB3: Kernel Methods for

Natural Language Processing and Information Retrieval Indexing and Vector Space Models

Natural Language Processing and Information Retrieval Indexing and Vector Space Models

Natural Language Processing and Information Retrieval Performance Evaluation Query Expansion

CSE 7/5337: Information Retrieval and Web Search Introduction and Boolean Retrieval (IIR 1)

CSE 7/5337: Information Retrieval and Web Search Dictionaries and tolerant retrieval (IIR 3)

Natural Language Processing and Information Retrieval Part II: Structured Output Alessandro

Natural Language Processing and Information Retrieval Automated Text Categorization Alessandro

Natural Language Understanding We want to communicate with computers using natural language

Introduction to Information Retrieval http://informationretrieval.org IIR 12: Language Models for

Overview & Natural Language Processing: Natural Synergies to Support Digital Information

Media Indexing & Retrieval Media Indexing & Retrieval Prepared by Ling Guan Jose Lay