Question β Answering Dynamic world what is there in front of MPI-INF? Ambiguous query Subjective; multiple correct Our Q&A Model answers Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 9
Question β Answering Dynamic world Media as answer what is there in front of MPI-INF? Ambiguous query Subjective; multiple correct Our Q&A Model answers Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 9
Dynamic-Egocentric Extension πΏππππ (π) Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 10
Dynamic-Egocentric Extension πΏππππ (π) Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 10
Dynamic-Egocentric Extension πΏππππ (π) π»πππππ πΏππππ (π π ) cafe(βmensaβ, 49.2560,7.0454). building(βmpi_infβ, 49.2578,7.0460). Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 10
Dynamic-Egocentric Extension πΏππππ (π) π»πππππ πΏππππ (π π ) π¬ππππππ πΏππππ π π cafe(βmensaβ, 49.2560,7.0454). building(βmpi_infβ, 49.2578,7.0460). Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 10
Dynamic-Egocentric Extension πΏππππ (π) π»πππππ πΏππππ (π π ) π¬ππππππ πΏππππ π π cafe(βmensaβ, 49.2560,7.0454). building(βmpi_infβ, 49.2578,7.0460). Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 10
Dynamic-Egocentric Extension πΏππππ (π) π»πππππ πΏππππ (π π ) π¬ππππππ πΏππππ π π cafe(βmensaβ, 49.2560,7.0454). building(βmpi_infβ, 49.2578,7.0460). π½πππ π΅πππππππ (π ππ ) person(49.2578,7.0460 ,βnβ) . day(20150220). Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 10
Dynamic-Egocentric Extension πΏππππ (π) π»πππππ πΏππππ (π π ) π¬ππππππ πΏππππ π π cafe(βmensaβ, 49.2560,7.0454). building(βmpi_infβ, 49.2578,7.0460). π½πππ π΅πππππππ (π ππ ) π«πππππ ππππ π΅πππππ (π ππ ) person(49.2578,7.0460 ,βnβ) . image(`img_20141111_165828',2014111 day(20150220). 1,49.2566,7.0442,`november'). video(`vid_20141121_120149',20141121 ,49.2569, 7.0456,`november'). Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 10
Qualitative Results Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 11
Qualitative Results Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 11
Qualitative Results Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 11
Qualitative Results Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 11
Qualitative Results Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 11
Qualitative Results Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 11
Qualitative Results Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 11
Outline β’ Motivation and Overview β’ Contextual Media Retrieval System β’ Results and Conclusion Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 12
Evaluation Agreement and Disagreement between users * Model tested on 500 test queries Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 13
Evaluation Agreement and Disagreement between users Total agreement * Model tested on 500 26.67% test queries Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 13
Evaluation Agreement and Disagreement between users Majority agreement * Model tested on 500 ~ 40% test queries Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 13
Evaluation Agreement and Disagreement between users * Model tested Disagreement on 500 test queries ~ 25% Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 13
Evaluation Agreement and Disagreement between users * Model tested Future scope for on 500 personalization test queries ~ 25% Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 13
Evaluation Study of human reference frame resolution Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 14
Evaluation Study of human reference frame resolution Future scope for using other Knowledgebases Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 14
Summary We have: β’ Instantiated a β Collective Memoryβ of media content β’ Developed a novel architecture for media retrieval with natural language voice queries in a dynamic setting - Xplore-M-Ego β’ Integrated β egocentrism β to media retrieval Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 15
Summary We have: β’ Instantiated a β Collective Memoryβ of media content β’ Developed a novel architecture for media retrieval with natural language voice queries in a dynamic setting - Xplore-M-Ego β’ Integrated β egocentrism β to media retrieval Thank You Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 15
References β’ Photo Tourism: Exploring Photo Collections in 3D Noah Snavely, Steven M. Seitz, Richard Szeliski β’ Video Collections in Panoramic Contexts J.Tompkin, F.Pece, R.Shah, S.Izadi, J.Kautz, C.Theobalt β’ Videoscapes: Exploring Sparse, Unstructures Video Collections J.Tompkin, K. In Kim, J.Kautz, C.Theobalt β’ PhotoScope:Visualizing Spatiotemporal Coverage of Photos for Construction Management F.Wu, M.Tory Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 16
References β’ Learning Dependency-Based Compositional Semantics Percy Liang, Michael I. Jordan, Dan Klein β’ A multi-world approach to question answering about real-world scenes based on uncertain input M. Malinowski, M. Fritz β’ Image Retrieval with Structured Object Queries Using Latent Ranking SVM T.Lan, W.Yang, Y.Wang, G.Mori β’ Interpretation of Spatial Language in a Map Navigation Task M. Levit, D. Roy Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 16
Extra Material 23-02-2015
Contribution β’ Instantiation of a β Collective Memory β of media files β’ Extension of question-answering to a dynamic setting β’ Extension of spatio-temporal exploration of media to a dynamic setting β’ Incorporation of βegocentrismβ to media retrieval β’ Use of natural language voice queries for media retrieval 23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries
System Overview Modules of Xplore-M-Ego β’ The Google Glass: User Interface β’ Pre-processing : Modification of query, Mapping of a dynamic environment to a static environment β’ Semantic Parser + Denotation : Semantic parsing and prediction of answer β’ Collective Memory : Store of media files 23-02-2015 69
Related Work β’ Spatio-temporal Media Retrieval Paper Author(s) Overview Photo tourism: N. Snavely, S. M. Seitz, Exploration of popular exploring photo and R. Szeliski world sites by collections in 3D browsing through images Video collections in J. Tompkin, F. Pece, S. Spatio-temporal panoramic contexts Rajvi, I. Shahram, K. exploration of videos Jan, and C. Theobalt embedded on a panoramic context 23-02-2015 70
Related Work β’ Natural Language Question-Answering Paper Author(s) Overview Learning P. Liang, M. I. Jordan, Training of a semantic Dependency-based and D. Klein parser with question- compositional answer pairs; single Semantics static world approach A multi-world M. Malinowski and M. Question-answering approach to question Fritz task based on real answering world indoor images; about real-world static multi-world scenes based on approach uncertain input 23-02-2015 71
Related Work β’ Media Retrieval with Natural Language Queries Paper Author(s) Overview Towards surveillance S. Tellex and D. Roy Retrieval of video video search by frames from natural language surveillance videos query with spatial relations βacrossβ and βalongβ Image retrieval with T. Lan, W. Yang, Y. Retrieval of images structured Wang, and G. Mori based on scene object queries using contents using short latent ranking SVM structured phrases as queries 23-02-2015 72
Data Collection 1. Map information : OpenStreetMap Contains β ο§ Type of the entity ο§ GPS coordinates ο§ Name ο§ Address 10-02-2015 73
Data Collection 2. Collection of media files : Collective Memory ** Media files were captured with smart phones 10-02-2015 74
Data Collection 3. Training and Test data ο± Synthetically-generated Data (β What is there in front of MPI- INF?β, answer(A, ( frontOf(A, β mpi inf β)))) (βWhat is there behind MPI- INF?β, answer(A , (behind(A, β mpi inf β)))) (βWhat is there on the right of MPI- INF?β, answer(A , (rightOf(A, β mpi inf β)))) (βWhat is there on the left of MPI- INF?β, answer(A , (leftOf(A, β mpi inf β)))) ο± Real-world Data (βWhat is there on the left of MPI-INF ?β, βimg_20141102_123406β) (β What is on the left of MPI- INF?β, βimg_20141113_160930β) (β What is to the left of MPI-INF ?β, βimg_20141109_134914β) (β What is on the left side of MPI-INF ?β, βimg_20141115_100705β) 10-02-2015 75
Data Collection 23-02-2015 76
Semantic Parser Dependency-based Compositional Semantics (DCS) by Percy Liang β’ DCS tree defines relations between predicates β’ Denotation are solutions satisfying the relations β’ city, major, loc, CA are predicates 23-02-2015 77
Semantic Parser Example Questions World( w ): βWhat is the highest point in state('california','ca', 'sacramento', Florida ?β 23.67e+6, 158.0e+3,31, 'los angeles', 'san diego', 'san francisco', 'san jose'). βWhich State has the shortest river ?β city('alabama','al','birmingham',284413). βWhat is the capital of Maine ?β river('arkansas',2333,['colorado','kansas', 'oklahoma','arkansas']). βWhat are the populations of states through which the mountain('alaska','ak','mckinley',6194). Mississippi river run?β road('86',['massachusetts','connecticut']). βName all the lakes of US ?β country('usa',307890000,9826675). 23-02-2015 78
**slide courtesy: Percy Liang Semantic Parser Learning in DCS 23-02-2015 79
Semantic Parser β’ Induction of logical forms β’ Logical forms (DCS trees) induced as latent variables according to a probability distribution parametrized with ΞΈ β’ Answer y evaluated with respect to world w 23-02-2015 80
Semantic Parser β’ Induction of logical forms Requirements β A set of rules/predicates: city(cityid(City,St)) :- state(State,St,_ ,_ ,_ ,_ ,City,_ ,_ ,_ ). loc(cityid(City,St),stateid(State)) :- state(State,St,_,_ ,_ ,_ ,City,_ ,_ ,_ ). river(riverid(R)) :- river(R,_ ,_ ). loc(cityid(City,St),stateid(State)) :- city(State,St,City, ). traverse(riverid(R),stateid(S)) :- river(R, ,States), member(S,States). area(stateid(X),squared mile(Area)) :- state(X,_ ,_ ,_ ,Area,_ ,_ ,_ ,_ ,_ ). population(countryid(X),Pop) :- country(X,Pop,_). major(X) :- city(X), population(X,moreThan(150000)). 23-02-2015 81
Semantic Parser β’ Induction of logical forms Requirements β A set of lexical triggers( L ): <(function words; predicate )> <([ POS tags ]; [predicates])> ( most, size). ( WRB ; loc) ( total , sum). ([ NN;NNS ]; [city,state,country,lake,mountain,river,place) ( called , nameObj). ([ NN;NNS ]; [person,capital,population]) ([ NN;NNS; JJ ]; [len,negLen,size,negSize,elevation) ([ NN;NNS; JJ ]; [negElevation,density,negDensity,area,negArea]) ( JJ ; major) Augmented Lexicon( L+ ): ( long , len). ( large , size). ( small , negSize). ( high , elevation). 23-02-2015 82
Media Retrieval from Denotations Example Questions World( w ): βWhat is there on the right of MPI- image(`img_20141111_165828',201 INF?β 41111,49.2566,7.0442,`november'). βWhat is there in front of postbank ?β video(`vid_20141121_120149',2014 1121,49.2569, 7.0456,`november'). βWhat is there on the left of Mensa ?β cafe(βmensaβ, 49.2560,7.0454). βWhat is there near Science Park ?β building(βmpi_infβ, 49.2578,7.0460). βWhat happened here one day ago?β bank(βpostbankβ, 49.2556,7.0449). βWhat does this place look like in December?β 23-02-2015 83
Dynamic-Egocentric Extension Lexical triggers: Basic lexicon L Augmented lexicon L+ ([ WP,WDT ], [image,video]). ( front , frontOf). ( NN , ( behind , behind). [atm,building,cafe,highway,parking,research_institution, ( right , rightOf). restaurant,shop,sport,tourism,university]). ( left , leftOf). ( JJS , [nearest]). ([ NN,NNS,VB ], [view]). ( VBD , [view]). Prediction accuracy: 17.9% Prediction accuracy: 47% 23-02-2015 84
Dynamic-Egocentric Extension g g g g g g g g g g g g g 23-02-2015 85
Dynamic-Egocentric Extension g g g g g g g g g g g g g 23-02-2015 86
POS tags from Penn Treebank β’ WRB : Wh-adverb β’ NN : Noun, singular or mass β’ NNS : Noun, plural β’ JJ : Adjective β’ WP : Wh-pronoun β’ WDT : Wh-determiner β’ NN : Noun, singular or mass β’ JJS : Adjective, superlative β’ NNS : Noun, plural β’ VB : Verb β’ VBD : Verb, past tense 23-02-2015 87
Reason behind hard-coding spatial relations β’ What is there left/VBN of MPI? β’ What is there on the left/NN of MPI? β’ What is there in front/NN of MPI? β’ What is there behind/IN MPI? β’ What is there right/RB of MPI? β’ What is there on the right/NN of MPI? 23-02-2015 88
Predicates used in Xplore-M-Ego 23-02-2015 89
Results and Evaluation β’ Synthetically generated question-answer pairs used for training and testing β’ Maximum prediction accuracy β 47% 23-02-2015 90
Results and Evaluation Performance Measures: π π = ππ£ππππ ππ ππ£ππ πππ‘ π₯ππ’π πππππ π ππ’π πππ€πππ‘ β’ π π = ππ£ππππ ππ ππ£ππ πππ‘ π₯ππ’π π ππππ€πππ’ π ππ’π πππ€πππ‘ πππππ ππ β’ π π’ = ππ£ππππ ππ ππ£ππ πππ‘ π₯ππ’π π’ππ¦π’π£ππ π ππ’π πππ€πππ‘ πππ ππ π ππ’π πππ€πππ‘ β’ π π ππ€ππ πππ ππ ππππ‘πππ = β’ π π π π ππ€ππ πππ π πππππ = β’ π π + π π’ 23-02-2015 91
Results and Evaluation β’ βhuman -in-the- loopβ training of the model o Five different models were trained o Training accuracies ranged from 42.6% to 48.8% o The best model based on training accuracy was used for further evaluations 23-02-2015 92
Results and Evaluation β’ βhuman -in-the- loopβ training of the model β’ It is a method of training the semantic parser by human users through relevance feedback β’ βCorrectβ/βWrongβ decisions are made solely based on the predicted answers β’ The models are trained with real questions from human users 23-02-2015 93
Results and Evaluation β’ βhuman -in-the- loopβ training of the model o Automatic training of the semantic parser with the real data was not possible because β β’ GPS coordinates of media files showing a particular entity does not match that of the map data β’ Humans are inconsistent with regards to reference frames β’ Question- answer pairs didnβt follow any pattern β’ Denotations (often more than one answer) never matched with true answers, hence EM-like algorithm failed to learn 23-02-2015 94
Results and Evaluation Human evaluation of model trained with real-world data β’ RealModel -model trained with real- world data β’ Relevance feedback collected from five users β’ Overall percentage of relevant retrievals = 26.67% 23-02-2015 95
Results and Evaluation β’ Recall of SynthModel = 15.88% β’ Recall of RealModel = 26.67% 23-02-2015 96
Evaluation Human evaluation of temporal and contextual Q&A β’ Five hypothetical locations and viewing directions provided to users β’ Relevance feedback collected for retrievals following a canonical reference frame and a user-centric reference frame Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 97
Evaluation Human evaluation of temporal and contextual Q&A β’ Canonical and User-centric reference frame: βfront ofβ βright ofβ βleft ofβ User heading East βbehindβ Original: What is there in front of MPI-INF? Altered: What is there on the right of MPI-INF? Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015 98
Discussion Problem with matching GPS coordinates What is in front iCoffee of MPI-INF? retrieved media ground-truth media MPI-INF βFront of MPI -INF β 23-02-2015 99
Discussion Challenges Limitations Converting a dynamic world to Spatial and temporal references a static world not identified Words tagged with incorrect Integrating βegocentrismβ POS tags Arguments not identified from Handling temporal queries sentences Collection of data Scalability Increasing the coverage of the Reference resolution is not static database handled 23-02-2015 100
Recommend
More recommend