Learning semantic relationships for better action retrieval in images Inkyu An
Content 1. Background 2. Motivation 3. Related Work 4. Approach 5. Result 2
Background | Semantic ? What comes to mind when you see below picture ? There are many parked vehicles on either side of the road. 3
Background | Semantic labeling http://rodrigob.github.io/are_we_there_yet/build/semantic_l abeling_datasets_results.html#4d5352432d3231 4
Background | Semantic labeling More complex - A wide variety of classes 5
Background | Semantic labeling More complex - A wide variety of classes Collie Retriever Great Labrador Pomeranian Dane Retriever Vizsla Samoyed Bull Terrier Poodle Yorkshire Terrier 6
Background | More and more complex She is stretching her right leg over listening a music 7
Motivation | Action retrieval in images Query image Image Search ??? Person interacting with panda 8
Motivation | Action retrieval in images Query image Result of Prior work Image Search False Positive Person interacting with panda 9
Motivation | Action retrieval in images Query image Result images Person holding Person interacting Person feeding Person feeding animals with panda panda calf Implied-by Mutual-exclusive Type-of 10
Motivation | Action retrieval in images Three kinds of relations 1. Implied-by 2. Type-of 3. Mutual-exclusive HEX-graph Large-scale object classification using label relation graphs [ECCV 2014] 11
Motivation | Action retrieval in images βPerson interacting with pandaβ is represented by a weight vector π§ π© Skip-grams Distributed Representations of Words and Phrases and their Compositionality [NIPS 2013] 12
Motivation | Action retrieval in images They needed to get a score of relationship of sentences pair. Neural Tensor Network Reasoning With Neural Tensor Networks for Knowledge Base Completion [NIPS 2013] 13
Related Work | 1. HEX-graph - Three kinds of relations 2. Skip-grams - Weight vectors of actions(Sentence) 3. Neural Tensor Network - Scores of relationship of pairs of actions 14
Related Work | HEX-graph _ Motivation Classifier Siberian Husky Poodle Bulldog Bengal cat Russian Blue Dog Cat 15
Related Work | HEX-graph _ Motivation Classifier Siberian Husky Puppy Dog Cat Exclusion Subsumption HEX-graph 16
Related Work | HEX-graph _ Problem Definition <HEX-graph> exclusion πππππ‘ π βΆ Dog Cat Dog Cat Puppy Husky subsumption πΌπππ ππ πβπ§ ππππ πΉ β βΆ subsumption Husky Puppy πΉπ¦πππ£π‘πππ ππππ πΉ π βΆ exclusion Relations : Dog Puppy : subsumption Dog Cat : exclusion Husky Puppy : overlap 17
Related Work | skip-grams - The training objective is to learn word vector representations that are good at predicting the nearby words The average log probability ο Training Input sentence Nearby words 21
Related Work | Neural Tensor Networks (NTN) - The model returns a high score if they are in that relationship and a low on otherwise 23
Approach | Problem setup A set of actions π Action : Person riding bike - Person riding bike - Person riding horse - Person preparing food Related - Chef cooking pasta images - Person walking with a horse Two SVO structure : 1. <subject, verb, object> 2. <subject, verb, prepositional object> 24
Approach | Problem setup _ three kinds of relations Person preparing food 1. Implied-by : Chef cooking pasta Person doing football 2. Type-of : Man playing soccer 3. Mutually exclusive : Person riding horse Man riding camel 25
Approach | Full model 2 π· = π· ππ + π½ π π· π ππ + π½ π π· πππ + π½ π π· ππππ‘ + π π 2 Full model : Basic action The weights in Language prior retrieval model the model [only Action] [Image + Action] Consistency Visual objective objective [Image + Action] [only Action] 26
Approach | Full model 2 π· = π· ππ + π½ π π· π ππ + π½ π π· πππ + π½ π π· ππππ‘ + π π 2 Full model : Basic action The weights in Language prior retrieval model the model [only Action] [Image + Action] Consistency Visual objective objective [Image + Action] [only Action] π = π ππ , π₯ π΅ , π ππ π π 2 π πππ£πππ ππ¨ππ π₯ππ’β π π πππ£πππ ππ¨ππ’πππ πππππππππππ’ π π ππ π΅βπ 27
Approach | Full model 2 π· = π· ππ + π½ π π· π ππ + π½ π π· πππ + π½ π π· ππππ‘ + π π 2 Full model : Basic action The weights in Language prior retrieval model the model [only Action] [Image + Action] Consistency Visual objective objective [Image + Action] [only Action] 28
Approach | Basic action retrieval model Person riding π΅ππ’πππ Skip-grams π π© Skip-grams bike π΅ π π ππ ππ π½ π΅ + π π© + π π© π π© + CNN CNN π β π π© π β I β π ππ π ππ Action prediction loss π π½ = π ππ π·ππ π½ + π ππ π (π π· ππ = max 0,1 + π₯ π΅ π½β β π π½+ ) π½ + βπ° π° π΅ π΅ : a set of positive images of A π΅ π½ β βπ° π° π΅ : a set of negative images of A π΅ 29
Approach | Full model 2 π· = π· ππ + π½ π π· π ππ + π½ π π· πππ + π½ π π· ππππ‘ + π π 2 Full model : Basic action The weights in Language prior retrieval model the model [only Action] [Image + Action] Consistency Visual objective objective [Image + Action] [only Action] 30
Approach | Relationship prediction Goal : Denote the relationship by a vector π π΅πΆ π , π π’ , π π β 0,1 3 = π π΅πΆ π΅πΆ π΅πΆ Implied by, type-of and mutually exclusive π π©πͺ Person riding π΅ππ’πππ π₯ π΅ , π₯ πΆ Skip-grams Skip-grams bike π΅ Neural Tensor Person riding Neural Tensor π΅ππ’πππ 1:3 Softmax π Network π ππ camel Network πΆ 1:3 β¨π₯ πΆ + π π ππ π π΅πΆ = π‘πππ’πππ¦ πΎ π₯ π΅ β¨π π ππ 31
Approach | Language prior for relationship - NLP prior Person preparing food 1. Implied-by : Chef cooking pasta Wrong 2. Type-of : Man eating fish Person feeding a fish 3. Mutually exclusive : Person riding horse Man riding camel 32
Approach | Language prior for relationship The loss function of language-based relationship π« πππ : π· πππ = π π΅πΆ β π π΅πΆ π΅ πΆββ π΅ π π©πͺ : NLP prior π π©πͺ : Relationship prediction - NLP priors are not always accurate - They treated NLP priors as a noisy prior 33
Approach | Full model 2 π· = π· ππ + π½ π π· π ππ + π½ π π· πππ + π½ π π· ππππ‘ + π π 2 Full model : Basic action The weights in Language prior retrieval model the model [only Action] [Image + Action] Consistency Visual objective objective [Image + Action] [only Action] 34
Approach | Action retrieval with relationship - Visual objective A is implied-by B : Rank the positive images of B higher than the negatives of A π = π π π½ β β π π½ π β π· π΅πΆ max 0,1 + π₯ π΅ π° πΆ : a set of positive images of B π½ π βπ° πΆ π½ β βπ° π° π΅ : a set of negative images of A π΅ A is Type-of B : Rank the positive images of A higher than negatives of B π = π π π½ β β π π½ π β π· π΅πΆ max 0,1 + π₯ πΆ π½ π βπ° π° π΅ : a set of positive images of A π΅ π½ β βπ° π° πΆ : a set of negative images of B πΆ A is Mutually : Rank the positive images of A higher than the positives exclusive of B of B π = π π π½ π β π π½ π β π· π΅πΆ max 0,1 + π₯ π΅ π½ π βπ° π° π΅ : a set of positive images of A π΅ π½ π βπ° π° πΆ : a set of positive images of B πΆ 35
Approach | Action retrieval with relationship - Visual objective π β π· π’ β π· π’ + π π β π· π΅πΆ π π ππππππ’ππ€π: π· π ππ = π + π π΅πΆ π΅πΆ π΅πΆ π΅πΆ π΅πΆ π΅βπ πΆββ π΅ Relationship prediction π , π π’ , π π } π π΅πΆ = {π π΅πΆ π΅πΆ π΅πΆ π , π· π΅πΆ π’ , π· π ) of each relations, when ο Summarize costs( π· π΅πΆ π΅πΆ π , π π’ , π π } ) is β1β. relationship prediction( {π π΅πΆ π΅πΆ π΅πΆ 36
Approach | Full model 2 π· = π· ππ + π½ π π· π ππ + π½ π π· πππ + π½ π π· ππππ‘ + π π 2 Full model : Basic action The weights in Language prior retrieval model the model [only Action] [Image + Action] Consistency Visual objective objective [Image + Action] [only Action] 37
Recommend
More recommend