knowlywood mining activity knowledge from hollywood
play

Knowlywood: Mining Activity Knowledge from Hollywood Narratives - PowerPoint PPT Presentation

Knowlywood: Mining Activity Knowledge from Hollywood Narratives Niket Tandon (MPI Informatics, Saarbruecken) Gerard de Melo (IIIS, Tsinghua Univ) Abir De (IIT Kharagpur) Gerhard Weikum (MPI Informatics, Saarbruecken) Legs, person, shoe,


  1. Knowlywood: Mining Activity Knowledge from Hollywood Narratives Niket Tandon (MPI Informatics, Saarbruecken) Gerard de Melo (IIIS, Tsinghua Univ) Abir De (IIT Kharagpur) Gerhard Weikum (MPI Informatics, Saarbruecken)

  2. Legs, person, shoe, mountain, rope..

  3. Legs, person, shoe, mountain, rope.. Rock climbing Going up a mountain/ hill Going up an elevation Daytime, outdoor activity What happens next?

  4. Legs, person, shoe, mountain, rope.. Rock climbing Activity classes Going up a mountain/ hill Activity groupings Activity hierarchy Going up an elevation Additional information Daytime, outdoor activity What happens next? Temporal guidance

  5. Go up an elevation .. .. Parent activity Previous activity Next activity {Climb up a , Hike up a hill} mountain Participants climber, boy, rope Get to village Drink water .. .. .. .. Location camp, forest, sea shore Time daylight, holiday Visuals 5

  6. Activity commonsense: Related work Event mining Encyclopedic KBs: Factual e.g. bornOn Entity oriented e.g. Person Many KBs: e.g. Freebase 6

  7. Activity commonsense: Related work Event mining Commonsense KB Encyclopedic KBs: Cyc: Factual e.g. bornOn Manual Entity oriented e.g. Person Limited size Many KBs: e.g. Freebase No focus on activities ConceptNet: Crowdsourced Limited size No semantic activity frames WebChild: No focus on activities 7

  8. Activity commonsense: Related work Event mining Commonsense KB This talk Encyclopedic KBs: Cyc: Semantic Manual Activity CSK Factual e.g. bornOn Entity oriented e.g. Person Limited size KB construction Many KBs: e.g. Freebase No focus on activities ConceptNet: Crowdsourced Limited size No semantic activity frames WebChild: No focus on activities 8

  9. Go up an elevation .. .. Parent activity Previous activity Next activity {Climb up a , Hike up a hill} mountain Participants climber, boy, rope Get to village Drink water .. .. .. .. Location camp, forest, sea shore Time daylight, holiday .. Activity commonsense is hard : - People hardly express the obvious : implicit and scarce - Spread across multiple modalities : text, image, videos - Non-factual : hence noisy

  10. Contain events but not activity knowledge May contain activities but varying granularity and no visuals. No clear scene boundaries. Hollywood narratives are easily available and meet the desiderata align via subtitles with approximate dialogue similarity 10

  11. 11

  12. Syntactic and semantic role State of the art WSD customized for phrases semantics from VerbNet the man.1 NP VP NP man man.2 patient. agent. shoot.vn.1 animate animate began shoot.1 to shoot shoot.4 patient. agent. shoot.vn.3 inanimate animate a NP VP NP video.1 video 12

  13. Syntactic and semantic role Output State of the art WSD customized for phrases semantics from VerbNet Frame the man.1 NP VP NP man Agent: man.2 patient. agent. man.1 shoot.vn.1 animate animate began shoot.1 to Action: shoot shoot.4 shoot.4 patient. agent. shoot.vn.3 inanimate animate a NP VP NP video.1 video Patient: video.1

  14. Selectional WN prior Word, VN IMS prior restriction score match score x ij = binary decision var. for word i, mapped to WN sense j One VN sense per verb WN, VN sense consistency Selectional restr. constraints binary decision 14

  15. Go up an elevation Climb up a mountain Hike up a hill Drink water Participants climber, Participants climber .. .. .. .. rope Location camp, forest Location sea shore Time holiday Time daylight Similarity: + Attribute overlap Hypernymy: WordNet hypernymy : + Attribute hypernymy v i , v j and o i , o j Temporal: Generalized Sequence Pattern mining over statistics with gaps #(asynset 1 precedes asynset 2 ) / #(asynset 1 ) #(asynset 2 ) 15

  16. Probabilistic soft logic - refining Typeof (T), Similar (S) and Prev (P) edges 16

  17. Go up an elevation Climb up a mountain Hike up a hill Drink water Participating climber, Participating climber .. .. .. .. Agent rope Agent Location camp, forest Location sea shore Time holiday Time daylight Tie the activity synsets Break cycles Resultant: DAG 17

  18. Recap • Defined a new problem of automatic acquisition of semantically refined frames. • Proposed a joint method that needs no labeled data. 18

  19. Evaluation Knowlywood Statistics Scenes 1,708,782 Activity synsets 505,788 Accuracy 0.85 ± 0.01 URL bit.ly/knowlywood #Scenes is aggregated counts over Moviescripts, TV serials, Sitcoms, Novels, Kitchen data. Evaluation: Manually sampled accuracy over the activity frames. 19

  20. Evaluation: Baselines - No direct competitor providing activity frames. KB Baseline: Our semantic frame (rule based) structure over the crowdsourced commonsense KB ConceptNet Methodology Baseline: A rule based frame detector over our data and other data using an open IE system ReVerb 20

  21. KB Baseline You open your wallet hasNextSubEvent take out money N ormalized domain: concept1 ~ verb [article] noun Organize and canonicalized the relations as follows: ConceptNet 5’s relations We map it to IsA, InheritsFrom type Causes, ReceivesAction, RelatedTo, CapableOf, UsedFor agent HasPrerequisite, HasFirst/LastSubevent, prev/next HasSubevent, MotivatedByGoal SimilarTo, Synonym similarTo AtLocation, LocationOfAction, LocatedNear location 21

  22. Methodology Baseline Reverb, an openIE tool extracts SVO triples from text - S and O are only surface forms. - V is not categorized into a relation. We use a Bayesian classifier to estimate the label of V The estimates come from MovieClips.com that provides 30K manually tagged popular movie scenes like, action : singing, prop : violin, setting : theater 22

  23. Methodology Baseline Reverb, an openIE tool extracts SVO triples from text - S and O are only surface forms. - V is not categorized into a relation. We use a Bayesian classifier to estimate the label of V The estimates come from MovieClips.com that provides 30K manually tagged popular movie scenes like, action : singing, prop : violin, setting : theater 23

  24. Knowlywood ConceptNet based Reverb based Reverb clueweb 1 0.92 0.91 0.87 0.9 0.86 0.85 0.84 0.83 0.81 0.79 0.78 0.77 0.8 0.7 0.66 0.6 0.5 0.41 0.4 0.33 0.3 0.2 0.15 0.1 0 0 0 0 0 0 0 0 0 0 Parent Participant Prev Next Location Time # activities ~1 M High accuracy & high coverage Knowlywood ConceptNet based ~ 5 K High accuracy & low coverage Reverb based ~ 0.3 M Low accuracy & high coverage Reverb clueweb ~ 0.8 M Low accuracy & high coverage 24

  25. Visual alignments ~30,000 Images from movies, and additionally, >1 Million images via Flickr tag matching: Match ride a bicycle riding, verb-noun Flickr Activity participant: road, pairs from vector = road man, boy bicycle .. Knowlywood DOT location: road as ride bicycle Knowlywood = man, road 25

  26. External use case -1 : Semantic indexing Given: participant, location and time Predict: the activity Ground truth: Movieclip’s manually specified activity tag. Atleast one hit in Top 10 predictions 26 Thank you! Browse at bit.ly/webchild

  27. External use case 2: Movie Scene Search Method: A generative model encoding that a query holistically matches a scene if the participants and activity fit well with the query. 27

  28. Conclusion Thank you! Browse at bit.ly/webchild

Recommend


More recommend