Linking People in Videos with Their Names Using Coreference - PowerPoint PPT Presentation

Linking People in Videos with Their Names Using Coreference Resolution Vignesh Ramanathan, Armand Joulin, Percy Liang, and Li Fei-Fei Stanford University Images from Ramanathan et al. (2014) Yukun Zhu CSC2523 1 / 17

Task Missy points to the larger kid. The big kid walks off. Other kids jeer. No labelled instance. Script is the only source of supervision Names include nominal expressions and pronouns Yukun Zhu CSC2523 2 / 17

Previous Approach On person naming: Multiple instance learning, using proper names from script Treat videos and scripts as bag of face tracks and names Unidirectional information flow from text to vision Yukun Zhu CSC2523 3 / 17

Previous Approach On person naming: Multiple instance learning, using proper names from script Treat videos and scripts as bag of face tracks and names Unidirectional information flow from text to vision On coreference resolution: One of the core task in NLP Can operate on language alone Not accurate enough Yukun Zhu CSC2523 3 / 17

Problem Setup Input: Yukun Zhu CSC2523 4 / 17

Problem Setup Input: Videos with detected human tracks Yukun Zhu CSC2523 4 / 17

Problem Setup Input: Videos with detected human tracks Script roughly aligned with video segments Yukun Zhu CSC2523 4 / 17

Problem Setup Input: Videos with detected human tracks Script roughly aligned with video segments Names (including pronoun/nominals) from script Yukun Zhu CSC2523 4 / 17

Problem Setup Input: Videos with detected human tracks Script roughly aligned with video segments Names (including pronoun/nominals) from script Cast names Yukun Zhu CSC2523 4 / 17

Problem Setup Output: Yukun Zhu CSC2523 5 / 17

Problem Setup Output: Name assignment to human tracks in video Yukun Zhu CSC2523 5 / 17

Problem Setup Output: Name assignment to human tracks in video Name assignment to human mentions in text Yukun Zhu CSC2523 5 / 17

Proposed Method C = γ t C track + γ m C mention + C align Yukun Zhu CSC2523 6 / 17

Proposed Method C = γ t C track ( Y ) + γ m C mention ( Z , R ) + C align ( A , Y , Z ) Name-Track assignment Y ∈ { 0 , 1 } T × P Name-Mention assignment Z ∈ { 0 , 1 } M × P Antecedent matrix R ∈ { 0 , 1 } M × M Alignment matrix A ∈ { 0 , 1 } T × M Yukun Zhu CSC2523 7 / 17

C track ( Y ) Cost of assigning names to tracks Based on video features only Formulate cost function of regression based clustering � || Y − XW || 2 F + λ || W || 2 C ( Y ; X , λ ) = arg min F W t ∈ τ = tr ( Y T Π( X , λ ) Y ) Constraints: Each track is assigned to exactly one name Speaker should be aligned to at least one track Name not mentioned in a scene won’t be aligned Yukun Zhu CSC2523 8 / 17

C mention ( Z , R ) Depends on text only Proper mentions(68%) are trivial to map Pronouns/Nominals alone are not informative Apply regression based clustering to predict R Constraints: Each mention has at most one antecedent Each mention is assigned to one name Gender consistency/no self-association of pronouns Connection constraint R m , m ′ = 1 → Z m = Z m ′ Yukun Zhu CSC2523 9 / 17

C align ( A , Y , Z ) Intuition Aligned track/mention should be assigned to the same name Tracks and mentions are ordered sequence through time Tracks and mentions are roughly aligned in time Formulation Soft connection penalty min || A T Y − Z || 2 F Monotonic constraint Mention mapping constraint Yukun Zhu CSC2523 10 / 17

Optimization min γ t C track ( Y )+ γ m C mention ( Z , R ) + C align ( A , Y , Z ) s . t . Y ∈ C Y , Z , R ∈ C Z , R , A ∈ C A Relax Y , R , Z to be [0 , 1] Slack constraints of Y , Z Block coordinate descent Yukun Zhu CSC2523 11 / 17

Optimization min γ t C track ( Y )+ γ m C mention ( Z , R ) + C align ( A , Y , Z ) s . t . Y ∈ C Y , Z , R ∈ C Z , R , A ∈ C A Relax Y , R , Z to be [0 , 1] Slack constraints of Y , Z Block coordinate descent Quadratic programming to optimize Y Yukun Zhu CSC2523 11 / 17

Optimization min γ t C track ( Y )+ γ m C mention ( Z , R ) + C align ( A , Y , Z ) s . t . Y ∈ C Y , Z , R ∈ C Z , R , A ∈ C A Relax Y , R , Z to be [0 , 1] Slack constraints of Y , Z Block coordinate descent Quadratic programming to optimize Y Quadratic programming to optimize Z , R Yukun Zhu CSC2523 11 / 17

Optimization min γ t C track ( Y )+ γ m C mention ( Z , R ) + C align ( A , Y , Z ) s . t . Y ∈ C Y , Z , R ∈ C Z , R , A ∈ C A Relax Y , R , Z to be [0 , 1] Slack constraints of Y , Z Block coordinate descent Quadratic programming to optimize Y Quadratic programming to optimize Z , R Dynamic time wrapping to optimize A Yukun Zhu CSC2523 11 / 17

Optimization min γ t C track ( Y )+ γ m C mention ( Z , R ) + C align ( A , Y , Z ) s . t . Y ∈ C Y , Z , R ∈ C Z , R , A ∈ C A Relax Y , R , Z to be [0 , 1] Slack constraints of Y , Z Block coordinate descent Quadratic programming to optimize Y Quadratic programming to optimize Z , R Dynamic time wrapping to optimize A Round Y , Z to integer matrix Yukun Zhu CSC2523 11 / 17

Dataset Yukun Zhu CSC2523 12 / 17

Quantitative Results Name assignment to tracks in video. Random: Randomly picks a name based on crude alignment Cour: Weakly-supervised method for name assignment BOJ: min C track without scene constraint OurUnidir: min C track with scene constraint OurUnicor: min C track with coreference constraints OurUnif: All tracks given equal values in alignment matrix OurBidir: Full model Yukun Zhu CSC2523 13 / 17

Quantitative Results Name assignment to mentions in text. Yukun Zhu CSC2523 14 / 17

Qualitative Results Yukun Zhu CSC2523 15 / 17

Errors Missing/low resolution faces Error in coreference resolution Yukun Zhu CSC2523 16 / 17

Summary Contribution: Joint person naming and coreference resolution New dataset State-of-the-art performance on visual/textual side Yukun Zhu CSC2523 17 / 17

Summary Contribution: Joint person naming and coreference resolution New dataset State-of-the-art performance on visual/textual side Future work: Actions/attributes for alignment Yukun Zhu CSC2523 17 / 17

V. Ramanathan, A. Joulin, P. Liang, and L. Fei-Fei. Linking People in Videos with “Their” Names Using Coreference Resolution. In Computer Vision – ECCV 2014 , pages 95–110. Springer International Publishing, Cham, Sept. 2014. Yukun Zhu CSC2523 17 / 17

Linking People in Videos with Their Names Using Coreference - PowerPoint PPT Presentation

Linking People in Videos with Their Names Using Coreference Resolution Vignesh Ramanathan, Armand Joulin, Percy Liang, and Li Fei-Fei Stanford University Images from Ramanathan et al. (2014) Yukun Zhu CSC2523 1 / 17 Task Missy points to the

Linking linking Weak forms Linking Weak forms Elision (sound cut)

Syntax 3 Predicates Predicates and Linking Verbs Linking Verbs Linking Verbs

Welcome! Org. Names Org. Names Org. Names Org. Names Technical Set-up Denver Art

Welcome! Org. Names Org. Names Org. Names Org. Names TFGH Dave Ross GHC3 Robert Aaron

A framework for linking land use and A framework for linking land use and A framework for linking

Presentation Last Names A-E Ms. Kennair Last Names F-L Ms. Fornera Last Names M-R Ms. Tippins

Web Hosting and Domain Names Introduction to Web Design Web Hosting and Domain Names

Dennis Rosenberg http://DennisRosenberg.com Why Videos? People love watching videos Higher

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

AAA Showcase! Who is my counselor? Last Names A-EL: Mr. Melvin Last Names EM-LEE: Ms. Tauer

Web Development Web Hosting and Domain Names CSCI-GA 1122 Web Development Web Hosting and

The Base Names The Base Names Elohim Elohim El El Shaddai El El Shaddai Shaddai Shaddai

learning methods in large video collections Armand Joulin Stanford University Linking people in

Creating Videos Session will begin shortly Why create instructional videos for your courses?

Consuming videos with the ForkBrowser Consuming videos with the ForkBrowser Ork de Rooij, Cees

Understand Basketball Games 2018.6.15 Sports Videos Large quantity, high

EDU Tutorial: DNS Privacy Sara Dickinson Sinodun sara@sinodun.com EDU Tutorial @ IETF_97

Zajmavosti ze systmovho programovn Pavel imerda pavlix@pavlix.net C API File I/O

System-Level I/O 15-213: Introduc0on to Computer Systems

BioLinux on HPC Bio: Jenny Wu jiew5@uci.edu Linux: Harry Mangalam harry.mangalam@uci.edu

Programming Languages Third Edition Chapter 7 Basic Semantics Objectives Understand

Vytautas Valancius, Nick Feamster, Akihiro Nakao, and Jennifer Rexford Cloud computing is on

Seamless Mobility over ICN Ravi Ravindran (ravi.ravindran@huawei.com) FG-IMT 2020, Demo Day

DNS and CDNs 14-740: Fundamentals of Computer Networks Bill Nace Material from Computer