Learning Perceptual Causality from Video Amy Fire and Song-Chun Zhu Center for Vision, Cognition, Learning, and Art UCLA 1
Ideally: Learn Causality from Raw Video 2
Inference Using Learned Causal Structure c) STC-Parsing a) Input: Video b) Event Parsing d) Inference Over Time Flip Agent Switch Action Drink Actions Hidden Fluent ON Light Causal OFF Link Fluents THIRSTY Agent UNKNOWN NOT Time Time Time • Answer why events occurred • Joint STC: Infer misdetections and hidden objects/actions • Infer triggers, goals, and intents 3
But… OBSERVATION CAUSALITY (generally) 4
SO…WHERE ARE WE NOW? 5
Vision Research and Causal Knowledge • Use pre-specified causal relationships for action detection – E.g., PADS (Albanese, et al. 2010) – Model Newtonian mechanics (Mann, Jepson, and Siskind 1997) • Use causal measures to aid action detection – E.g., Prabhakar, et al. 2010 • Use infant perceptions of motion to learn causality – Using cognitive science (Brand 1997) • Needed: Learn causality from video, integrating ST learning strategies at pixel level 6
Causality and Video Data: Often Disjoint • Learning Bayesian networks – Constraint satisfaction (Pearl 2009) – Bayesian formulations (Heckerman 1995) – Intractable on vision sensors • Commonsense reasoning (Mueller 2006) – first order logic. – Do not allow for ambiguity/probabilistic solutions • MLNs (Richardson and Domingos 2006) – Intractable – Used for action detection (Tran and Davis 2008) • KB formulas not learned 7
MOVING FORWARD: OUR PROPOSED SOLUTION 8
Cognitive Science as a Gateway: Perceptual Causality • Causal Induction from Observation in Infancy – Agentive actions are causes (Saxe, Tenenbaum, and Carey 2005) – Co-occurrence of events and effects (Griffiths and Tenenbaum 2005) − ✏ Measure co-occurrence ¬ Action Action Effect c 0 c 1 cr : − ¬ Effect c 2 c 3 – Temporal lag between the two is short (Carey 2009) – Cause precedes effect (Carey 2009) • Note: NOT the same as necessary and sufficient causes − λ 9 ✓ ◆ λ · fluent e-specified classifiers. e-specified ’s amyfir .fir
MODIFIED GOAL: LEARN AND INFER PERCEPTUAL CAUSALITY 10
What are the effects? Fluent changes. Door Opens Door Closes OPEN Door Door Closed Inertially Door Open CLOSED Inertially Light Turns On ON Light Turns Off Light Off Inertially Light Light On Inertially OFF t • Fluents are time-varying statuses of objects – Mueller – Commonsense Reasoning 2006 11
What are the causes? Actions. A A And-Nodes Or-Nodes Give Compose Single Alternative a a Causes (multiple a Causes sub-actions) Unlock Push Open Other Door Door Side • Probabilistic Graphical Representation for Causality – And-Or Graph 12
Causal AOG Light fluent off on OR OR a 01 a 01 AND AND 13
Connecting Temporal to Causal and Spatial T-AOG fragment C-AOG S-AOG fragment fragment parent( A ) O 2 f 2 object2 fluent2 terminates A R(A) O 1 f 1 object1 fluent1 children( A ) 14
Grounding on Pixels: Connecting S/T/C-AOG T-AOG S-AOG fragment fragment actions A 1 parent( O ) A A 2 terminates O Grounded A 3 grounding on Pixels T 1 (o) templates( O ) f 1 f 2 fluents f 3 C-AOG fragment children( O ) 15
Preliminary Theory LEARNING PERCEPTUAL CAUSALITY 16
Principled Approach: Information projection p : E I E I cr cr 1 p 1 f 1 Match the statistics of the contingency table Initial Select cause/effect relationship that makes p 1 distribution closest (KL) to p to preserve learning history while maximizing Info. Gain p : E I E I cr cr 2 p 2 f 2 Repeat True contingency table distribution ( ) = KL f || p + ( ) + KL p + || p ( ) KL f || p ( ) - KL f || p + ( ) ( ) é ù û= max KL p + || p max KL f || p ë DellaPietra, DellaPietra,Lafferty, 97 Zhu, Wu, Mumford, 97 17
Learning Pursuit: Add Causal Relations • Model Pursuit k p p p p p f 0 1 (On ST-AOG) cr cr Effect Effect 1 p pg p pg exp , cr Action c o c 2 z Action c 1 c 3 • Proposition 1: Find parameters – Model formed by min KL (p + || p) , matching statistics æ ö l + , i = log h i × f 0 h i is c i under p ( ) = E f cr ( ) E p + cr ç ÷ f i is c i from f + + è h 0 f i ø • Proposition 2: Pursue cr . ( ) = argmax ( ) + = argmax cr KL p + || p KL f || h cr cr 18
Selection from ST-AOG Or-node And-node f f f f f A A A A A A A f pf ( 1 p ) f 1 2 1 2 A A A 1 2 p 1 p A 1 A 2 A 1 A 2 f f T f A T 2 ( F ) ( F ) A A 1 1
Performance vs. TE 20
Performance vs. Hellinger χ 2 21
Increasing Misdetections (Simulation) 0% misdetection 10% misdetection 20% misdetection 22
STC-Parsing Demo 23
Looking Forward: • Finish learning the C-AOG • Increase reasoning capacity of the C-AOG • Integrate experiment design 24
Recommend
More recommend