Taking Synchrony Seriously: Taking Synchrony Seriously: A Perceptual-Level Model of Infant A Perceptual-Level Model of Infant Synchrony Detection Synchrony Detection Christopher G. Prince, George J. Hollich, Christopher G. Prince, George J. Hollich, Nathan A. Helder, Eric J. Mislivec, Nathan A. Helder, Eric J. Mislivec, Anoop Reddy, Sampanna Salunke, & Anoop Reddy, Sampanna Salunke, & Naveed Memon Naveed Memon Department of Computer Science Department of Psychological Sciences University of Minnesota Duluth Purdue University Duluth, MN USA West Lafayette, IN USA chris@cprince.com 25 August 2004 25 August 2004 http://www.cprince.com/PubRes/EpiRob04 http://www.cprince.com/PubRes/EpiRob04 1
Outline of Talk Outline of Talk Types of Synchrony Detection Types of Synchrony Detection A Model of Synchrony Detection A Model of Synchrony Detection Comparison to Infant Behavior Comparison to Infant Behavior Conclusions Conclusions 25 August 2004 25 August 2004 http://www.cprince.com/PubRes/EpiRob04 http://www.cprince.com/PubRes/EpiRob04 2
Acknowledgements Acknowledgements Collaborators Collaborators Lakshmi Gogate Lakshmi Gogate Students Students Soleh Dib, Tyrel Pollak Soleh Dib, Tyrel Pollak Tim Colburn’s CS 4531 software engineering class Tim Colburn’s CS 4531 software engineering class Colleagues Colleagues Rocio Alba-Flores, Kang James Rocio Alba-Flores, Kang James Supported in part by UROP grants and by a donation Supported in part by UROP grants and by a donation from Digi-Key from Digi-Key QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 25 August 2004 25 August 2004 http://www.cprince.com/PubRes/EpiRob04 http://www.cprince.com/PubRes/EpiRob04 3
Types of Audio-Visual Types of Audio-Visual Synchrony Detection Synchrony Detection QuickTime™ and a YUV420 codec decompressor are needed to see this picture. Punctuate speech-object synchrony Punctuate speech-object synchrony Two month olds can detect (Gogate et Two month olds can detect (Gogate et al., 2004) al., 2004) Face-voice synchrony Face-voice synchrony QuickTime™ and a YUV420 codec decompressor are needed to see this picture. 10- to 16-week old infants (Dodd, 10- to 16-week old infants (Dodd, 1979) 1979) Talker with distractor Talker with distractor QuickTime™ and a YUV420 codec decompressor are needed to see this picture. E.g., cocktail party (Hollich et al., in E.g., cocktail party (Hollich et al., in press) press) Multiple visual events Multiple visual events QuickTime™ and a YUV420 codec decompressor E.g., multiple talkers (Pickens et al., E.g., multiple talkers (Pickens et al., are needed to see this picture. 1994; Hollich & Prince, in progress) 1994; Hollich & Prince, in progress) 4
Research Question Research Question Can a single general-purpose synchrony Can a single general-purpose synchrony detection mechanism, estimating audio- detection mechanism, estimating audio- visual synchrony from low-level signal visual synchrony from low-level signal features, account for infant synchrony features, account for infant synchrony detection across a broad range of audio- detection across a broad range of audio- visual speech integration tasks ? visual speech integration tasks ? 25 August 2004 25 August 2004 http://www.cprince.com/PubRes/EpiRob04 http://www.cprince.com/PubRes/EpiRob04 5
Hershey & Movellan (2000) Hershey & Movellan (2000) Computes mutual information between two Computes mutual information between two sensory channels over a time window (length S ) sensory channels over a time window (length S ) Assumes Gaussian distributed sensory signals Assumes Gaussian distributed sensory signals Synchrony defined as mutual-information Synchrony defined as mutual-information between sensory channels between sensory channels | A ( t k ) || V ( x , y , t k ) | M ( x , y , t k ) 1 2 log 2 ( audioDampening ) | A , V ( x , y , t k ) | For other approaches see: http://www.cprince.com/PubRes/Zurich04 25 August 2004 25 August 2004 http://www.cprince.com/PubRes/EpiRob04 http://www.cprince.com/PubRes/EpiRob04 6
Synchrony Detection with HM Synchrony Detection with HM HM algorithm HM algorithm Generates mixelgrams Generates mixelgrams Each pixel of the mixelgram is Each pixel of the mixelgram is a mixel , a m utual i nformation a mixel , a m utual i nformation pix el pix el SenseStream progam: Mixels computed from mutual information between audio and visual channels (Mislivec, 2004) Perceptually relevant mixelgrams typically indicate synchrony between the two input channels (Vuppla, 2004) 25 August 2004 25 August 2004 http://www.cprince.com/PubRes/EpiRob04 http://www.cprince.com/PubRes/EpiRob04 7
Calculation for Each Mixel Frames from one channel consist Frames from other channel consist of single vectors of n -elements, of h x w m -element vectors, here here processed audio features visual image features 1 É n S frames 1 É n 1 É n 1 É n h 1 É 1 É m n w n m QuickTime™ and a decompressor are needed to see this picture. Audio cov arian ce ma trix: A ( t k ) QuickTime™ and a decompressor are needed to see this picture. n+m Visual cov arian ce V ( x , y , t k ) ma trix: A , V ( x , y , t k ) Joint covariance matrix: | A ( t k ) || V ( x , y , t k ) | M ( x , y , t k ) 1 25 August 2004 25 August 2004 8 2 log 2 | A , V ( x , y , t k ) |
Audio Dampening Audio Dampening We use an additional term on the HM We use an additional term on the HM equation to dampen mutual information equation to dampen mutual information outputs when audio is “sub-audible” outputs when audio is “sub-audible” | A ( t k ) || V ( x , y , t k ) | M ( x , y , t k ) 1 (1 1 2 log 2 2 r ) | A , V ( x , y , t k ) | r = max RMS audio value over S interval r = max RMS audio value over S interval = 50 is a fixed threshold = 50 is a fixed threshold 25 August 2004 25 August 2004 http://www.cprince.com/PubRes/EpiRob04 http://www.cprince.com/PubRes/EpiRob04 9
SenseStream Program SenseStream Program Running Running Original Video SenseStream Running On Video QuickTime™ and a YUV420 codec decompressor are needed to see this picture. QuickTime™ and a H.261 decompressor are needed to see this picture. 25 August 2004 25 August 2004 http://www.cprince.com/PubRes/EpiRob04 http://www.cprince.com/PubRes/EpiRob04 10 10
Quantitative Analysis of Quantitative Analysis of Synchrony Synchrony HM algorithm outputs mixelgrams HM algorithm outputs mixelgrams Qualitative Qualitative Depict synchrony graphically Depict synchrony graphically Also useful to reduce mixelgrams to Also useful to reduce mixelgrams to scalars scalars Quantitative synchrony analysis Quantitative synchrony analysis 25 August 2004 25 August 2004 http://www.cprince.com/PubRes/EpiRob04 http://www.cprince.com/PubRes/EpiRob04 11 11
Idea: Connected Regions Idea: Connected Regions Original Video SenseStream Running On Video QuickTime™ and a YUV420 codec decompressor are needed to see this picture. QuickTime™ and a H.261 decompressor are needed to see this picture. 25 August 2004 25 August 2004 http://www.cprince.com/PubRes/EpiRob04 http://www.cprince.com/PubRes/EpiRob04 12 12
Connected Region Analysis Connected Region Analysis Compute variance in sizes of connected regions Compute variance in sizes of connected regions per mixelgram. Nonzero mixels i and j are said to per mixelgram. Nonzero mixels i and j are said to be connected when j is one of the eight-neighbors be connected when j is one of the eight-neighbors of i (edge mixels have fewer neighbors), and of i (edge mixels have fewer neighbors), and max M ( i ) M ( j ) , M ( j ) Threshold M ( i ) applies where M(mixel) is the value of the mixel , and applies where M(mixel) is the value of the mixel , and Threshold = 1.125. Threshold = 1.125. Connected regions are the spatial extent of pairs of Connected regions are the spatial extent of pairs of mixels that are connected. mixels that are connected. 25 August 2004 25 August 2004 http://www.cprince.com/PubRes/EpiRob04 http://www.cprince.com/PubRes/EpiRob04 13 13
Edge Detection Method Edge Detection Method Another synchrony estimation method uses Another synchrony estimation method uses general-purpose image processing general-purpose image processing Relies on a similar observation to that of Relies on a similar observation to that of connected region analysis connected region analysis With mixelgram M , With mixelgram M , h w Sobel 3 3 ( Gaussian 15 15 ( M )) i 1 Generally better results than with connected Generally better results than with connected region analysis region analysis 25 August 2004 25 August 2004 http://www.cprince.com/PubRes/EpiRob04 http://www.cprince.com/PubRes/EpiRob04 14 14
Recommend
More recommend