Narrative Theme Navigation for Sitcoms Supported by Fan-generated - PowerPoint PPT Presentation

Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts Gerald Friedland, Luke Gottlieb, Adam Janin International Computer Science Institute (ICSI) Presented by: Katya Gonina

What? Novel method to generate indexing information for the navigation of TV content

Why?  Lots of different ways to watch videos  DVD, Blu-ray  On-demand  Internet  Lots of videos out there!  Need better ways to navigate content  Show a particular scene  Show where a favorite actor talks  Support random seek into videos

Example: Sitcoms  Specifically “Seinfeld”  Strict set of rules  Every scene transition is marked by music  Every punchline marked by artificial laughter  Video: http://www.youtube.com/watch?v=PaPxSsK6ZQA

Outline Original Joke-O-Mat (2009) 1. System setup - Evaluation - Limitations - Enhanced version (2010) 2. System setup - Evaluation - Future Work 3.

Joke-O-Mat  Original system (2008-2009)  Ability to navigate basic narrative elements:  Scenes  Punchlines  Dialog segments  Per-actor filter  Ability to skip certain parts  “Surf” the episode "Using Artistic Markers and Speaker Identification for Narrative-Theme Navigation of Seinfeld Episodes” G. Friedland, L. Gottlieb, and A. Janin Proceedings of the 11th IEEE International Symposium on Multimedia (ISM2009), San Diego, California, pp. 511-516

Joke-O-Mat  Two main elements: Pre-processing step 1. Online video browser: 2.

Joke-O-Mat  Two main elements: Pre-processing and analysis step 1.

Acoustic Event & Speaker Identification  Goal: Train GMMs for different audio events  Jerry, Kramer, Elaine, George  Male & female supporting actor  Laughter  Music  Non-speech (i.e. other noises)  Use 1-minute audio sample  Compute 19-dim MFCCs  Train 20-component GMMs

Audio Segmentation  Given the trained GMMs  2.5 sec * 10ms = 250 frames  Compute likelihood for each set of features for each GMM  Use majority vote to classify to either speakers or laughter/ music/non-speech

Narrative Theme Analysis  Transforms acoustic event segmentation and speaker detection into narrative theme segments  Rule-based system:  Dialog = single contiguous speech segment  Punchline = dialog + laughter  Top-5 punchlines = 5 punchlines followed by the longest laughter  Scene = segment of at least 10 sec between two music events

Narrative Theme Analysis  Creates icons for the GUI  Sitcom rules: actor has to be shown once a certain speaking time is exceeded  Median frame of the longest speech segment for each actor  Could use a visual approach here..  Use median frame for other events (scene, punchlines, dialog)

Online Video Browser  Shows video  Allows for play/pause, seeking to random positions  Navigational panel allows to browse directly to:  Scene  Punchline  Top-5 punchlines  Dialog element  Select/deselect actors  http://www.icsi.berkeley.edu/jokeomat/HD/auto/ index.html

Evaluation Phase Performance For 25min Episode Training 30% real-time 2.7min Classification 10% real-time 2.5min Narrative Theme 10% real-time 2.5min Analysis Total 7.7min  Diarization Error Rate (DER) = 46%  5% per class  Winner of the ACM Multimedia Grand Challenge 2009

Limitations of the original Joke-O-Mat  Requires manual training of speaker models  Requires 60 seconds of training data for each speaker  Cannot support actors with minor roles  Does not take into account what was said

Extended System  Enhanced Joke-O-Mat (2010)  + Speech Recognition  Keyword search  Automatic alignment of speaker ID and ASR with:  Fan-generated scripts  Closed captions  Significantly reduces manual intervention

New Joke-O-Mat System

Context-Augmentation  Producing transcripts can be costly  Luckily we have the Internet!  Scripts and closed captions produced by fans

Fan-generated data  Fan-sourced scripts  Tend to be very accurate  However, don’t contain any time information  Closed captions  Contain time information  However, do not contain speaker attribution  Less accurate, often intentionally altered  Normalize and merge them together…

Fan-generated data  Normalize the scripts and the closed captions  Then, use minimum edit distance to align two sources  Start & End words in script = Start & End words in caption  Use timing from the closed caption, speaker from the script  If one speaker = single-single speaker segment  If multiple speakers = multi-speaker segment (37.3%)

Forced Alignment + = Transcript Audio Alignment  Generate detailed timing information for each word  Perform all steps of a speech recognizer on the audio  But, instead of using a language model, use only the transcript sequence of words  Also does speaker adaptation over segments  Will be more accurate on speaker-homogeneous segments

Forced Alignment  Run forced alignment on each segment  For 10 episodes tested – 90% of the segments aligned at the first step  Start time & end time of each word  Speaker attribution

Forced Alignment  Pool segments for each speaker and train speaker models  + train a garbage model  On audio that falls between the segments  Assume that contain only laughter, music, and other non-speech

Forced Alignment  For the failed single-speaker segments:  Still use segment start and end time  Don’t have a way to index exact temporal location of each word  For each failed multi-speaker segment:  Generate a HMM alternating:  Speaker states  Garbage states

Forced Alignment  For each time step, advance an arc and collect probability  Ex: if move across “Patrice” arc, invoke “Patrice” speaker model at that time step  Segmentation = most probable path through the HMM  Garbage model allows for arbitrary noise between speakers  Minimum duration for each speaker  In reality, system was not sensitive the the duration

Forced Alignment  Multi-speaker segments => many single-speaker segments  Run the forced alignment with ASR again

Music & Laughter Segmentation  Laughter decoded using Shout speech/nonspeech decoder  Music models are trained separately (same as the original Joke-O-Mat)

Putting it all together http://www.icsi.berkeley.edu/jokeomat/HD/auto/ index.html

Evaluation  Compare to expert-annotated ground truth DER 1.  False alarms: closed captions spanning multiple dialog segments  Missed speech: truncation of words in forced alignment

Evaluation  Compare to expert-annotated ground truth 2. User Study  25 participants  Randomly showed expert- and fan-annotated episodes  Asked to state preference

Limitations & Future Work  Laughter and scene transition music – manually trained  Require scripts and closed captions  Available from show producers  Failed single-speaker segments – how to handle?  Retrain speaker models  HMM for the whole episode  Look at other genres (dramas, soap operas, lectures?)  New rules  Add visual data

Thanks!

Narrative Theme Navigation for Sitcoms Supported by Fan-generated - PowerPoint PPT Presentation

Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts Gerald Friedland, Luke Gottlieb, Adam Janin International Computer Science Institute (ICSI) Presented by: Katya Gonina What? Novel method to generate indexing information

What is narrative change? What is narrative? Narrative is everywhere and its

Navigation, Gravitation and Navigation, Gravitation and Navigation, Gravitation and Navigation,

#prep X Assembly 02: Left Fan In this guide, we attach Left filament fan to the X carriage.

Cthulus Clutches Lovecraftian Horror Theme Storyboard Implementation Theme Storyboard

Narrative Bible Teaching What is it? Principles of NBT Narrative teaching works Active

Haptic Navigation in Mobile Contexts Agenda What is Haptic Navigation? Advantages of

React Native Navigation Screens, moving, parameters React Navigation React Navigation is not

React Native Navigation: Tabs 1 Tab Navigation the most common style of navigation in

Fan Design MSYS4480 Fan Definition ASHRAE A fan is an air pump that creates a pressure

Fully Automated Nagios Cdric TEMPLE 1 RMLL 2009 Presentation outline Introducing FAN

BASICS Premium Theme Install, Set Up and Features Add Customizr Theme THE FOLLOWING RESOURCE

Theme Feature Menu What Is Theme? Universal Themes Finding the Theme Making a Judgment

OFF COURSE: OFF COURSE: and narrative and narrative a creative exploration of cartography,

Narrative in web and video Narrative in web and video Andrew Stanton The clues to a great

Make Academic Content Exciting with Narrative Games Amos Levy Kay Bey Teaching Artist Teaching

NARRATIVE & REDACTIONAL THEMES IN MATTHEW Two Critical Approaches Returning to Narrative

Integrated Resource Plan Integrated Resource Plan Rick Haener September 4th, 2015 Integrated

Petition R133-10 Waters In The Snake River Basin NAC 445A.214 through NAC 445A.225 John

Sustainability Briefing for Investors Toronto | June 6, 2018 2018 Sustainability

THIRD QUARTER 2019 EARNINGS CONFERENCE CALL November 7, 2019 Placeholder image RESPONSIBLE.

CONGRESS PARK UPDATE James Robinette- Principal Sunny Sue Chang Jonas- Assistant Principal NEW

P1 P1 P1 P1 presentation presentation presentation presentation Westminster Abbey, 1695 a

Board Update - June 2020 Los Gatos HS School Goals for Improvement 1. Continue to develop and

Brunswick School Board 5-8 Presentation May 7, 2014 5-8 PRESENTATION A REVIEW OF BEST PRACTICES,

Narrative Theme Navigation for Sitcoms Supported by Fan-generated - PowerPoint PPT Presentation

Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts Gerald Friedland, Luke Gottlieb, Adam Janin International Computer Science Institute (ICSI) Presented by: Katya Gonina What? Novel method to generate indexing information

What is narrative change? What is narrative? Narrative is everywhere and its

Navigation, Gravitation and Navigation, Gravitation and Navigation, Gravitation and Navigation,

#prep X Assembly 02: Left Fan In this guide, we attach Left filament fan to the X carriage.

Cthulus Clutches Lovecraftian Horror Theme Storyboard Implementation Theme Storyboard

Narrative Bible Teaching What is it? Principles of NBT Narrative teaching works Active

Haptic Navigation in Mobile Contexts Agenda What is Haptic Navigation? Advantages of

React Native Navigation Screens, moving, parameters React Navigation React Navigation is not

React Native Navigation: Tabs 1 Tab Navigation the most common style of navigation in

Fan Design MSYS4480 Fan Definition ASHRAE A fan is an air pump that creates a pressure

Fully Automated Nagios Cdric TEMPLE 1 RMLL 2009 Presentation outline Introducing FAN

BASICS Premium Theme Install, Set Up and Features Add Customizr Theme THE FOLLOWING RESOURCE

Theme Feature Menu What Is Theme? Universal Themes Finding the Theme Making a Judgment

OFF COURSE: OFF COURSE: and narrative and narrative a creative exploration of cartography,

Narrative in web and video Narrative in web and video Andrew Stanton The clues to a great

Make Academic Content Exciting with Narrative Games Amos Levy Kay Bey Teaching Artist Teaching

NARRATIVE &amp; REDACTIONAL THEMES IN MATTHEW Two Critical Approaches Returning to Narrative

Integrated Resource Plan Integrated Resource Plan Rick Haener September 4th, 2015 Integrated

Petition R133-10 Waters In The Snake River Basin NAC 445A.214 through NAC 445A.225 John

Sustainability Briefing for Investors Toronto | June 6, 2018 2018 Sustainability

THIRD QUARTER 2019 EARNINGS CONFERENCE CALL November 7, 2019 Placeholder image RESPONSIBLE.

CONGRESS PARK UPDATE James Robinette- Principal Sunny Sue Chang Jonas- Assistant Principal NEW

P1 P1 P1 P1 presentation presentation presentation presentation Westminster Abbey, 1695 a

Board Update - June 2020 Los Gatos HS School Goals for Improvement 1. Continue to develop and

Brunswick School Board 5-8 Presentation May 7, 2014 5-8 PRESENTATION A REVIEW OF BEST PRACTICES,

NARRATIVE & REDACTIONAL THEMES IN MATTHEW Two Critical Approaches Returning to Narrative