m4 brno video processing
play

M4 Brno, video processing Pavel Zem k, Stanislav Sumec, Igor Pot - PowerPoint PPT Presentation

M4 Brno, video processing Pavel Zem k, Stanislav Sumec, Igor Pot ek, Kteish Abu Ibrahim, Martin Dobsik (currently Helsinki) Msc. Students Goals To develop (an automatic) video editing tool To provide a video taging tool


  1. M4 Brno, video processing Pavel Zem č ík, Stanislav Sumec, Igor Potú č ek, Kteish Abu Ibrahim, Martin Dobsik (currently Helsinki) Msc. Students

  2. Goals ● To develop (an automatic) video editing tool ● To provide a video taging tool (currently XML) ● To track and interpret the participants' motion ● To extract features to improve audio processing ● To capture and interpret speaker's gestures ● To create a simple (scaled-down) version of the audio/video meeting recording equipment

  3. Video editing tool ● Simple „video class“ that allows for simple software controlled editing of video based eg. on the XML tags – ready and documented (Windows only so far but the interface is portable) ● Research in development of aesthetic metrics for video to allow automatic editing starts in conjunction with MU Brno and FAVU VUT Brno (the work follows the T. Staudek's Ph.D. work in static image aesthetics published at SIGGRAPH) ● Future extension for camera navigation (no HW)

  4. Video tagging tool ● Video tagging tool available and documented (currently uses XML directly but can be adopted for a data base) ● Tag definitions ● Tag assignment by hotkeys ● Tag additional texts ● XML output

  5. Participants' tracking ● Robust identification of the „starting point“ - colour based; problems – changing camera setup, bad exposure, changing colour statistics ● Tracking of once detected body part – not finished, need more modelling of body parts motion, need better identification of body parts ● With omnidirectional images we should be finally able to reliably count paticipants and identify their body parts

  6. Features to improve audio processing ● Simple but most possibly useful features: � identification and position of body parts (currently we know the position, id manually) � motion of the body parts (centre, bouding rect) � mouse/face movement under development, lips tracking hardly possible because of the resolution (need robust and precise face positioning)

  7. Speakers' gestures ● Models of articulated structures and gesture interpretation partly done (previous projects) ● The implementation requires robust feature extraction (not yet done) ● Applications will be the “voting detection and recognition”, „recording equipment control“, and possibly „emotion detection“

  8. Simplified video/audio capture ● Scaled-down recording equipment built from off- the-shelf goods (simple and feasible), the motivation also is to capture some „Enlish with accent“ data ● Single camera with hyperbolic mirror probably suits the needs, at least with HDTV resolution ● The set should consis of a single notebook, camera (DV or HDTV) with lenses and hyperbolic mirror, and two stereo audio devices (two USB Sound Blasters)

  9. Hyperbolic mirror

Recommend


More recommend