M4 Brno, video processing Pavel Zem č ík, Stanislav Sumec, Igor Potú č ek, Kteish Abu Ibrahim, Martin Dobsik (currently Helsinki) Msc. Students
Goals ● To develop (an automatic) video editing tool ● To provide a video taging tool (currently XML) ● To track and interpret the participants' motion ● To extract features to improve audio processing ● To capture and interpret speaker's gestures ● To create a simple (scaled-down) version of the audio/video meeting recording equipment
Video editing tool ● Simple „video class“ that allows for simple software controlled editing of video based eg. on the XML tags – ready and documented (Windows only so far but the interface is portable) ● Research in development of aesthetic metrics for video to allow automatic editing starts in conjunction with MU Brno and FAVU VUT Brno (the work follows the T. Staudek's Ph.D. work in static image aesthetics published at SIGGRAPH) ● Future extension for camera navigation (no HW)
Video tagging tool ● Video tagging tool available and documented (currently uses XML directly but can be adopted for a data base) ● Tag definitions ● Tag assignment by hotkeys ● Tag additional texts ● XML output
Participants' tracking ● Robust identification of the „starting point“ - colour based; problems – changing camera setup, bad exposure, changing colour statistics ● Tracking of once detected body part – not finished, need more modelling of body parts motion, need better identification of body parts ● With omnidirectional images we should be finally able to reliably count paticipants and identify their body parts
Features to improve audio processing ● Simple but most possibly useful features: � identification and position of body parts (currently we know the position, id manually) � motion of the body parts (centre, bouding rect) � mouse/face movement under development, lips tracking hardly possible because of the resolution (need robust and precise face positioning)
Speakers' gestures ● Models of articulated structures and gesture interpretation partly done (previous projects) ● The implementation requires robust feature extraction (not yet done) ● Applications will be the “voting detection and recognition”, „recording equipment control“, and possibly „emotion detection“
Simplified video/audio capture ● Scaled-down recording equipment built from off- the-shelf goods (simple and feasible), the motivation also is to capture some „Enlish with accent“ data ● Single camera with hyperbolic mirror probably suits the needs, at least with HDTV resolution ● The set should consis of a single notebook, camera (DV or HDTV) with lenses and hyperbolic mirror, and two stereo audio devices (two USB Sound Blasters)
Hyperbolic mirror
Recommend
More recommend