Visual Language Perception from Videos MOHIT GUPTA ADVISOR: - PowerPoint PPT Presentation

Visual Language Perception from Videos MOHIT GUPTA ADVISOR: AMITABHA MUKERJEE

Introduction and Motivation  Human’s process and store what they perceive in a highly abstracted, condensed format  For e.g. …  Computers on the other hand are much less efficient in this department  Possibilities if computers could condense perception  Significant dip in information size (less memory requirement)  ‘show me who is the villain in this movie and when does he enter’ will become a valid question for a computer  Absolutely new; no similar work has been done

Methodology  Scene Segmentation  Using change in histogram method  Heuristic for start or end of speech-silence boundary  Strong heuristic for change in speaker  Sound Segmentation  Classifying voice, silence and miscellaneous (music, audience laughing etc.)  Threshold-ing energy of signal, zero-crossing rate, pitch detection by Yin algorithm  Diarization of voices (separating voices of different speakers)  Voice features like MFCCs are most significant for speaker recognition  Associating faces with speech  Detect faces in frames containing speech using Haar-based features  Tag face with the speech stream for a speaker based on majority-first approach

Methodology  Sound Segmentation  Classifying voice, silence and miscellaneous (music, audience laughing etc.)  Threshold-ing energy of signal, zero-crossing rate, pitch detection

Methodology  Associating faces with speech  Detect faces in frames containing speech  Using acquired speech boundaries and detecting faces in each segment

Subtitles and speech  The pitch plot also separates words with high recall but low precision  Subtitle alignment in small-error domain successfully achieved by maximizing the common pitch-subtitle boundaries

Applications  Surround Sound Effects  Using the knowledge of who is speaking in a frame and the location of his face  Background sounds separated from speech and attenuated to get more vocals  Information abstraction and retrieval  Efficiency in memory usage  Model voice, face and scene; use text to produce speech and video on the fly  Asking the computer to seek the video to the instance the villain is first seen

References [1] Tran, Luan, et al. "Pitch reduced patterns relative to photolithography features." U.S. Patent No. 7,253,118. 7 Aug. 2007. [2] Swe, Ei Mon Mon, and Moe Pwint. "An Efficient Approach for Classification of Speech and Music." Advances in Multimedia Information Processing-PCM 2008 . Springer Berlin Heidelberg, 2008. 50-60. [3] Cotton, Courtenay. "A Three-Feature Speech/Music Classification System." (2006). [4] Shah, Sejal, and Archana Bhise. "Fast Speaker Recognition using Efficient Feature Extraction Technique." International Journal of Computer Science 2. [5] Hossen, Abdulnasir, and Said Al-Rawahi. "A Text – Independent Speaker Identification System Based on the Zak Transform." Signal Processing an International Journal (SPIJ) 4.2: 68. [6] Zhao, Xianyu, et al. "SVM-based speaker verification by location in the space of reference speakers." Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on . Vol. 4. IEEE, 2007.

Visual Language Perception from Videos MOHIT GUPTA ADVISOR: - PowerPoint PPT Presentation

Visual Language Perception from Videos MOHIT GUPTA ADVISOR: AMITABHA MUKERJEE Introduction and Motivation Humans process and store what they perceive in a highly abstracted, condensed format For e.g. Computers on the other

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

Recap by Milo Davies, SAS NZ POWERFUL ADAPTIVE OPEN UNIFIED SAS Visual Analytics SAS Visual

Visual Analytics Visual Analytics is the science of analytical reasoning supported by interactive

Visual Perception human perception display devices 1 CS 349 - Visual Perception Reference

A Corpus of Natural Language for Visual Reasoning Cornell Natural Language Visual Reasoning

KEEPING UP WITH DATA:SMART CITIES IN 3D A new language: VISUAL VISUAL THINKING THINKING

Interior Design Visual Presentation Mitton Maureen Interior Design Visual Presentation Mitton

VISUAL LIBRARY THE VISUAL LIBRARY CONTACT URL: https://visuals.newzealand.com Contact: Jodi

Analysing the Cognitive Effectiveness of the UCM Visual Notation of the UCM Visual Notation

Machine visual perception Cordelia Schmid INRIA Grenoble Machine visual perception

Efficient visual search of local features Efficient visual search of local features Cordelia

Studying the visual system (1) Early Vision and The visual system can be (and is) studied using

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

1. Project Plan Factsheet Date of report: June 17, 2016 Revisions/comments: Authors: Fred

PXD6 matrix pretests Jelena Ninkovic for the HLL team PXD6 Production Status Production was

Contiguous star-forming features in the outskirts of early-type galaxies Jean Michel Gomes

12/30/10 Ab initio spectral simulations The n H T u plane Detailed microphysics

Absorption Electromagnetic waves and interactions; spectroscopy Definition of absorption;

Photomultiplier Tube Testing for the MiniBooNE Experiment B. T. Fleming, L. Bugel, E. Hawker, S.

Silicon Photomultiplier tests in LN, LAr Janicsk o J ozsef, Aghaei Khozani Hossein March

Application of novel semi-conductor based photo-detectors to PET Martin Gttlich DESY (1)

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us