First experiments in audio/video features for phoneme recognition - PowerPoint PPT Presentation

Nov 22, 2022 •291 likes •360 views

First experiments in audio/video features for phoneme recognition Petr Motl cek FIT VUT Brno, motlicek@fit.vutbr.cz M4 meeting in Prague, January 22nd - 23rd 2004 Introduction Data: M4 IDIAP, 41 min., audio-video data (training,

First experiments in audio/video features for phoneme recognition Petr Motl´ ıˇ cek FIT VUT Brno, motlicek@fit.vutbr.cz M4 meeting in Prague, January 22nd - 23rd 2004
Introduction • Data: M4 – IDIAP, 41 min., audio-video data (training, testing). • Labels: 47 phoneme categories, obtained by forced alignment (models on ICSI data, adapted on M4 data). • Audio: Beam-formed recordings, 16kHz. • Video: Cut off head positions.
Audio preprocessing • F s = 16kHz, frame-rate 100Hz, 20ms long frames of MFB log energies. Video preprocessing • Frame-rate 25Hz, RGB frames 70x70 points.
Bimodal speech recognition system Audio signal Acoustic features Acoustic parameterization (16kHz) (23 dim., 100Hz) −−−> time Acoustic−visual features Feature Neural Net fusion (39ddim., 100Hz) 10 20 30 Visual signal Visual features Visual features Visual 40 50 Interpolation 60 Recognition results parameterization 70 (25Hz) (16 dim., 25Hz) (16 dim., 25Hz) 80 90 100 110 10 20 30 40 50 60 70 Gray Edge 2D−cross Resize scalization calculation correlation Maximum Square cropping 2D − DCT LPF
Recognition results - Accuracy Acoustic [%] Visual [%] Acoustic-Visual [%] Phonemes 31.05 12.15 31.33 VAD 94.04 83.79 94.12 - 0 (83%) 96.86 99.62 96.89 - 1 (17%) 79.32 1.44 79.71
Problems & Current focus • More data for acoustic-visual experiments. • Incorporation of robust mouth detection algorithm. • Compensation algorithms to reduce lighting variations, rotation, . . . . • LDA - to reduce dimensionality and improve discrimination among the speech classes.

Recommend

Create PowerPoint Audio and Video V0B August 2020 V0B V0B Schield: 2020 PPTX Create Audio-Video

Create PowerPoint Audio and Video V0B August 2020 V0B V0B Schield: 2020 PPTX Create Audio-Video 1 Schield: 2020 PPTX Create Audio-Video 2 PowerPoint: Record Audio: Create Audio and Video Overview by There are two ways to record audio in

171 views • 15 slides

More on Speech More on Speech Perception Perception Phoneme Phoneme Discrimination

LIGN 171: Child Language Acquisition http://ling.ucsd.edu/courses/lign171 http://ling.ucsd.edu/courses/lign171 LIGN 171: Child Language Acquisition More on Speech More on Speech Perception Perception Phoneme Phoneme Discrimination

827 views • 37 slides

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio API Spec Editor (Audio WG) Proposal: Audio Device Client Work-in-progress! (in Audio CG) Proposal: Audio Device Client Low-level audio I/O

454 views • 27 slides

Audio Indexing and Retrieval IT6902; Semester B, 2004/2005; Leung Audio Indexing and Retrieval

Audio Indexing and Retrieval IT6902; Semester B, 2004/2005; Leung Audio Indexing and Retrieval Motivation Main Audio Features Audio Classification Speech Recognition Music Retrieval Using Audio Features for Video

590 views • 30 slides

COMPANY PROFILE WATER FEATURES 1 WATER FEATURES 2 WATER FEATURES 3 WATER FEATURES 4 WATER

COMPANY PROFILE WATER FEATURES 1 WATER FEATURES 2 WATER FEATURES 3 WATER FEATURES 4 WATER FEATURES 5 WATER FEATURES 6 WATER FEATURES 7 EXCLUSIVE POOLS 8 EXCLUSIVE POOLS 9 EXCLUSIVE POOLS 10 EXCLUSIVE POOLS 11 OVERFLOW 12

962 views • 40 slides

BASICS ON DIGITAL BASICS ON DIGITAL AUDIO AND VIDEO AUDIO AND VIDEO REPRESENTATION

BASICS ON DIGITAL BASICS ON DIGITAL AUDIO AND VIDEO AUDIO AND VIDEO REPRESENTATION REPRESENTATION Fernando Pereira Fernando Pereira Instituto Superior Tcnico Instituto Superior Tcnico Audio and Video Communication, Fernando Pereira,

1.07k views • 105 slides

Audio, Video, Film Pathway Skills: High School Credits: 3 CTAE Pathway Credits Students will

Audio, Video, Film Pathway Skills: High School Credits: 3 CTAE Pathway Credits Students will learn Adobe CC to Audio, Video, and Film I include: Premiere and After Audio, Video, and Film II Effects. Audio, Video, and Film III 1 English

520 views • 5 slides

Beyond Audio & Video Beyond Audio & Video Beyond Audio & Video the Internet was the

University of North Carolina at University of North Carolina at University of North Carolina at Multimedia Networking Multimedia Networking Chapel Hill Beyond Audio and Video Beyond Audio and Video Chapel Hill Chapel Hill Support for

508 views • 7 slides

The PRONALSYL Letter-to-Phoneme Challenge Bob Damper and Yannick Marchand University

The PRONALSYL Letter-to-Phoneme Challenge Bob Damper and Yannick Marchand University of Southampton, UK Institute for Biodiagnostics (Atlantic), Canada PASCAL Workshop, Venice, Italy 11 April 2006 The PRONALSYL Letter-to-Phoneme

703 views • 32 slides

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made was in the early 1940-1950s. Also the most popular video game back then was Cathode Ray Tube. Video Game Research. Video Games are sometimes

419 views • 11 slides

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player A/V Receiver w ith MP3/WMA Flash & HDD based DVD Receiver MP3/WMA Players Home Audio driven by Digital Multi Channel Surround Sound Multi

366 views • 20 slides

Audio and Speech August 13, 2001 Audio 2 Digital sound anti-aliasing amplifier codec filter

Audio 1 Audio and Speech August 13, 2001 Audio 2 Digital sound anti-aliasing amplifier codec filter A packet- G.7xx ization D 1mV A G.7xx D August 13, 2001 Audio 3 Digital audio sample each audio channel and quantize

478 views • 21 slides

Game Audio Coding vs. Aesthetics Leonard Paul of Lotus Audio Vancouver, Canada Game Audio :

Game Audio Coding vs. Aesthetics Leonard Paul of Lotus Audio Vancouver, Canada Game Audio : Coding vs. Aesthetics Leonard Paul Lotus Audio GDC 2003 Game Audio : Coding vs. Aesthetics Code Content Coder Composer vs ? Technology

611 views • 37 slides

www.baconcoach.com/learn Video & Audio Multi Media Content www.baconcoach.com/learn Video

TRAINING - Video & Audio www.baconcoach.com/learn Video & Audio Multi Media Content www.baconcoach.com/learn Video Video is an electronic medium for the recording, copying, playback, broadcasting, and display of moving visual

683 views • 46 slides

091031 091031 VIDEO SIGNALS VIDEO SIGNALS Lecturer: Marco Marcon 091032 - AUDIO AND VIDEO

091031 091031 VIDEO SIGNALS VIDEO SIGNALS Lecturer: Marco Marcon 091032 - AUDIO AND VIDEO SIGNALS 091031 - Video signals Fall 2012-2013 INSTRUCTOR DETAILS INSTRUCTOR DETAILS e-mail: marcon@elet.polimi.it e mail:

465 views • 20 slides

Phoneme state posteriorgram features for speech based automatic classification of speakers in cold

Phoneme state posteriorgram features for speech based automatic classification of speakers in cold and healthy condition Akshay Kalkunte Suresh, Srinivasa Raghavan K M, Dr. Prasanta Kumar Ghosh SPIRE LAB Electrical Engineering, Indian

465 views • 44 slides

Collecting and evaluating speech recognition corpora for nine Southern Bantu languages Jaco

Collecting and evaluating speech recognition corpora for nine Southern Bantu languages Jaco Badenhorst, Charl van Heerden, Marelie Davel and Etienne Barnard March 31, 2009 Introduction ASR corpus design Project Lwazi Computational analysis

400 views • 20 slides

E9 205 Machine Learning for Signal Processing Understanding Deep Networks 08-11-2019 Instructor

E9 205 Machine Learning for Signal Processing Understanding Deep Networks 08-11-2019 Instructor - Sriram Ganapathy (sriramg@iisc.ac.in.in) Understanding Deep Networks Understanding Deep Networks Understanding Deep Networks SVHN dataset

366 views • 23 slides

Programmable Shaders December 25, 2006 RenderMan & Its Shading Language Key Idea of a

Programmable Shaders December 25, 2006 RenderMan & Its Shading Language Key Idea of a Shading Language Image synthesis can be divided into two basic concerns Shape: Geometric Objects, Coordinates, Transformations, Hidden-Surface

151 views • 13 slides

Game Graphics & Real-time Rendering CMPM 163, S2019 Prof. Angus Forbes (instructor)

Game Graphics & Real-time Rendering CMPM 163, S2019 Prof. Angus Forbes (instructor) angus@ucsc.edu Manu Thomas (TA) David Abramov (TA) mthomas6@ucsc.edu dabramov@ucsc.edu Website: creativecoding.soe.ucsc.edu/courses/cmpm163_s19 Slack:

470 views • 14 slides

DIFFERENCE TEACHERS MAKE THE ROLES Steve Underwood, Ed.D. Marybeth Flachbart, Ed.D. Education

DIFFERENCE TEACHERS MAKE THE ROLES Steve Underwood, Ed.D. Marybeth Flachbart, Ed.D. Education Northwest | Manager in Neuhaus Education Center, the Center for Strengthening President/CEO Education Systems mflachbart@neuhaus.org

627 views • 33 slides

There are many little ways to enlarge your childs world. Love of books is the best of all.

There are many little ways to enlarge your childs world. Love of books is the best of all. Jacqu queline Kennedy dy Onassis The new National Curriculum for English, places reading for pleasure at the heart of the curriculum . Ou

592 views • 21 slides

Welcome to CS11 Week 1 - Day 1 What does this class teach? THEORY APPLICATION COMMUNICATION

Welcome to CS11 Week 1 - Day 1 What does this class teach? THEORY APPLICATION COMMUNICATION This is a participation heavy class What makes reality feel real? Some elements that contribute - Vision: - Shadows - Depth cues -

373 views • 6 slides

Lecture 09: Shaders (Part 1) CSE 40166 Computer Graphics Peter Bui University of Notre Dame, IN,

Lecture 09: Shaders (Part 1) CSE 40166 Computer Graphics Peter Bui University of Notre Dame, IN, USA November 9, 2010 OpenGL Rendering Pipeline OpenGL Rendering Pipeline (Pseudo-Code) 1 f o r gl_Vertex in GL_VERTICES : 2 // Process v

433 views • 24 slides