LipNet End-to-End Sentence-level Lipreading Yannis Assael, Brendan - PowerPoint PPT Presentation

LipNet End-to-End Sentence-level Lipreading Yannis Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas NVIDIA GTC San Jose 2017

Outline 1. Introduction 2. Background 3. LipNet 4. Analysis

1. Introduction How easy do you think lipreading is? • McGurk effect (McGurk & MacDonald, 1976) • Phonemes and Visemes (Fisher, 1968) • Human lipreading performance is poor We can improve it… 3 /21

1. Introduction https://goo.gl/hyFBVQ 4 /21

1. Introduction Why is lipreading important? Among others: -Improved hearing aids -Speech recognition in noisy environments (e.g. cars) -Silent dictation in public spaces -Security -Biometric identification -Silent-movie processing 5 /21

1. Introduction https://goo.gl/RTXh9Q 6 /21

1. Introduction Automated lipreading • Most existing work does not employ deep learning • Heavy preprocessing • Open problems: • generalisation across speakers • extraction of motion features 7 /21

2. Background End-to-end supervised learning using NNs 1. Hierarchical, expressive, differentiable function Layer 1 Layer 2 Layer L predictive input distribution 1. Adjust parameters to maximise probability of data with gradient descent 8 /21

2. Background Convolutional Neural Networks • Model: Deep stacks of local operations. • Good for: relationships over space (2D) : deeplearning.net • Also good for time (1D) • Or in our case, space & time (3D) : every layer can model either or both. Lets the optimisation decide what's best. 9 /21

2. Background Recurrent Neural Networks • Model: carry information over time using a state • Good for: sequences • Often used to predict classes at each timestep • But what if inputs/outputs are unequal length, or aren't aligned? 10 /21

2. Background Recurrent Neural Networks • If inputs/outputs aren't aligned, CTC (Graves 2006) efficiently marginalises over all alignments • To do this, let the RNN output blanks or duplicates : • Sum over every way to output the same sequence: p( am ) = p(aam) + p(amm) + p(_am) + p(a_m) + p(am_) 11 /21

3. LipNet LipNet • Monosyllabic vs Compound words (Easton & Basala, 1982) • Spatiotemporal features • End-to-end, sentence-level • GRID corpus 33000 sentences 12 /21

3. LipNet GRID corpus 13 /21

3. LipNet Preprocessing • Facial Landmarks • Crop the mouth • Affine transform the frames • Smoothen using Kalman filter • Temporal augmentation 14 /21

3. LipNet Model Architecture 15 /21

3. LipNet Baselines • Hearing-Impaired People 3 students from the Oxford Students’ Disability Community • Baseline-LSTM Replicate previous state-of-the-art architecture by (Wand et al., 2016) • Baseline-2D Spatial-only convolutions • Baseline-NoLM Language model disabled 16 /21

3. LipNet Lipreading Performance Unseen Overlapped Speakers Speakers CER WER CER WER Hearing 47.7% Impaired Baseline- 38.4% 52.8% 15.2% 26.3% LSTM Baseline- 16.2% 26.7% 4.3% 11.6% 2D Baseline- 6.7% 13.6% 2.0% 5.6% NoLM 6.4% 11.4% 1.9% 4.8% LipNet 17 /21

4. Analysis Learned Representations 18 /21

4. Analysis Viseme Confusions 19 /21

Thank you!

Thank you NVIDIA! DGX-1

LipNet End-to-End Sentence-level Lipreading Yannis Assael, Brendan - PowerPoint PPT Presentation

LipNet End-to-End Sentence-level Lipreading Yannis Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas NVIDIA GTC San Jose 2017 Outline 1. Introduction 2. Background 3. LipNet 4. Analysis 1. Introduction How easy do you think

Justice Reinvestment in Arkansas 2nd Presentation to the Legislative Criminal Justice

Creating Success Through Creative Sentencing Matthew Buesing Assistant City Prosecutor III

The story since 2000 Towards more effective, fairer use of alternatives to prison July 2014

Using Parallel Features in Parsing of Machine-Translated Sentences for Correction of Grammatical

Natural Language Understanding Lecture 17: Entity-based Coherence Mirella Lapata School of

Career-Technical Education Program SLO Assessment Report Spring 2012 Elmida Baghdaserians, Child

Corporate Governance of Subsidiaries: Emerging Risks and Best Practices Unraveling the Subsidiary

Friends Group Project Update Federation Meeting June 8, 2017 Federation Agenda Objectives

Be the Prime: Joint Ventures from A to Z Presented by SEPTAs DBE Program Office and Cheyney

Third Quarter 2016 Earnings Release October 27, 2016 Forward-Looking Statements & Non-GAAP

A Focus on Title Xs Physical Separation Requirement Clare Coleman, NFPRHA Daryn Eikner,

LEGISLATION Norismizan binti Hj Ismail, Public Officers Law Seminar : Senior Counsel

BASIC PRINCIPLES THE BASIC PRINCIPLES LOCAL GOVERNMENT ACT 2000 A SEPARATION OF POWERS

REPUBLIC OF TURKEY REPUBLIC OF TURKEY MINISTRY OF FORESTRY AND WATER AFFAIRS GENERAL DIRECTORATE

The Rule of Law in the Islamic Legal System Irmgard Marboe University of Vienna Overview

THE CORNERST STONE NE OF COMMUNIT UNITY Y PARTICIPATION 28 July 2017 .. 1

Public Wor kshop on the Agr ic ultur e Se c tor to Infor m De ve lopme nt of the 2030 T

October 12, 2007 1. FERC Orders Appointment of Facilitator in PJM Settlement Case On October 5,

FEDERAL v. STATE APPROACHES FEDERAL (USA) STATE (Wyoming) Legislative Impasse

CUMBERLAND COUNTY BOARD OF COMMISSIONERS MAY 29, 2014 6:45 PM 117 DICK STREET, 1 ST FLOOR,

O R D E R PER G.S. PANNU, AM : The captioned appeal filed by the Revenue pertaining to Assessment

Fatal and Serious Injury Trend Update Terry Hopkins Chris Oliver January 9, 2015 North Carolina

Social Housing Fraud Act 2013 in practice Lucie Cocker Solicitor Higher Court Advocate

COUNTY COUNSEL Presenter: Stephen Dingle FY 20-21 Proposed Budget Presentation Department

LipNet End-to-End Sentence-level Lipreading Yannis Assael, Brendan - PowerPoint PPT Presentation

LipNet End-to-End Sentence-level Lipreading Yannis Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas NVIDIA GTC San Jose 2017 Outline 1. Introduction 2. Background 3. LipNet 4. Analysis 1. Introduction How easy do you think

Justice Reinvestment in Arkansas 2nd Presentation to the Legislative Criminal Justice

Creating Success Through Creative Sentencing Matthew Buesing Assistant City Prosecutor III

The story since 2000 Towards more effective, fairer use of alternatives to prison July 2014

Using Parallel Features in Parsing of Machine-Translated Sentences for Correction of Grammatical

Natural Language Understanding Lecture 17: Entity-based Coherence Mirella Lapata School of

Career-Technical Education Program SLO Assessment Report Spring 2012 Elmida Baghdaserians, Child

Corporate Governance of Subsidiaries: Emerging Risks and Best Practices Unraveling the Subsidiary

Friends Group Project Update Federation Meeting June 8, 2017 Federation Agenda Objectives

Be the Prime: Joint Ventures from A to Z Presented by SEPTAs DBE Program Office and Cheyney

Third Quarter 2016 Earnings Release October 27, 2016 Forward-Looking Statements &amp; Non-GAAP

A Focus on Title Xs Physical Separation Requirement Clare Coleman, NFPRHA Daryn Eikner,

LEGISLATION Norismizan binti Hj Ismail, Public Officers Law Seminar : Senior Counsel

BASIC PRINCIPLES THE BASIC PRINCIPLES LOCAL GOVERNMENT ACT 2000 A SEPARATION OF POWERS

REPUBLIC OF TURKEY REPUBLIC OF TURKEY MINISTRY OF FORESTRY AND WATER AFFAIRS GENERAL DIRECTORATE

The Rule of Law in the Islamic Legal System Irmgard Marboe University of Vienna Overview

THE CORNERST STONE NE OF COMMUNIT UNITY Y PARTICIPATION 28 July 2017 .. 1

Public Wor kshop on the Agr ic ultur e Se c tor to Infor m De ve lopme nt of the 2030 T

October 12, 2007 1. FERC Orders Appointment of Facilitator in PJM Settlement Case On October 5,

FEDERAL v. STATE APPROACHES FEDERAL (USA) STATE (Wyoming) Legislative Impasse

CUMBERLAND COUNTY BOARD OF COMMISSIONERS MAY 29, 2014 6:45 PM 117 DICK STREET, 1 ST FLOOR,

O R D E R PER G.S. PANNU, AM : The captioned appeal filed by the Revenue pertaining to Assessment

Fatal and Serious Injury Trend Update Terry Hopkins Chris Oliver January 9, 2015 North Carolina

Social Housing Fraud Act 2013 in practice Lucie Cocker Solicitor Higher Court Advocate

COUNTY COUNSEL Presenter: Stephen Dingle FY 20-21 Proposed Budget Presentation Department

Third Quarter 2016 Earnings Release October 27, 2016 Forward-Looking Statements & Non-GAAP