Discriminative Keyword Spotting Joseph Keshet, The Hebrew - PowerPoint PPT Presentation

Discriminative Keyword Spotting Joseph Keshet, The Hebrew University David Grangier, IDIAP Research Institute Samy Bengio , Google Inc. Joseph Keshet, The Hebrew University

Outline • Problem Definition • Keyword Spotting with HMMs • Discriminative Keyword Spotting – derivation – analysis – feature functions • Experimental Results Joseph Keshet, The Hebrew University

Problem Definition Goal: find a keyword in a speech signal h iy z bcl b ao t ix tcl he's bought it Joseph Keshet, The Hebrew University

Problem Definition Notation: alignment sequence ¯ s s 1 s 2 s 3 s 4 e 4 keyword phoneme sequence bcl b ao t ¯ p keyword bought k x = ( x 1 , x 2 , x 3 , ¯ x T ) . . . acoustic feature vectors Joseph Keshet, The Hebrew University

Problem Definition predicted speech decision signal Keyword detection (yes/no) ¯ x Spotter s ′ p = /b ao t/ ¯ ¯ f (¯ x , ¯ p ) predicted keyword alignment (phoneme sequence) Joseph Keshet, The Hebrew University

Fat is Good The performance of a keyword spotting system is measured by a Receiver Operating Characteristics (ROC) curve. true positive = detected utterances with keywords total utterances with keywords false positive = detected utterances without keywords total utterances without keywords Joseph Keshet, The Hebrew University

Fat is Good The performance of a keyword spotting system is measured by a Receiver Operating Characteristics (ROC) curve. true positive = detected utterances with keywords true positive rate total utterances with keywords area under curve A false positive = detected utterances without keywords total utterances without keywords false positive rate Joseph Keshet, The Hebrew University

Fat is Good The performance of a keyword spotting system is measured by a Receiver Operating Characteristics (ROC) curve. true positive = detected utterances with keywords true positive rate total utterances with keywords A = 1 false positive = detected utterances without keywords total utterances without keywords false positive rate Joseph Keshet, The Hebrew University

Fat is Good The performance of a keyword spotting system is measured by a Receiver Operating Characteristics (ROC) curve. true positive = detected utterances with keywords true positive rate total utterances with keywords A false positive = detected utterances without keywords total utterances without keywords false positive rate Joseph Keshet, The Hebrew University

Fat is Good The performance of a keyword spotting system is measured by a Receiver Operating Characteristics (ROC) curve. true positive = detected utterances with keywords true positive rate total utterances with keywords area under curve A false positive = detected utterances without keywords total utterances without keywords false positive rate Joseph Keshet, The Hebrew University

HMM-based Keyword Spotting Joseph Keshet, The Hebrew University

HMM-based Keyword Spotting Whole Word Modeling bought ¯ q ¯ x 10 ms [Rahim et al, 1997; Rohlicek et al, 1989] Joseph Keshet, The Hebrew University

HMM-based Keyword Spotting Whole Word Modeling a garbage model bought ¯ q ¯ x 10 ms [Rahim et al, 1997; Rohlicek et al, 1989] Joseph Keshet, The Hebrew University

HMM-based Keyword Spotting Whole Word Modeling bought ¯ q ¯ x 10 ms [Rahim et al, 1997; Rohlicek et al, 1989] Joseph Keshet, The Hebrew University

HMM-based Keyword Spotting Phoneme-Based garbage bought garbage h iy b ao t ih t ¯ p ¯ q ¯ x 10 ms [Bourlard et al, 1994; Manos & Zue, 1997; Rohlicek et al, 1993] Joseph Keshet, The Hebrew University

HMM-based Keyword Spotting Large Vocabulary Based • Linguistic constraints on the garbage model • Does a human listener need to have a large vocabulary in order to recognize one word? (Cardillo et al, 2002; Rose & Paul, 1990; Szoke et al, 2005; Weintraub, 1995) Joseph Keshet, The Hebrew University

HMM Approaches to Keyword Spotting • Do not address specifically the goal of maximizing the area under the ROC curve for the task of keyword spotting Joseph Keshet, The Hebrew University

Discriminative Approach Joseph Keshet, The Hebrew University

Learning Paradigm Discriminative learning from examples x + x + S = { (¯ p 1 , ¯ 1 , ¯ 1 , ¯ s 1 ) , . . . , (¯ p m , ¯ m , ¯ m , ¯ s m ) } x − x − Joseph Keshet, The Hebrew University

Learning Paradigm Discriminative learning from examples x + x + S = { (¯ p 1 , ¯ 1 , ¯ 1 , ¯ s 1 ) , . . . , (¯ p m , ¯ m , ¯ m , ¯ s m ) } x − x − keyword (phoneme sequence) Joseph Keshet, The Hebrew University

Learning Paradigm Discriminative learning from examples x + x + S = { (¯ p 1 , ¯ 1 , ¯ 1 , ¯ s 1 ) , . . . , (¯ p m , ¯ m , ¯ m , ¯ s m ) } x − x − utterance in which the keyword is uttered Joseph Keshet, The Hebrew University

Learning Paradigm Discriminative learning from examples x + x + S = { (¯ p 1 , ¯ 1 , ¯ 1 , ¯ s 1 ) , . . . , (¯ p m , ¯ m , ¯ m , ¯ s m ) } x − x − utterance in which the keyword is not uttered Joseph Keshet, The Hebrew University

Learning Paradigm Discriminative learning from examples x + x + S = { (¯ p 1 , ¯ 1 , ¯ 1 , ¯ s 1 ) , . . . , (¯ p m , ¯ m , ¯ m , ¯ s m ) } x − x − alignment of the keyword and the utterance with keyword Joseph Keshet, The Hebrew University

Learning Paradigm Discriminative learning from examples x + x + S = { (¯ p 1 , ¯ 1 , ¯ 1 , ¯ s 1 ) , . . . , (¯ p m , ¯ m , ¯ m , ¯ s m ) } x − x − Joseph Keshet, The Hebrew University

Learning Paradigm Discriminative learning from examples x + x + S = { (¯ p 1 , ¯ 1 , ¯ 1 , ¯ s 1 ) , . . . , (¯ p m , ¯ m , ¯ m , ¯ s m ) } x − x − Discriminative Keyword Spotting f (¯ Keyword spotter x , ¯ p ) Joseph Keshet, The Hebrew University

Learning Paradigm Discriminative learning from examples x + x + S = { (¯ p 1 , ¯ 1 , ¯ 1 , ¯ s 1 ) , . . . , (¯ p m , ¯ m , ¯ m , ¯ s m ) } x − x − Discriminative Class of all keyword Keyword spotting functions F w Spotting f (¯ Keyword spotter x , ¯ p ) Joseph Keshet, The Hebrew University

Learning Paradigm Discriminative learning from examples x + x + S = { (¯ p 1 , ¯ 1 , ¯ 1 , ¯ s 1 ) , . . . , (¯ p m , ¯ m , ¯ m , ¯ s m ) } x − x − Discriminative f (¯ x , ¯ p ) = max w · φ (¯ x , ¯ p, ¯ s ) Keyword ¯ s Spotting w ∈ R n f (¯ Keyword spotter x , ¯ p ) Joseph Keshet, The Hebrew University

Feature Functions We define 7 feature functions of the form: keyword (phoneme sequence of Confidence in sequence) acoustic features the keyword and suggested alignment Feature (¯ x , ¯ p ) Functions R φ j ¯ s Suggested alignment Joseph Keshet, The Hebrew University

Feature Functions I Cumulative spectral change around the boundaries | ¯ p | − 1 � φ j (¯ x , ¯ p, ¯ s ) = d ( x − j + s i , x j + s i ) , j ∈ { 1 , 2 , 3 , 4 } i =2 s i − j + s i j + s i Joseph Keshet, The Hebrew University

Feature Functions I Cumulative spectral change around the boundaries | ¯ p | − 1 � φ j (¯ x , ¯ p, ¯ s ) = d ( x − j + s i , x j + s i ) , j ∈ { 1 , 2 , 3 , 4 } i =2 − j + s i s i j + s i Joseph Keshet, The Hebrew University

Feature Functions II Cumulative confidence in the phoneme sequence | ¯ p | s i +1 − 1 � � φ 5 (¯ x , ¯ p, ¯ s ) = g ( x t , p i ) t = s i i =1 p i − 1 = t p i = eh . . . . . . . . . s i s i +1 s i − 1 Joseph Keshet, The Hebrew University

Feature Functions II Cumulative confidence in the phoneme sequence | ¯ p | s i +1 − 1 � � φ 5 (¯ x , ¯ p, ¯ s ) = g ( x t , p i ) t = s i i =1 We build a static frame-based phoneme classifier g : X × Y → R g ( x t , p i ) is the confidence that phoneme was uttered at p i p i − 1 = t p i = eh frame x t . . . . . . . . . [Dekel, Keshet, Singer, ‘04] s i s i +1 s i − 1 Joseph Keshet, The Hebrew University

Feature Functions II Cumulative confidence in the phoneme sequence | ¯ p | s i +1 − 1 � � φ 5 (¯ x , ¯ p, ¯ s ) = g ( x t , p i ) t = s i frame based i =1 phoneme classifier p i − 1 = t p i = eh . . . . . . . . . s i s i +1 s i − 1 Joseph Keshet, The Hebrew University

Feature Functions III Phoneme duration model | ¯ p | � log N ( s i +1 − s i ; ˆ φ 6 (¯ x , ¯ p, ¯ s ) = µ p i , ˆ σ p i ) i =1 s i − s i − 1 s i +1 − s i Joseph Keshet, The Hebrew University

Feature Functions III Phoneme duration model | ¯ p | � log N ( s i +1 − s i ; ˆ φ 6 (¯ x , ¯ p, ¯ s ) = µ p i , ˆ σ p i ) i =1 - average length of phoneme ˆ p i µ p i - standard deviation of the ˆ σ p i length of phoneme p i s i − s i − 1 s i +1 − s i Joseph Keshet, The Hebrew University

Feature Functions III Statistics of Phoneme duration model phoneme p i | ¯ p | � log N ( s i +1 − s i ; ˆ φ 6 (¯ x , ¯ p, ¯ s ) = µ p i , ˆ σ p i ) i =1 s i − s i − 1 s i +1 − s i Joseph Keshet, The Hebrew University

Discriminative Keyword Spotting Joseph Keshet, The Hebrew - PowerPoint PPT Presentation

Discriminative Keyword Spotting Joseph Keshet, The Hebrew University David Grangier, IDIAP Research Institute Samy Bengio , Google Inc. Joseph Keshet, The Hebrew University Outline Problem Definition Keyword Spotting with HMMs

Birdwatching Spotting Scopes April, 2020 GENERAL FEATURES OF BIRDWATCHING SPOTTING SCOPES

Target or tactical June, 2020 spotting scopes TARGET OR TACTICAL SPOTTING SCOPES Target or

Angled Spotting Scopes March, 2020 ANGLED SPOTTING SCOPES FOR HUNTING Appropriate for hunting

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

Deep Learning Feature for Handwritten Keyword Spotting Baptiste Wicht Andreas Fischer Jean

ASR-free CNN-DTW keyword spotting using multilingual bottleneck features for almost zero-resource

Visually grounded cross-lingual keyword spotting in speech SLTU, August 2018 Herman Kamper 1 and

Automatic speech recognition and keyword spotting in under-resourced languages Digital Signal

Visually grounded cross-lingual keyword spotting in speech SLTU, August 2018 Herman Kamper 1 and

Bayes-Nash Price of Anarchy for GSP Renato Paes Leme va Tardos Cornell University Keyword

A glimpse to sponsored search auctions Maria Serna Fall 2016 AGT-MIRI Sponsored search Keyword

Generative vs. discriminative Generative Discriminative Belief network A is more More

Discriminative word alignment by learning the Discriminative word alignment by learning the

Three models for discriminative machine Three models for discriminative machine translation using

Spotting Violence from Space The Detection of Housing Destruction in Syria Andr Grger,

The Challenges of Marketing Your Home Watch Business First C Chal allenge Virtually no one

Evaluating Binary Classifiers TPR FPR Many slides attributable to: Prof. Mike Hughes Erik

Bubbles for Fama PRESENTER Robin Greenwood, Harvard Business School DISCUSSANT Bubbles for Fama

Dimension Reduction with Heavy Tails Gabriel Kuhn Munich University of Technology

Dimensionality reduction Machine Learning Hamid Beigy Sharif University of Technology Fall 1393

Illinois Early Childhood Innovation Zones: Early Wins & Lessons Learned So Far Part 1 of

PQL: A Purely-Declarative Java Extension for Parallel Programming Christoph Reichenbach 1 , 2 ,

Polynomials and Fast Fourier Transform (FFT) Polynomials n-1 a i x i a polynomial of degree n-1

The Fast Fourier Transform (FFT) A top 10 Algorithm* Rubin H Landau Sally Haerer,

Discriminative Keyword Spotting Joseph Keshet, The Hebrew - PowerPoint PPT Presentation

Discriminative Keyword Spotting Joseph Keshet, The Hebrew University David Grangier, IDIAP Research Institute Samy Bengio , Google Inc. Joseph Keshet, The Hebrew University Outline Problem Definition Keyword Spotting with HMMs

Birdwatching Spotting Scopes April, 2020 GENERAL FEATURES OF BIRDWATCHING SPOTTING SCOPES

Target or tactical June, 2020 spotting scopes TARGET OR TACTICAL SPOTTING SCOPES Target or

Angled Spotting Scopes March, 2020 ANGLED SPOTTING SCOPES FOR HUNTING Appropriate for hunting

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

Deep Learning Feature for Handwritten Keyword Spotting Baptiste Wicht Andreas Fischer Jean

ASR-free CNN-DTW keyword spotting using multilingual bottleneck features for almost zero-resource

Visually grounded cross-lingual keyword spotting in speech SLTU, August 2018 Herman Kamper 1 and

Automatic speech recognition and keyword spotting in under-resourced languages Digital Signal

Visually grounded cross-lingual keyword spotting in speech SLTU, August 2018 Herman Kamper 1 and

Bayes-Nash Price of Anarchy for GSP Renato Paes Leme va Tardos Cornell University Keyword

A glimpse to sponsored search auctions Maria Serna Fall 2016 AGT-MIRI Sponsored search Keyword

Generative vs. discriminative Generative Discriminative Belief network A is more More

Discriminative word alignment by learning the Discriminative word alignment by learning the

Three models for discriminative machine Three models for discriminative machine translation using

Spotting Violence from Space The Detection of Housing Destruction in Syria Andr Grger,

The Challenges of Marketing Your Home Watch Business First C Chal allenge Virtually no one

Evaluating Binary Classifiers TPR FPR Many slides attributable to: Prof. Mike Hughes Erik

Bubbles for Fama PRESENTER Robin Greenwood, Harvard Business School DISCUSSANT Bubbles for Fama

Dimension Reduction with Heavy Tails Gabriel Kuhn Munich University of Technology

Dimensionality reduction Machine Learning Hamid Beigy Sharif University of Technology Fall 1393

Illinois Early Childhood Innovation Zones: Early Wins &amp; Lessons Learned So Far Part 1 of

PQL: A Purely-Declarative Java Extension for Parallel Programming Christoph Reichenbach 1 , 2 ,

Polynomials and Fast Fourier Transform (FFT) Polynomials n-1 a i x i a polynomial of degree n-1

The Fast Fourier Transform (FFT) A top 10 Algorithm* Rubin H Landau Sally Haerer,

Illinois Early Childhood Innovation Zones: Early Wins & Lessons Learned So Far Part 1 of