Efficient Voice Activity Detection via Binarized Neural Networks - PowerPoint PPT Presentation

Sep 12, 2023 •201 likes •270 views

Efficient Voice Activity Detection via Binarized Neural Networks Jong Hwan Ko Josh Fromm Matthai Philipose Shuayb Zarar Ivan Tashev Microsoft Georgia Tech U of Washington Voice Activity Detection (VAD) Need to run

Efficient Voice Activity Detection via Binarized Neural Networks Jong Hwan Ko Josh Fromm Matthai Philipose Shuayb Zarar Ivan Tashev Microsoft Georgia Tech U of Washington
Voice Activity Detection (VAD) • Need to run on a fraction voice noise of a CPU • Traditionally (pre-2016) • Based on Gaussian Mixture Models • Google WebRTC state of the art: 0 0 1 1 1 1 1 0 1 1 1 0 0 1 1 1 0 1 1 1 1 0 • 20.5% error • 17 ms latency …
VAD with DNNs • Simple DNN on audio 0 1 1 0 0 1 1 0 … spectrogram † Current 1 0 0 0 1 0 0 0 … [noisy features, frame 1 1 1 0 1 1 1 0 … † I. Tashev and S. Mirsamadi, ITA 2016 ground-truth labels] 3 3 7-frame … … … … … … … … … window • Results: Input: 256x7 (1792) • ☺ 5.6% error (from 20.5%) 512 512 Hidden •  152ms (from 17ms) 512 Output: 257 0 1 1 0 0 1 1 0 … Predicted 1 0 0 0 1 0 0 0 … Labels 1 1 1 0 1 1 1 0 … … … … … … … … … … Idea: Quantize DNN to very low (1-3 bit) bitwidths
Implementing Binarized Arithmetic • Quantize floats to +/-1 1.2 3.12 -11.2 3.4 -2.12 - 132.1 … 0.2 - 121.1, … • 1.122 * -3.112 ==> 1 * -1 • Notice: 64 floats • 1 * 1 = 1 • 1 * -1 = -1 64 bits 0b110100…1 0x0… • -1 * 1 = -1 • -1*-1 = 1 A[:64] . W[:64] == popc(A /64 XNOR W /64 ) • Replacing -1 with 0, this is just XNOR • Retrain model to convergence
Cost/Benefit of Binarized Arithmetic float x[], y[], w[]; ... for i in 1…N: y[j] += x[i] * w[i]; 2N ops ~40x fewer ops 32x smaller unsigned long x[], y[], w[]; 3N/64 ops … for i in 1…N/64: y[j] += 64 – 2*popc(not(x_b[i] xor w_b[i])); Problem: Optimized model slower when measured!  
Kang et al. ICASSP 2018 Try Again, With Custom GEMM Operation Per-frame error (WebRTC=20.46%) feature quantization bits Model N32 N8 N4 N2 N1 5.55 weight quantization bits W32 W8 6.25 6.45 7.23 13.87 Sweet spot: W4 6.16 6.47 7.32 14.11 ☺ ~5ms latency (30.2x faster) ☺ additional 2.4% accuracy loss 7.92 W2 6.63 7.06 13.88 W1 7.91 8.47 8.97 14.95 Takeaway: Compilers (a la TVM/Halide) essential for new ops.

Recommend

Voice Activity Detection Voice Activity Detection Speaker Recognition Feature Extraction

Voice Activity Detection Introduction Voice Activity Detection Voice Activity Detection Speaker Recognition Feature Extraction Algorithms Victor Lenoir Threshold VAD Gaussian Mixture Model VAD LRDE Experiments Laboratoire de Recherche

367 views • 35 slides

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

The Leader's Voice Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The Leader's Voice Slide 5 Page: 7 The Leader's Voice Slide 6 Page: 8 The Leader's Voice Slide 7 Page: 9 The Leader's Voice [

667 views • 49 slides

DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and

DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes Presented by N7MOT Lenny Gemar Amateur radio began with spark gap transmitters, evolving to Morse code and analog voice

442 views • 20 slides

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF 4FSK VHF C4FM AMBE, AMBE+2, Codec-2 HF OFDM Audio Vocoder Modulator Integrate data into protocol A/D Compressed 10101111010 Mic

729 views • 13 slides

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Detection of neutral particles detection of neutrons detection of neutrinons detection of low energy photons (detection of high energy photons calorimeters) Peter Krizan, Neutron and neutrino detection Detection of neutral particles

1.21k views • 67 slides

Aisle Safety Light Brightness SFMTA Fleet Engineering Voice Annunciator Volume Voice

Voice Annunciator Volume and Aisle Safety Light Brightness SFMTA Fleet Engineering Voice Annunciator Volume Voice announcements are handled by the coachs Clever Devices announcement system. Voice Annunciator Volume Voice Annunciator

558 views • 7 slides

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How good is the voice? How good is the voice? This voice is a 45.67 This voice is a 45.67 Is voice X better than voice Y Is voice X

380 views • 25 slides

There is a voice speaking. That voice is sovereign. That voice alone is sovereign. Jeremiah

Jeremiah 1:1-19 There is a voice speaking. That voice is sovereign. That voice alone is sovereign. Jeremiah 1:1-19 That voice rules. not only heard, but heeded. not only obliged, but obyed. Jeremiah 1:1-19 the word of the Lord

617 views • 12 slides

CS 528 Mobile and Ubiquitous Computing Lecture 8b: Voice Analytics, Affect Detection &

CS 528 Mobile and Ubiquitous Computing Lecture 8b: Voice Analytics, Affect Detection & Energy Efficiency Emmanuel Agu Voice-Based/Speech Analytics Voice Based Analytics Voice can be analyzed, lots of useful information extracted Who

1.01k views • 58 slides

CS 528 Mobile and Ubiquitous Computing Lecture 9b: Voice Analytics & Affect Detection

CS 528 Mobile and Ubiquitous Computing Lecture 9b: Voice Analytics & Affect Detection Emmanuel Agu Voice-Based/Speech Analytics Voice Based Analytics Voice can be analyzed, lots of useful information extracted Who is talking? (Speaker

697 views • 30 slides

CS 528 Mobile and Ubiquitous Computing Lecture 9b: Voice Analytics, Affect Detection &

CS 528 Mobile and Ubiquitous Computing Lecture 9b: Voice Analytics, Affect Detection & Energy Efficiency Emmanuel Agu Voice-Based/Speech Analytics Voice Based Analytics Voice can be analyzed, lots of useful information extracted Who

758 views • 58 slides

Year 3 Reading Activity 1 Prefixes - page 2 Activity 2 Context clues page 15

Year 3 Reading Activity 1 Prefixes - page 2 Activity 2 Context clues page 15 Activity 3 Context clues - page 27 Activity 4 word classes page 39 Activity 5 Compare - page 47 Activity 6 Contrast - page 56 Activity 7

1.09k views • 82 slides

Getting Sta rted with Voice API Lorna Mitchell Getting Sta rted with Voice API Use the Voice

Getting Sta rted with Voice API Lorna Mitchell Getting Sta rted with Voice API Use the Voice API to make and receive calls, play audio, send and receive DTMF tones, and to record calls. Workshop plan: Introduce concepts and vocabulary

480 views • 19 slides

Year 4 Science - Sound Activity 1 - Vibrations - Page 2 Activity 2 How do we hear? - Page

Year 4 Science - Sound Activity 1 - Vibrations - Page 2 Activity 2 How do we hear? - Page 8 Activity 3 Loud and quiet sounds - Page 15 Activity 4 - Pitch - Page 21 Activity 5 Pitch Optional Challenge Page 30 Activity 6 - Quiz

1.01k views • 54 slides

Activity 4 Inference from pictures Page 27 Activity 5 Feelings from pictures Page

Activity 1 Mystery Box Page 2 Activity 2 Mystery Bag Page 14 Activity 3 Inference from pictures Page 21 Activity 4 Inference from pictures Page 27 Activity 5 Feelings from pictures Page 33 Activity 6

1.04k views • 70 slides

Efficient Incremental Dynamic Invariant Detection Jeff Perkins and Michael Ernst MIT CSAIL Page

Efficient Incremental Dynamic Invariant Detection Jeff Perkins Efficient Incremental Dynamic Invariant Detection Jeff Perkins and Michael Ernst MIT CSAIL Page 1 27 Oct 2004 20:38 Efficient Incremental Dynamic Invariant Detection Jeff

525 views • 27 slides

Deep convolutional acoustic word embeddings using word-pair side information Herman Kamper 1 ,

Deep convolutional acoustic word embeddings using word-pair side information Herman Kamper 1 , Weiran Wang 2 , Karen Livescu 2 1 CSTR and ILCC, School of Informatics, University of Edinburgh, UK 2 Toyota Technological Institute at Chicago, USA

1.03k views • 76 slides

Tasty Malware Analysis with T.A.C.O. Bringing Cuckoo Reports into IDA Pro Ruxcon 2015 Jason

Tasty Malware Analysis with T.A.C.O. Bringing Cuckoo Reports into IDA Pro Ruxcon 2015 Jason Jones Who Am I? Sr. Security Research Analyst for Arbor Networks ASERT Attend AHA! in Austin semi-frequently Welcome to the

648 views • 43 slides

On the Stackelberg strategies in control theory Enrique FERNNDEZ-CARA Dpto. E.D.A.N. - Univ. of

On the Stackelberg strategies in control theory Enrique FERNNDEZ-CARA Dpto. E.D.A.N. - Univ. of Sevilla several joint works with F.D. ARARUNA Dpto. Matemtica - UFPB - Brazil S. GUERRERO Lab. J.-L. Lions - UPMC - France M.C. SANTOS Dpto.

426 views • 26 slides

Composite Event Recognition for Maritime Monitoring Manolis Pitsikalis 1 , Alexander Artikis 2 , 1

Composite Event Recognition for Maritime Monitoring Manolis Pitsikalis 1 , Alexander Artikis 2 , 1 , Richard Dreo 3 ,Cyril Ray 3 , 4 , Elena Camossi 5 and Anne-Laure Jousselme 5 1 Institute of Informatics & Telecommunications, NCSR Demokritos,

448 views • 24 slides

Lecture III: Majorana neutrinos Petr Vogel, Caltech NLDBD school, October 31, 2017 Whatever

Lecture III: Majorana neutrinos Petr Vogel, Caltech NLDBD school, October 31, 2017 Whatever processes cause 0 , its observation would imply the existence of a Majorana mass term and thus would represent ``New Physics : Schechter and

554 views • 34 slides

Costcompetitive Reduction of Carbon Emissions of up to 80% from the US Electric Sector by 2030

Costcompetitive Reduction of Carbon Emissions of up to 80% from the US Electric Sector by 2030 Alexander E. MacDonald Christopher Clack* Anneliese Alexander* Adam Dunbar Yuanfu Xie James Wilczak NOAA Earth System Research Laboratory

517 views • 28 slides

Clustering Ciira Maina Dedan Kimathi University of Technology 17th June 2015 Introduction

Clustering Ciira Maina Dedan Kimathi University of Technology 17th June 2015 Introduction In most data science applications we are start off with a large collection of objects which form our data set. Clustering is often an initial

1.38k views • 53 slides

Dark soliton in a disorder potential Magorzata Mochol , Marcin Podzie, Krzysztof Sacha

Introduction Classical description: Deformation of a dark soliton Quantum description Conclusions Dark soliton in a disorder potential Magorzata Mochol , Marcin Podzie, Krzysztof Sacha Institute of Physics Jagiellonian University in

354 views • 16 slides