GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech - PowerPoint PPT Presentation

GPU-Accelerated GPU-Accelerated � Large Vocabulary Continuous Speech Recognition Large Vocabulary Continuous Speech Recognition � for Scalable Distributed Speech Recognition for Scalable Distributed Speech Recognition Jungsuk ¡Kim ¡ ¡Ian ¡Lane ¡ ¡ Electrical and Computer Engineering Carnegie Mellon University March 20, 2015 @GTC2015 1

Carnegie Mellon University Overview ¡ • Introduc5on ¡ • Background ¡ • Weighted ¡Finite ¡State ¡Transducers ¡in ¡Speech ¡Recogni5on ¡ • Proposed ¡Approach ¡ • GPU-‑Accelerated ¡scalable ¡DSR ¡ • Evalua5on ¡ • Conclusion ¡ 2

Carnegie Mellon University Introduc5on ¡ • Voice interfaces a core technology for User Interaction • Mobile devices, Smart TVs, In-Vehicle Systems, … • For a captivating User Experience, Voice UI must be: • Robust • Acoustic robustness à à Large Acoustic Models • Linguistics robustness à à Large Vocabulary Recognition • Responsive • Low latency à à Faster than real-time search • Adaptive • User and Task adaptation 3

Carnegie Mellon University Introduc5on ¡ • Large models critical for accurate speech recognition: • Large acoustic models è è Tens of Millions of parameters • Large vocabulary è è Millions of words è Billions of n-gram entries (>= 20GB) • Large language model è • Examples include: Acoustic modeling for telephony [Mass 2014] or Youtube [Bacchiani 2014] • • ~200M parameter Deep Neural Networks Language model rescoring for Voice Search [Schalkwyk 2010] • 1.2M vocabulary, 5-gram LM, 12.7B n-gram entries • 4

Carnegie Mellon University Introduc5on ¡ Speech recognition contains many highly parallel tasks = + Graphic Processing Units ASR engine designed (SIMT, ~3000 cores, <24GB) specifically for GPUs optimized for parallel Large Models computing More Accurate 5

Carnegie Mellon University Introduc5on ¡ • 1 ¡Million ¡ Vocabulary ¡(3-‑gram) ¡ • 30 ¡Million ¡ parameter ¡ Deep ¡Neural ¡Network ¡ Titan X Tesla K40 Tegra K1 Tegra X1 Maxwell, 3072 cores Kepler, 2880 cores Kepler, 192 cores Maxwell, 256 cores RTF 0.02 0.01 0.17 0.14 xRT 50X 100X 6X 7X 1hour 72s 36s 612s 504s 6

Background ¡ Weighted ¡Finite ¡State ¡Transducers ¡(WFSTs) ¡ in ¡Speech ¡Recogni7on ¡

Carnegie Mellon University WFST ¡in ¡Speech ¡Recogni5on ¡ s:SPEECH/ w[SPEECH] k:RECOGNIZE/ z ax g n ay 3 5 10 12 13 6 8 w[RECOGNIZE] p ε iy ch r eh ε a s 0 1 2 15 16 17/0 n:NICE/ w[NICE] ay a:A/w[A] s b:BEACH/ 7 11 9 14 4 k:WRECK/ w[BEACH] w[WRECK] • “ Recognize speech ” v.s. “ Wreck a nice beach ” .. . • Search is performed in 3 phases. • Phase 0 : Active Set Preparation. • Phase 1 : Acoustic Score Computation. • Phase 2 : WFST Search. 8

Carnegie Mellon University WFST ¡in ¡Speech ¡Recogni5on ¡ s:SPEECH/ w[SPEECH] k:RECOGNIZE/ ax g n ay z 3 5 12 13 6 8 10 w[RECOGNIZE] p ε ch r eh iy ε a s 0 1 2 15 16 17/0 n:NICE/ w[NICE] ay a:A/w[A] s b:BEACH/ 7 9 11 14 4 k:WRECK/ w[BEACH] w[WRECK] • Phase 0: Active Set Preparation • Collect active hypotheses from previous frame. 9

Carnegie Mellon University WFST ¡in ¡Speech ¡Recogni5on ¡ s:SPEECH/ w[SPEECH] k:RECOGNIZE/ ax g n ay z 3 5 12 13 10 6 8 w[RECOGNIZE] p ε r iy ch eh ε a s 2 15 16 0 1 17/0 n:NICE/ w[NICE] ay a:A/w[A] s b:BEACH/ 7 11 4 9 14 k:WRECK/ w[BEACH] w[WRECK] • Phase 1: Acoustic Score Computation • Compute acoustic similarity between given speech and phonetic models using Deep Neural Network 10

Carnegie Mellon University WFST ¡in ¡Speech ¡Recogni5on ¡ s:SPEECH/ w[SPEECH] k:RECOGNIZE/ ax g n ay z 3 5 10 12 13 6 8 w[RECOGNIZE] p ε iy ch r eh ε a s 0 1 2 15 16 17/0 n:NICE/ w[NICE] ay a:A/w[A] s b:BEACH/ 7 9 11 14 4 k:WRECK/ w[BEACH] w[WRECK] • Phase 2: WFST Search • Perform frame synchronous Viterbi beam search on WFST network. • If multiple transitions have same next state s , then the most likely (minimum score) hypothesis is retained (i.e. ¡state ¡12, ¡14, ¡15…) 11

Carnegie Mellon University WFST ¡in ¡Speech ¡Recogni5on ¡ s:SPEECH/ w[SPEECH] k:RECOGNIZE/ ax g n ay z 3 5 10 12 13 6 8 w[RECOGNIZE] p ε iy ch r eh ε a s 0 1 2 15 16 17/0 n:NICE/ w[NICE] ay a:A/w[A] s b:BEACH/ 7 9 11 14 4 k:WRECK/ w[BEACH] w[WRECK] • Iterate these 3 phases until input audio ends. • Phase ¡0: ¡ Ac7ve ¡Set ¡Prepara7on 12

Carnegie Mellon University WFST ¡in ¡Speech ¡Recogni5on ¡ s:SPEECH/ w[SPEECH] k:RECOGNIZE/ ax g n ay z 3 5 10 12 13 6 8 w[RECOGNIZE] p ε iy ch r eh ε a s 0 1 2 15 16 17/0 n:NICE/ w[NICE] ay a:A/w[A] s b:BEACH/ 7 9 11 14 4 k:WRECK/ w[BEACH] w[WRECK] • Phase ¡1: ¡ Acous7c ¡Score ¡Computa7on 13

Carnegie Mellon University WFST ¡in ¡Speech ¡Recogni5on ¡ s:SPEECH/ w[SPEECH] k:RECOGNIZE/ ax g n ay z 3 5 10 12 13 6 8 w[RECOGNIZE] p ε iy ch r eh ε a s 0 1 2 15 16 17/0 n:NICE/ w[NICE] ay a:A/w[A] s b:BEACH/ 7 9 11 14 4 k:WRECK/ w[BEACH] w[WRECK] • Phase ¡2: ¡ WFST ¡Search 14

Carnegie Mellon University WFST ¡in ¡Speech ¡Recogni5on ¡ s:SPEECH/ w[SPEECH] k:RECOGNIZE/ ax g n ay z 3 5 10 12 13 6 8 w[RECOGNIZE] p ε iy ch r eh ε a s 0 1 2 15 16 17/0 n:NICE/ w[NICE] ay a:A/w[A] s b:BEACH/ 7 9 11 14 4 k:WRECK/ w[BEACH] w[WRECK] • Phase ¡0: ¡ Ac7ve ¡Set ¡Prepara7on 15

Carnegie Mellon University WFST ¡in ¡Speech ¡Recogni5on ¡ s:SPEECH/ w[SPEECH] k:RECOGNIZE/ ax g n ay z 3 5 10 12 13 6 8 w[RECOGNIZE] p ε iy ch r eh ε a s 0 1 2 15 16 17/0 n:NICE/ w[NICE] ay a:A/w[A] s b:BEACH/ 7 9 11 14 4 k:WRECK/ w[BEACH] w[WRECK] • Phase ¡1: ¡ Acous7c ¡Score ¡Computa7on 16

Carnegie Mellon University WFST ¡in ¡Speech ¡Recogni5on ¡ s:SPEECH/ w[SPEECH] k:RECOGNIZE/ ax g n ay z 3 5 10 12 13 6 8 w[RECOGNIZE] p ε iy ch r eh ε a s 0 1 2 15 16 17/0 n:NICE/ w[NICE] ay a:A/w[A] s b:BEACH/ 7 9 11 14 4 k:WRECK/ w[BEACH] w[WRECK] • Phase ¡2: ¡ WFST ¡Search 17

Carnegie Mellon University WFST ¡in ¡Speech ¡Recogni5on ¡ s: SPEECH / w[SPEECH] k: RECOGNIZE / ax g n ay z 3 5 10 12 13 6 8 w[RECOGNIZE] p ε iy ch r eh ε a s 0 1 2 15 16 17/0 n:NICE/ w[NICE] ay a:A/w[A] s b:BEACH/ 7 9 11 14 4 k:WRECK/ w[BEACH] w[WRECK] • Recognized result is an output symbol sequence over the best path. • Result: “RECOGNIZE SPEECH” 18

Proposed ¡Approach ¡ GPU-‑Accelerated ¡Scalable ¡DSR ¡

Carnegie Mellon University Distributed ¡Speech ¡Recogni5on ¡(DSR) ¡ (1)$Extract$features$ (2)$Stack$incoming$ (3)$Conduct$Viterbi$ (0)$Itera1on$control,$ (4)$Send$result$back$ from$ac1ve$audio5 frames$from$ac1ve$ beam$search$over$ over$TCP/IP,$Data5 data$prepara1on,$ collec1on.$ streams$into$stacked$ audio5streams$and$ WFST$and$conduct$ result$handling.$ feature$vector$ compute$likelihoods$$ on5the5fly$rescoring$ Acous+c'Score' Feature'Extrac+on' Graph'Search' Post'Processing' Audio&Stream- Itera+on'Control' Update-hyp.- Computa+on' • Itera7on ¡control ¡ • Acous7c ¡score ¡computa7on ¡ Allocate ¡or ¡deallocate ¡data ¡structures. ¡ Deep ¡Neural ¡Network ¡(Forward ¡PropagaKon). ¡ • • Terminate ¡decoding ¡task. ¡ • • Graph ¡search ¡ • Feature ¡extrac7on ¡ Conduct ¡frame ¡synchronous ¡WFST ¡search. ¡ • Receive ¡audio ¡and ¡extract ¡feature ¡for ¡ End-‑of-‑uSerance ¡detecKon. ¡ • • current ¡iteraKon ¡(batch). ¡ • Post ¡processing ¡ Speaker ¡dependent ¡adaptaKon. ¡ • Output ¡(LaUce) ¡processing. ¡ • Sending ¡result ¡back ¡to ¡client. ¡ • ¡ 20

Carnegie Mellon University Producer/Consumer ¡design ¡paOern ¡ Master/Slave ¡paSern. ¡ • Decuple ¡processes ¡that ¡produce ¡and ¡ • consume ¡data ¡at ¡different ¡rates. ¡ Advantages: ¡ ¡ • Enhanced ¡data ¡sharing ¡ • Processes ¡can ¡run ¡in ¡different ¡speeds. ¡ • Buffered ¡communicaKon ¡between ¡ • processes. ¡ ¡ Producer-Consumer multi-threaded model 21

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech - PowerPoint PPT Presentation

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large Vocabulary Continuous Speech Recognition for Scalable Distributed Speech Recognition for Scalable Distributed Speech Recognition Jungsuk Kim

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

Sophomore Accelerated English A Separate Peace Vocabulary Presentation Throughout our study of A

What is Accelerated Reader? Accelerated Reader is a computer program that helps teachers manage

Accelerated Learning - for Breakthrough Results Whole brain, person, systems approach Debbie

ACCELERATED COMPUTING FOR AI Bryan Catanzaro, 7 December 2018 ACCELERATED COMPUTING: REDUCE

Parallel Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space, Jos e

LARGE SCALE VISUALIZATION ON GPU ACCELERATED SUPERCOMPUTERS Peter Messmer, 11/16/2015

CUDA-Accelerated Short-Read Alignment to a Large Reference Genome Richard Wilton Department of

GPU-Accelerated Incremental Correlation Clustering of Large Data with Visual Feedback Eric

When was the last time you did something for the first time? Accelerated learning techniques

Time accelerated P E R F O R M 6 0 P E R F O R M 6 0 P E R F O R M 6 0 P E R F O R M 6 0 F

menu del dia learning (7 slides) purushottam kar (iit kanpur) accelerated kernel learning

Accelerated Development of Materials, The Future Is Here (!) Raymundo Arryave Accelerated

Roseburn Primary School Dream Believe Achieve Accelerated Reading A Guide for Parents

Decoherence of the radiation from an accelerated quantum source T.C.Ralph School of Maths &

GPU-accelerated Data Management Data Processing on Modern Hardware Sebastian Bre TU Dortmund

Bhyve guests with hardware accelerated graphics Michael Chiu EuroBSDCon 2019 Who am I?

GraFBoost: Using accelerated flash storage for external graph analytics Sang-Woo Jun, Andy

AMP: Accelerated Mobile Pages Re-Imagined Karen Stevenson Director of Technology L U L L A B O

POLICY & PRACTICE IN ACCELERATED EDUCATION Nov 12-15, Kampala, Uganda 2.1 PRINCIPLE 2,

1 SRIM is a Monte Carlo simulation program for calculating the trajectory of accelerated ions in

Nanotech Accelerated Development Center A presentation to The Joint Commission on Technology

Accelerated Douglas-Rachford splitting and ADMM for structured nonconvex optimization Panos

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech - PowerPoint PPT Presentation

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large Vocabulary Continuous Speech Recognition for Scalable Distributed Speech Recognition for Scalable Distributed Speech Recognition Jungsuk Kim

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

Sophomore Accelerated English A Separate Peace Vocabulary Presentation Throughout our study of A

What is Accelerated Reader? Accelerated Reader is a computer program that helps teachers manage

Accelerated Learning - for Breakthrough Results Whole brain, person, systems approach Debbie

ACCELERATED COMPUTING FOR AI Bryan Catanzaro, 7 December 2018 ACCELERATED COMPUTING: REDUCE

Parallel Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space, Jos e

LARGE SCALE VISUALIZATION ON GPU ACCELERATED SUPERCOMPUTERS Peter Messmer, 11/16/2015

CUDA-Accelerated Short-Read Alignment to a Large Reference Genome Richard Wilton Department of

GPU-Accelerated Incremental Correlation Clustering of Large Data with Visual Feedback Eric

When was the last time you did something for the first time? Accelerated learning techniques

Time accelerated P E R F O R M 6 0 P E R F O R M 6 0 P E R F O R M 6 0 P E R F O R M 6 0 F

menu del dia learning (7 slides) purushottam kar (iit kanpur) accelerated kernel learning

Accelerated Development of Materials, The Future Is Here (!) Raymundo Arryave Accelerated

Roseburn Primary School Dream Believe Achieve Accelerated Reading A Guide for Parents

Decoherence of the radiation from an accelerated quantum source T.C.Ralph School of Maths &amp;

GPU-accelerated Data Management Data Processing on Modern Hardware Sebastian Bre TU Dortmund

Bhyve guests with hardware accelerated graphics Michael Chiu EuroBSDCon 2019 Who am I?

GraFBoost: Using accelerated flash storage for external graph analytics Sang-Woo Jun, Andy

AMP: Accelerated Mobile Pages Re-Imagined Karen Stevenson Director of Technology L U L L A B O

POLICY &amp; PRACTICE IN ACCELERATED EDUCATION Nov 12-15, Kampala, Uganda 2.1 PRINCIPLE 2,

1 SRIM is a Monte Carlo simulation program for calculating the trajectory of accelerated ions in

Nanotech Accelerated Development Center A presentation to The Joint Commission on Technology

Accelerated Douglas-Rachford splitting and ADMM for structured nonconvex optimization Panos

Decoherence of the radiation from an accelerated quantum source T.C.Ralph School of Maths &

POLICY & PRACTICE IN ACCELERATED EDUCATION Nov 12-15, Kampala, Uganda 2.1 PRINCIPLE 2,