Effective Open Source Speech Recognition in Your Application - PowerPoint PPT Presentation

Sep 25, 2023 •258 likes •411 views

Effective Open Source Speech Recognition in Your Application #kde-speech Peter Grasch peter@grasch.net The Basics Speech model Decoder Acoustic model Language model Sounds Vocabulary Grammar Open Source Speech Recognition

Effective Open Source Speech Recognition in Your Application #kde-speech Peter Grasch peter@grasch.net
The Basics Speech model Decoder Acoustic model Language model ● Sounds ● Vocabulary ● Grammar
Open Source Speech Recognition Decoder Trainer UI CMU SPHINX ✓ ✓ (PocketSphinx, SphinxTrain) Julius ✓ KALDI ✓ ✓ Simon ✓ ✓ ✓
Standard Architecture Commands Simond Simon Your application ? Acoustic model Language model
Standard Architecture Commands Simond Simon Scenario Scenario Your application Scenario Acoustic model Language model
Headless Architecture Commands Simond Simon Your application Acoustic model Language model
Embedded Architecture Commands Simond Simon Your application Acoustic model Language model Decoder
Standard Architecture Commands Simond Simon Scenario Scenario Your application Scenario Acoustic model Language model
Writing your Scenario ● Lay out the commands you want to support ● Create: – Vocabulary – Grammar – Commands
Writing your Scenario Demonstration
Tighter Integration: Write a Custom Command Plug-In ● Full, programmatic control of the scenario ● Meta information of recognition results: – Phonetic transcriptions – Confidence scores* – Alternative results*
Tighter Integration: Write a Custom Command Plug-In Demonstration
Q & A #kde-speech Peter Grasch peter@grasch.net
Thank you for your attention

Recommend

Coding by Voice with Open Source Speech Recognition David Williams-King Ph.D. student at

Coding by Voice with Open Source Speech Recognition David Williams-King Ph.D. student at Columbia University dwk@voxhub.io Too-Much-Typing Disease Muscle strength & endurance 0 Could not type, use a pencil, open doors, etc

708 views • 41 slides

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types 1 7-Speech Recognition (Cont d) HMM Calculating Approaches

1.08k views • 74 slides

PocketSphinx: Open-Source Speech Recognition for Hand-held and Embedded Devices David

PocketSphinx: Open-Source Speech Recognition for Hand-held and Embedded Devices David Huggins-Daines (dhuggins@cs.cmu.edu) Mohit Kumar (mohitkum@cs.cmu.edu) Arthur Chan (archan@cs.cmu.edu) Alan W Black (awb@cs.cmu.edu) Mosur Ravishankar

169 views • 14 slides

simon Open-Source Speech Recognition Developed by the non profit organization Simon Listens in

simon Open-Source Speech Recognition Developed by the non profit organization Simon Listens in cooperation with Cyber-Byte IT services Introducing: David 17 years old Hobbies: Music TV Friends Girls Page 2 of 13

378 views • 16 slides

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented by Jen-Wei Kuo Reference 1. X. Huang et. al., Spoken Language Processing, Chapter 8 2. Daniel Jurafsky and James H. Martin, Speech and Language

1.05k views • 65 slides

Codec 2 open source speech codec low bit rate (2400 bit/s and below) applications

Codec 2 open source speech codec low bit rate (2400 bit/s and below) applications include digital speech for HF and VHF radio fills gap in open source speech codecs beneath 5000 bit/s Why Open Source? Ham radio is an

450 views • 19 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech synthesis (Concluding lecture) Instructor: Preethi Jyothi Nov 6, 2017 Recall: SPSS framework O Speech Speech Train Parameter

273 views • 26 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech Synthesis (Part I) Instructor: Preethi Jyothi Oct 30, 2017 T ext- T o- S peech Systems Storied History Von Kempelens speaking machine (1791)

290 views • 8 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction to Statistical Speech Recognition Instructor: Preethi Jyothi July 24, 2017 Course Specifics Pre-requisites Ideal Background: Completed one of

732 views • 44 slides

Speech recognition Brief history Technology Computer Literacy 1 Lecture 22 How does

Topics Definition of speech recognition Speech recognition Brief history Technology Computer Literacy 1 Lecture 22 How does speech recognition work 10/11/2008 Speaker recognition Problems of speech and speaker recognition

325 views • 6 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate A frame discrete samples Need to

441 views • 26 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 9: Brief

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 9: Brief Introduction to Neural Networks Instructor: Preethi Jyothi Feb 2, 2017 Final Project Landscape Tabla bol transcription Voice-based music Sanskrit

409 views • 17 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFSTs in ASR

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFSTs in ASR + Basics of Speech Production Instructor: Preethi Jyothi Lecture 4 Qv iz-1 Postmortem Common Mistakes: Correct Incorrect Output

589 views • 25 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFST

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFST algorithms contd. + WFSTs in ASR Instructor: Preethi Jyothi August 3, 2017 Qv iz-1 Postmortem Common Mistakes: Correct Incorrect Missing

432 views • 24 slides

Combining Speech and Speaker Recognition - A Joint Modeling Approach Hang Su Supervised by:

Introduction and Motivation Backgrounds on Speech and Speaker Recognition Connecting Speech and Speaker Recognition Joint Modeling of Speech and Speaker Conclusion and Future Work Combining Speech and Speaker Recognition - A Joint Modeling

729 views • 72 slides

Speech Recognition Speech Recognition Berlin Chen,

Speech Recognition Speech Recognition Berlin Chen, berlin@csie.ntnu.edu.tw http://berlin.csie.ntnu.edu.tw Course Contents Both the theoretical and practical issues for spoken language processing will be considered

827 views • 37 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 11: Recurrent

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 11: Recurrent Neural Network (RNN) Models for ASR Instructor: Preethi Jyothi Feb 9, 2017 Recap: Hybrid DNN-HMM Systems Triphone state labels (DNN

651 views • 20 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 22: Speaker

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 22: Speaker Adaptation & Pronunciation modelling Instructor: Preethi Jyothi Apr 10, 2017 Speaker variations Major cause of variability in speech is the di

466 views • 22 slides

Speech Separation for Recognition and Enhancement Dan Ellis Laboratory for Recognition and

Speech Separation for Recognition and Enhancement Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia University, NY and International Computer Science Institute, Berkeley CA

348 views • 18 slides

AN OPEN-SOURCE FRAMEWORK FOR INTEGRATING MULTI-SOURCE LAYOUT AND TEXT RECOGNITION TOOLS INTO

AN OPEN-SOURCE FRAMEWORK FOR INTEGRATING MULTI-SOURCE LAYOUT AND TEXT RECOGNITION TOOLS INTO SCALABLE OCR WORKFLOWS KAY-MICHAEL WRZNER KONSTANTIN BAIERER 1 . 1 Bibliotheca Baltica 2018 Rostock 2018-10-05 OVERVIEW 1. Why OCR-D 2. The

776 views • 48 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 16: Language

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 16: Language Models (Part III) Instructor: Preethi Jyothi Mar 16, 2017 Mid-semester feedback Thanks! Work out more examples esp. for topics that are

417 views • 25 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 7: Hidden

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 7: Hidden Markov Models (Part III) Instructor: Preethi Jyothi Aug 14, 2017 Recap: Learning HMM Parameters Given an HMM = ( A , B ) and an observation se-

514 views • 19 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker Adaptation Instructor: Preethi Jyothi Oct 23, 2017 Speaker variations Major cause of variability in speech is the di ff erences between

704 views • 18 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 10: Deep Neural

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 10: Deep Neural Network(DNN)-based Acoustic Models Instructor: Preethi Jyothi Feb 6, 2017 Qv iz 2 Postmortem Correct Incorrect Common Mistakes: 1 Markov

394 views • 26 slides