Speech Processing 15-492/18-492 Speech Recognition Signal - PowerPoint PPT Presentation

Nov 26, 2023 •327 likes •499 views

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech (sound) is analog Speech (sound) is analog Computers are digital Computers are digital We need to convert We need to convert

Speech Processing 15-492/18-492 Speech Recognition Signal Processing
Analog to Digital Speech (sound) is analog � Speech (sound) is analog � � Computers are digital Computers are digital �  We need to convert We need to convert  Sample from A- -D converter D converter � Sample from A � � N times a second N times a second � How many times a second? � How many times a second? �
Goals of Signal Processing Distinguish between phonetic types � Distinguish between phonetic types � Be invariant to channel/room conditions � Be invariant to channel/room conditions � Be invariant to speaker characteristics � Be invariant to speaker characteristics � Computational efficiency � Computational efficiency �
Time vs Frequency Domain Human ear distinguishes frequencies � Human ear distinguishes frequencies � Initial ASR used time domain features � Initial ASR used time domain features � � Power Power � � Zero crossings (sort of frequency) Zero crossings (sort of frequency) �
Source Filter Model Pitch Voiced Pulse Filter Noise Vocal Track Unvoiced Model
Time domain Signal
Waveform Representation
Speech Spectragram
/iy/ vs /ae/ • “beat” /b iy t/ and “bat” /b ae t/
Frequency Domain • “pencils” /p eh n s ih l z/
Frequency Domain • “beats pits” / b iy t s p ih t s /
Speech Analysis
Standard Parameterization Split waveform into “frames” � Split waveform into “frames” � � Advance every 10ms Advance every 10ms � � Size around 25ms (overlapping frames) Size around 25ms (overlapping frames) � � Window them Window them � � Perform FFT/Mel Perform FFT/Mel Cepstral Cepstral analysis analysis � � Find Deltas (difference from previous) Find Deltas (difference from previous) � � Find Delta Deltas (difference in delta) Find Delta Deltas (difference in delta) �
Summary Time domain vs vs Frequency domain Frequency domain � Time domain � Parameterization of speech � Parameterization of speech � � Frequency domain Frequency domain � � Short term Short term FFTs FFTs � � FFT FFT vs vs MEL MEL Cepstrum Cepstrum �

Recommend

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis Evaluating Speech Synthesis How good is the voice? How good is the voice? This voice is a 45.67 This voice is a

466 views • 24 slides

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody Speech Synthesis Speech Synthesis Linguistic Analysis Linguistic Analysis Pronunciations Pronunciations Prosody Prosody

422 views • 24 slides

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary Acoustic Modeling Acoustic Modeling Speech and Signal Variability Speech and Signal Variability Measuring

625 views • 27 slides

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars Other ASR techniques But not just acoustics But not just acoustics But not all phones are equi-probable Find word sequences that maximizes

572 views • 20 slides

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS components Spoken Dialog Systems Spoken Dialog Systems More than just ASR and TTS More than just ASR and TTS Recognition Recognition

813 views • 53 slides

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis From text to speech From text to speech Text Analysis Text Analysis Strings of characters to words Strings of characters to words

670 views • 25 slides

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis From text to speech Text Analysis Strings of characters to words Linguistic Analysis From words to pronunciations and prosody

490 views • 25 slides

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs Text Speech vs Text Same but different Same but different Core Speech Technologies Core Speech Technologies Speech Recognition Speech

706 views • 38 slides

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges Commercial and Research Current and Future What are the hot topics in Speech What are the hot topics in Speech What currently works What

546 views • 16 slides

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech Synthesis Linguistic Analysis Linguistic Analysis Pronunciations Pronunciations Prosody Prosody Part of Speech Tagging

383 views • 21 slides

Speech Processing 15-492/18-492 Computer Speech Analog to Digital Speech (sound) is analog

Speech Processing 15-492/18-492 Computer Speech Analog to Digital Speech (sound) is analog Speech (sound) is analog Computers are digital Computers are digital We need to convert We need to convert Sample from A- -D

476 views • 18 slides

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final presentation on New Parameterizations for Emotional Speech Synthesis) Processing Emotional Speech What is it? Emotion/Expressive/Style

435 views • 26 slides

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary Acoustic Modeling Speech and Signal Variability Speech and Signal Variability Measuring Error Measuring Error Pronunciation

575 views • 28 slides

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems Three part systems ASR ASR - -> Translation > Translation - -> TTS > TTS System configurations System configurations

384 views • 16 slides

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How good is the voice? How good is the voice? This voice is a 45.67 This voice is a 45.67 Is voice X better than voice Y Is voice X

382 views • 25 slides

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by Templates A little history A little history Matching Templates Matching Templates DTW (Dynamic Time Warping) DTW (Dynamic

381 views • 24 slides

Q dP dC Q 1 + = = = 1 P 0 , or MR MC , or P 1

ECO 300 Fall 2005 November 15 MONOPOLY PART 2 SECOND-DEGREE PRICE DISCRIMINATION (P-R pp. 386-7) This is an imperfect attempt to extract some consumer surplus using quantity discounts, usually in blocks of quantities The same

331 views • 5 slides

Today References See Russell and Norvig, chapter, 2 and 7 Russell and Norvig Kinds of Agents

1 2 Today References See Russell and Norvig, chapter, 2 and 7 Russell and Norvig Kinds of Agents D. Dennett. Kinds of Minds . Weidenfeld and Nicolson, London, 1996. Logical Agents Michael Wooldridge. An introduction to Multi-Agent

381 views • 10 slides

Case Study Pueblo Chemical Depot RCRA Facility Investigations for SWMUs 13 & 12 A Tale

Case Study Pueblo Chemical Depot RCRA Facility Investigations for SWMUs 13 & 12 A Tale of Two SWMUs 3 April 2019 Site and SWMU Locations SWMU 13 SWMU 12 Location: Approximately 25 miles east of Pueblo, CO Red Line is the PCD

432 views • 12 slides

Logical Agents Philipp Koehn 5 March 2020 Philipp Koehn Artificial Intelligence: Logical Agents

Logical Agents Philipp Koehn 5 March 2020 Philipp Koehn Artificial Intelligence: Logical Agents 5 March 2020 1 The world is everything that is the case. Wittgenstein, Tractatus Philipp Koehn Artificial Intelligence: Logical Agents 5 March

1.05k views • 80 slides

Chapter 7 Logical Agents CS4811 - Artificial Intelligence Nilufer Onder Department of Computer

Chapter 7 Logical Agents CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Knowledge-based agents Wumpus world Logic in general: models and entailment Propositional Logic

696 views • 30 slides

Announcements Project milestone code out Due Nov 3

Introduc)on to Ar)ficial Intelligence Lecture 7 Logical reasoning CS/CNS/EE 154 Andreas Krause TexPoint fonts used in EMF. Announcements Project

413 views • 38 slides

Logical agents Chapter 7 Chapter 7 1 Outline Wumpus world Logic in generalmodels and

Revised by Hankui Zhuo, March 21, 2018 Logical agents Chapter 7 Chapter 7 1 Outline Wumpus world Logic in generalmodels and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability Inference rules

818 views • 69 slides

Logical agents Chapter 7 (Some slides adapted from Stuart Russell, Dan Klein, and many others.

Logical agents Chapter 7 (Some slides adapted from Stuart Russell, Dan Klein, and many others. Thanks guys!) 1 Outline Knowledge-based agents Wumpus world Logic in generalmodels and entailment Propositional (Boolean) logic

594 views • 45 slides

Speech Processing 15-492/18-492 Speech Recognition Signal - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech (sound) is analog Speech (sound) is analog Computers are digital Computers are digital We need to convert We need to convert

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 15-492/18-492 Computer Speech Analog to Digital Speech (sound) is analog

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

Q dP dC Q 1 + = = = 1 P 0 , or MR MC , or P 1

Today References See Russell and Norvig, chapter, 2 and 7 Russell and Norvig Kinds of Agents

Case Study Pueblo Chemical Depot RCRA Facility Investigations for SWMUs 13 & 12 A Tale

Logical Agents Philipp Koehn 5 March 2020 Philipp Koehn Artificial Intelligence: Logical Agents

Chapter 7 Logical Agents CS4811 - Artificial Intelligence Nilufer Onder Department of Computer

Announcements Project milestone code out Due Nov 3

Logical agents Chapter 7 Chapter 7 1 Outline Wumpus world Logic in generalmodels and

Logical agents Chapter 7 (Some slides adapted from Stuart Russell, Dan Klein, and many others.

Sambuz

Useful Links

Newsletter

Mail Us

Speech Processing 15-492/18-492 Speech Recognition Signal - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech (sound) is analog Speech (sound) is analog Computers are digital Computers are digital We need to convert We need to convert

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 15-492/18-492 Computer Speech Analog to Digital Speech (sound) is analog

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

Q dP dC Q 1 + = = = 1 P 0 , or MR MC , or P 1

Today References See Russell and Norvig, chapter, 2 and 7 Russell and Norvig Kinds of Agents

Case Study Pueblo Chemical Depot RCRA Facility Investigations for SWMUs 13 &amp; 12 A Tale

Logical Agents Philipp Koehn 5 March 2020 Philipp Koehn Artificial Intelligence: Logical Agents

Chapter 7 Logical Agents CS4811 - Artificial Intelligence Nilufer Onder Department of Computer

Announcements Project milestone code out Due Nov 3

Logical agents Chapter 7 Chapter 7 1 Outline Wumpus world Logic in generalmodels and

Logical agents Chapter 7 (Some slides adapted from Stuart Russell, Dan Klein, and many others.

Sambuz

Useful Links

Newsletter

Mail Us

Case Study Pueblo Chemical Depot RCRA Facility Investigations for SWMUs 13 & 12 A Tale