Improved Noise Weighting in CELP Coding of Speech T T Applying - PowerPoint PPT Presentation

Improved Noise Weighting in CELP Coding of Speech T T Applying the Vorbis Psychoacoustic Model To Speex By: Jean-Marc Valin, Christopher Montgomery 22/5/2006 CeNTIE is supported by the Australian Government through the Advanced Networks Program (ANP) of the Department of Communications, Information Technology and the Arts and the CSIRO ICT Centre

Introduction www.ict.csiro.au  Goal: Improve perceptual weighting of the noise in an existing CELP codec (Speex)  Proposed solution: adapt and apply the Vorbis psychoacoustic model to the Speex codec  Outline  Overview of Speex  Overview of Vorbis and psychoacoustic model  Application to Speex  Evaluation & results  Complexity  Conclusion

Overview of Speex www.ict.csiro.au  Speech codec based on CELP  Sampling rates, bitrates:  Narrowband (8 kHz): 2.15 kbps to 24.6 kbps  Wideband (16 kHz): 3.95 kbps to 42.2 kbps  Features:  Open-source (BSD-licensed): http://www.speex.org/  Source-controlled variable bitrate (VBR)  Embedded wideband coding  Variable encoder complexity  Optimised for VoIP  Bit-stream finalized in March 2003

Speex Encoder Structure www.ict.csiro.au  CELP variant with  20 ms frames (5 ms sub-frames)  No inter-frame coding other than LPC and pitch prediction  3-tap pitch predictor  Sub-vector quantization of innovation  “Global” excitation gain  Default noise weighting is LPC-derived W  z = A  z / 1  A  z / 2  ,  1 = 0.9,  2 = 0.6

Vorbis Psychoacoustic Model www.ict.csiro.au  Vorbis is an open-source, MDCT-based audio codec  Psychoacoustic model shapes noise according to:  Tone masking  Noise masking  Noise normalization  Impulse analysis  Noise shaping approximates the masking threshold  Good for transparent audio  Bad for lossy speech

Application to Noise Weighting in Speex www.ict.csiro.au  Vorbis “floor” curve interpreted as the inverse of the optimal perceptual weighting filter  Amplitude companding required  Compute curve for each frame and interpolate on sub-frames W  z = W n  z  1  Convert to pole-zero model: W d  z   Denominator: • Curve to auto-correlation (IFFT) • Auto-correlation to LPC (Levinson-Durbin)  Numerator: • Remove denominator contribution (1/FFT of denominator) • Convert inverse to LPC (IFFT and Levinson-Durbin)

Curves www.ict.csiro.au

Evaluation www.ict.csiro.au  Objective listening quality: PESQ MOS-LQ0 (P.862.x)  Tested on NTT multilingual speech database  354 files  177 speakers  20 languages  Reference: Speex version 1.2-beta1 (pre-release)

Results (narrowband) www.ict.csiro.au

Results (wideband) www.ict.csiro.au

Complexity Reduction www.ict.csiro.au Three strategies: 1 1 1) Use all-pole model W  z = W d  z  2) Force W d  z = A  z  W  z  1  Synthesis+weighting filter simplifies to A  z  = W n  z   Reduces complexity of the filtering 3) Apply 2) and make constant for a whole frame W n  z   Only one conversion per frame None of 1), 2) or 3) causes significant degradation

Conclusion www.ict.csiro.au  Proposed an improved noise weighting for the Speex codec  Noise weighting is based on the Vorbis psychoacoustic model  Up to 20% (equivalent) improvement at high bitrate  Little or no improvement at low bitrate  A case for more research to be done in noise weighting for CELP  A subjective MOS test is desirable  Future work  Investigate efficient approximations for W n  z   Derive CELP-specific masking models

Questions? www.ict.csiro.au

Improved Noise Weighting in CELP Coding of Speech T T Applying - PowerPoint PPT Presentation

Improved Noise Weighting in CELP Coding of Speech T T Applying the Vorbis Psychoacoustic Model To Speex By: Jean-Marc Valin, Christopher Montgomery 22/5/2006 CeNTIE is supported by the Australian Government through the Advanced Networks

Common Endpoint Locator Pools Common Endpoint Locator Pools Common Endpoint Locator Pools (CELP)

DESI GN OF A CELP CODER AND A STUDY DESI GN OF A CELP CODER AND A STUDY OF I TS PERFORMANCE USI

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Bandwidth Extension of Narrowband Speech for Low Bit- Rate Wideband Coding Speech Coding

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Image and Video Coding: Improved Inter-Picture Prediction Review of Hybrid Video Coding Last

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Johnson Noise: Determinations of k and Absolute Zero Edwin Ng | 12 December 2011 Nyquists

WIRELESS COMMUNICATION II LESSON I Monday, 27 March 2017 ETI 2511-CURRICULUM(1) ETI 2511

CS653 Mobile Computing Spring 2014 Spring 2014 Course Overview PG elective course, open to

2D Arrays in Java Two Dimensional Arrays Arrays with multiple dimensions may be "Computer

Master Tasks - Options Ralf Kliemt Panda Collaboration Meeting 5.Nov.2019 1 MasterTasks -

Traditional Machine Learning: Unsupervised Learning Juhan Nam Traditional Machine Learning

The Energy/Frequency Convexity Rule of Energy Consumption for Programs: Modeling,

Android Smartphone as a Microphone in SmartRoom System Pavel Y. Kovyrshin, Dmitry G. Korzun

61A Lecture 26 Announcements Programming Languages Programming Languages 4 Programming

Improved Noise Weighting in CELP Coding of Speech T T Applying - PowerPoint PPT Presentation

Improved Noise Weighting in CELP Coding of Speech T T Applying the Vorbis Psychoacoustic Model To Speex By: Jean-Marc Valin, Christopher Montgomery 22/5/2006 CeNTIE is supported by the Australian Government through the Advanced Networks

Common Endpoint Locator Pools Common Endpoint Locator Pools Common Endpoint Locator Pools (CELP)

DESI GN OF A CELP CODER AND A STUDY DESI GN OF A CELP CODER AND A STUDY OF I TS PERFORMANCE USI

Speech &amp; Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Bandwidth Extension of Narrowband Speech for Low Bit- Rate Wideband Coding Speech Coding

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Image and Video Coding: Improved Inter-Picture Prediction Review of Hybrid Video Coding Last

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Johnson Noise: Determinations of k and Absolute Zero Edwin Ng | 12 December 2011 Nyquists

WIRELESS COMMUNICATION II LESSON I Monday, 27 March 2017 ETI 2511-CURRICULUM(1) ETI 2511

CS653 Mobile Computing Spring 2014 Spring 2014 Course Overview PG elective course, open to

2D Arrays in Java Two Dimensional Arrays Arrays with multiple dimensions may be &quot;Computer

Master Tasks - Options Ralf Kliemt Panda Collaboration Meeting 5.Nov.2019 1 MasterTasks -

Traditional Machine Learning: Unsupervised Learning Juhan Nam Traditional Machine Learning

The Energy/Frequency Convexity Rule of Energy Consumption for Programs: Modeling,

Android Smartphone as a Microphone in SmartRoom System Pavel Y. Kovyrshin, Dmitry G. Korzun

61A Lecture 26 Announcements Programming Languages Programming Languages 4 Programming

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

2D Arrays in Java Two Dimensional Arrays Arrays with multiple dimensions may be "Computer