1 Introduction Perceptual Audio Coding " Transmission bandwidth increases continuously, but the demand increases even more # need for compression technology Sources: Kahrs, Brandenburg, (Editors). (1998). ”Applications of digital signal processing to audio and acoustics”. Kluwer Academic. " Applications of audio coding Bernd Edler. (1997). ”Low bit rate audio tools”. MPEG meeting. – audio streaming and transmission over the internet – mobile music players Contents: Overview of perceptual ! – digital broadcasting Introduction audio coding ! – soundtracks of digital video (e.g. digital television and DVD) Requiremens for audio Description of coding tools ! ! codecs Filterbankds ! Perceptual coding vs. ! Perceptual models ! source coding Quantization and coding ! Measuring audio quality ! Stereo coding ! Facts from psychoacoustics ! Real coding systems ! Requirements for audio coding systems Requirements (cont.) " Compression efficiency: sound quality vs. bit-rate " Algorithmic delay – depending on the application, the delay is or is not an important " Absolute achievable quality criterion – often required: given sufficiently high bit-rate, no audible difference – very important in two way communication (~ 20 ms OK) compared to CD-quality original audio – not important in storage applications " Complexity – somewhat important in digital TV/radio broadcasting (~ 100 ms) – computational complexity: main factor for general purpose " Editability computers – a certain point in audio signal can be accessed from the coded – storage requirements: main factor for dedicated silicon chips bitstream – encoder vs. decoder complexity – requires that the decoding can start at (almost) any point of the • the encoder is usually much more complex than the decoder bitstream • encoding can be done off-line in some applications " Error resilience – susceptibility to single or burst errors in the transmission channel – usually combined with error correction codes, but that costs bits
Source coding vs. perceptual coding Source coding vs. perceptual coding " Usually signals have to be transmitted with a given fidelity, but not " Speech and non-speech audio are quite different necessarily perfectly identical to the original signal – In the coding context, the word ”audio” usually refers to " Compression can be achieved by removing non-speech audio – redundant information that can be reconstructed at the receiver " For audio signals (as compared to speech), typically – irrelevant information that is not important for the listener – Sampling rate is higher " Source coding : emphasis on redundancy removal – Dynamic range is wider – speech coding: a model of the vocal tract defines the possible – Power spectrum varies more signals , parameters of the model are transmitted – High quality is more crucial than in the case of speech signals – works poorly in generic audio coding: any kind of signals are – Stereo and multichannel coding can be considered possible, and can even be called music " The bitrate required for speech signals is much lower than " Perceptual coding : emphasis on the removal of perceptually irrelevant that required for audio/music information – minimize the audibility of distortions Lossless coding vs. lossy coding Measuring audio quality " Lossless or noiseless coding " Lossy coding of audio causes inevitable distortion to the original signal – able to reconstruct perfectly the original samples " The amount of distortion can be measured using – compression ratios approximately 2:1 – subjective listening tests, for example using mean opinion score – can only utilize redundancy reduction (MOS): the most reliable way of measuring audio quality " Lossy coding – simple objective criteria such as signal-to-noise ratio between the – not able to reconstruct perfectly the original samples original and reconstructed signal (quite non-informative from the perceptual quality viewpoint) – compression ratios around 10:1 or 20:1 for perceptual coding – complex criteria such as objective perceptual similarity metrics – based on perceptual irrelevancy and statistical redundancy that take into account the known properties of the auditory system removal (for example the masking phenomenon)
2 Some facts from psychoacoustics Measuring audio quality (Recap from Hearing lecture) " MOS " Main question in perceptual coding: – test subjects rate the encoded audio using N-step scale – How much noise (distortion, quantization noise) can be introduced into a signal without it being audible? – MOS is defined as the average of the subjects’ ratings " The answer can be found in psychoacoustics " MOS is widely used but has also drawbacks – Psychoacoustics studies the relationship between acoustic events and the corresponding auditory sensations – results vary across time " Most important keyword in audio coding is ” masking ” and test subjects – results vary depending " Masking describes the situation where a weaker but on the chosen test signals clearly audible signal (maskee) becomes inaudible in the (typical audio material vs. presence of a louder signal (masker) critical test signals) – masking depends both on the spectral composition of the maskee " Figure: example scale and masker, and their variation over time for rating the disturbance of coding artefacts 2.1 Masking in frequency domain Masking in frequency domain " Model of the frequency analysis in the auditory system " Figure: masked thresholds [Herre95] – subdivision of the frequency axis into critical bands – masker: narrowband noise around 250 Hz, 1 kHz, 4 kHz – frequency components within a same critical band mask each – spreading function: the effect of masking extends to the spectral other easily vicinity of the masker (spreads more towards high freqencies) – Bark scale: frequency scale that is derived by mapping " Additivity of masking: joint masked thresh is approximately frequencies to critical band numbers (but slightly more than) sum of the components " Narrowband noise masks a tone (sinusoidal) easier than a tone masks noise " Masked threshold refers to the raised threshold of audibility caused by the masker – sounds with a level below the masked threshold are inaudible – masked threshold in quiet = threshold of hearing in quiet
2.2 Masking in time domain Pre-echo " Forward masking (=post-masking) " Pre-echo : If coder-generaged artifacts (distortions) are spread in time to precede the signal itself, the resulting – masking effect extends to times after the masker is switched off audible artifact is called ”pre-echo” " Backwards masking (pre-masking) – common problem, – masking extends to times before the masker is been switched on since filter banks used " Figure [Sporer98]: in coders cause # forward/backward temporal spreading masking does not " Figure: Example of extend far in time pre-echo # simultaneous masking – lower curve (noise signal) is more important reveals the shape of phenomenon the analysis window 2.3 Variability between listeners 3 Overview of perceptual audio coding " An underlying assumption of perceptual audio coding is " Basic idea is to hide quantization noise below the signal- that there are no great differences in individuals’ hearing dependent threshold of hearing (masked threshold) " More or less true " Modeling the masking effect – absolute threshold of hearing: varies even for one listener over – most important masking effects are described in the frequency time – perceptual coders have to assume very good hearing domain – masked threshold: variations are quite small – on the other hand, effects of masking extend only up to about – masking in time domain: large variations, a listener can be trained 15ms distance in time (see ”masking in time domain” above) to hear pre-echos " Consequence: 2.4 Conclusion – perceptual audio coding is best done in time-frequency domain # common basic structure of perceptual coders " Research on hearing is by no means a closed topic – simple models can be built rather easily and can lead to reasonably good coding results – when desining more advanced coders (perceptual models), the limits of psychoacoustic knowledge are soon reached
Recommend
More recommend