GCT535- Sound Technology for Multimedia Digital Audio Graduate School of Culture Technology KAIST Juhan Nam 1
Digital Representations … 0 1 1 0 1 1 0 … Sound … 1 0 0 1 1 0 1 … Image … 0 0 1 1 0 1 1 … Text
Digital Representations § Sampling and Quantization – Sound (samples) – Image (pixels) § Trade-off – Resolution (quality) and data size
Digital Representations § Encoding and Decoding (compression or de-compression) – Lossless : redundancy removal (e.g. zip) – Lossy: drop bits (quantize data more) such that they are perceptually not noticeable Reduce data (i.e. bits) with no loss of information
Outlines § Digital Representation of Sound § Sampling § Quantization § Compression 5
Digital Audio Chain …0 0 1 0 1 0 … 6
Transducers § Microphones – Air vibration to electrical signal – Dynamic / condenser microphones – The signal is very weak: use of pre-amp § Speakers – Electrical signal to air vibration – Generate some distortion (by diaphragm) – Crossover networks: woofer / tweeter 7
Sampling § Convert continuous-time signal to discrete-time signal by periodically picking up the instantaneous values – Represented as a sequence of numbers; pulse code modulation (PCM) – Sampling period ( T s ): the amount of time between samples – Sampling rate ( f s = 1/ T s ) Signal notation T s x ( t ) → x ( nT s ) 8
Sampling Theorem § What is an appropriate sampling rate? – Too high: increase data rate – Too low: become hard to reconstruct the original signal § Sampling Theorem – In order for a band-limited signal to be reconstructed fully, the sampling rate must be greater than twice the maximum frequency in the signal f s > 2 ⋅ f m f s – Half the sampling rate is called Nyquist frequency ( ) 2 9
Sampling in Frequency Domain § Sampling in time creates imaginary content of the original at every f s frequency Audible range Audible range -f m f m f m f s -f m f s -f s -f s +f m -f m f s +f m Nyquist Frequency § Why? 𝑦 𝑜 = sin 2𝜌𝑔 * 𝑜𝑈 - = sin 2𝜌𝑔 * 𝑜/𝑔 - 𝑦 𝑢 = sin 2𝜌𝑔 * 𝑢 = sin 2𝜌𝑔 * 𝑜/𝑔 - ± 2𝜌𝑙𝑜 = sin 2𝜌𝑜(𝑔 * ± 𝑙𝑔 - )/𝑔 - 10
Aliasing § If the sampling rate is less than twice the maximum frequency, the high- frequency content is folded over to lower frequency range 1 0.8 0.6 0.4 0.2 0 − 0.2 − 0.4 − 0.6 − 0.8 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 4 x 10 11
Aliasing in Frequency Domain § Sampling in time creates imaginary content of the original at every f s frequency Audible range Audible range -f m f m -f s -f s +f m f m f s -f m f s -f m f s +f m § The frequency that we hear is 𝑔 - − 𝑔 * In order to avoid aliasing f m < f s − f m 12
Aliasing in Frequency Domain § For general signals, high-frequency content is folded over to lower frequency range Audible range f s -f s -f m f s -f m f m f s +f m 13
Avoid Aliasing § Increase sampling rate f s > 2 ⋅ f m § Use lowpass filters before sampling -f s -f m f s -f m f s f m f s +f m Lowpass Filter f s -f s -f s /2 f s /2 14
Examples of Aliasing 0 0 Magnitude (dB) Magnitude (dB) − 20 − 20 − 40 − 40 − 60 − 60 5 10 15 20 5 10 15 20 Frequency (kHz) Frequency (kHz) Bandlimited sawtooth wave spectrum Trivial sawtooth wave spectrum 4 x 10 2 1.5 Frequency (Hz) 1 0.5 Frequency sweep of the trivial sawtooth wave 0 15 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)
Examples of Aliasing § Aliasing in Video – https://www.youtube.com/watch?v=QOqtdl2sJk0 – https://www.youtube.com/watch?v=jHS9JGkEOmA ( Note that video frame rate corresponds to the sampling rate ) 16
Sampling Rates § Determined by the bandwidth of signals or hearing limits – Consumer audio product: 44.1 kHz (CD) – Professional audio gears: 48/96/192 kHz – Speech communication: 8/16 kHz 17
Reconstruction in Frequency Domain § The sampled signal can be reconstructed by applying a low-pass filter in the frequency domain view f s /2 -f m f m f m f s -f s -f m 18
Reconstruction in Time Domain § The reconstruction corresponds to convolution with a sinc function in the time domain – The ideal low-pass corresponds to the sinc function – In practice, DACs are composed of sample-and-hold and low-pass filtering circuitry Frequency domain Before sampling sinc( x ) = sin( π x ) π x After sampling Time domain Reconstruction sinc functions! 19
Quantization § Discretizing the amplitude of real-valued signals – Round the amplitude to the nearest discrete steps – The discrete steps are determined by the number of bit bits • Audio CD: 16 bits (-2 15 ~ 2 15 -1) ß B bits (-2 B-1 ~ 2 B-1 -1) 20
Quick Review: Number Representations on Computer Fixed-point number § – Unsigned: 0 ~ 2 ^B -1 • 8 bits: 0 (0x00000000) ~ 255 (0x11111111) – Signed: -2 ^(B-1) ~ 2 ^(B-1)- 1 … • 8 bits: -128 (0x10000000) ~ 127 (0x01111111) • Audio signals are usually represented with signed numbers – 8 or 16 bits are popular choices – WAV file format B bits Floating-point number § – Composed of sign, exponent and mantissa – The represented number is (-1) s x m x 2 e (base 2) or (-1) s x m x 10 e (base 10) – Examples • 1.653 à 1653 x 10 -3 (s = 0, e=-3, m = 1653) Sine Exponent Mantissa • -1329.6 à (-1) x 13296 x 10 -1 (s = -1, e=-1, m = 13296) s e m – The floating point can represent a much wider range of numbers – 32 or 64 bits are popular choices – Internal processing in DAW 21
Quantization Error § Quantization causes noise – Average power of quantization noise: obtained from the probability density function (PDF) of the error P ( e ) Root mean square (RMS) of noise 1 1/2 112 x 2 p ( e ) dx ∫ = − 1/2 -1/2 1/2 § Signal to Noise Ratio (SNR) RMS of full-scale sine wave – Based on RMS 2 B − 1 / S rms 2 (With 16bits, SNR = 98.08dB) 20log 10 = 20log 10 = 6.02 B + 1.76 dB N rms 112 – Based on the max levels 2 B − 1 S max = 6.02 B dB (With 16bits, SNR = 96.32 dB) 20log 10 = 20log 10 12 N max 22
Dynamic Range § Dynamic range Again, RMS of full-scale sine wave – The ratio between the loudest and softest levels for both loudest and softest 2 B − 1 / S rms,max 2 (With 16bits, DR = 90.31 dB) 20log 10 = 20log 10 = 6.02 B − 6 S rms,min 1/ 2 § Human ear’s dynamic range – Depending on frequency band 23 Equal Loudness Curve
Clipping and Headroom § Clipping – Non-linear distortion that occurs when a signal is above the max level § Headroom – Margin between the peak level and the max level In digital audio, 0dB is regarded as the maximum level Clipping 0 dB Max level Head room B = 16 bits -90.31 dB Min level -98.08 dB Noise floor (By quantization) 24
Dithering § Note that the SNR for the quantization noise depends on signal levels – As the signal level goes down, SNR decreases – Low-level signals can have colored noise § Dithering – Adding a small white noise to the signal before sampling (or high to low bit conversion) x ( t ) = x ( t ) + n dithering ( t ) ! – This adds white noise but coloration is prevented – The amount is the order of 3dB No dithering X ( ω ) See the added white noise. This is less annoying With dithering than the colored noise X ( ω ) ! by quantization 25
Compression § Lossy compression – Perceptual audio coding: leverage human perception of tones – E.g. MP3 (.mp3), AAC (.mp4, m4a, ..), AC3 (Dolby DVD, …) § Lossless compression – Redundancy reduction: Huffman coding, arithmetic coding – E.g. FLAC 26
Perceptual Coding § Leverage the auditory masking phenomenon – Decrease the dynamic range in cochlea – The masked threshold depend on the tone frequency and critical bands – Allocate bits according to the signal-to-masking ratio masking tone absolute threshold Intensity / dB masked threshold log freq asis of MPEG Audio Borrowed from D. Ellis’ E4896 slides 27
Huffman Coding § Assigning bits according to the statistics of each source 0 0 (0.4) 0.4 10 1 (0.35) 1 1 110 2 (0.2) 0.6 11 111 0.25 3 (0.05) 1* 0.4 + 2* 0.35 + 3*0.2 + 3*0.05 = 1.85 bits à Save 0.15 bits Probability 28
Recommend
More recommend