CTP431- Music and Audio Computing Fundamentals of Sound and Digital - PowerPoint PPT Presentation

CTP431- Music and Audio Computing Fundamentals of Sound and Digital Audio Graduate School of Culture Technology KAIST Juhan Nam 1

Outlines § What is Sound? § Sound Properties – Loudness – Pitch – Timbre § Digital Representation of Sound – Sampling – Quantization 2

What Is Sound? § Vibration of air that you can hear – Compression and rarefaction of air pressure Perception Propagation Production Vibration on materials Traveling via the air Sensation of the air vibration (e.g. string, pipe, membrane) through ears Physical Psychological 3

Physical Sound § Governed by “Newton’s law” and ”Wave” properties § Sound production and propagation in musical instruments 1. Drive force on a sound object 2. Vibration by restoration force 3. Propagation 4. Reflection 5. Superposition 6. Standing Wave (modes): generate a tone Demos 7. Radiation from the object 8. Propagation through air http://www.acs.psu.edu/drussell/demos.html https://www.youtube.com/watch?v=_X72on6CSL0 4

Psychological Sound § Governed by ears (physiological sense) and brain (cognitive sense) – human auditory system § Ears – A series of highly sensitive transducers – Transform sound into subband signals Electric § Brain (Cook, 1999) – Segregate and organize the auditory stimulus Fluid Air Mechanical – Recognize loudness, pitch and timbre Auditory Transduction Video http://www.youtube.com/watch?v=PeTriGTENoc 5

Sound Properties Loudness Amplitude Frequency Pitch Waveshape Timbre Time Envelop (ADSR) Spectral Envelope (Modes) … Physical Psychological 6

Loudness § Perceptual correlate of sound intensity § Sound Pressure Level (SPL) – Objective measure of sound intensity – Log scale: 20log 10 ( P / P 0 ) 0 = 20 µ Pa : threshold of human hearing P – Loudness is proportional to SPL but not exactly § Equal-Loudness Curve – Most sensitive to 2-5KHz tones – Threshold of hearing Equal-Loudness Curve (also called Fetcher-Munson Curve) 7

4000 Pitch 3500 3000 § Perceptual correlate of fundamental 2500 frequency − Hz 2000 frequency (F0) 1500 1000 § Pitch Scale 500 – Human ears are sensitive to frequency changes 0 10 20 30 40 50 time [second] in a log scale Chromatic Scale of Piano notes (Linear Frequency) • Ex) Piano note scale 120 100 § Frequency Range of Hearing MIDI note number 80 – 20 to 20kHz 60 40 20 10 20 30 40 50 time [second] Chromatic Scale of Piano notes 8 (Log Frequency)

Timbre § Related to identifying a particular sound object – Musical instruments, human voices, … § Determined by multiple physical attributes – Time envelope (ADSR) – Spectral envelope – Changes of spectral envelope and fundamental ADSR frequency – Harmonicity: ratio between tonal and noise-like characteristics – The onset of a sound differing notably from the sustained vibration Changes of spectral envelope 9

Timbre § Determined by multiple parameters – Perspective of sound synthesis Source: http://www.matrixsynth.com/2011/05/kid-with-buchla.html 10

Digital Audio Chain …0 0 1 0 1 0 … 11

Microphones / Speakers § Microphones – Air vibration to electrical signal – Dynamic / condenser microphones – The signal is very weak: use of pre-amp § Speakers – Electrical signal to air vibration – Generate some distortion (by diaphragm) – Crossover networks: woofer / tweeter 12

Sampling • Convert continuous-time signal to discrete-time signal by periodically picking up the instantaneous values – Represented as a sequence of numbers; pulse code modulation (PCM) – Sampling period ( T s ): the amount of time between samples – Sampling rate ( f s = 1/ T s ) Signal notation T s x ( t ) → x ( nT s ) 13

Sampling Theorem § What is an appropriate sampling rate? – Too high: increase data rate – Too low: become hard to reconstruct the original signal § Sampling Theorem – In order for a band-limited signal to be reconstructed fully, the sampling rate must be greater than twice the maximum frequency in the signal f s > 2 ⋅ f m f s – Half the sampling rate is called Nyquist frequency ( ) 2 14

Sampling in Frequency Domain § Sampling in time creates imaginary content of the original at every f s frequency -f m f m f m f s -f s -f m f s -f m f s +f m To avoid overlap f m < f s − f m § Why ? f 2 = f 1 ± mf s x 1 ( t ) = A sin( ω 1 t ) = A sin(2 π f 1 n / f s ) x 2 ( t ) = A sin( ω 2 t ) = A sin(2 π f 2 n / f s ) = A sin(2 π ( f 1 ± mf s ) n / f s ) = A sin(2 π f 1 n / f s ± 2 π mn ) = A sin(2 π f 1 n / f s ) = x 1 ( t ) 15

Aliasing § If the sampling rate is less than twice the maximum frequency, the high- frequency content is folded over to lower frequency range 1 0.8 0.6 0.4 0.2 0 − 0.2 − 0.4 − 0.6 − 0.8 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 4 x 10 16

Aliasing in Frequency Domain § The high-frequency content is folded over to lower frequency range from the replicated images -f s -f m f s -f m f s f m f s +f m § A low-pass filter is applied before sampling to avoid the aliasing noise -f s /2 f s /2 f s -f s 17

Example of Aliasing 0 0 Magnitude (dB) Magnitude (dB) − 20 − 20 − 40 − 40 − 60 − 60 5 10 15 20 5 10 15 20 Frequency (kHz) Frequency (kHz) Bandlimited sawtooth wave spectrum Trivial sawtooth wave spectrum 4 x 10 2 1.5 Frequency (Hz) 1 0.5 Frequency sweep of the trivial sawtooth wave 0 18 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)

Example of Aliasing § Aliasing in Video – https://www.youtube.com/watch?v=QOqtdl2sJk0 – https://www.youtube.com/watch?v=jHS9JGkEOmA ( Note that video frame rate corresponds to the sampling rate ) 19

Sampling Rates § Determined by the bandwidth of signals or hearing limits – Consumer audio product: 44.1 kHz (CD) – Professional audio gears: 48/96/192 kHz – Speech communication: 8/16 kHz 20

Quantization § Discretizing the amplitude of real-valued signals – Round the amplitude to the nearest discrete steps – The discrete steps are determined by the number of bit bits • Audio CD: 16 bits (-2 15 ~ 2 15 -1) ß B bits (-2 B-1 ~ 2 B-1 -1) 21

Quantization Error § Quantization causes noise – Average power of quantization noise: obtained from the probability density function (PDF) of the error P ( e ) Root mean square (RMS) of noise 1 1/2 112 x 2 p ( e ) dx ∫ = − 1/2 -1/2 1/2 § Signal to Noise Ratio (SNR) RMS of full-scale sine wave – Based on average power 2 B − 1 / S rms 2 (With 16bits, SNR = 98.08dB) 20log 10 = 20log 10 = 6.02 B + 1.76 dB N rms 112 – Based on the max levels 2 B − 1 S max = 6.02 B dB (With 16bits, SNR = 96.32 dB) 20log 10 = 20log 10 12 N max 22

CTP431- Music and Audio Computing Fundamentals of Sound and Digital - PowerPoint PPT Presentation

CTP431- Music and Audio Computing Fundamentals of Sound and Digital Audio Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines What is Sound? Sound Properties Loudness Pitch Timbre Digital Representation of

CTP431- Music and Audio Computing Sound Synthesis Graduate School of Culture Technology KAIST

CTP431- Music and Audio Computing Sound Synthesis Graduate School of Culture Technology KAIST

CTP431- Music and Audio Computing Digital Audio Graduate School of Culture Technology KAIST

CTP431- Music and Audio Computing Music Information Retrieval Graduate School of Culture

CTP431- Music and Audio Computing Spectral Analysis Graduate School of Culture Technology KAIST

CTP431- Music and Audio Computing Digital Audio Effects Graduate School of Culture Technology

CTP431- Music and Audio Computing Spectral Analysis Graduate School of Culture Technology KAIST

CTP431- Music and Audio Computing Audio Signal Processing (Part #1) Graduate School of Culture

CTP431- Music and Audio Computing Musical Interface Graduate School of Culture Technology KAIST

CTP431- Music and Audio Computing Audio Signal Processing (Part #2) Graduate School of Culture

CTP431- Music and Audio Computing, Fall 2017 Introduction Graduate School of Culture Technology,

CTP431- Music and Audio Computing Musical Interface and Sequencer Graduate School of Culture

Sound Synthesis (Part 2) Graduate School of Culture Technology, KAIST Juhan Nam Category of

Fundamentals of Audio Programming Bjorn Roche XO Audio, LLC Who Am I? Software Designer

Fundamentals of Musical Acoustics Graduate School of Culture Technology, KAIST Juhan Nam

Sound Synthesis (Part 1) Graduate School of Culture Technology, KAIST Juhan Nam Outlines

Sound File Formats Raw data has samples (interleaved w/stereo) Need way to parse raw

GCT535- Sound Technology for Multimedia Music and Audio Alignment Graduate School of Culture

Digital Audio Graduate School of Culture Technology, KAIST Juhan Nam Outlines Introduction

Digital Audio Effects Graduate School of Culture Technology, KAIST Juhan Nam Introduction

Automatic Music Generation Graduate School of Culture Technology, KAIST Juhan Nam Outlines

Music Synchronization Meinard Mller International Audio Laboratories Erlangen

Music Representations Meinard Mller International Audio Laboratories Erlangen

Audio Decomposition Meinard Mller International Audio Laboratories Erlangen