Voice Capture and Analysis Cody Narber Computer and Information - PowerPoint PPT Presentation

Voice Capture and Analysis Cody Narber Computer and Information Science Department Kansas State University

Frequency Frequency is a measure of repeating events per unit time. In audio it is the measure of air pulses per second. The main unit of measurement is Hertz (Hz), which is 1/t, where t is the period of the wave (shown below). Every signal can be expressed as a sum of sine and cosine terms. This is known as the Fourier Theorem and is the basis for the Fourier Transform , which decomposes a signal into these parts. Efficient algorithms exist to approximate this decomposition (namely the FFT). Thus we can apply the FFT to an audio signal to extract the frequency terms that comprise the signal.

Spectrum The frequency spectrum is the plotting of the frequency and the corresponding amplitudes that are present in the signal. The amplitude is the height of the peaks in the sinusoidal waves that compose the signal, or the strength of that frequency present. A spectrogram is a plotting of the frequency spectrum at each moment of time (darker areas are higher amplitudes, with the y-axis being frequency, and x-axis being time).

Formants Formants are peaks in the frequency spectrum, or the frequencies that are most prevalent in the signal. Several formants exist in spoken samples and are used for vocal recognition (table below showing the average frequencies that are associated with vowels). These peaks correspond to resonance in sound sources like musical instruments, or anything with sound chambers (for humans this would be the nasal and oral cavity). The fundamental frequency is the first formant (F 0 ) and is the pitch that humans detect. Vowel formant data from Peterson and Barney, 1952

Special Frequencies There are certain frequencies of Average Human Hearing Frequency sounds that are of special note. Lower High 20 Hz 20,000 Hz The hearing statistics are for healthy young adult. as people age their Average Human Spoken Frequency ability to hear the far end sounds Male Female decreases. 120 Hz 210 Hz Musical Notes using Equal-Tempered tuning [A4 = 440Hz] Note Octave=1 Octave=2 Octave=3 Octave=4 Octave=5 Octave=6 A 55 110 220 440 880 1,760 A#/Bb 58 117 233 466 932 1,865 B 62 123 247 494 988 1,976 C 65 131 262 523 1,047 2,093 C#/Db 69 139 277 554 1,109 2,217 D 73 147 294 587 1,175 2,349 D#/Eb 78 156 311 622 1,245 2,489 E 82 165 330 659 1,319 2,637 F 87 175 349 698 1,397 2,794 F#/Gb 92 185 370 740 1,480 2,960 G 98 196 392 784 1,568 3,136 G#/Ab 104 208 415 831 1,661 3,322 A 110 220 440 880 1,760 3,520

Voice, Hearing, and Microphones When speaking the vocal cords vibrate which closes the airway which stops and starts air flow. The air then resonates in the oral and nasal cavities. It is this stop and start of airflow that creates what are known as voiced sounds (ones that use the vocal cords, namely vowels). Latitudinal waves are created by this stopping and starting of airflow. The faster the cords vibrate the closer together the waves and thus higher frequency sounds are produced. Our eardrums pick up these compression/decompression waves by moving back and forth triggering neurons that send impulses to be deciphered our brain. Dynamic Microphones work in the same way, by having a plate that moves in and out, along a magnet. This movement of wires along the magnet creates electrical impulses, which is what is saved in the computer. image from http://www.mediacollege.com/audio/microphones/dynamic.html

Applications The purpose of studying voice and it's constructive parts (frequency, energy, formants, etc.) is for the variety of applications that can be explored. Some of these topics have not had much research done, and are topics that are gaining a lot of interest recently with newer and newer technological improvements. ● Voice Recognition (has improved a lot in the past couple of years) ● Voice Synthesis (using emotion and inflections to make it more realistic) ● Voice Emotional Analysis (clinical and wellness applications) ● Voice Stress Detection (lie detection, and operator state) ● Etc. The reason voice analysis is becoming more and more popular is because of it's non-invasive data capture (much like that of vision analysis of facial expression).

Voice Capture and Analysis Cody Narber Computer and Information - PowerPoint PPT Presentation

Voice Capture and Analysis Cody Narber Computer and Information Science Department Kansas State University Frequency Frequency is a measure of repeating events per unit time. In audio it is the measure of air pulses per second. The main unit

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

Desktop Capture 164.pdf Page 1 of 35 Made with Doceri Desktop Capture 164.pdf Page 2 of 35

Aisle Safety Light Brightness SFMTA Fleet Engineering Voice Annunciator Volume Voice

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

There is a voice speaking. That voice is sovereign. That voice alone is sovereign. Jeremiah

DOR Data Capture and Imaging Automation Presented by: Department of Revenue Data Capture and

Carbon Capture and Storage Value Chain Capture and Compression Large Stationary Sources Capture

Carbon Capture Technology Carbon Capture Technology Strategies Strategies ARPA- -E Carbon

Lecture Capture Project Powered by Much more than Lecture Capture (Replacing Echo360)

Cisco IOS Embedded Packet Capture (EPC) Cisco IOS Embedded Packet Capture (EPC) The Cisco IOS

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Getting Sta rted with Voice API Lorna Mitchell Getting Sta rted with Voice API Use the Voice

Capture the Spotlight: Improve Your Presentation Skills and Make Public Speaking Capture the

Capturing Free Energy Capture and Store Free Energy Autotrophs : capture free energy from the

Burst Spectrum as a Cue to Stop Consonant Voicing English Production and Perception Results

Statistical NLP Spring 2011 Lecture 4: Speech Recognition Dan Klein UC Berkeley Speech in a

Alkaline Activation as a procedure for the transformation of fly ashes into cementitious

Counting d.o.f.s in periodic frameworks Louis Theran (Aalto University / AScI, CS) Frameworks

Pitch (in speech) MATLAB tutorial series (Part 2.2) Pouyan Ebrahimbabaie Laboratory for Signal

The ABCs of MLT Aptitude vs. Achievement Who has music aptitude? An Introduction to Gordons

LCS 11: Cognitive Science Results of evaluations Perception in language acquisition Language

Speech recognition (briefly) Chapter 15, Section 6 Chapter 15, Section 6 1 Outline Speech