message sound message p wolf sound p sound wolf x p wolf
play

? Message sound Message P(wolf|sound) P(sound| wolf) x P(wolf) 1 - PDF document

9/4/19 Speech Hynek Hermansky Elecrical and Computer Engineering Hackerman 324Fp ? Message sound Message P(wolf|sound) P(sound| wolf) x P(wolf) 1 9/4/19 P(sound| wolf) no wolf wolf loudness timbre (sound color) More


  1. 9/4/19 Speech Hynek Hermansky Elecrical and Computer Engineering Hackerman 324Fp ? Message sound Message P(wolf|sound) ≈ P(sound| wolf) x P(wolf) 1

  2. 9/4/19 P(sound| wolf) no wolf wolf loudness timbre (sound “color”) More dimensions of the sound – better chance to recognize it Environment Hearing (communication) (survival) Evolution of hearing Evolution of speech 200 000 years 200 000 000 years Pristerodon Homo sapiens We hear to survive We speak to hear …. sensory neurons are adapted We speak in order to be heard to the statistical properties of the and need to be heard in order signals to which they are exposed. to be understood. Simoncelli and Olshausen Jakobson and Waugh p.95 Human speech evolved to fit properties of human hearing 2

  3. 9/4/19 Human vocal tract means for generation many different sounds (many dimensions) breathing eating nasal cavity biting tongue mouth velum teeth lips larynx speaking? lungs Message Message word ≈ P(wolf|word) P(word| wolf) x P(wolf) When more than one signal (e.g., audio and visual) P(o|x 1 ,x 2 )= P(x 1 |o)P(x 2 |o) P(o) / P(x 1 )P(x 2 ) 3

  4. 9/4/19 McGurk effect acoustic /ba/ and visual /ga/ yields /da/ or /tha/ HEARING 4

  5. 9/4/19 Physiology of Hearing middle inner ear outer ear ear hammer eardrum anvil stirrup oval window to higher processing levels hairs tectorial membrane basilar membrane round window 5

  6. 9/4/19 Basilar membrane as a mechanical frequency analyzer pliable stiff 0.05 mm apical 0.5 mm basal end end 500 Hz 100 Hz basilar membrane movements => bending of hair cells => electrical pulses ~ 40 hairs/cell ~ 140 hairs/cell tectorial outer middl inner ear ear e membrane ear organ of Corti tunnel of corti basilar membrane inner outer hair cells hair cells auditory nerve auditory nerve fiber fiber inner hair cells – firmly connected only to basilar membrane - information outer hair cells – firmly connected to both the tectorial and the basilar membranes - govern cochlear mechanics (cochlear amplifier - positive feedback) 6

  7. 9/4/19 message? who ? where from? inter-spike number of interval spiking neurons ~100 ms ~100,000,000 up to 10, 000,000 bottom-up connections top-down connections active in a given task sensory organ ~1 ms ~100,000 speech signal • massive increase in number of neurons from lower processing levels to cortex • decrease in average spiking rates from periphery to cortex • spikes in cortex are sparse (< 5% of cortical neurons active at any moment) Hromadka et al PLOS Biology 2008 7

  8. 9/4/19 TONOTOPY APEX BASE different frequencies excite different parts of the cortex processing stages different frequencies excite different parts of the cochleaa apex low frequencies base high frequencies Sensitivity of hearing 8

  9. 9/4/19 Simultaneous masking masker threshold threshpldth target masked threshold Frequency selectivity of hearing (Critical bands of hearing) 18 9

  10. 9/4/19 SPEAKING breathing eating nasal cavity biting tongue mouth velum teeth lips larynx speaking? lungs 10

  11. 9/4/19 INFORMATION ABOUT TRACT SHAPES DISTRIBUTED IN FREQUENCY shape of motor critical elements the whole control (tongue, lips, velum) vocal tract spectrum of speech signal (redundant contributions of movements of critical elements in different frequency bands) INFORMATION ABOUT TRACT SHAPES DISTRIBUTED IN TIME intended speech sounds sluggishness of vocal organs produced speech sounds from Sri Narajanan movements of vocal organs are rather sluggish 11

  12. 9/4/19 Carrier nature of speech (Dudley 1940) message in movements of modulator vocal tract voiced or unvoiced message carrier to make the tract modulated movements audible carrier Linear model of speech production (Chiba and Kajiyama 1942) source filter filtered source signal 12

  13. 9/4/19 vocal source vocal tract speech signal signal vocal tract shape contributions vocal source contributions APEX BASE brain nasal cavity tongue mouth velum teeth lips larynx Medial geniculate body Inferior colliculus Superior olive Cochlear nucleus lungs Auditory nerve ear Redundant spread of information • every change of the tract shape shows at all frequencies of speech spectrum • frequency selective (about 20 bands) • tract shape changes do not happen very fast • sluggish (tenths of seconds) 13

  14. 9/4/19 decoding < 50 bps coding > 50 kbs < 50 bps use introduce redundancies speech message message redundancies for reliable signal in frequency extraction of and in time the message noise PRODUCTION TRANSMISSION PERCEPTION redundancy in frequency production : tract acoustics distributes the information to all frequencies of the speech spectrum perception: hearing selectivity allows for decoding the information in separate frequency bands redundancy in time production: tract sluggishness (coarticulation) distributes information about each speech sound in time perception: temporal sluggishness of hearing collect the information distributed in time PRODUCTION intended sound redundancies in time redundancies in frequency representation of speech through sounds in frequency and in sequence through sluggishness of a vocal tract time effect of tract movements on speech spectrum movements of speech vocal tract signal frequency time time vocal tract physiology vocal tract acoustics PERCEPTION representations of sound perceived sound cortical time- sequences in individual sequence frequency filters streams fusion of corrupted formation of spectral multiple speech signal streams streams frequency frequency frequency frequency time time time time periphery higher perceptual levels metacognitive performance ~100 000 active neurons ~10 000 000 active neurons monitoring ~1000 Hz firing rates ~10 Hz firing rates 14

Recommend


More recommend