perceiving prosody in sinewave speech
play

Perceiving Prosody in Sinewave Speech A Sine of the Times Yasmine - PowerPoint PPT Presentation

Perceiving Prosody in Sinewave Speech A Sine of the Times Yasmine Sukola and Lissette Vizcarrondo Mentor: J. Nissenbaum Ph.D. NSF Grant No. 1659607 Formants. What are they and why are they so important? Formants are vocal tract resonances that


  1. Perceiving Prosody in Sinewave Speech A Sine of the Times Yasmine Sukola and Lissette Vizcarrondo Mentor: J. Nissenbaum Ph.D. NSF Grant No. 1659607

  2. Formants. What are they and why are they so important? Formants are vocal tract resonances that represent the phonetic quality of a vowel. Each formant can be identified by a formant number. The formants we will be using are the three main formants F1, F2, and F3. Image: www.pomaspace.com

  3. Harmonics Harmonics come from the vocal folds. ● The lowest harmonic (the fundamental ) is what we usually perceive as pitch. ● There are multiple harmonics in every sound in nature, only computers can ● create a singular harmonic (a sine wave ).

  4. Fundamental Frequency The human voice is a complex tone which means it is composed of many ● frequencies. ● Fundamental frequency is how we perceive pitch. ○ i.e. a male’s voice is generally perceived lower than a female’s voice as this is due to a lower fundamental frequency. We change our fundamental frequency for many reasons ● ○ When singing a musical scale ○ When asking a question When a word is given stress for emphasis ○

  5. Sine Wave Speech Sine wave speech (SWS) is a form of computer generated sound designed to ● be a highly abstract representation of speech. ● SWS can be described as sounding like “whistles” or “sci - fi” in nature. ● SWS can be perceived as speech, even in the absence of ordinary, or natural acoustic cues such as broadband formants.

  6. Why study Sine Wave speech? ● Because listeners are able to understand SWS as speech, it has proven useful as a tool for investigating perceptual primitives of speech. ● SWS is important because it shows the bare minimum the human brain needs in order to detect an acoustic signal as a speech utterance.

  7. SWS and Pitch Perception ● While sometimes intelligible, SWS contains none of the information relevant for pitch perception, making it unsuitable for investigating prosody in English. ● So, in SWS you may not have trouble distinguishing what was said, but how it was said.

  8. Intonation and Focus In English, intonation patterns are how speakers adjust the pitch of their voice ● to convey meaning. Remember, SWS is harmonically independent, there is no Fundamental Frequency, so prosodic features are not heard in SWS! ● Let’s take the phrase: “After I told you not to” ○ Same sentence, but different meanings because of the intonation pattern ○ Intonation can distinguish between a statement and a question.

  9. Broad Research Aim ● Is it possible to overcome the limitation of SWS ○ Specifically, we want to be able to minimally change the way SWS is produced, in a way that both ■ Preserves the highly abstract character of SWS (useful for studying speech perception), but also ■ Provides a perceptual cue for pitch

  10. Modification: Shepard-Risset Tone ● Psychoacoustic illusion made from multiple sine waves that rise or fall in pitch simultaneously. Each sine wave (in turn) drops an octave, and then continues to rise. When played on a continuous loop, listeners perceive an infinitely ascending or descending harmonic tone.

  11. Procedure: Step One

  12. Procedure: Step Two

  13. Procedure: Step Three

  14. Procedure: Step Four

  15. Our Experiment What is Question Answer Congruence (QAC)? ● In response to a Wh- question, an appropriate answer will have focus on the corresponding constituent.

  16. Question-Answer Congruence Example Question: Who is doing their homework? Answer 1: Eric is doing his homework . ( Incongruent answer is appropriate to question) Answer 2: Eric is doing his homework. ( Congruent answer is appropriate to question)

  17. Eric is doing his homework. Step 2 . Step 1 Step 4 Step 3

  18. Modifying Pitch Contours Eric Modified Homework Modified

  19. Who is doing their homework? cont. Eric is doing his homework. modified Eric is doing his [homework] F [Eric] F is doing his homework.

  20. Our Experiment: Question-Answer Congruence ● Participants will be presented with three different types of stimulus blocks: natural speech, modified SWS, and unmodified SWS. ● Test questions have three possible answers, each containing a different focus word. One focus word is considered the appropriate focus word for the answer, the other would make the answer sound unnatural and inappropriate, and the third is a completely unrelated answer to the question.

  21. Our Experiment: Question-Answer Congruence Continued ● After the test question and one of the possible answers are presented to the participant as an auditory stimulus, the participant is shown a written question: “Is the answer appropriate?” ● The available answers presented will be: “Appropriate” or “Inappropriate”

  22. Predicted Results: Question-Answer Congruence ● For the recordings of natural speech, it is expected that listeners will consistently choose “appropriate” or “inappropriate” according to the word that is focused in the answer sentence, signaled by a pitch peak. ● For the unmodified SWS condition, listeners are expected to answer “appropriate” 100% of the time. ● If the method for creating SWS is successful, then participants’ perception of focus will be sensitive to the location of intended prosodic prominence.

  23. Broader Impact and Future studies Cochlear implants utilize noise-vocoded speech which can be described as a whispered but extremely distorted speech. While SWS is an abstract form of speech, we hope our research can be implemented in the use of cochlear implants to include pitch perception.

  24. References 1. Remez, Robert, and Philip Rubin. 1990. On the perception of speech from time-varying acoustic information: contributions of amplitude variation. Perception and Psychophysics 48.4: 313 – 325. 2. Remez, Robert, and Philip Rubin. 1984. On the perception of intonation from sinusoidal sentences. Perception and Psychophysics 35.5: 429 – 440. 3. Remez, Robert, and Philip Rubin. 1993. On the intonation of sinusoidal sentences. Journal of the Acoustical Society of America SR-113, 33 – 40. 4. Risset, Jean-Claude. 1971. Paradoxes de hauteur: Le concept de hauteur sonore n'est pas la même pour tout le monde. In Proceedings of the Seventh International Congress on Acoustics , S10, 613 – 616. 5. Krifka, M. 2006. Association with focus phrases. In Molnar, V. & S. Winkler, eds. The Architecture of Focus, Mouton de Gruyter, Berlin. 105-136.

Recommend


More recommend