Understanding Dyadic Human Spoken Interactions Using Speech Processing Techniques: Case studies in Autism Spectrum Disorder (ASD) and behavioral Couple Therapy Jeremy 李祈均 *Materials in this presentation partially comes from Daniel Bone, Dr. Matt Black, Prof. Panos Georgiou, Prof. Shri Narayanan
Picture credit to the USC SAIL lab: http://sail.usc.edu 2
What is BSP? Employ and advance signal processing and machine learning to sense human behaviors • Aid in, and transform the traditional observational methods • Focus on mental health research and practice Many benefits: speedup, parallel observation capabilities, large scale trends etc. • Significance: USA-- ‐ 10mil people receive psychotherapy every year, increasing! • State of the art hasn’t changed for decades 3
Mental health: traditional observational study * Picture credit to S. Narayanan and P. G. Georgiou, "Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language: Computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, 4 commerce, education, and beyond," Proc IEEE Inst Electr Electron Eng, vol. 101, pp. 1203-1233, Feb 7 2013.
Mental health: putting BSP in the loop * Picture credit to S. Narayanan and P. G. Georgiou, "Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language: Computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, 5 commerce, education, and beyond," Proc IEEE Inst Electr Electron Eng, vol. 101, pp. 1203-1233, Feb 7 2013.
6
Case Study I Domain: behavioral couple therapy Specifics: problem solving interactions as part of IBCT Engineering Task: interaction modeling (vocal synchrony quantification) Chi-Chun Lee, Athanasios Katsamanis, Matthew Black, Brian Baucom, Andrew Christensen, Panayiotis G. Georgiou, and Shrikanth S. Narayanan, "Compute Vocal Entrainment: A Signal-Derived PCA-based quantification Scheme with Application for Affect Analysis in Married Couple Interactions ", in: Journal of 7 Computer Speech and Language, 28(2): 518-539 doi:10.1016/j.csl.2012.06.006
Couple therapy: Integrated Behavior Couple Therapy 8
Couple therapy database • Collaborative work between UCLA and UW • 134 seriously and chronically distressed REAL couples • 10 minutes long problem-solving spoken interactions • Audio-video recording (far-field microphone, varying noise conditions) • 33 global ratings of behavioral codes for each spouse (SSIRS, CIRS) 372 Sessions 90 hours of data • • Manual transcripts available Studying this large amount of data in a spontaneous fashion 9
Automatic pre-processing: automatic speaker segmentation • Segment the sessions into meaningful regions – Recursive automatic speech-text alignment technique [Moreno 1998] – Session split into regions: wife/husband/unknown – Segmented >60% of sessions’ words into wife/husband regions for 293/574 sessions Example: Aligned Text MFCC = Mel-Frequency Cepstral Coefficients AM = Acoustic Model “… that she’s known ASR = Automatic Speech Recognition LM = Language Model for five months and HYP = ASR Hypothesized Transcript Dict = Dictionary didn’t tell me …” Application Domain 2: Couples Therapy 10 / 55 *slide content credit to Dr. Matthew P. Black Research
Automatic acoustic feature extraction: LLDs computation • Acoustic features shown to be relevant (e.g., [Gottman 1977, Yildirim et al. 2010]) • 11 low-level descriptors (LLDs) extracted every 10ms with 25ms window – Voice Activity Detector (VAD), speaking rate, pitch, energy, harmonics-to-noise ratio, voice quality, 13 MFCCs, 26 MFBs, magnitude of spectral centroid, spectral flux • Each session split into 3 “domains” : wife, husband, speaker-independent • 13 statistics (mean, std. dev. …) across each domain for each LLD – 2000 features capture the global acoustic properties for each spouse Application Domain 2: Couples Therapy 11 / 55 *slide content credit to Dr. Matthew P. Black Research
What is vocal synchrony? • Definition: • Naturally-spontaneous behavioral matching between dyadic social interactions • Purpose in human interactions • Achieving communication efficiency* – unintentional effort • Communicating interest and engagement* – conscious effort • Psychological significance in theory and practice • Learning and memory in child-parent interactions • Regulating emotion processes* • Precursor to empathy • Mirroring neurons No quantification method is present! Can we do it even when it’s not possible for human perception – no ground truth 12
Unsupervised signal-derived method 13
Verification 14
Study of behavioral codes and vocal synchrony 15
Utilization as features for affect recognition application 16
Utilization as quantitative metrics for clinical analysis via MLM Analysis p <0.001, not significant within-partner husband-to-wife wife-to-husband entrainment entrainment wife demander/husband wife demander/husband withdrawer polarization withdrawer polarization Wife-demander/Husband-withdrawer p <0.01 p <0.001 Clinical Implications between-partner between-partner Behavioral Informatics husband-to-wife wife-to-husband entrainment entrainment wife demander/husband wife demander/husband withdrawer polarization withdrawer polarization 04/11/2013 17
Case Study II Domain: Autism spectrum disorder Specifics: ADOS III Interview session Engineering Task: interaction modeling (atypical prosody quantification) Daniel Bone, Chi-Chun Lee, Matthew Black, Marian Williams, Sungbok Lee, Pat Levitt, and Shrikanth S. Narayanan, “The Psychologist as an Interlocutor in ASD Assessment: Insights from a Study of Spontaneous Prosody”, in: Journal of Speech, Language, and Hearing Research 2014 Feb 11. doi: 10.1044/2014_JSLHR-S-13-0062 18 *slide content credit to Daniel Bone
Autism Spectrum Disorder: ADOS session 19
ADOS – Module 3: behavioral codes 20
• ADOS semi-structured assessment framework • Used to help psychologists diagnose autism (one popular tool) • Subject interacts with a psychologist for ~30-45 minutes • Constrained developmentally-appropriate tasks • 4 modules, depending on expressive language level and age • Module 1 (less than phrase speech): Free play, response to joint attention • Module 2 (some phrase speech): Joint interactive play, bubble play • Module 3 (verbally fluent): Make-believe play, telling a story from a book • Module 4 (verbally fluent adolescents/adults): More interview style • Psychologist rate the child’s socio -communicative skills • e.g., speech abnormalities (intonation/volume/rhythm/rate) • e.g., reciprocal social interaction (unusual eye contact) • Scores on sub-assessments added, and total score is used to diagnose ASD • Psychologists trained to administer ADOS using stringent training protocol 21
Atypical Prosody • Prosody refers to the way in which something is said (rhythm) • Intonation, Volume, Rate, and Voice Quality • Critical role in expressivity and social-affective reciprocity • Variety of abnormalities • Monotonous • Atypical lexical stress and pragmatic prosody • Speaking Rate • “Bizarre” quality to speech • Qualitative descriptions are general and contrasting, “bizarre” "slow, rapid, jerky and irregular in rhythm, odd intonation or inappropriate pitch and stress, markedly flat and toneless, or consistently abnormal volume” -[Lord et al. 2003] 22
USC CARE Corpus • Child-psychologist ADOS interactions • ADOS- Autism Diagnostic Observation Schedule. [Lord et al., 2000] • Multimodal : 2 HD video and 2 far-field microphones (ecological validity) 23
Experimental Setup: Subject Sample • Analysis focused on subjects administered the ADOS Module 3 • Verbally fluent children and young adults • 30 sessions total, 28 appropriate for analysis • Manual transcription and segmentation • Transcription : spoken words, non-verbal communication, and vocalizations • Segmentation : single speaker utterances, temporal markings • Psychologists • Three trained clinical psychologists conducted the ADOS sessions • Each psychologist administered ~9 sessions 24
Experimental Setup: Labels • Coding • 60 minute session/14 subtasks • 28 codes scored by psychologist that is interacting with child • Not all codes used • Code of Interest – Speech Abnormalities Associated with Autism • Scored on an integer scale from ‘0’ (appropriate) to ‘2’ (clearly abnormal) • Code of Interest-ADOS Totals • ADOS totals relate to ‘Severity’ of autism spectrum disorder • Three total codes: Communication, Social Interaction, and C.+S.I. • Higher resolution, (min. 0, max. 8-22) • Spearman’s ρ =0.74 (p<10e-6) for Speech Abnormality and C.+S.I. Total 25
Recommend
More recommend