exploring demographic language variations to improve
play

Exploring Demographic Language Variations to Improve Multilingual - PowerPoint PPT Presentation

Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media Svitlana Volkova 1 , Theresa Wilson 2 and David Yarowsky 1 , 2 , 1 Center for Language and Speech Processing, Johns Hopkins University 2


  1. Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media Svitlana Volkova 1 , Theresa Wilson 2 and David Yarowsky 1 , 2 , 1 Center for Language and Speech Processing, Johns Hopkins University 2 Human-Language technology Center of Excellence

  2. Motivation S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

  3. Motivation Demographic language variations (DLV) have been studied by socio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005; Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006). S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

  4. Motivation Demographic language variations (DLV) have been studied by socio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005; Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006). DLV have been recently explored in personal email communication, blog posts, and public discussions (Boneva et al., 2001; Mohammad & Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012) S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

  5. Motivation Demographic language variations (DLV) have been studied by socio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005; Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006). DLV have been recently explored in personal email communication, blog posts, and public discussions (Boneva et al., 2001; Mohammad & Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012) We propose to study differences in subjective language in social media to support commercial applications: S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

  6. Motivation Demographic language variations (DLV) have been studied by socio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005; Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006). DLV have been recently explored in personal email communication, blog posts, and public discussions (Boneva et al., 2001; Mohammad & Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012) We propose to study differences in subjective language in social media to support commercial applications: personalized recommendation systems and targeted online advertising (Fan & Chang, 2009), S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

  7. Motivation Demographic language variations (DLV) have been studied by socio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005; Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006). DLV have been recently explored in personal email communication, blog posts, and public discussions (Boneva et al., 2001; Mohammad & Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012) We propose to study differences in subjective language in social media to support commercial applications: personalized recommendation systems and targeted online advertising (Fan & Chang, 2009), detecting helpful product reviews (Ott et al., 2011), S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

  8. Motivation Demographic language variations (DLV) have been studied by socio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005; Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006). DLV have been recently explored in personal email communication, blog posts, and public discussions (Boneva et al., 2001; Mohammad & Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012) We propose to study differences in subjective language in social media to support commercial applications: personalized recommendation systems and targeted online advertising (Fan & Chang, 2009), detecting helpful product reviews (Ott et al., 2011), tracking sentiment in real time (Resnik, 2013), S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

  9. Motivation Demographic language variations (DLV) have been studied by socio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005; Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006). DLV have been recently explored in personal email communication, blog posts, and public discussions (Boneva et al., 2001; Mohammad & Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012) We propose to study differences in subjective language in social media to support commercial applications: personalized recommendation systems and targeted online advertising (Fan & Chang, 2009), detecting helpful product reviews (Ott et al., 2011), tracking sentiment in real time (Resnik, 2013), large-scale, low-cost, passive polling (O’Connor et al., 2010). S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

  10. Motivation Male ♂ and Female ♀ Twitter users use subjective terms differently: ♀ + “Chocolate is my weakness ” S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 3 / 35

  11. Motivation Male ♂ and Female ♀ Twitter users use subjective terms differently: ♀ + “Chocolate is my weakness ” ♂ − “Clearly they know our weakness . Argggg....” S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 3 / 35

  12. Motivation Male ♂ and Female ♀ Twitter users use subjective terms differently: ♀ + “Chocolate is my weakness ” ♂ − “Clearly they know our weakness . Argggg....” S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 3 / 35

  13. Motivation Male ♂ and Female ♀ Twitter users use subjective terms differently: ♀ + “Chocolate is my weakness ” ♂ − “Clearly they know our weakness . Argggg....” S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 4 / 35

  14. Motivation Male ♂ and Female ♀ Twitter users use subjective terms differently: ♀ + “Chocolate is my weakness ” ♂ − “Clearly they know our weakness . Argggg....” S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 5 / 35

  15. Motivation Male ♂ and Female ♀ Twitter users use subjective terms differently: ♀ + “Chocolate is my weakness ” ♂ − “Clearly they know our weakness . Argggg....” S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 6 / 35

  16. Goal I. Explore gender bias in the use of subjective language in Twitter: S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35

  17. Goal I. Explore gender bias in the use of subjective language in Twitter: investigate multilingual subjective lexical variations; S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35

  18. Goal I. Explore gender bias in the use of subjective language in Twitter: investigate multilingual subjective lexical variations; cross-cultural emoticon and hashtag usage. S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35

  19. Goal I. Explore gender bias in the use of subjective language in Twitter: investigate multilingual subjective lexical variations; cross-cultural emoticon and hashtag usage. II. Incorporate gender bias into models to improve sentiment analysis for English, Spanish, and Russian: S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35

  20. Goal I. Explore gender bias in the use of subjective language in Twitter: investigate multilingual subjective lexical variations; cross-cultural emoticon and hashtag usage. II. Incorporate gender bias into models to improve sentiment analysis for English, Spanish, and Russian: demonstrate that simple, binary features representing author gender are insufficient for gender-dependent sentiment analysis. S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35

  21. Data S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 8 / 35

  22. Data S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 8 / 35

  23. Data S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 8 / 35

  24. Data S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

  25. Data Automatic gender label prediction using user first name morphology (precision is above 0.98 across languages). S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

  26. Data Automatic gender label prediction using user first name morphology (precision is above 0.98 across languages). Sentiment labels from Mechanical Turk (5 annotations per tweet): S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

  27. Data Automatic gender label prediction using user first name morphology (precision is above 0.98 across languages). Sentiment labels from Mechanical Turk (5 annotations per tweet): Positive: Как же приятно просто лечь в постель после тяжелого дня... (It is a great pleasure to go to bed after a long day at work...) S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

Recommend


More recommend