Interpreting Social Media Elijah Mayfield School of Computer - PowerPoint PPT Presentation

Interpreting Social Media Elijah Mayfield School of Computer Science Carnegie Mellon University elijah@cmu.edu (many slides borrowed with permission from Diyi Yang, CMU → Google AI → GaTech )

Lecture Goals 1. Understand what it looks like to apply NLP on real-world data ○ What’s different about online data compared to cleaner problems like newswire text? ○ What questions are you going to have to answer as part of working with online data? 2. What does a research project on social media data look like? ○ How are the projects designed and what are their goals? ○ What kind of findings we do come up with using NLP today?

About Me

About Me Language Technologies Institute Project Olympus / Swartz Center Ph.D. Student Entrepreneur-in-Residence

Social Media generates BIG UNSTRUCTURED NATURAL LANGUAGE DATA 6

Social Media generates BIG UNSTRUCTURED NATURAL LANGUAGE DATA Volume Velocity Variety 2 billion 2 Wikipedia tweets, articles, monthly active revisions per discussions, FB users sec news 7

What’s different about online data? ● NLP researchers love benchmark corpora and standardized tasks ○ Preprocessing takes forever ○ Easy to measure improvement compared to prior approaches ○ Collection, transcription, annotation is unbelievably expensive. (computer vision believes all of these things even more than NLP does)

What’s different about online data? ● NLP researchers love benchmark corpora and standardized tasks “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. [...]”

What’s so different about online data? ● NLP researchers love benchmark corpora ○ (computer vision researchers love them even more) ● But for most applied work, you are going to be taking in unknown / weird text

Formality online (and elsewhere) is a continuum ● Language varies based on who you’re talking to and what you’re doing. ● People are really good at “reading the room” and switching styles! ● NLP mostly does not have this ability on the fly yet, needs to be trained.

Group Exercise: Spot the Difference

Group Exercise: Spot the Difference What differences are easy to spot? What differences are less obvious? ● [answers go here] ● [answers go here] ● [and here] ● [and here] ● [and here]

Existing NLP for Social Media is… not good yet? Machine Translation ➢ Works for EN-FR in parliamentary documents ○ Not so great for translating posts from Urdu Facebook ○ Part-of-Speech Tagging ➢ Very nearly perfect for Wall Street Journal newstext ○ Still plenty of work to do for Black Twitter ○ Sentiment Classification ➢ Works for thumbs-up/down movie reviews ○ Pretty bad at complex emotions, short chats, topical humor ○ 15

What are common tasks in social media? Unsupervised Tasks ➢ Trending Topic Clustering / Detection ○ Friend / Article Recommendation ○ Classification Tasks ➢ Sentiment Analysis ○ “Fake News” Identification ○ Hateful Content / Cyberbullying Detection ○ Structured Tasks ➢ Text generation (Article Summarization) ○ Knowledge base population(Information Extraction) ○ Learning to Rank (Information Retrieval / Search Engines) ○ 17 New member dynamics (Longitudinal/Survival analysis) ○

Each task is composed of a pipeline of subtasks Unsupervised Tasks ➢ Overlapping geographic locations, events Trending Topic Clustering / Detection ○ Identifying shared habits, mutual interests Friend / Article Recommendation ○ Moods and mental health (e.g., depression) Classification Tasks ➢ Demographic attributes (gender, race, language) Sentiment Analysis ○ “Fake News” Identification ○ Hateful Content / Cyberbullying Detection ○ Structured Tasks ➢ Text generation (Article Summarization) ○ Knowledge base population(Information Extraction) ○ Learning to Rank (Information Retrieval / Search Engines) ○ 18 New member dynamics (Longitudinal/Survival analysis) ○

Each task is composed of a pipeline of subtasks Unsupervised Tasks ➢ Trending Topic Clustering / Detection ○ Friend / Article Recommendation ○ Classification Tasks ➢ Factoid Extraction / Stance Classification Sentiment Analysis ○ Formality / Politeness / Discourse Analysis “Fake News” Identification ○ Source Reputation Ranking Hateful Content / Cyberbullying Detection ○ Virality / Graph analytics Structured Tasks ➢ Text generation (Article Summarization) ○ Knowledge base population(Information Extraction) ○ Learning to Rank (Information Retrieval / Search Engines) ○ 19 New member dynamics (Longitudinal/Survival analysis) ○

Each task is composed of a pipeline of subtasks Unsupervised Tasks ➢ Trending Topic Clustering / Detection ○ Friend / Article Recommendation ○ Classification Tasks ➢ Sentiment Analysis ○ “Fake News” Identification ○ Hateful Content / Cyberbullying Detection ○ Structured Tasks ➢ Linguistic accommodation Text generation (Article Summarization) ○ Behaviors tied to retention Knowledge base population(Information Extraction) ○ Homogeneity of population Learning to Rank (Information Retrieval / Search Engines) ○ 20 Social roles / leadership New member dynamics (Longitudinal/Survival analysis) ○

Why do universities work on social media? It’s incredibly convenient. ➢ ○ Data collection is expensive ! Crawled/open data is free, relatively fast. ○ IRB approval for human subjects research is slow ; public social media data (Twitter, Wikipedia, IMDB) is typically exempt or expedited. It acts as a “model organism.” ➢ ○ Looks more like real language in use than WSJ. ○ Fairly rapid transition to industry interventions. ○ Multilingual by nature in some cases. 21

Why do companies fund the work? Unsupervised Tasks ➢ Trending Topic Clustering / Detection ○ Friend / Article Recommendation ○ Classification Tasks ➢ Some tasks improve a site’s engagement - companies get a Sentiment Analysis ○ direct, measurable outcome. “Fake News” Identification ○ Hateful Content / Cyberbullying Detection ○ Structured Tasks ➢ Text generation (Article Summarization) ○ Knowledge base population(Information Extraction) ○ Learning to Rank (Information Retrieval / Search Engines) ○ 22 New member dynamics (Longitudinal/Survival analysis) ○

Why do companies fund the work? Unsupervised Tasks ➢ Trending Topic Clustering / Detection ○ Friend / Article Recommendation ○ Classification Tasks ➢ Sentiment Analysis ○ “Fake News” Identification ○ Some tasks are about profiling your user demographics and Hateful Content / Cyberbullying Detection ○ their intent. Structured Tasks ➢ Knowing who your users are, Text generation (Article Summarization) ○ and what they want, lets you make your site more relevant. Knowledge base population(Information Extraction) ○ Learning to Rank (Information Retrieval / Search Engines) ○ 23 New member dynamics (Longitudinal/Survival analysis) ○

Why do companies fund the work? Unsupervised Tasks ➢ Trending Topic Clustering / Detection ○ Friend / Article Recommendation ○ Classification Tasks ➢ Sentiment Analysis ○ “Fake News” Identification ○ Some tasks are about preserving Hateful Content / Cyberbullying Detection ○ reputation - if your site is toxic and unmanaged, your community of users Structured Tasks ➢ will abandon you for alternatives. Text generation (Article Summarization) ○ Knowledge base population(Information Extraction) ○ Learning to Rank (Information Retrieval / Search Engines) ○ 24 New member dynamics (Longitudinal/Survival analysis) ○

What’s not guaranteed? User perceived value ➢ University motives ➢ Convenient ○ Authentic ○ Legal accountability ➢ Generalizable ○ Industry motives ➢ Engagement ○ Answers from the class Profiles ➢ ○ [go here] ○ Reputation ○ [and here] ○ 25 [and here] ○

Interpreting Social Media Elijah Mayfield School of Computer - PowerPoint PPT Presentation

Interpreting Social Media Elijah Mayfield School of Computer Science Carnegie Mellon University elijah@cmu.edu (many slides borrowed with permission from Diyi Yang, CMU Google AI GaTech ) Lecture Goals 1. Understand what it looks like

Getting Social What is social media? Why does social media matter? What social media

Presentation 1 What is social media? Get Media Smart social media 2 What is social media?

Social networking platforms Social media refers to the means of interactions among people in which

Social Media for Mason AGENDA What is Social Media Social Media Strategy Content

Social Media 201: Using Social Media to Advocate for Advocacy Day and Beyond Social Media 101

Social Media donts What is social media Social media is nothing new Just an extension

Social Media Seminar for Development Educators Part 1: Social Media Basics How are these

Overview Introduction to myself Introduction to social media Social media and LIS

Social Media and NFP organisatons Emma Bennett Why are you using social media? Today

Social Media Policy Example At the School, we understand that social media can be a fun and

Presentation 2 Why is there advertising on social media? Get Media Smart social media 2

Social Media Whats it all about? Social media is about the conversation, which only works if you

SOCIAL MEDIA T H E G O O D, T H E B A D A N D T H E B I Z A R R E TODAY. What is

Social Media What is Social Media? forms of electronic communication (such as Web sites) through

1. Social Media Outline 1.1. What is Social Media? 1.2. Opinion Retrieval 1.3. Feed

Social Media -- Understanding it and Making it Work Preliminary Guidance on Social Media

Digital Media Addiction Smart Phones, Social Media and Suicide Fact: Social Media is a

Social media and safeguarding Ensuring social media is used appropriately and positively Iain

Operationalizing Social Media - How SAP Replicated its Successful Social Media Processes Across

Social Media Week BEIRUT Social Media versus Traditional Media; The contradictory results of the

Social Media Strategies Everything you need to know about the most popular social media platforms

Tom Mason Social Media Coordinator About me Social Media Coordinator in the Division of

What is Social Media? noun: social media ; plural noun: social medias websites and

Social Media for Business July 28, 2009 What is it? Social media marketing also known as social