Big Social Data: Analyzing and Extracting Knowledge from Social Data in Web Prof. Jonice Oliveira UFRJ – Federal University of Rio de Janeiro DCC – Computer Science Department CORES - Social Computing and Social Network Analysis Laboratory
Social Networks are NOT… CORES - Social Computing and Social Network Analysis Laboratory 2 2
CORES - Social Computing and Social Network Analysis Laboratory 3 3
SOCIAL Data From crowd Social Media Events, opinions, social networks,... Mobile Location, Routes, Interactions, Emotions, Velocity, ... Sensors Movement, Noise, … Web logs Access and updates Public Cameras Images! CORES - Social Computing and Social Network Analysis Laboratory 4
SOCIAL Data About crowd Official agencies Demography Health Transportation Entertainment/Sports/Public Events Violence … CORES - Social Computing and Social Network Analysis Laboratory 5
Big Social Data Volume Data Size Velocity Propagation Variety Speed of Change Sources and Data Uncertainty of Data Veracity CORES - Social Computing and Social Network Analysis Laboratory 6
What do we research? People interaction People’s role in a group Understanding and prediction of events Recommendation of ‘ things ’/ resources Documents Routes Groups ... CORES - Social Computing and Social Network Analysis Laboratory 7
What do we research? Urban Centers Science | Academia CORES - Social Computing and Social Network Analysis Laboratory 8
ETL (Extraction, Transformation and Load) User Interface Level Sociogram Dynamic Analysis Reports Visualization … Historical Information Social Media Analysis Level Contextual Propagation Patents Trend Prediction Identification Analysis Curricula Influence and Identification of Relevance Reliability Publications Detection Information CF Proposal Mining Level Projects … Behavioral Pattern Scientific Sources Linking Mining Opinion Mining Identification Social Scorecard Data Level
ETL (Extraction, Transformation and Load) User Interface Level Sociogram Dynamic Analysis Reports Visualization … Historical Information Social Media Analysis Level Contextual Propagation Patents Trend Prediction Identification Analysis Curricula Influence and Identification of Relevance Reliability Detection Information Mining Level Behavioral Pattern Scientific Sources Linking Mining Opinion Mining Identification Data Level
ETL (Extraction, Transformation and Load) User Interface Level Sociogram Dynamic Analysis Reports Visualization … Historical Information Social Media Analysis Level Contextual Propagation Trend Prediction Identification Analysis Influence and Identification of Relevance Reliability Detection Information Publications Mining Level Behavioral Pattern Scientific Sources Linking Mining Opinion Mining Identification Data Level
ETL (Extraction, Transformation and Load) User Interface Level Sociogram Dynamic Analysis Reports Visualization … Historical Information Social Media Analysis Level Contextual Propagation Patents Trend Prediction Identification Analysis Curricula Influence and Identification of Relevance Reliability Publications Detection Information CF Proposal Mining Level Projects … Behavioral Pattern Scientific Sources Linking Mining Opinion Mining Identification Data Level
ETL (Extraction, Transformation and Load) User Interface Level Sociogram Dynamic Analysis Reports Visualization … Historical Information Social Media Analysis Level Contextual Propagation Patents Trend Prediction Identification Analysis Curricula Influence and Identification of Relevance Reliability Publications Detection Information CF Proposal Mining Level Projects … Behavioral Pattern Scientific Sources Linking Mining Opinion Mining Identification Data Level
Traffic Conditions Based on Twitter Static analysis Tweets in last 60 minutes Remove interrogative sentences Sentimental analysis: Positive, Negative or Neutral Problems in Linha Vermelha Without problems in Linha Vermelha Fast and Easy Traffic in Linha Vermelha #sqn (irony) Main streets – Dynamical analysis CORES - Social Computing and Social Network Analysis Laboratory 14
Traffic Conditions Based on Twitter Dynamical analysis There are not tweets in last 60 minutes “We do not have enough information” Different opinions Interval between most recent-conflicting tweets > 15 minutes – last tweet ≤ 15 minutes – #positive tweets - # negative tweets # negative > #positive tweets : “ Probably you are in traffic jam” CORES - Social Computing and Social Network Analysis Laboratory 15
Traffic Conditions Based on Twitter Dynamical analysis There are not tweets in last 60 minutes “We do not have enough information” Different opinions Interval between most recent-conflicting tweets > 15 minutes – last tweet ≤ 15 minutes – #positive tweets - # negative tweets # negative > #positive tweets : “ Probably you are in traffic jam” CORES - Social Computing and Social Network Analysis Laboratory 16
Traffic Conditions Based on Twitter Average by day Reliable users Common Users All Users Precision 0,4175 0,25 0,2925 Recall 0,75 0,375 0,625 Accuracy 0,542 0,225 0,275 LAUAND, B. ; OLIVEIRA, J. . TweeTraffic: ferramenta de análise das condições de trânsito baseado nas informações do Twitter. In: II Brazilian Workshop on Social Network Analysis and Mining (BraSNAM), 2013 (in portuguese). CORES - Social Computing and Social Network Analysis Laboratory 17
Protests in Brazil (2013) Started in June – raises in bus fares Biggest street demonstrations 20 years ago - citizens took to the streets to demand the impeachment of their president on corruption charges Social media has played an important role: Organization Police brutality CORES - Social Computing and Social Network Analysis Laboratory 18
Protests in Brazil (2013) Supervised approach Categorized: positive, negative and neutral Naive Bayes classifier 70% - training 30% - test CORES - Social Computing and Social Network Analysis Laboratory 19
Protests in Brazil (2013) Accuracy (A), Variance (V), Standard Deviation (DP), Precision (P%), Recall (R%), Macro-Averaged (Ma-A) e F-score (F%) A(%) V DP P% R% Ma-A F% Corpus Positive Tweets 90% 0.0325 0.1803 79% 87% 1.18 83% 72% 0.0325 0.1803 Corpus Negative Tweets 85% 77% 1.05 81% FRANCA, T. ; Oliveira, Jonice . Análise de Sentimento de Tweets Relacionados aos Protestos que ocorreram no Brasil entre Junho e Agosto de 2013. In: III Brazilian Workshop on Social Network Analysis and Mining (BraSNAM), 2014. (in portuguese) CORES - Social Computing and Social Network Analysis Laboratory 20
ETL (Extraction, Transformation and Load) User Interface Level Sociogram Dynamic Analysis Reports Visualization … Historical Information Social Media Analysis Level Contextual Propagation Patents Trend Prediction Identification Analysis Curricula Influence and Identification of Relevance Reliability Publications Information Detection CF Proposal Mining Level Projects … Behavioral Pattern Scientific Sources Linking Mining Opinion Mining Identification Data Level
• Retweet Network • User with a high number of followers are not necessarily influencers. Ex: Paulo Coelho • 20 graphs (timestamp = 2 days) • Network evolution = diameter and quantity of nodes THEODORO, I. et al. Análise dos Influenciadores dos Protestos Brasileiros de 2013 via Twitter. In: III Brazilian Workshop on Social Network Analysis and Mining (BraSNAM), 2014 (in portuguese) . CORES - Social Computing and Social Network Analysis Laboratory 22
Protests in Brazil (2013) • Tweets – June, 23 to August, 02 • Hashtags used in the search: CORES - Social Computing and Social Network Analysis Laboratory 23 23
Protests in Brazil (2013) • ‘Prestige’ by Wasserman e Faust [1994] Degree Prestige – average of out- degree – Proximity Prestige Eigenvector centrality Status or Rank Prestige – (in out) PageRank CORES - Social Computing and Social Network Analysis Laboratory 24 24
Influence and Relevance Detection VRABL, S. et al #twintera!: A social matching environment • based on microblogging. In: 15th International Conference on Computer Supported Cooperative Work in Design (CSCWD), 2011. Zudio, P. ; MENDONCA, L. ; Oliveira, Jonice . Um método para • recomendação de relacionamentos em redes sociais científicas heterogêneas. In: XI Simpósio Brasileiro de Sistemas Colaborativos (SBSC), 2014 (in portuguese) CORES - Social Computing and Social Network Analysis Laboratory 25
ETL (Extraction, Transformation and Load) User Interface Level Sociogram Dynamic Reports Analysis Visualization … Historical Information Social Media Analysis Level Contextual Propagation Patents Trend Prediction Identification Analysis Curricula Influence and Identification of Relevance Reliability Publications Detection Information CF Proposal Mining Level Projects … Behavioral Pattern Scientific Sources Linking Mining Opinion Mining Identification Social Scorecard Data Level
Recommend
More recommend