knowledge from social data in web
play

Knowledge from Social Data in Web Prof. Jonice Oliveira UFRJ - PowerPoint PPT Presentation

Big Social Data: Analyzing and Extracting Knowledge from Social Data in Web Prof. Jonice Oliveira UFRJ Federal University of Rio de Janeiro DCC Computer Science Department CORES - Social Computing and Social Network Analysis Laboratory


  1. Big Social Data: Analyzing and Extracting Knowledge from Social Data in Web Prof. Jonice Oliveira UFRJ – Federal University of Rio de Janeiro DCC – Computer Science Department CORES - Social Computing and Social Network Analysis Laboratory

  2. Social Networks are NOT… CORES - Social Computing and Social Network Analysis Laboratory 2 2

  3. CORES - Social Computing and Social Network Analysis Laboratory 3 3

  4. SOCIAL Data  From crowd  Social Media  Events, opinions, social networks,...  Mobile  Location, Routes, Interactions, Emotions, Velocity, ...  Sensors  Movement, Noise, …  Web logs  Access and updates  Public Cameras  Images! CORES - Social Computing and Social Network Analysis Laboratory 4

  5. SOCIAL Data  About crowd  Official agencies  Demography  Health  Transportation  Entertainment/Sports/Public Events  Violence  … CORES - Social Computing and Social Network Analysis Laboratory 5

  6. Big Social Data Volume Data Size Velocity Propagation Variety Speed of Change Sources and Data Uncertainty of Data Veracity CORES - Social Computing and Social Network Analysis Laboratory 6

  7. What do we research?  People interaction  People’s role in a group  Understanding and prediction of events  Recommendation of ‘ things ’/ resources  Documents  Routes  Groups  ... CORES - Social Computing and Social Network Analysis Laboratory 7

  8. What do we research? Urban Centers Science | Academia CORES - Social Computing and Social Network Analysis Laboratory 8

  9. ETL (Extraction, Transformation and Load) User Interface Level Sociogram Dynamic Analysis Reports Visualization … Historical Information Social Media Analysis Level Contextual Propagation Patents Trend Prediction Identification Analysis Curricula Influence and Identification of Relevance Reliability Publications Detection Information CF Proposal Mining Level Projects … Behavioral Pattern Scientific Sources Linking Mining Opinion Mining Identification Social Scorecard Data Level

  10. ETL (Extraction, Transformation and Load) User Interface Level Sociogram Dynamic Analysis Reports Visualization … Historical Information Social Media Analysis Level Contextual Propagation Patents Trend Prediction Identification Analysis Curricula Influence and Identification of Relevance Reliability Detection Information Mining Level Behavioral Pattern Scientific Sources Linking Mining Opinion Mining Identification Data Level

  11. ETL (Extraction, Transformation and Load) User Interface Level Sociogram Dynamic Analysis Reports Visualization … Historical Information Social Media Analysis Level Contextual Propagation Trend Prediction Identification Analysis Influence and Identification of Relevance Reliability Detection Information Publications Mining Level Behavioral Pattern Scientific Sources Linking Mining Opinion Mining Identification Data Level

  12. ETL (Extraction, Transformation and Load) User Interface Level Sociogram Dynamic Analysis Reports Visualization … Historical Information Social Media Analysis Level Contextual Propagation Patents Trend Prediction Identification Analysis Curricula Influence and Identification of Relevance Reliability Publications Detection Information CF Proposal Mining Level Projects … Behavioral Pattern Scientific Sources Linking Mining Opinion Mining Identification Data Level

  13. ETL (Extraction, Transformation and Load) User Interface Level Sociogram Dynamic Analysis Reports Visualization … Historical Information Social Media Analysis Level Contextual Propagation Patents Trend Prediction Identification Analysis Curricula Influence and Identification of Relevance Reliability Publications Detection Information CF Proposal Mining Level Projects … Behavioral Pattern Scientific Sources Linking Mining Opinion Mining Identification Data Level

  14. Traffic Conditions Based on Twitter  Static analysis  Tweets in last 60 minutes  Remove interrogative sentences  Sentimental analysis: Positive, Negative or Neutral  Problems in Linha Vermelha  Without problems in Linha Vermelha  Fast and Easy Traffic in Linha Vermelha #sqn (irony)  Main streets – Dynamical analysis CORES - Social Computing and Social Network Analysis Laboratory 14

  15. Traffic Conditions Based on Twitter  Dynamical analysis  There are not tweets in last 60 minutes  “We do not have enough information”  Different opinions  Interval between most recent-conflicting tweets  > 15 minutes – last tweet  ≤ 15 minutes – #positive tweets - # negative tweets  # negative > #positive tweets : “ Probably you are in traffic jam” CORES - Social Computing and Social Network Analysis Laboratory 15

  16. Traffic Conditions Based on Twitter  Dynamical analysis  There are not tweets in last 60 minutes  “We do not have enough information”  Different opinions  Interval between most recent-conflicting tweets  > 15 minutes – last tweet  ≤ 15 minutes – #positive tweets - # negative tweets  # negative > #positive tweets : “ Probably you are in traffic jam” CORES - Social Computing and Social Network Analysis Laboratory 16

  17. Traffic Conditions Based on Twitter  Average by day Reliable users Common Users All Users Precision 0,4175 0,25 0,2925 Recall 0,75 0,375 0,625 Accuracy 0,542 0,225 0,275 LAUAND, B. ; OLIVEIRA, J. . TweeTraffic: ferramenta de análise das condições de trânsito baseado nas informações do Twitter. In: II Brazilian Workshop on Social Network Analysis and Mining (BraSNAM), 2013 (in portuguese). CORES - Social Computing and Social Network Analysis Laboratory 17

  18. Protests in Brazil (2013)  Started in June – raises in bus fares  Biggest street demonstrations  20 years ago - citizens took to the streets to demand the impeachment of their president on corruption charges  Social media has played an important role:  Organization  Police brutality CORES - Social Computing and Social Network Analysis Laboratory 18

  19. Protests in Brazil (2013)  Supervised approach  Categorized: positive, negative and neutral  Naive Bayes classifier  70% - training  30% - test CORES - Social Computing and Social Network Analysis Laboratory 19

  20. Protests in Brazil (2013) Accuracy (A), Variance (V), Standard Deviation (DP), Precision (P%), Recall (R%), Macro-Averaged (Ma-A) e F-score (F%) A(%) V DP P% R% Ma-A F% Corpus Positive Tweets 90% 0.0325 0.1803 79% 87% 1.18 83% 72% 0.0325 0.1803 Corpus Negative Tweets 85% 77% 1.05 81% FRANCA, T. ; Oliveira, Jonice . Análise de Sentimento de Tweets Relacionados aos Protestos que ocorreram no Brasil entre Junho e Agosto de 2013. In: III Brazilian Workshop on Social Network Analysis and Mining (BraSNAM), 2014. (in portuguese) CORES - Social Computing and Social Network Analysis Laboratory 20

  21. ETL (Extraction, Transformation and Load) User Interface Level Sociogram Dynamic Analysis Reports Visualization … Historical Information Social Media Analysis Level Contextual Propagation Patents Trend Prediction Identification Analysis Curricula Influence and Identification of Relevance Reliability Publications Information Detection CF Proposal Mining Level Projects … Behavioral Pattern Scientific Sources Linking Mining Opinion Mining Identification Data Level

  22. • Retweet Network • User with a high number of followers are not necessarily influencers. Ex: Paulo Coelho • 20 graphs (timestamp = 2 days) • Network evolution = diameter and quantity of nodes THEODORO, I. et al. Análise dos Influenciadores dos Protestos Brasileiros de 2013 via Twitter. In: III Brazilian Workshop on Social Network Analysis and Mining (BraSNAM), 2014 (in portuguese) . CORES - Social Computing and Social Network Analysis Laboratory 22

  23. Protests in Brazil (2013) • Tweets – June, 23 to August, 02 • Hashtags used in the search: CORES - Social Computing and Social Network Analysis Laboratory 23 23

  24. Protests in Brazil (2013) • ‘Prestige’ by Wasserman e Faust [1994] Degree Prestige – average of out-  degree – Proximity Prestige Eigenvector  centrality Status or Rank Prestige – (in  out)  PageRank CORES - Social Computing and Social Network Analysis Laboratory 24 24

  25. Influence and Relevance Detection VRABL, S. et al #twintera!: A social matching environment • based on microblogging. In: 15th International Conference on Computer Supported Cooperative Work in Design (CSCWD), 2011. Zudio, P. ; MENDONCA, L. ; Oliveira, Jonice . Um método para • recomendação de relacionamentos em redes sociais científicas heterogêneas. In: XI Simpósio Brasileiro de Sistemas Colaborativos (SBSC), 2014 (in portuguese) CORES - Social Computing and Social Network Analysis Laboratory 25

  26. ETL (Extraction, Transformation and Load) User Interface Level Sociogram Dynamic Reports Analysis Visualization … Historical Information Social Media Analysis Level Contextual Propagation Patents Trend Prediction Identification Analysis Curricula Influence and Identification of Relevance Reliability Publications Detection Information CF Proposal Mining Level Projects … Behavioral Pattern Scientific Sources Linking Mining Opinion Mining Identification Social Scorecard Data Level

Recommend


More recommend