data driven approaches for detection of antisocial
play

Data-driven Approaches for Detection of Antisocial Behavior - PowerPoint PPT Presentation

Data-driven Approaches for Detection of Antisocial Behavior Veronika atkov, Ivan Srba, Rbert Mro (FIIT STU) PyData Bratislava 22 nd May 2019 WHO ARE WE? Ivan and Rbert - Researchers @FIIT STU Veronika - Master student @FIIT STU Our


  1. Data-driven Approaches for Detection of Antisocial Behavior Veronika Žatková, Ivan Srba, Róbert Móro (FIIT STU) PyData Bratislava 22 nd May 2019

  2. WHO ARE WE? Ivan and Róbert - Researchers @FIIT STU Veronika - Master student @FIIT STU Our topics of interest: Data science Computational social science ▪ ▪ Machine learning Social computing ▪ ▪ Data mining ▪ 2

  3. Source: https://kinsta.com/blog/wordpress-social-media-plugins/ 3

  4. 4 Source: https://www.wsj.com/articles/scholars-get-the-real-scoop-on-fake-news-1515360315

  5. 5 Source: https://www.poynter.org/fact-checking/2019/is-expert-crowdsourcing-the-solution-to-health-misinformation/

  6. 6 Source: https://www.edutopia.org/blog/how-respond-when-students-use-hate-speech-richard-curwin

  7. How DATA SCIENCE can help to characterize, detect and mitigate such antisocial behavior? 7

  8. WHAT ARE WE WORKING ON? Two research projects: - Antisocial behavior in general ▪ - Medical misinformation ▪ https://rebelion.fiit.stuba.sk/ Cooperation: 8

  9. Antisocial behaviour Data science perspective

  10. ANTISOCIAL BEHAVIOR 10

  11. TASKS Characterization what does characterize/distinguish, e.g., fake news from true news, ▪ how is it spread and by whom is it shared? Detection how can we automatically detect fake news, hate speech, etc.? ▪ Mitigation how can we stop, e.g., the spread of fake news in a transparent, ▪ trustworthy, ethical way? 11

  12. TECHNIQUES Machine learning Data mining Natural language processing Neural networks and deep learning 12

  13. OPEN PROBLEMS Exploiting content, user and context data Multisource approaches ▪ Multimodal approaches ▪ Multilingual approaches ▪ Extended context ▪ 13

  14. OPEN PROBLEMS Exploiting content, user and context data Multisource approaches ▪ Multimodal approaches ▪ Multilingual approaches ▪ Extended context ▪ Addressing unlabelled and dynamic data Unsupervised, semi-supervised and ensemble models (e.g. multiview learning) ▪ Active learning ▪ 14

  15. OPEN PROBLEMS Exploiting content, user and context data Multisource approaches ▪ Multimodal approaches ▪ Multilingual approaches ▪ Extended context ▪ Addressing unlabelled and dynamic data Unsupervised, semi-supervised and ensemble models (e.g. multiview learning) ▪ Active learning ▪ Investigating new mitigation approaches Early warning system ▪ On-site warning system ▪ Education and training ▪ 15

  16. OPEN PROBLEMS No suitable content-rich and benchmark datasets No suitable applications and platforms to deploy solutions 16

  17. Monant platform Platform for monitoring antisocial behavior

  18. 18

  19. IMPLEMENTATION Primary implementation language: Python Dev ops Docker ▪ Travis CI ▪ 19

  20. CENTRAL DATA STORAGE Mediates data transfer between all platform modules Three layers Evidence layer ▪ Inference and prediction layer ▪ Platform management layer ▪ 20

  21. CENTRAL DATA STORAGE Mediates data transfer between all platform modules Implementation Flask ▪ PostgreSQL ▪ REST APIs + Apistrap + Schematics ▪ Swagger ▪ http://flask.pocoo.org/ https://github.com/Cognexa/apistrap 21 https://schematics.readthedocs.io/en/latest https://swagger.io/

  22. WEB MONITORING Crawls and parses data from various data sources by means of data providers Data sources News sites ▪ Fact-checking sites ▪ Social networks ▪ Existing datasets ▪ Event-based architecture Supports scheduling 22

  23. WEB MONITORING Crawls and parse data from various data sources by means of data providers Data providers Site-specific crawlers and parsers ▪ RSS feeds ▪ News site generic crawler and parser ▪ News API ▪ Chaining of data providers RSS feed Site-specific parser 23 https://newsapi.org/

  24. WEB MONITORING Crawls and parse data from various data sources by means of data providers Implementation Scrapy library ▪ Beautiful Soup library ▪ Newspaper library ▪ Feedparser library ▪ Celery + RabbitMQ + Flower ▪ https://scrapy.org/ https://www.crummy.com/software/BeautifulSoup/ 24 https://github.com/codelucas/newspaper/tree/master/newspape https://github.com/kurtmckee/feedparser

  25. PLATFORM MANAGEMENT Manages the data flows between all platform modules Web monitoring management Monitors (e.g. “Monitoring of health ▪ misinformation in Europe”) Data storage management Access control to central data storage ▪ 25

  26. PLATFORM MANAGEMENT Manages the data flows between all platform modules Implementation Django ▪ Flask-JWT (not implemented yet) ▪ 26 https://www.djangoproject.com/ https://flask-jwt-extended.readthedocs.io/en/latest/

  27. 27

  28. AI CORE Allows to easily extend the platform with a wide variety of data-driven methods User and domain modeling methods Derive and maintain user and content ▪ characteristics Sources and their trust, authors’ credibility, ... ▪ Prediction methods Characterize and detect antisocial behavior ▪ 28

  29. AI CORE Allows to easily extend the platform with a wide variety of data-driven methods Implementation Independant from platform ▪ Central storage allows easy data exchange ▪ between methods 29

  30. END-USER SERVICES Serve as an interface for experts (e.g., journalists) and general public Examples Real-time monitoring and visualization tool ▪ URL and user history verifier ▪ Education and training tool ▪ 30

  31. The first prototype of Monant was developed by a team of our students 31

  32. 32 Source: https://patientengagementhit.com/news/patient-access-to-preventive-care-key-for-cancer-care-equity

  33. NATURAL NEWS NETWORK 33 Source: https://www.cancer.news/2019-04-24-green-coffee-blueberries-tomatoes-strawberries-have-chlorogenic-acid.html

  34. CASE STUDY - HEALTHCARE MISINFORMATION Task: To characterize the amount of misinformative articles containing false claims related to cancer treatment Data providers Custom crawlers and parsers of Natural News network ▪ Additional data providers to be used ▪ badatel.net ▪ RSS parser ▪ Newspaper crawler and parser ▪ News API ▪ 34

  35. CASE STUDY - HEALTHCARE MISINFORMATION Articles: 40,198 news articles from 23 sites 35

  36. CASE STUDY - HEALTHCARE MISINFORMATION Articles: 40,198 news articles from 23 sites Claims: 139 cancer "treatments" 36 Source of claims: https://docs.google.com/spreadsheets/d/1EyhHFv2WswRNrFZ-O6SjF5m_9EhnV6zCZ0RdSX5TtFM/edit#gid=0

  37. CASE STUDY - HEALTHCARE MISINFORMATION Articles: 40,198 news articles from 23 sites Claims: 139 cancer "treatments" Mapping: 6,222 news articles (15.5%) contains at least one cancer “treatment” claim An average number of claims per article is 1.93 ▪ A maximal number of claims was 9 ▪ 37 Source of claims: https://docs.google.com/spreadsheets/d/1EyhHFv2WswRNrFZ-O6SjF5m_9EhnV6zCZ0RdSX5TtFM/edit#gid=0

  38. CASE STUDY - HEALTHCARE MISINFORMATION Articles: 40,198 news articles from 23 sites Claims: 139 cancer "treatments" Mapping: 6,222 news articles (15.5%) contains at least one cancer “treatment” claim An average number of claims per article is 1.93 ▪ A maximal number of claims was 9 ▪ The most frequent claims ▪ Antioxidants (2459 articles) ▪ Herbalism (1715 articles) ▪ Poly-MVA (Lipoic Acid Mineral Complex, 723 articles) ▪ Superfood (609 articles) ▪ 38 Source of claims: https://docs.google.com/spreadsheets/d/1EyhHFv2WswRNrFZ-O6SjF5m_9EhnV6zCZ0RdSX5TtFM/edit#gid=0

  39. CONCLUSIONS Monant addresses a lack of datasets and suitable platforms. There is still a problem of missing labelled data. 39

  40. CONCLUSIONS Monant addresses a lack of datasets and suitable platforms. There is still a problem of missing labelled data. More interesting problems (e.g., automatic detection) lie ahead of us. We have some first results in fake news detection that we plan to deploy to the platform. 40

  41. CONCLUSIONS Monant addresses a lack of datasets and suitable platforms. There is still a problem of missing labelled data. More interesting problems (e.g., automatic detection) lie ahead of us. We have some first results in fake news detection that we plan to deploy to the platform. Interested in more info? https://rebelion.fiit.stuba.sk/ 41

Recommend


More recommend