computational methods for text analysis
play

Computational methods for text analysis BA program Sociology and - PowerPoint PPT Presentation

Computational methods for text analysis BA program Sociology and Social Informatics Kirill Maslinsky 2018 Higher School of Economics Saint Petersburg 1/12 Why do you need it Just to learn to make those pictures 1 1 Just kiddin


  1. Computational methods for text analysis BA program “Sociology and Social Informatics” Kirill Maslinsky 2018 Higher School of Economics — Saint Petersburg 1/12

  2. Why do you need it

  3. Just to learn to make those pictures 1 1 Just kiddin 2/12

  4. Scale up population studied “all social media users of a town” time spans “all of the Post-Soviet history” geographical scope “all educational migration in Russia” 3/12

  5. Course goals • provide basic understanding of how to properly use collections of texts • and to make this knowledge practical 4/12 as quantitative evidence,

  6. Course content

  7. Bread and butter: Topic modeling 5/12

  8. Killer feature: Word embeddings 6/12

  9. The icing on the cake: Sentiment analysis 7/12

  10. The icing on the cake: Sentiment analysis 7/12

  11. The icing on the cake: Sentiment analysis 7/12

  12. Course topics • word embeddings, • this is a really very boring slide, isn’t it? • information extraction from unstructured text. • sentiment analysis, • automating content analisys (extracting theme and topic), • Applied tasks: • sequence modeling. • topic modeling, • Basic word statistics: • document classification and clusterization, • dictionary methods, • Methods for supervised and unsupervised modeling: • vector representation of text. • distributive semantics (word co-occurrence patterns), • lexical statistics (word frequency distributions), 8/12

  13. What to expect

  14. How coursework will be organized • An interesting recent article • with an explanation of the necessary concepts and methods during lecture • followed by detailed analysis of the method in class • concluded by the task to reproduce the method with your own data 9/12

  15. Expectations Practical work with real texts in class and at home. • command line • mining your own text collection • R scripts • bugs in scripts, googling, bugs in scripts again • seeking and getting help from your peers and course instructor • happy end 10/12

  16. Work in groups 11/12

  17. What you can learn • State-of-the-art of natural language processing: • solved problems • topical issues and unsolved problems • Terms: • a minimal vocabulary of necessary linguistic terms (with meanings! :)) • appropriate keywords to search for current research and tools • Tools: • Where to apply methods for computational text analysis and how to interpret their results • Existing software for text analysis (for Russian and English) • Existing linguistic resources — dictionaries, corpora, pre-trained models (for Russian and English) 12/12

Recommend


More recommend