computational methods for text analysis
play

COMPUTATIONAL METHODS FOR TEXT ANALYSIS BA PROGRAM SOCIOLOGY AND - PowerPoint PPT Presentation

COMPUTATIONAL METHODS FOR TEXT ANALYSIS BA PROGRAM SOCIOLOGY AND SOCIAL INFORMATICS Kirill Maslinsky 2020 Higher School of Economics Saint Petersburg 1/14 WHY DO YOU NEED IT THE TIME WE LIVE IN 2/14 JUST TO LEARN TO MAKE THOSE


  1. COMPUTATIONAL METHODS FOR TEXT ANALYSIS BA PROGRAM “SOCIOLOGY AND SOCIAL INFORMATICS” Kirill Maslinsky 2020 Higher School of Economics — Saint Petersburg 1/14

  2. WHY DO YOU NEED IT

  3. THE TIME WE LIVE IN 2/14

  4. JUST TO LEARN TO MAKE THOSE PICTURES 1 1 Just kiddin 3/14

  5. IMAGINED AUDIENCE — ALLEGED GOALS • Practical type — learn to make those pictures • Sentimental type — make the machine understand those texts for me • Philosopher type — why on earth it works at all? 4/14

  6. IMAGINED AUDIENCE — ALLEGED GOALS • Practical type — learn to make those pictures • Sentimental type — make the machine understand those texts for me • Philosopher type — why on earth it works at all? 4/14

  7. IMAGINED AUDIENCE — ALLEGED GOALS • Practical type — learn to make those pictures • Sentimental type — make the machine understand those texts for me • Philosopher type — why on earth it works at all? 4/14

  8. COURSE GOALS • provide basic understanding of how to properly use collections of texts • and to make this knowledge practical 5/14 as quantitative evidence,

  9. COURSE CONTENT

  10. BREAD AND BUTTER: TOPIC MODELING 6/14

  11. KILLER FEATURE: WORD EMBEDDINGS 7/14

  12. THE ICING ON THE CAKE: SENTIMENT ANALYSIS 8/14

  13. THE ICING ON THE CAKE: SENTIMENT ANALYSIS 8/14

  14. THE ICING ON THE CAKE: SENTIMENT ANALYSIS 8/14

  15. COURSE TOPICS • word embeddings, • this is a really very boring slide, isn’t it? • information extraction from unstructured text. • sentiment analysis, • automating content analisys (extracting theme and topic), • Applied tasks: • language models. • topic modeling, • Basic word statistics: • document classifjcation and clusterization, • dictionary methods, • Methods for supervised and unsupervised modeling: • vector representation of text. • distributive semantics (word co-occurrence patterns), • lexical statistics (word frequency distributions), 9/14

  16. WHAT TO EXPECT

  17. HOW COURSEWORK WILL BE ORGANIZED Format: • OFFLINE — lectures, discussions, student presentations • ONLINE — practical work (programming exercises), tech support Content: • NLP basics • discussion of several recent articles (understanding methodology, reproducing parts of it) • Practicing analysis of textual data (with R) 10/14

  18. EXPECTATIONS Practical work with real texts in class and at home. • command line • mining your own text collection • R scripts • bugs in scripts, googling, bugs in scripts again • seeking and getting help from your peers and course instructor • happy end 11/14

  19. WORK IN GROUPS 12/14

  20. WHAT YOU CAN LEARN • State-of-the-art of natural language processing: • solved problems • topical issues and unsolved problems • Terms: • a minimal vocabulary of necessary linguistic terms (with meanings! :)) • appropriate keywords to search for current research and tools • Tools: • Where to apply methods for computational text analysis and how to interpret their results • Existing software for text analysis (for Russian and English) • Existing linguistic resources — dictionaries, corpora, pre-trained models (for Russian and English) 13/14

  21. GRADING 25% student presentations/paper summaries 10% practical exercises 45% lab works (3) 20% fjnal project (?) 14/14

Recommend


More recommend