COMPUTATIONAL METHODS FOR TEXT ANALYSIS BA PROGRAM “SOCIOLOGY AND SOCIAL INFORMATICS” Kirill Maslinsky 2020 Higher School of Economics — Saint Petersburg 1/14
WHY DO YOU NEED IT
THE TIME WE LIVE IN 2/14
JUST TO LEARN TO MAKE THOSE PICTURES 1 1 Just kiddin 3/14
IMAGINED AUDIENCE — ALLEGED GOALS • Practical type — learn to make those pictures • Sentimental type — make the machine understand those texts for me • Philosopher type — why on earth it works at all? 4/14
IMAGINED AUDIENCE — ALLEGED GOALS • Practical type — learn to make those pictures • Sentimental type — make the machine understand those texts for me • Philosopher type — why on earth it works at all? 4/14
IMAGINED AUDIENCE — ALLEGED GOALS • Practical type — learn to make those pictures • Sentimental type — make the machine understand those texts for me • Philosopher type — why on earth it works at all? 4/14
COURSE GOALS • provide basic understanding of how to properly use collections of texts • and to make this knowledge practical 5/14 as quantitative evidence,
COURSE CONTENT
BREAD AND BUTTER: TOPIC MODELING 6/14
KILLER FEATURE: WORD EMBEDDINGS 7/14
THE ICING ON THE CAKE: SENTIMENT ANALYSIS 8/14
THE ICING ON THE CAKE: SENTIMENT ANALYSIS 8/14
THE ICING ON THE CAKE: SENTIMENT ANALYSIS 8/14
COURSE TOPICS • word embeddings, • this is a really very boring slide, isn’t it? • information extraction from unstructured text. • sentiment analysis, • automating content analisys (extracting theme and topic), • Applied tasks: • language models. • topic modeling, • Basic word statistics: • document classifjcation and clusterization, • dictionary methods, • Methods for supervised and unsupervised modeling: • vector representation of text. • distributive semantics (word co-occurrence patterns), • lexical statistics (word frequency distributions), 9/14
WHAT TO EXPECT
HOW COURSEWORK WILL BE ORGANIZED Format: • OFFLINE — lectures, discussions, student presentations • ONLINE — practical work (programming exercises), tech support Content: • NLP basics • discussion of several recent articles (understanding methodology, reproducing parts of it) • Practicing analysis of textual data (with R) 10/14
EXPECTATIONS Practical work with real texts in class and at home. • command line • mining your own text collection • R scripts • bugs in scripts, googling, bugs in scripts again • seeking and getting help from your peers and course instructor • happy end 11/14
WORK IN GROUPS 12/14
WHAT YOU CAN LEARN • State-of-the-art of natural language processing: • solved problems • topical issues and unsolved problems • Terms: • a minimal vocabulary of necessary linguistic terms (with meanings! :)) • appropriate keywords to search for current research and tools • Tools: • Where to apply methods for computational text analysis and how to interpret their results • Existing software for text analysis (for Russian and English) • Existing linguistic resources — dictionaries, corpora, pre-trained models (for Russian and English) 13/14
GRADING 25% student presentations/paper summaries 10% practical exercises 45% lab works (3) 20% fjnal project (?) 14/14
Recommend
More recommend