Recap LING572 Advanced Statistical Methods for NLP January 23, - PowerPoint PPT Presentation

Recap LING572 Advanced Statistical Methods for NLP January 23, 2020 1

Outline ● Summary of the material so far ● Reading materials ● Math formulas 2

So far ● Introduction: – Course overview – Information theory – Overview of classification task ● Basic classification algorithms: – Decision tree – Naïve Bayes – kNN   ● Feature selection, chi-square test and recap ● Hw1-Hw3 3

Main steps for solving   a classification task ● Prepare the data: ● Reformulate the task into a learning problem ● Define features ● Feature selection ● Form feature vectors   ● Train a classifier with the training data   ● Run the classifier on the test data   ● Evaluation 4

Comparison of 3 Learners kNN Decision Tree Naïve Bayes Vote by your Choose the c that max Modeling Vote by your groups neighbors P(c | x) Learn P(c) and Training None Build a decision tree P(f | c) Calculate Decoding Find neighbors Traverse the tree P(c)P(x | c) Max depth K Hyper Split function Delta for smoothing parameters Similarity fn Thresholds 5

Implementation issues ● Taking the log: P ( f i | c )) = log P ( c ) + ∑ log( P ( c ) ∏ log P ( f i | c ) i i ● Ignoring some constants: | V | P ( w k | c ) N ik ∏ P ( d i | c ) = P ( | d i | ) | d i | ! N ik ! k =1 ● Increasing small numbers before dividing log P ( x , c 1 ) = − 200; log P ( x , c 2 ) = − 201 6

Implementation issues (cont) ● Reformulate the formulas: P ( d i , c ) = P ( c ) ∏ P ( w k | c ) ∏ (1 − P ( w k | c )) w k ∈ d i w k ∉ d i = P ( c ) ∏ P ( w k | c ) 1 − P ( w k | c ) ∏ (1 − P ( w k | c )) w k ∈ d i w k ∏ ● Store useful intermediate results: 1 − P ( w k | c ) w k ● Vectorize! (e.g. entropy) 7

Lessons learned ● Don’t follow the formulas blindly. Vectorize when possible. ● Ex1: Multinomial NB | V | ∏ P ( w k | c ) N ik P ( c ) k =1 ● Ex2: cosine function for kNN ∑ k d i , k d j , k cos( d i , d j ) = ∑ k d 2 ∑ k a 2 i , k j , k 8

Next • Next unit (2.5 weeks): two more advanced methods: – MaxEnt (aka multinomial logistic regression) – CRF (Conditional Random Fields) ● Focus: ● Main intuition, final formulas used for training and testing ● Mathematical foundation ● Implementation issues 9

Reading material 10

The purpose of having   reading material ● Something to rely on besides the slides ● Reading before class could be beneficial ● Papers (not textbooks; some blog posts) could be the main source of information in the future 11

Problems with the reading material ● The authors assume that you know the algorithm already: ● Little background info ● Page limit ● Style   ● The notation problem   ➔ It could take a long time to understand everything 12

Some tips ● Look at several papers and slides at the same time ● Skim through the papers first to get the main idea ● Go to class and understand the slides ● Then go back to the papers (if you have time) ● Focus on the main ideas. It’s ok if you don’t understand all the details in the paper. 13

Math formulas 14

The goal of LING572 ● Understand ML algorithms ● The core of the algorithms ● Implementation: e.g., efficiency issues   ● Learn how to use the algorithms: ● Reformulate a task into a learning problem ● Select features ● Write pre- and post-processing modules 15

Understanding ML methods ● 1: never heard about it ● 2: know very little ● 3: know the basics ● 4: understand the algorithm (modeling, training, testing) ● 5: have implemented the algorithm ● 6: know how to modify/extend the algorithm ➔ Our goal: kNN, DT, NB: 5 MaxEnt, CRF, SVM, NN: 3-4 Math is important for 4-6, especially for 6. 16

  Why are math formulas hard? ● Notation, notation, notation. ● Same meaning, different notation: f k , w k , t k ● Calculus, probability, statistics, optimization theory, linear programming, …   ● People often have typos in their formulas.   ● A lot of formulas to digest in a short period of time. 17

Some tips ● No need to memorize the formulas   ● Determine which part of the formulas matters | V | P ( w k | c ) N ik ∏ P ( d i | c ) = P ( | d i | ) | d i | ! N ik ! k =1 classify ( d i ) = arg max P ( c ) P ( d i | c ) c | V | ∏ classify ( d i ) = arg max P ( w k | c ) N ik P ( c ) c k =1 ● It’s normal if you do not understand it the 1 st /2 nd time around. 18

Understanding a formula 1 + ∑ | D | i =1 N it P ( c j | d i ) P ( w t | c j ) = | V | + ∑ | V | s =1 ∑ | D | i =1 N is P ( c j | d i ) ∑ | D | i =1 N it P ( c j | d i ) P ( w t | c j ) = ∑ | V | s =1 ∑ | D | i =1 N is P ( c j | d i ) ∑ | D | i =1 N it P ( c j | d i ) = Z ( c j ) ∑ d i ∈ D ( c j ) N it = Z ( c j ) 19

Next Week ● On to MaxEnt! Don’t forget: reading assignment due Tuesday at 11AM! 20

Recap LING572 Advanced Statistical Methods for NLP January 23, - PowerPoint PPT Presentation

Recap LING572 Advanced Statistical Methods for NLP January 23, 2020 1 Outline Summary of the material so far Reading materials Math formulas 2 So far Introduction: Course overview Information theory Overview of

Semiotics: Recap Examples References Jrg Cassens Data and Process Visualization SoSe 2017

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Access Methods 1 / 44 Recap Recap 2 / 44 Recap A More Detailed Architecture granularity:

Trees (Part 2) 1 / 59 Trees (Part 2) Recap Recap 2 / 59 Trees (Part 2) Recap B + Tree A B

Trees (Part 1) 1 / 57 Trees (Part 1) Recap Recap 2 / 57 Trees (Part 1) Recap Hash Tables

Proof of Stake Recap Bitcoin Incentives Block subsidy Transaction fees Recap

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

Ruby Monstas Session 14 Agenda Recap Standard Library: RSS Exercises Recap Recap: TodoList

PARTNERSHIPS FOR CHILDREN Branding and Positioning :: FINAL WORKSHOP RECAP WORKSHOP RECAP //

1 7 Wonders Recap 2 Inspiring Travel 7 Wonders Recap 2 3 Responses Scenic Byways 7 Wonders

61A Lecture 11 Friday, September 21 Midterm 1 Recap 2 Midterm 1 Recap The exam was more

Interactive Proofs Lecture 16 What the all-powerful can convince mere mortals of 1 Recap 2

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker - April-July 2015 - Lecture 13: Grand

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2018

Welcome! Todays Agenda: Grand Recap Exam Now What Todays Agenda:

The Beginner's Guide to Dimensionality Reduction Explore the methods that data scientists use to

CSSE 220 Arrays, ArrayLists, Wrapper Classes, Auto-boxing, Enhanced for loop Please sit in the

Import Yo u r Data W OR K IN G W ITH DATA IN TH E TIDYVE R SE Alison Hill Professor &

Expected Physicists Usage of CMS Tier 3 Christopher D Jones Cornell University Overview

ECE700.07: Game Theory with Engineering Applications Le Lecture 5: 5: Ga Games in Ext Extensi

Policies and Principles Part I Recurring Themes Annoucements Homework 1 due now! '*'

Week 1 -Wednesday What did we talk about last time? Syllabus A little about computer

Identifying and Reading Research Papers By Andrew Suh and Zhongping Zhang What is the goal?

Recap LING572 Advanced Statistical Methods for NLP January 23, - PowerPoint PPT Presentation

Recap LING572 Advanced Statistical Methods for NLP January 23, 2020 1 Outline Summary of the material so far Reading materials Math formulas 2 So far Introduction: Course overview Information theory Overview of

Semiotics: Recap Examples References Jrg Cassens Data and Process Visualization SoSe 2017

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Access Methods 1 / 44 Recap Recap 2 / 44 Recap A More Detailed Architecture granularity:

Trees (Part 2) 1 / 59 Trees (Part 2) Recap Recap 2 / 59 Trees (Part 2) Recap B + Tree A B

Trees (Part 1) 1 / 57 Trees (Part 1) Recap Recap 2 / 57 Trees (Part 1) Recap Hash Tables

Proof of Stake Recap Bitcoin Incentives Block subsidy Transaction fees Recap

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

Ruby Monstas Session 14 Agenda Recap Standard Library: RSS Exercises Recap Recap: TodoList

PARTNERSHIPS FOR CHILDREN Branding and Positioning :: FINAL WORKSHOP RECAP WORKSHOP RECAP //

1 7 Wonders Recap 2 Inspiring Travel 7 Wonders Recap 2 3 Responses Scenic Byways 7 Wonders

61A Lecture 11 Friday, September 21 Midterm 1 Recap 2 Midterm 1 Recap The exam was more

Interactive Proofs Lecture 16 What the all-powerful can convince mere mortals of 1 Recap 2

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker - April-July 2015 - Lecture 13: Grand

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker &amp; Debabrata Panja - April-July 2018

Welcome! Todays Agenda: Grand Recap Exam Now What Todays Agenda:

The Beginner's Guide to Dimensionality Reduction Explore the methods that data scientists use to

CSSE 220 Arrays, ArrayLists, Wrapper Classes, Auto-boxing, Enhanced for loop Please sit in the

Import Yo u r Data W OR K IN G W ITH DATA IN TH E TIDYVE R SE Alison Hill Professor &amp;

Expected Physicists Usage of CMS Tier 3 Christopher D Jones Cornell University Overview

ECE700.07: Game Theory with Engineering Applications Le Lecture 5: 5: Ga Games in Ext Extensi

Policies and Principles Part I Recurring Themes Annoucements Homework 1 due now! '*'

Week 1 -Wednesday What did we talk about last time? Syllabus A little about computer

Identifying and Reading Research Papers By Andrew Suh and Zhongping Zhang What is the goal?

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2018

Import Yo u r Data W OR K IN G W ITH DATA IN TH E TIDYVE R SE Alison Hill Professor &