text classification contd document representations
play

Text Classification Contd + Document Representations Prof. Sameer - PowerPoint PPT Presentation

Text Classification Contd + Document Representations Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 January 17, 2017 Based on slides from Nathan Schneider, Noah Smith, Dan Klein and everyone else they copied from. Outline Logistic


  1. Text Classification Contd + Document Representations Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 January 17, 2017 Based on slides from Nathan Schneider, Noah Smith, Dan Klein and everyone else they copied from.

  2. Outline Logistic Regression Brief Intro to Neural Networks Document Representations CS 295: STATISTICAL NLP (WINTER 2017) 2

  3. Outline Logistic Regression Brief Intro to Neural Networks Document Representations CS 295: STATISTICAL NLP (WINTER 2017) 3

  4. Text Classification Paper Title CS Area • Human Computer Interaction Human machine interface for Theory • ABC computer applications • Artificial Intelligence Systems • CS 295: STATISTICAL NLP (WINTER 2017) 4

  5. Linear Models Human machine interface for ABC computer applications CS 295: STATISTICAL NLP (WINTER 2017) 5

  6. Matrix/Neural View CS 295: STATISTICAL NLP (WINTER 2017) 6

  7. Naïve Bayes as a Linear Model CS 295: STATISTICAL NLP (WINTER 2017) 7

  8. Joint vs Conditional Likelihood CS 295: STATISTICAL NLP (WINTER 2017) 8

  9. Logistic Regression Model CS 295: STATISTICAL NLP (WINTER 2017) 9

  10. Logistic Regression: 2 classes CS 295: STATISTICAL NLP (WINTER 2017) 10

  11. Estimating the parameters CS 295: STATISTICAL NLP (WINTER 2017) 11

  12. Gradient Descent CS 295: STATISTICAL NLP (WINTER 2017) 12

  13. Tips and Tricks: TF-IDF Sparsity of Words Remember Zipf’s Law? Lots of rare words • For classification, they can be more informative! • CS 295: STATISTICAL NLP (WINTER 2017) 13

  14. Tips and Tricks: TF-IDF Why use log(proportion) It works… • Importance is not a linear function • IDF is an additive function • CS 295: STATISTICAL NLP (WINTER 2017) 14

  15. Tips and Tricks: Regularization Overfitting Training data is finite: thus has spurious correlations • Rare words that occur with one label! • Or don’t occur often enough • Curse of the Zipf’s Law continues… • For a word that occurs 10 times… There are many that occur ~10 times! CS 295: STATISTICAL NLP (WINTER 2017) 15

  16. Tips and Tricks: Regularization Fixing Overfitting Ignore rare words (opposite of TF-IDF) • Penalize really high weights… • Accuracy Regularization Strength CS 295: STATISTICAL NLP (WINTER 2017) 16

  17. Tips and Tricks: Featurizing CS 295: STATISTICAL NLP (WINTER 2017) 17

  18. Outline Logistic Regression Brief Intro to Neural Networks Document Representations CS 295: STATISTICAL NLP (WINTER 2017) 18

  19. Neural View of Log. Regression CS 295: STATISTICAL NLP (WINTER 2017) 19

  20. Linear vs Non-linear Model CS 295: STATISTICAL NLP (WINTER 2017) 20

  21. Introducing a Hidden Layer CS 295: STATISTICAL NLP (WINTER 2017) 21

  22. What is Deep Learning? Many hidden layers In NLP, utilize unlabeled data to learn representations… (next lecture) CS 295: STATISTICAL NLP (WINTER 2017) 22

  23. Outline Logistic Regression Brief Intro to Neural Networks Document Representations CS 295: STATISTICAL NLP (WINTER 2017) 23

  24. Document Similarity A survey of user opinion of computer system response time Relation of user perceived response time to error measurement The generation of random, binary, ordered trees CS 295: STATISTICAL NLP (WINTER 2017) 24

  25. Cosine Distance Advantages • Between -1 and 1 (0 means no overlap) If all >0, it is between 0 and 1 • • Size of vectors don’t matter CS 295: STATISTICAL NLP (WINTER 2017) 25

  26. Term Document Matrix CS 295: STATISTICAL NLP (WINTER 2017) 26

  27. Local and Global Weighting Local Weighting Global Weighting • Binary: • Binary: • Term Freq: • Normal: Log: IDF: • • CS 295: STATISTICAL NLP (WINTER 2017) 27

  28. Example: Documents c1: Human machine interface for ABC computer applications c2: A survey of user opinion of computer system response time c3: The EPS user interface management system c4: System and human system engineering testing of EPS c5: Relation of user perceived response time to error measurement m1: The generation of random, binary, ordered trees m2: The intersection graph of paths in trees m3: Graph minors IV: Widths of trees and well-quasi-ordering m4: Graph minors: A survey From http://lsa.colorado.edu/papers/dp1.LSAintro.pdf CS 295: STATISTICAL NLP (WINTER 2017) 28

  29. Example: Term-Doc Matrix c1 c2 c3 c4 c5 m1 m2 m3 m4 human interface computer user system response time EPS survey trees graph minors CS 295: STATISTICAL NLP (WINTER 2017) 29

  30. Example: Distance Matrix c1 c2 c3 c4 c5 m1 m2 m3 m4 c1 c2 c3 c4 c5 m1 m2 m3 m4 CS 295: STATISTICAL NLP (WINTER 2017) 30

  31. Problems with Sparse Vectors c2: A survey of user opinion of computer system response time c1: Human machine interface m4: Graph minors: A survey for ABC computer applications CS 295: STATISTICAL NLP (WINTER 2017) 31

  32. Example: Distance Matrix c1 c2 c3 c4 c5 m1 m2 m3 m4 c1 c2 c3 c4 c5 m1 m2 m3 m4 CS 295: STATISTICAL NLP (WINTER 2017) 32

  33. Option 1: Clustering CS 295: STATISTICAL NLP (WINTER 2017) 33

  34. Example: Clustering c1 c1 c2 c2 c3 c3 c4 c4 c5 c5 m1 m1 m2 m2 m3 m3 m4 m4 CS 295: STATISTICAL NLP (WINTER 2017) 34

  35. Upcoming… • Homework 1 is up! No more material will be covered Homework • • Due: January 26, 2017 Project pitch is due January 23, 2017! • Start assembling teams now Project • Tons of datasets on the “projects” page on website • CS 295: STATISTICAL NLP (WINTER 2017) 35

Recommend


More recommend