nlp part 2
play

NLP!!! (Part 2) April 9, 2020 Data Science CSCI 1951A Brown - PowerPoint PPT Presentation

NLP!!! (Part 2) April 9, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter 1 Announcements Viz Lab tomorrow afternoon (4pm? Check Piazza) Project


  1. NLP!!! (Part 2) April 9, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter 1

  2. Announcements • Viz Lab tomorrow afternoon (4pm? Check Piazza) • Project Grades/Pitches/Presentations 2

  3. Today • More NLP! • Ngrams • Topic Models • Word Embeddings 3

  4. Today • More NLP! • Ngrams • Topic Models • Word Embeddings 4

  5. N-Grams • N-length sequence of words (unigrams, bigrams, trigrams, 4-grams, …) • Provides some context (differentiating “cute dog” from “hot dog ”) • Blows up size of vocabulary, increases sparsity • Usually vocab size cutoffs/min count thresholds apply to ngrams too 5

  6. N-Grams html does work . all webdev is awesome. 1gms: [‘html’, ‘does’, ‘work’, ‘.’, ‘all’, …] 2gms: [‘html does’, ‘does work’, ‘work .’, ‘. all’, …] 3gms: [‘html does work’, ‘does work .’, ‘work . all’, …] 6

  7. N-Grams html does work . all webdev is awesome. 1gms: [‘html’, ‘does’, ‘work’, ‘.’, ‘all’, …] 2gms: [‘html does’, ‘does work’, ‘work .’, ‘. all’, …] 3gms: [‘html does work’, ‘does work .’, ‘work . all’, …] skip-1gms: [‘html does’, ‘html work’, ‘does html’, ‘does work’, …] 7

  8. Tagging • Parts of Speech — “fly” the noun or “fly” the verb? • Word Sense Disambiguation — “fly” as in “take an airplane” or “fly” as in “go fast”? • Named Entity Recognition — “Washington” the place or “Washington” the person 8

  9. Syntactic Relations “Dependency Parsing” today, despite the lockdown, i will get groceries https://explosion.ai/demos/displacy 9

  10. Syntactic Relations “Dependency Parsing” today, despite the lockdown, i will get groceries https://explosion.ai/demos/displacy 10

  11. Syntactic Relations “Constituency Parsing” all webdev is awesome. https://demo.allennlp.org/constituency-parsing 11

  12. 12

  13. Today • More NLP! • Ngrams • Topic Models • Word Embeddings 13

  14. Topic Models When I try to display dots Can you elaborate on from part 2 on my mac exactly what the directions (tried chrome, firefox, and are in part 2 step 3, the safari), the elements do stencil code does not quite not appear in the html. imply what we are supposed to do… Changes I make to the nations.js file do not affect any of the html in after I load the nations.html file 14

  15. Topic Models When I try to display dots Can you elaborate on from part 2 on my mac exactly what the directions (tried chrome, firefox, and are in part 2 step 3, the safari), the elements do stencil code does not quite not appear in the html. imply what we are supposed to do… Changes I make to the nations.js file do not affect any of the html in after I load the nations.html file instructions: stencil, instructions, part, step, rubric, handin… UI: html, javascript, debug, display, elements… systems: mac, windows, linux, chrome, firefox, os… fillers: I, you, when, the, and, a 15

  16. Topic Models Where do documents come from? “The generative story” instructions: stencil, instructions, part, step, rubric, handin… UI: html, javascript, debug, display, elements… systems: mac, windows, linux, chrome, firefox, os… fillers: I, you, when, the, and, a 16

  17. Topic Models Where do documents come from? “The generative story” instructions: stencil, instructions, part, step, rubric, handin… UI: html, javascript, debug, display, elements… systems: mac, windows, linux, chrome, firefox, os… fillers: I, you, when, the, and, a 1. Sample a topic 17

  18. Topic Models Where do documents come from? “The generative story” instructions: stencil, instructions, part, step, rubric, handin… UI: html, javascript, debug, display, elements… systems: mac, windows, linux, chrome, firefox, os… fillers: I, you, when, the, and, a You 2. Sample a word from that topic 18

  19. Topic Models Where do documents come from? “The generative story” instructions: stencil, instructions, part, step, rubric, handin… UI: html, javascript, debug, display, elements… systems: mac, windows, linux, chrome, firefox, os… fillers: I, you, when, the, and, a You 1. Sample a topic 19

  20. Topic Models Where do documents come from? “The generative story” instructions: stencil, instructions, part, step, rubric, handin… UI: html, javascript, debug, display, elements… systems: mac, windows, linux, chrome, firefox, os… fillers: I, you, when, the, and, a You javascript 2. Sample a word from that topic 20

  21. Topic Models Where do documents come from? “The generative story” instructions: stencil, instructions, part, step, rubric, handin… UI: html, javascript, debug, display, elements… systems: mac, windows, linux, chrome, firefox, os… fillers: I, you, when, the, and, a You javascript 1. Sample a topic 21

  22. Topic Models Where do documents come from? “The generative story” instructions: stencil, instructions, part, step, rubric, handin… UI: html, javascript, debug, display, elements… systems: mac, windows, linux, chrome, firefox, os… fillers: I, you, when, the, and, a You javascript handin 2. Sample a word from that topic 22

  23. Topic Models 23

  24. Topic Models “latent” variable (not observed) 24

  25. Topic Models words are determined by topic (and are conditionally independent of each other) 25

  26. Topic Models documents are a distribution over topics 26

  27. Topic Models set parameters to maximize probability of observations 27

  28. Topic Models part 2 html does not work 28

  29. Topic Models 60 45 part 2 html does not work 30 15 0 Topic1 Topic2 Topic3 Topic4 29

  30. Topic Models 60 45 part 2 html does not work 30 15 0 Topic1 Topic2 Topic3 Topic4 html html javascript javascript work work handin handin part part stencil stencil 30 0 10 20 30 40 0 7.5 15 22.5 30

  31. Clicker Question! 31

  32. Clicker Question! Which is the best parameter setting for 0.3 part the observed data? 0 <NUM> 0.2 0.3 0 html part <NUM> html does not work 0.4 0.1 does 0.1 0.1 not 50 70 0.2 50 50 67 37.5 52.5 0.2 work 25 35 0 33 12.5 17.5 0 0.1 0.2 0.3 0.4 0 0 Topic1 Topic2 Topic1 Topic2 Topic 1 Topic 2 (a) (b) 32

  33. Clicker Question! a: (0.3+0.2+0+0.1+0.1+0.2)x0.5 Which is the best parameter setting for 0.3 part (0+0.3+0.4+0.1+0.2)x0.5 the observed data? 0 = 0.45 + 0.5 <NUM> 0.2 = 0.95 0.3 0 html part <NUM> html does not work 0.4 0.1 does 0.1 0.1 not 50 70 0.2 50 50 67 37.5 52.5 0.2 work 25 35 0 33 12.5 17.5 0 0.1 0.2 0.3 0.4 0 0 Topic1 Topic2 Topic1 Topic2 Topic 1 Topic 2 (a) (b) 33

  34. Clicker Question! a: (0.3+0.2+0+0.1+0.1+0.2)x0.5 Which is the best parameter setting for 0.3 part (0+0.3+0.4+0.1+0.2)x0.5 the observed data? 0 = 0.45 + 0.5 <NUM> 0.2 = 0.95 0.3 0 html part <NUM> html does not work 0.4 0.1 does 0.1 0.1 not 50 70 0.2 50 50 67 37.5 52.5 0.2 work 25 35 0 33 12.5 17.5 0 0.1 0.2 0.3 0.4 0 0 Topic1 Topic2 Topic1 Topic2 Topic 1 Topic 2 (a) (b) 34

  35. Clicker Question! a: (0.3+0.2+0+0.1+0.1+0.2)x0.5 Which is the best parameter setting for 0.3 part (0+0.3+0.4+0.1+0.2)x0.5 the observed data? 0 = 0.45 + 0.5 <NUM> 0.2 = 0.95 0.3 0 html part <NUM> html does not work 0.4 0.1 does 0.1 0.1 not 50 70 0.2 50 50 67 37.5 52.5 0.2 work 25 35 0 33 12.5 17.5 0 0.1 0.2 0.3 0.4 0 0 Topic1 Topic2 Topic1 Topic2 Topic 1 Topic 2 (a) (b) 35

  36. Clicker Question! b: (0.3+0.2+0+0.1+0.1+0.2)x0.33 Which is the best parameter setting for 0.3 part (0+0.3+0.4+0.1+0.2)x0.67 the observed data? 0 = 0.297 + 0.67 <NUM> 0.2 = 0.967 0.3 0 html part <NUM> html does not work 0.4 0.1 does 0.1 0.1 not 50 70 0.2 50 50 67 37.5 52.5 0.2 work 25 35 0 33 12.5 17.5 0 0.1 0.2 0.3 0.4 0 0 Topic1 Topic2 Topic1 Topic2 Topic 1 Topic 2 (a) (b) 36

  37. Topic Models 37

  38. Topic Models LDA Latent Dirichelet Allocation (latent = not directly observed; Dirichelet = prior follows a Dirichelet distribution) Generative Model Set parameters using EM or MCMC 38

  39. Topic Models LDA LSA Latent Dirichelet Allocation Latent Semantic Analysis (latent = not directly observed; Dirichelet = prior follows a Dirichelet distribution) Generative Model Discriminative Model Set parameters using EM Set parameters by factorizing or MCMC the term-document matrix 39

  40. cong parli the US UK ress ame 1 1 1 1 0 Topic Models doc1 1 0 1 0 1 doc2 1 1 0 1 0 doc3 1 0 1 0 1 doc4 cong parlia the US UK ress ment -0.65 -0.34 -0.51 -0.34 -0.31 d1 -0.60 -0.39 0.70 0.00 3.06 0.00 0.00 0.00 0.00 0.02 -0.54 0.34 -0.54 0.56 d2 -0.48 0.50 -0.12 -0.71 0.00 1.81 0.00 0.00 0.00 d3 -0.43 -0.58 -0.69 0.00 -0.42 0.02 0.79 0.02 -0.44 0.00 0.00 0.57 0.00 0.00 d4 -0.48 0.50 -0.12 0.71 -0.63 0.27 0.00 0.37 0.63 0.00 0.00 0.00 0.00 0.00 -0.04 0.73 0.00 -0.68 0.04 U D V 40

Recommend


More recommend