introduction to the course
play

Introduction to the Course Prof. Sameer Singh CS 295: STATISTICAL - PowerPoint PPT Presentation

Introduction to the Course Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 January 10, 2017 Based on slides from Nathan Schneider, Mohit Bansal, Sebastian Riedel, Yejin Choi, and everyone else they copied from. About Me Academic


  1. Introduction to the Course Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 January 10, 2017 Based on slides from Nathan Schneider, Mohit Bansal, Sebastian Riedel, Yejin Choi, and everyone else they copied from.

  2. About Me Academic Positions • New Assistant Professor at UC Irvine! (2016 -) Postdoc at University of Washington (2013 -) • • PhD from University of Massachusetts, Amherst (2014) Research Interests Natural Language Processing: information extraction, relation • extraction, entity linking and disambiguation, joint modeling Machine Learning: interpretable ML, semi-supervised learning, • matrix/tensor factorization, probabilistic graphical models http://sameersingh.org sameer@uci.edu CS 295: STATISTICAL NLP (WINTER 2017) 2

  3. Natural Language Processing Introduction to NLP Course Information Upcoming deadlines CS 295: STATISTICAL NLP (WINTER 2017) 3

  4. Natural Language Processing Introduction to NLP Course Information Upcoming deadlines CS 295: STATISTICAL NLP (WINTER 2017) 4

  5. Knowledge Representation “Knowledge” NLP Structured Unstructured Precise, Actionable Ambiguous Specific to the task Lots and lots of it! Humans can read them, but Computers can use … very slowly … quickly answer questions … can’t remember all … memory is not a problem … can’t answer questions … don’t get tired CS 295: STATISTICAL NLP (WINTER 2017) 5

  6. “Deep” understanding CS 295: STATISTICAL NLP (WINTER 2017) 6

  7. Lots of Existing Applications CS 295: STATISTICAL NLP (WINTER 2017) 7

  8. But a long long way to go… CS 295: STATISTICAL NLP (WINTER 2017) 8

  9. Future Applications CS 295: STATISTICAL NLP (WINTER 2017) 9

  10. Future Applications Law, by reading Computational past cases for you Social Sciences Question Answering (instead of search) Digital Humanities Healthcare, by (historical texts) organizing records Science, by reading papers for you Assistive News Technologies (dialog Summarization systems) CS 295: STATISTICAL NLP (WINTER 2017) 10

  11. Turing’s test for Artificial Intelligence Human or Computer? CS 295: STATISTICAL NLP (WINTER 2017) 11

  12. Challenges in NLP WHY ISN’T NLP SOLVED YET? CS 295: STATISTICAL NLP (WINTER 2017) 12

  13. Three main challenges Ambiguity Sparsity Variation CS 295: STATISTICAL NLP (WINTER 2017) 13

  14. Three main challenges Ambiguity Sparsity Variation CS 295: STATISTICAL NLP (WINTER 2017) 14

  15. Language is Ambiguous One tries to be as informative as one possibly can, and gives as much information as is needed, and no more. - Grice’s Maxim of Quantity Corollary: The more you know, the less you need. Computers “know” very little. CS 295: STATISTICAL NLP (WINTER 2017) 15

  16. Words have many meanings Hershey’s Bars Protest CS 295: STATISTICAL NLP (WINTER 2017) 16

  17. Words have many meanings He knows you like your mother. CS 295: STATISTICAL NLP (WINTER 2017) 17

  18. Attachment Ambiguities Stolen painting found by tree. CS 295: STATISTICAL NLP (WINTER 2017) 18

  19. Attachment Ambiguities One morning I shot an elephant in my pajamas. How he got into my pajamas I'll never know. - Groucho Marx CS 295: STATISTICAL NLP (WINTER 2017) 19

  20. Attachment Ambiguities She saw the man with the telescope. CS 295: STATISTICAL NLP (WINTER 2017) 20

  21. And so on… ◦ Enraged Cow Injures Farmer with Ax ◦ Ban on Nude Dancing on Governor’s Desk ◦ Teacher Strikes Idle Kids ◦ Hospitals Are Sued by 7 Foot Doctors ◦ Iraqi Head Seeks Arms ◦ Kids Make Nutritious Snacks ◦ Local HS Dropouts Cut in Half CS 295: STATISTICAL NLP (WINTER 2017) 21

  22. Coreference Ambiguities My girlfriend and I met my lawyer for a drink, but she became ill and had to leave. CS 295: STATISTICAL NLP (WINTER 2017) 22

  23. Coreference Ambiguities The city councilmen refused the demonstrators a permit because they feared violence. “Context” is important The city councilmen refused the demonstrators a permit because they advocated violence. Winograd Schema: An Open Challenge for AI CS 295: STATISTICAL NLP (WINTER 2017) 23

  24. Coreference Ambiguities CS 295: STATISTICAL NLP (WINTER 2017) 24

  25. Entity Types and Identities Types Identities • Washington, Georgia, • Same Name: Clinton, Adams Kevin Smith, Jamaica, Springfield • John Deere, Williams, Dow Jones, Thomas Cook • Multiple “Names”: President, Obama, Chief, • Princeton, Amazon, Bambam,… Kingston “Context” is important CS 295: STATISTICAL NLP (WINTER 2017) 25

  26. Entity Types and Identities Not easy even for humans CS 295: STATISTICAL NLP (WINTER 2017) 26

  27. Three main challenges Ambiguity Sparsity Variation CS 295: STATISTICAL NLP (WINTER 2017) 27

  28. Sparsity of Words cornflakes mathematician s the fuzziness jumbling of pseudo-rapporteur lobby-ridden to perfunctorily and Lycketoft UNCITRAL H-0695 policyfor Commissioneris >1/3 occur only once CS 295: STATISTICAL NLP (WINTER 2017) 28

  29. Sparsity of Words CS 295: STATISTICAL NLP (WINTER 2017) 29

  30. Rescaling the Axes Zipf’s Law Regardless of the size of the data, there will be many rare words. CS 295: STATISTICAL NLP (WINTER 2017) 30

  31. Not unique to English In a document in which each character has been chosen randomly from a uniform distribution of all letters (plus a space character), the "words" follow the general trend of Zipf's. (Try it at home!) CS 295: STATISTICAL NLP (WINTER 2017) 31

  32. Three main challenges Ambiguity Sparsity Variation CS 295: STATISTICAL NLP (WINTER 2017) 32

  33. Many ways to say something She gave the book to Tom vs. She gave Tom the book Some kids popped by vs. A few children visited Is that window still open? vs Please close the window CS 295: STATISTICAL NLP (WINTER 2017) 33

  34. Variations in Domains Its vanished trees, the trees that had made way for Gatsby’s house, had once pandered in whispers to the last and greatest of all human dreams; for a transitory enchanted moment man must have held his breath in the presence of this continent, compelled into an aesthetic contemplation he neither understood nor desired, face to face for the last time in history with something commensurate to his capacity for wonder. ikr smh he asked fir yo last name so he can add u on fb lolololtw CS 295: STATISTICAL NLP (WINTER 2017) 34

  35. Tools & Methods HOW CAN WE GET COMPUTERS TO SOLVE THIS PROBLEM? CS 295: STATISTICAL NLP (WINTER 2017) 35

  36. Eli Sanders childOf spouse birthplace Bernie Brooklyn Entity resolution, Corpus Sanders Entity linking, Dorothy childOf Sanders Relation extraction… Mrs. Sanders.. his father Bernie.. Document he the city .. his mother .. Discourse analysis, Bernie Sanders... Eli Coreference, Person Location Person Person Sentiment analysis... Sanders was born in Brooklyn, to Dorothy and Eli Sanders. Sentence Dependency Parsing, Part of speech tagging, Named entity recognition… NNP VBD VBD IN NNP TO NNP CC NNP NNP Sanders was born in Brooklyn, to Dorothy and Eli Sanders. CS 295: STATISTICAL NLP (WINTER 2017) 36

  37. Two Different Approaches DIRECTLY USE LINGUISTICS MACHINE LEARNING! Expensive, time-consuming... Automatically learn from data! … but also, incomplete! … if the right data exists “Every time I fire a linguist, my accuracy goes up.” - Frederick Jelinek CS 295: STATISTICAL NLP (WINTER 2017) 37

  38. Example: Machine Translation From https://medium.com/@ageitgey/machine-learning-is-fun-part-5-language-translation-with-deep-learning-and-the-magic-of-sequences-2ace0acca0aa CS 295: STATISTICAL NLP (WINTER 2017) 38

  39. Example: Machine Translation Step 1: Break into Chunks CS 295: STATISTICAL NLP (WINTER 2017) 39

  40. Example: Machine Translation Step 2: Translations for each chunk CS 295: STATISTICAL NLP (WINTER 2017) 40

  41. Example: Machine Translation Step 3: Generate all possible sequences In same order In different order Step 4: Find the most human sounding one 😋 😖 I want to go to the prettiest beach. CS 295: STATISTICAL NLP (WINTER 2017) 41

  42. In summary… Language to Knowledge • Lots of applications… • Made a lot of progress, but not done It’s quite difficult • Varied, sparse, and lots of ambiguities • Context really matters Machine Learning! • With enough data and math, we can do it • The future looks really exciting for NLP CS 295: STATISTICAL NLP (WINTER 2017) 42

  43. Natural Language Processing Introduction to NLP Course Information Upcoming deadlines CS 295: STATISTICAL NLP (WINTER 2017) 43

  44. Course Logistics Meetings Reader • Room: ICS 180 Zhengli Zhao, PhD student • Tues/Thursday 9:30-10:50 • Email: zhengliz@uci.edu • • No holidays this quarter (Yay!) But, contact us only on Piazza • Office Hours Room: DBH 4204 • Tuesdays 1pm - 5pm (by appt only) • https://calendly.com/sameersingh/office-hours • Course webpage: http://sameersingh.org/courses/statnlp/wi17/ CS 295: STATISTICAL NLP (WINTER 2017) 44

Recommend


More recommend