natural language processing
play

Natural Language Processing Lecture 11/13/2015 CSCI 5832 Susan W. - PDF document

Natural Language Processing Lecture 11/13/2015 CSCI 5832 Susan W. Brown Natural Language Processing Were going to study what goes into getting computers to perform useful and interesting tasks involving human language. Speech and


  1. Natural Language Processing Lecture 1—1/13/2015 CSCI 5832 Susan W. Brown Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting tasks involving human language. Speech and Language Processing - Jurafsky and Martin 1/14/15 2 1

  2. Natural Language Processing More specifically, it’s about the structure of human languages, the algorithms that exploit that structure to process language, and the formal basis for those algorithms. Speech and Language Processing - Jurafsky and Martin 1/14/15 3 Why Should You Care? Three trends 1. An enormous amount of information is now available in machine readable form as natural language text (newspapers, web pages, medical records, financial filings, etc.) 2. Conversational agents are becoming an important form of human-computer communication 3. Much of human-human interaction is now mediated by computers via social media Speech and Language Processing - Jurafsky and Martin 1/14/15 4 2

  3. Applications • Let’s take a quick look at three important application areas � Text analytics � Question answering � Machine translation Speech and Language Processing - Jurafsky and Martin 1/14/15 5 Text Analytics • Data-mining of weblogs, microblogs, discussion forums, message boards, user groups, and other forms of user generated media � Product marketing information � Political opinion tracking � Social network analysis � Buzz analysis (what ’ s hot, what topics are people talking about right now) Speech and Language Processing - Jurafsky and Martin 1/14/15 6 3

  4. Text Analytics Speech and Language Processing - Jurafsky and Martin 1/14/15 7 Text Analytics Speech and Language Processing - Jurafsky and Martin 1/14/15 8 4

  5. Question Answering • Traditional information retrieval provides documents/resources that provide users with what they need to satisfy their information needs. • Question answering on the other hand directly provides an answer to information needs posed as questions. Speech and Language Processing - Jurafsky and Martin 1/14/15 9 Web Q/A Speech and Language Processing - Jurafsky and Martin 1/14/15 10 5

  6. Watson Speech and Language Processing - Jurafsky and Martin 1/14/15 11 Machine Translation The automatic translation of texts between languages is one of the oldest non-numerical applications in Computer Science. In the past 10 years or so, MT has gone from a niche academic curiosity to a robust commercial industry. Speech and Language Processing - Jurafsky and Martin 1/14/15 12 6

  7. Google Translate Speech and Language Processing - Jurafsky and Martin 1/14/15 13 Google Translate Speech and Language Processing - Jurafsky and Martin 1/14/15 14 7

  8. How? All of these applications operate by exploiting underlying regularities inherent in human languages. Sometimes in complex ways, sometimes in pretty trivial ways. Language Formal Practical structure models applications Speech and Language Processing - Jurafsky and Martin 1/14/15 15 Major Class Topics 1. Words 2. Syntax 5. Applications exploiting each 3. Meaning 4. Texts Speech and Language Processing - Jurafsky and Martin 1/14/15 16 8

  9. Applications First, what makes an application a language processing application (as opposed to any other piece of software)? � An application that requires the use of knowledge about the structure of human language � Example: Is Unix wc (word count) an example of a language processing application? Speech and Language Processing - Jurafsky and Martin 1/14/15 17 Applications • Word count? � When it counts words: Yes � To count words you need to know what a word is. That ’ s knowledge of language. • Note that the definition of “word” embodied in wc doesn ’ t work for Chinese or other languages that don’t delimit words with spaces � When it counts lines and bytes: No � Lines and bytes are computer artifacts, not linguistic entities Speech and Language Processing - Jurafsky and Martin 1/14/15 18 9

  10. Caveat NLP has an distinct AI aspect to it � We’re often dealing with ill-defined problems � We don ’ t often come up with exact solutions/ algorithms � That is, we’re dealing with algorithms that don’t work. � To make progress we need to have concrete metrics that tell us how well we’re doing, or at least whether our systems are improving or not Speech and Language Processing - Jurafsky and Martin 1/14/15 19 Administrative Stuff • Waitlist • Web page � verbs.colorado.edu/~mpalmer/csci5832/ • Reasonable preparation • Requirements Speech and Language Processing - Jurafsky and Martin 1/14/15 20 10

  11. Web Page The course web page can be found at. verbs.colorado.edu/~mpalmer/csci5832/ It will have the syllabus, lecture notes, assignments, announcements, etc. You should check the News tab periodically for new stuff. I’ll be using this in preference to email. Speech and Language Processing - Jurafsky and Martin 1/14/15 21 Mailing List • There is a automatically generated mailing list. • Mail goes to your colorado.edu email address. � I can’t alter it so don ’ t ask me to send your mail to gmail/yahoo/work or whatever � You can set up a forward yourself Speech and Language Processing - Jurafsky and Martin 1/14/15 22 11

  12. Preparation • Some exposure to • Ability to program logic • Basic algorithm and • Exposure to basic data structure analysis concepts in probability • Familiarity with linguistics • Ability to write well in English Speech and Language Processing - Jurafsky and Martin 1/14/15 23 Requirements • Readings: � Speech and Language Processing by Jurafsky and Martin, 2ed. Prentice-Hall 2009 � A few conference or journal papers • 3 programming assignments • Problem sets (about 10) • 2 midterms • Final report and presentation Speech and Language Processing - Jurafsky and Martin 1/14/15 24 12

  13. Programming • Most of the programming will be done in Python. � It ’ s free and works on Windows, Macs, and Linux � It ’ s easy to install � Easy to learn Speech and Language Processing - Jurafsky and Martin 1/14/15 25 Programming • Go to www.python.org to get started. • The default installation comes with an editor called IDLE. It ’ s a serviceable development environment. • Python mode in Emacs is pretty good. It ’ s what I use, but I ’ m a dinosaur. • If you like Eclipse use that. Speech and Language Processing - Jurafsky and Martin 1/14/15 26 13

  14. Grading • Programming assignments – 30% • Problem sets – 18% • Midterms – 28% • Final report 14% • Participation – 10% Speech and Language Processing - Jurafsky and Martin 1/14/15 27 Questions? Speech and Language Processing - Jurafsky and Martin 1/14/15 28 14

  15. Course Material • We ’ ll be intermingling discussions of: � Linguistic topics � Morphology, syntax, semantics, discourse � Formal systems � Regular languages, context-free grammars, probabilistic models � Applications � Question answering, machine translation, information extraction Speech and Language Processing - Jurafsky and Martin 1/14/15 29 Course Material • We won ’ t be doing speech recognition or synthesis. Speech and Language Processing - Jurafsky and Martin 1/14/15 30 15

  16. Topics: Linguistics • Word-level processing • Syntactic processing • Lexical and compositional semantics Speech and Language Processing - Jurafsky and Martin 1/14/15 31 Topics: Techniques • Finite-state methods • Context-free methods Supervised machine learning methods • Probabilistic models Speech and Language Processing - Jurafsky and Martin 1/14/15 32 16

  17. Categories of Knowledge • Phonology Each kind of knowledge has associated with • Morphology it an encapsulated set of processes that make use of it. • Syntax Interfaces are defined that allow the • Semantics various levels to communicate. This often leads to a pipeline architecture. • Pragmatics • Discourse Syntactic Semantic Morphological Processing Analysis Interpretation Context Speech and Language Processing - Jurafsky and Martin 1/14/15 33 Ambiguity • Ambiguity is a fundamental problem in computational linguistics • Hence, resolving, or managing, ambiguity is a recurrent theme Speech and Language Processing - Jurafsky and Martin 1/14/15 34 17

  18. Ambiguity • Find at least 5 meanings of this sentence: � I made her duck Speech and Language Processing - Jurafsky and Martin 1/14/15 35 Ambiguity • Find at least 5 meanings of this sentence: � I made her duck • I cooked waterfowl for her benefit (to eat) • I cooked waterfowl belonging to her • I created the (ceramic?) duck she owns • I caused her to quickly lower her upper body • I waved my magic wand and turned her into undifferentiated waterfowl Speech and Language Processing - Jurafsky and Martin 1/14/15 36 18

Recommend


More recommend