natural language processing
play

Natural Language Processing Zhiyuan Liu THUNLP - PowerPoint PPT Presentation

Natural Language Processing Zhiyuan Liu THUNLP liuzy@tsinghua.edu.cn 1 What is Natural Language Processing? Input Structure Prediction Output: Semantic Structure Syntactic Structure The


  1. Natural Language Processing Zhiyuan Liu THUNLP liuzy@tsinghua.edu.cn 1

  2. What is Natural Language Processing? Input: 救援队正组织力量接应灾民下山 Structure Prediction Output: Semantic Structure Syntactic Structure The Nature of NLP is Structure Prediction! 2

  3. Complexity of NLP • The search space of possible syntactic 句长 二分结构树数量 1 1 trees of a sentence: exponential growth 2 2 with sentence length 3 2 4 5 5 14 !" ! 6 42 (Church and Patil, 1982) 7 132 "$% ! " ! 8 429 9 1, 430 10 4, 862 11 16, 796 12 58, 786 13 208, 012 14 742, 900 15 2, 674, 440 16 9, 794, 845 17 35, 357, 670 18 129, 644, 790 19 477, 638, 700 Similar to the problem of board game play 20 1, 767, 263, 190 Chess, Go 3

  4. Complexity of NLP • Solution: Find optimal structure regularized with prior syntactic and semantic knowledge • The regularized search for NLP is difficult – Variety – Recursion – Ambiguity – … 4

  5. Complexity of NLP: Variety Examples 亲,看帖要回帖哦! 走召弓虽 ( 超强 ) 1314 ( 一生一世 ) 菌男霉女 屌丝 5

  6. Complexity of NLP: Recursion 他觉得 他觉得自己丑 他觉得她认为自己丑 他觉得昨天下午的聚会她认为自己丑 他觉得他记得昨天下午的聚会她认为自己丑 他觉得就在刚才他记得昨天下午的聚会她认为自己丑 6

  7. Complexity of NLP: Recursion 7

  8. Complexity of NLP: Recursion Noam Chomsky We hypothesize that FLN only includes recursion and is the only uniquely human component of the faculty of language. 8

  9. Complexity of NLP: Ambiguity 9

  10. Complexity of NLP: Ambiguity 10

  11. Complexity of NLP: Ambiguity 领导:你这是什么意思? 小明:没什么意思。意思意思。 领导:你这就不够意思了。 小明:小意思,小意思。 领导:你这人真有意思。 小明:其实也没有别的意思。 领导:那我就不好意思了。 小明:是我不好意思。 【问:以上对话中的“意思”分别是什么意思?】 11

  12. Complexity of NLP: Ambiguity 1、冬天:能穿多少穿多少; 夏天:能穿 多少穿多少。 2、剩女产生的原因有两个:一是谁都看不 上;二是谁都看不上。 3、地铁里听到一个女孩大概是给男朋友打 电话:“我已经到西直门了,你快出来往 地铁站走。如果你到了,我还没到,你就 等着吧。如果我到了,你还没到,你就等 着吧。” 【问:请写出以上语句的区别】 12

  13. Complexity of NLP: Ambiguity 1、冬天:能穿多少穿多少; 夏天:能穿 多少穿多少。 2、剩女产生的原因有两个:一是谁都看不 上;二是谁都看不上。 3、地铁里听到一个女孩大概是给男朋友打 电话:“我已经到西直门了,你快出来往 地铁站走。如果你到了,我还没到,你就 等着吧。如果我到了,你还没到,你就等 着吧。” 【问:请写出以上语句的区别】 13

  14. Complexity of NLP: Ambiguity W:小明,那些题你对了吗? M:对了,但有些题没有对。 W:看你这样似乎很多题都没有对。 M:对呀。见那么多题不对,我都不敢继续 对下去了。 W:这么说,你后面的题都没对了? M:对。 【问:请写出这段对话的意思是什么?】 14

  15. Scientific Impact of NLP • Turing Test: A test of machine ability to exhibit intelligent behavior indistinguishable from that of a human 15

  16. Scientific Impact of NLP • Origin Version: Imitation Game 16

  17. Scientific Impact of NLP • Origin Version: Imitation Game 17

  18. Scientific Impact of NLP • 2011: IBM Watson DeepQA system competed on Jeopardy! and received the first place • A new milestone of AI after DeepBlue won world champion of chess in 1997 Q: Who was presidentially pardoned on September 8, 1974? A: Nixon. 18

  19. Application Impact of NLP • Nature 2011: Natural Language QA will be next-generation search engine • Gartner Hype Cycle 2012 19

  20. Application Impact of NLP • IT giants launch their NLP products Apple Siri Skype Translator Sogou Input Google Knowledge Graphs 20

  21. Application Impact of NLP • Many research grants in NLP from US government and military Project Names Release Start Grant $ 67.4 Machine Reading 2007 2008 million $ 25.0 Deep Exploration and Filtering of Text 2012 2013 million 21

  22. Impact of Chinese NLP • US government regards Chinese as key languages • Many institutes take Chinese NLP as research areas 22

  23. Impact of Chinese NLP Performance 100% 90% World Best China Best 80% 70% 60% 50% 40% 30% 20% 10% 0% 中文分词 中文依存句法分析 中文语义角色标注 中文语义依存分析 中文指代消解 中文 IR4QA 中英机器翻译 International Evaluation on Chinese NLP Tasks 23

  24. TYPICAL APPLICATIONS OF NLP 24

  25. Search Engines 25

  26. Online Advertisement 26

  27. Content-based Recommendation 27

  28. Personal Assistant 28

  29. Machine Translation 29

  30. Document Summarization 30

  31. Sentiment Analysis and Opinion Mining 31

  32. Key-phrase Extraction 32

  33. Computational Social Sciences • Culturomics ( 文化组学 ): http://www.culturomics.org • Harvard researchers use keywords over Google Books (5 million books from 1800 to 2000) to study the evolution of human culture • Google Book N-grams: https://books.google.com/ngrams 33

  34. Computational Social Sciences • Evolution of irregular verbs in English 34

  35. Computational Social Sciences 35

  36. Computational Social Sciences 36

  37. Computational Social Sciences Famous Persons Birth Location à Death Location Winckelmann Corpus Freebase 37

  38. Computational Social Sciences • Use language usage to study human psychology Cristian Danescu-Niculescu-Mizil No country for old members: User lifecycle and linguistic change in online communities with Dan Jurafsky, Jure Leskovec, Christopher Potts. WWW 2013. Best Paper Award. 38

  39. TYPICAL TASKS IN NLP 39

  40. Advances in Natural Language Processing • Julia Hirschberg, Columbia University • AAAI、ACL Fellow • Christopher Manning, Stanford University • ACM、AAAI、ACL Fellow • Google Scholar Citation > 50,000 40

  41. NLP Tasks 41

  42. Two Drives for Big Data NLP • Annotated Language Resources, e.g., LDC – Founded in 1992, about 700 datasets – Including speech, syntactic, translation and semantics 42

  43. Two Drives for Big Data NLP • Public Evaluations, e.g., CoNLL Shared Tasks 43

  44. Key Factors for NLP Developments CPU/GPU LDC PGM Semantic KB DL Syntactic Distributed Machine Language Computation Linguistic Resource Power Learning Theories 44

  45. Deep Learning • Learn deep structure from big data Geoffrey Hinton Judea Pearl Turing Award Winner 45

  46. Deep Learning • Deep learning has achieve great success in speech recognition and image annotation Speech Recognition Error rate decreases >30% Google Brain 46

  47. Deep Learning • DL has not achieved so significant improvement on NLP, but can avoid feature engineering in conventional methods • Brain-inspired methods for language learning 47

  48. Typical NLP Tasks • Machine Translation • Speech Dialog Systems and Chat-bots • Machine Reading • Sentiment Analysis and Opinion Mining 48

  49. Machine Translation Rule-based Phrase-based Neural-based 1960 1990s 1990s 2015 Statistics-based 49

  50. Machine Translation • Consider more discourse information to make translation more fluent (2013_DiscoMT)Feature Weight Optimization for Discourse-Level SMT 50

  51. Machine Translation • Computer-assistant Translation (2014) Predictive translation memory: A mixed-initiative system for human language translation 51

  52. Speech Dialog Systems and Chat-bots • Speech Recognition (ASR) • Dialog Management (DM) • Action • Text-to-Speech Synthesis (TTS) 52

  53. Machine Reading Wikipedia Cyc WordNet 知网 1985 1990 2005-2010 53

  54. Machine Reading Information Knowledge Information Knowledge Detection Linking Extraction Fusion Fuel Pump Pump Shorts Cold Relay wether Headlight Fails Running hot Engine Stalls At low speeds 54

  55. Knowledge Graphs 55

  56. Construction of Knowledge Graphs 56

  57. Application of Knowledge Graphs 57

  58. Sentiment Analysis and Opinion Mining • Infer personal states via text or speech – Including opinions, emotions, … • Detect opinion holders and targets 58

  59. RECOMMENDED READINGS 59

  60. NLP Books 信息检索导论 原作名 : Introduction to Information Retrieval 作 者 : Christopher D.Manning / Hinrich Schutze / Prabhakar Raghavan 译者 : 王斌;出版社 : 人民邮电出版社 统计自然语言处理基础 原作名: Foundations of Statistical Natural Language Processing 作者 : Chris Manning / Hinrich Schütze 译者 : 苑春法 / 李伟 / 李庆中 出版社 : 电子工业出版社;出版年 : 2005-01- 01 ;页数 : 432 60

Recommend


More recommend