introduction to cl
play

Introduction to CL Session 1: 7/08/2011 What is computational - PowerPoint PPT Presentation

Introduction to CL Session 1: 7/08/2011 What is computational linguistics? Processing natural language text by computers for practical applications ... or linguistic research Among practical applications Sometimes the


  1. Introduction to CL Session 1: 7/08/2011

  2. What is computational linguistics? Processing natural language text by computers  for practical applications   ... or linguistic research •  Among practical applications  Sometimes the computer only needs to classify or transform the text  ... but sometimes it needs to “understand”  Ex: Watson: winner of ‘Jeopardy’  CL vs. NLP (natural language processing)

  3. NLP applications • Automatic speech recognition (ASR): speech  text • Machine translation (MT): L1  L2 • Information retrieval (IR): Query + documents  a subset of doc • Information extraction (IE): document  “database”

  4. NLP applications (cont) • Question answering (QA): Question + documents  Answer • Summarization: documents  summary • Natural language generation (NLG): representation  text

  5. Other Applications • Call Center • Spam filter • Spell checker • Sentiment analysis: product reviews • Bio-NLP: processing clinical data • ….

  6. Basic NLP tasks: Shallow processing • Tokenization: – He visited New York in 2003. • Morphological analysis: – visited  visit + -ed • Part-of-speech tagging – He/Pron visited/V New/?? York/N in/Prep 2003/CD • Name-entity tagging – He visited [LOCATION New York] in [YEAR 2003] • Chunking – [NP He] [V visited] [NP New York] in [NP 2003]

  7. Basic NLP tasks: Deep processing • Parsing – (S (NP (PRON he)) (VP (V visited) ….) • Semantic analysis – Semantic tagging: *AGENT He+ visited *DEST New York+ …. – Meaning: visit (he, New-York) • Discourse – Co- reference: “He” refers to “John” – Discourse structure • Dialogue • Generation

  8. Ambiguity • Phonological ambiguity: (ASR) – “too”, “two”, “to” – “ice cream” vs. “I scream” – “ta” in Mandarin: he, she, or it • Morphological ambiguity: (morphological analysis) – unlockable: [[un-lock]-able] vs. [un-[lock-able]] • Syntactic ambiguity: (parsing) – John saw a man with a telescope. – Time flies like an arrow.

  9. Ambiguity (cont) • Lexical ambiguity: (WSD) – Ex: “bank”, “saw”, “run” • Semantic ambiguity: (semantic representation) – Ex: every boy loves his mother – Ex: John and Mary bought a house • Discourse ambiguity: – Susan called Mary. She was sick. (coreference resolution) – It is pretty hot here. (intention resolution) • Machine translation: – “brother”, “cousin”, “uncle”, etc.

  10. Ambiguity resolution • Rule-based or knowledge-based: – Parsing: • I saw a man with a hat • I saw a man with a telescope (in my hand) – WSD: • “bank” – MT: • “brother”, “cousin”, “uncle” • Statistical approach: – Require training data – Build a statistical model – Knowledge and rules can be incorporated into the model as features etc.

  11. Major approaches to NLP • Rule-based approach • Statistical approach – Supervised learning – Semi-supervised learning – Unsupervised learning

  12. Supervised learning algorithms • Hidden Markov Model (HMM) • Decision tree • Decision list • Naïve Bayes • Transformation-based Learning (TBL) • Maximum Entropy (MaxEnt) • Support Vector Machine (SVM) • Conditional Random Field (CRF) • …

  13. Data • Raw text: – Monolingual: English/Chinese/Arabic Gigawords – Parallel data: UN data, EuroParl • Treebank: – Syntactic treebanks: a set of parse trees – Proposition Bank: – Discourse Treebank • Dictionaries • WordNet • FrameNet • …

  14. Applications Task1 Task2 Task_i … ML1 ML2 … ML_m … D1 D2 D_n

  15. The role of linguistics knowledge in NLP • An NLP system is language-independent. • Good or bad? – Good: it can be ported to many languages without any changes. – Bad: it cannot take advantage of properties of certain languages. • How to incorporate (linguistic) knowledge in statistical systems? – the design of models – as features – as filters – …  Building a treebank is an effective way.

Recommend


More recommend