For Thursday No new reading Homework: Chapter 23, exercise 15 - PowerPoint PPT Presentation

For Thursday • No new reading • Homework: – Chapter 23, exercise 15

Homework Instructions 1. Pick a machine translation system. 2. Write (or find) 5 sentences of varying complexity in English. 3. Pick a language (A). 4. For each sentence from 1, translate it into language A and back to English. Then run that result back through the same language and back to English. 5. Pick a second, very different, language (B). 6. Redo step 4 with language B. 7. Turn in each of the 5 versions of the sentences in English (25 “sentences” total) and what the two languages are plus a discussion of the results.

Program 5

Syntactic Parsing • Given a string of words, determine if it is grammatical, i.e. if it can be derived from a particular grammar. • The derivation itself may also be of interest. • Normally want to determine all possible parse trees and then use semantics and pragmatics to eliminate spurious parses and build a semantic representation.

Parsing Complexity • Problem: Many sentences have many parses. • An English sentence with n prepositional phrases at the end has at least 2 n parses. I saw the man on the hill with a telescope on Tuesday in Austin... • The actual number of parses is given by the Catalan numbers: 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796...

Parsing Algorithms • Top Down: Search the space of possible derivations of S (e.g.depth-first) for one that matches the input sentence. I saw the man. VP -> V NP S -> NP VP V -> hit NP -> Det Adj* N V -> took Det -> the V -> saw Det -> a NP -> Det Adj* N Det -> the Det -> an Adj* -> e NP -> ProN N -> man ProN -> I

Parsing Algorithms (cont.) • Bottom Up: Search upward from words finding larger and larger phrases until a sentence is found. I saw the man. ProN saw the man ProN -> I NP saw the man NP -> ProN NP N the man N -> saw (dead end) NP V the man V -> saw NP V Det man Det -> the NP V Det Adj* man Adj* -> e NP V Det Adj* N N -> man NP V NP NP -> Det Adj* N NP VP VP -> V NP S -> NP VP S

Bottom-up Parsing Algorithm function BOTTOM-UP-PARSE( words, grammar ) returns a parse tree forest  words loop do if LENGTH( forest ) = 1 and CATEGORY( forest [1]) = START( grammar ) then return forest [1] else i  choose from {1...LENGTH( forest )} rule  choose from RULES( grammar ) n  LENGTH(RULE-RHS( rule )) subsequence  SUBSEQUENCE( forest , i , i + n -1) if MATCH( subsequence , RULE-RHS( rule )) then forest [ i ... i + n -1] / [MAKE-NODE(RULE-LHS( rule ), subsequence )] else fail end

Chart Parsers

Augmented Grammars • Simple CFGs generally insufficient: “The dogs bites the girl.” • Could deal with this by adding rules. – What’s the problem with that approach? • Could also “augment” the rules: add constraints to the rules that say number and person must match.

Verb Subcategorization

Semantics • Need a semantic representation • Need a way to translate a sentence into that representation. • Issues: – Knowledge representation still a somewhat open question – Composition “He kicked the bucket.” – Effect of syntax on semantics

Dealing with Ambiguity • Types: – Lexical – Syntactic ambiguity – Modifier meanings – Figures of speech • Metonymy • Metaphor

Resolving Ambiguity • Use what you know about the world, the current situation, and language to determine the most likely parse, using techniques for uncertain reasoning.

Discourse • More text = more issues • Reference resolution • Ellipsis • Coherence/focus

Survey of Some Natural Language Processing Research

Speech Recognition • Two major approaches – Neural Networks – Hidden Markov Models • A statistical technique • Tries to determine the probability of a certain string of words producing a certain string of sounds • Choose the most probable string of words • Both approaches are “learning” approaches

Syntax • Both hand-constructed approaches and data- driven or learning approaches • Multiple levels of processing and goals of processing • Most active area of work in NLP (maybe the easiest because we understand syntax much better than we understand semantics and pragmatics)

POS Tagging • Statistical approaches--based on probability of sequences of tags and of words having particular tags • Symbolic learning approaches – One of these: transformation-based learning developed by Eric Brill is perhaps the best known tagger • Approaches data-driven

Developing Parsers • Hand-crafted grammars • Usually some variation on CFG • Definite Clause Grammars (DCG) – A variation on CFGs that allow extensions like agreement checking – Built-in handling of these in most Prologs • Hand-crafted grammars follow the different types of grammars popular in linguistics • Since linguistics hasn’t produced a perfect grammar, we can’t code one

Efficient Parsing • Top down and bottom up both have issues • Also common is chart parsing – Basic idea is we’re going to locate and store info about every string that matches a grammar rule • One area of research is producing more efficient parsing

Data-Driven Parsing • PCFG - Probabilistic Context Free Grammars • Constructed from data • Parse by determining all parses (or many parses) and selecting the most probable • Fairly successful, but requires a LOT of work to create the data

Applying Learning to Parsing • Basic problem is the lack of negative examples • Also, mapping complete string to parse seems not the right approach • Look at the operations of the parse and learn rules for the operations, not for the complete parse at once

Syntax Demos • http://www2.lingsoft.fi/cgi-bin/engcg • http://nlp.stanford.edu:8080/parser/index.jsp • http://teemapoint.fi/nlpdemo/servlet/ParserS ervlet • http://www.link.cs.cmu.edu/link/submit- sentence-4.html

Language Identification • http://rali.iro.umontreal.ca/

Semantics • Most work probably hand-constructed systems • Some more interested in developing the semantics than the mappings • Basic question: what constitutes a semantic representation? • Answer may depend on application

Possible Semantic Representations • Logical representation • Database query • Case grammar

Distinguishing Word Senses • Use context to determine which sense of a word is meant • Probabilistic approaches • Rules • Issues – Obtaining sense-tagged corpora – What senses do we want to distinguish?

Semantic Demos • http://www.cs.utexas.edu/users/ml/geo.html • http://www.ling.gu.se/~lager/Mutbl/demo.ht ml

Information Retrieval • Take a query and a set of documents. • Select the subset of documents (or parts of documents) that match the query • Statistical approaches – Look at things like word frequency • More knowledge based approaches interesting, but maybe not helpful

Information Extraction • From a set of documents, extract “interesting” pieces of data • Hand-built systems • Learning pieces of the system • Learning the entire task (for certain versions of the task) • Wrapper Induction

IE Demos • http://services.gate.ac.uk/annie/

For Thursday No new reading Homework: Chapter 23, exercise 15 - PowerPoint PPT Presentation

For Thursday No new reading Homework: Chapter 23, exercise 15 Homework Instructions 1. Pick a machine translation system. 2. Write (or find) 5 sentences of varying complexity in English. 3. Pick a language (A). 4. For each

The Jewel of Dublins churches Thursday 18 July 13 1 Thursday 18 July 13 2 Thursday 18 July

Thursday, September 10, 2009 Thursday, September 10, 2009 Thursday, September

Thursday, 6 August 15 Kampala Thursday, 6 August 15 Kampala City Thursday, 6 August 15 Kampala

Kill-switch Presentation Motivation 2013/5/16 Thursday 2013/5/16 Thursday 2013/5/16 Thursday

Sound Thursday, 8 December 11 CD quality 44.1 kHz, 16-bit, stereo Thursday, 8 December 11

History and Biology Thursday, April 3, 14 Apis Cerana Apis Cerana Thursday, April 3, 14 Apis

Annual Meeting - May 2, 2013 Thursday, May 2, 13 Call to Order Thursday, May 2, 13 Review of

HyperAgile: Empowering Creativity within Software Development Processes Sam Aaron Thursday, 11

Pegarus & Poison Rubinius VM as a Multilanguage Platform Thursday, July 29, 2010 Brian Ford

Interpretation of Probability February 14, 2013 Thursday, February 14, 13 Thursday, February

Architecture of the Triposo travel guide Douwe Osinga (@dosinga) Thursday, 17 October 13 The

Timelines @ Twitter QCon London 2012 Arya Asemanfar Thursday, March 8, 2012 Poll-based Timeline

basho Thursday, 11 April 13 $ Thursday, 11 April 13 $ whoami Thursday, 11 April 13 $ whoami

Modeling your Customer December 2013 Thursday, 5 December, 13 Coming Clean Thursday, 5

Thursday, 22 March 2012 1 Thursday, 22 March 2012 T ere once was a queen so s rong and so pr o d,

Welcome Thursday, January 5, 2012 H EALTHY C HOCOLATE W EIGHT -L OSS S YMPOSIUM Thursday, January

Abstract Categorial Grammar Parsing the general case in Honor of G erard Huet Philippe de

Grammar and graphical semiotics in early syntactic diagrams: Clark (1847) and Reed-Kellogg (1876)

CSC 452 File Systems Files Jonathan Misurda jmisurda@cs.arizona.edu File Naming File

GETTING STARTED AND BEST PRACTICES Jeff Goldsmith, PhD Department of Biostatistics 1

Exercise 4.53. In each case, given the context-free grammar G , find an c. equivalent CFG with no

2. Lexical Analysis 2.1 Tasks of a Scanner 2.2 Regular Grammars and Finite Automata 2.3 Scanner

Gbor Csernyi Department of English Linguistics University of Debrecen

Lustre V6 Synchronous Team VERIMAG, Grenoble 2 Lustre Basics Structuration Only nodes

For Thursday No new reading Homework: Chapter 23, exercise 15 - PowerPoint PPT Presentation

For Thursday No new reading Homework: Chapter 23, exercise 15 Homework Instructions 1. Pick a machine translation system. 2. Write (or find) 5 sentences of varying complexity in English. 3. Pick a language (A). 4. For each

The Jewel of Dublins churches Thursday 18 July 13 1 Thursday 18 July 13 2 Thursday 18 July

Thursday, September 10, 2009 Thursday, September 10, 2009 Thursday, September

Thursday, 6 August 15 Kampala Thursday, 6 August 15 Kampala City Thursday, 6 August 15 Kampala

Kill-switch Presentation Motivation 2013/5/16 Thursday 2013/5/16 Thursday 2013/5/16 Thursday

Sound Thursday, 8 December 11 CD quality 44.1 kHz, 16-bit, stereo Thursday, 8 December 11

History and Biology Thursday, April 3, 14 Apis Cerana Apis Cerana Thursday, April 3, 14 Apis

Annual Meeting - May 2, 2013 Thursday, May 2, 13 Call to Order Thursday, May 2, 13 Review of

HyperAgile: Empowering Creativity within Software Development Processes Sam Aaron Thursday, 11

Pegarus &amp; Poison Rubinius VM as a Multilanguage Platform Thursday, July 29, 2010 Brian Ford

Interpretation of Probability February 14, 2013 Thursday, February 14, 13 Thursday, February

Architecture of the Triposo travel guide Douwe Osinga (@dosinga) Thursday, 17 October 13 The

Timelines @ Twitter QCon London 2012 Arya Asemanfar Thursday, March 8, 2012 Poll-based Timeline

basho Thursday, 11 April 13 $ Thursday, 11 April 13 $ whoami Thursday, 11 April 13 $ whoami

Modeling your Customer December 2013 Thursday, 5 December, 13 Coming Clean Thursday, 5

Thursday, 22 March 2012 1 Thursday, 22 March 2012 T ere once was a queen so s rong and so pr o d,

Welcome Thursday, January 5, 2012 H EALTHY C HOCOLATE W EIGHT -L OSS S YMPOSIUM Thursday, January

Abstract Categorial Grammar Parsing the general case in Honor of G erard Huet Philippe de

Grammar and graphical semiotics in early syntactic diagrams: Clark (1847) and Reed-Kellogg (1876)

CSC 452 File Systems Files Jonathan Misurda jmisurda@cs.arizona.edu File Naming File

GETTING STARTED AND BEST PRACTICES Jeff Goldsmith, PhD Department of Biostatistics 1

Exercise 4.53. In each case, given the context-free grammar G , find an c. equivalent CFG with no

2. Lexical Analysis 2.1 Tasks of a Scanner 2.2 Regular Grammars and Finite Automata 2.3 Scanner

Gbor Csernyi Department of English Linguistics University of Debrecen

Lustre V6 Synchronous Team VERIMAG, Grenoble 2 Lustre Basics Structuration Only nodes

Pegarus & Poison Rubinius VM as a Multilanguage Platform Thursday, July 29, 2010 Brian Ford