semantic parsing with combinatory categorial grammars
play

Semantic Parsing with Combinatory Categorial Grammars Yoav Artzi ! - PowerPoint PPT Presentation

A b r i d g e d Semantic Parsing with Combinatory Categorial Grammars Yoav Artzi ! University of Washington Based on ACL 2013 Tutorial ! With Nicholas FitzGerald and Luke Zettlemyer ! Original tutorial slides available at


  1. CCG Categories ADJ : λ x.fun ( x ) • Basic building block ! • Capture syntactic and semantic information jointly

  2. CCG Categories ADJ : λ x.fun ( x ) Syntax Semantics • Basic building block ! • Capture syntactic and semantic information jointly

  3. CCG Categories ADJ : λ x.fun ( x ) Syntax ( S \ NP ) /ADJ : λ f. λ x.f ( x ) NP : CCG • Primitive symbols: N, S, NP , ADJ and PP ! • Syntactic combination operator (/,\) ! • Slashes specify argument order and direction

  4. CCG Categories ADJ : λ x.fun ( x ) Semantics ( S \ NP ) /ADJ : λ f. λ x.f ( x ) NP : CCG • λ -calculus expression ! • Syntactic type maps to semantic type

  5. CCG Lexical Entries fun ` ADJ : λ x.fun ( x ) • Pair words and phrases with meaning ! • Meaning captured by a CCG category

  6. CCG Lexical Entries fun ` ADJ : λ x.fun ( x ) Natural ! CCG Category Language • Pair words and phrases with meaning ! • Meaning captured by a CCG category

  7. CCG Lexicons fun ` ADJ : λ x.fun ( x ) is ` ( S \ NP ) /ADJ : λ f. λ x.f ( x ) CCG ` NP : CCG • Pair words and phrases with meaning ! • Meaning captured by a CCG category

  8. Between CCGs and CFGs CFGs CCGs Combination operations Many Few Parse tree nodes Non-terminals Categories Handful, but Syntactic symbols Few dozen can combine Paired with words POS tags Categories

  9. Parsing with CCGs CCG is fun NP S \ NP/ADJ ADJ CCG λ f. λ x.f ( x ) λ x.fun ( x ) > S \ NP λ x.fun ( x ) < S fun ( CCG ) Use lexicon to match words and phrases with their categories

  10. CCG Operations • Small set of operators ! • Input: 1-2 CCG categories ! • Output: A single CCG category ! • Operate on syntax semantics together ! • Mirror natural logic operations

  11. CCG Operations Application B : g A \ B : f ⇒ A : f ( g ) ( < ) A/B : f B : g ⇒ A : f ( g ) ( > ) • Equivalent to function application ! • Two directions: forward and backward ! - Determined by slash direction

  12. CCG Operations Application Argument Function Result B : g A \ B : f ⇒ A : f ( g ) ( < ) A/B : f B : g ⇒ A : f ( g ) ( > ) • Equivalent to function application ! • Two directions: forward and backward ! - Determined by slash direction

  13. Parsing with CCGs CCG is fun NP S \ NP/ADJ ADJ CCG λ f. λ x.f ( x ) λ x.fun ( x ) > S \ NP λ x.fun ( x ) < S fun ( CCG ) Use lexicon to match words and phrases with their categories

  14. Parsing with CCGs CCG is fun NP S \ NP/ADJ ADJ CCG λ f. λ x.f ( x ) λ x.fun ( x ) > S \ NP λ x.fun ( x ) < S fun ( CCG ) Combine categories using operators A/B : f B : g ⇒ A : f ( g ) ( > )

  15. Parsing with CCGs CCG is fun NP S \ NP/ADJ ADJ CCG λ f. λ x.f ( x ) λ x.fun ( x ) > S \ NP λ x.fun ( x ) < S fun ( CCG ) Combine categories using operators B : g A \ B : f ⇒ A : f ( g ) ( < )

  16. Parsing with CCGs Composed adjectives square blue or round yellow pillow Non-standard coordination

  17. CCG Operations Composition A/B : f B/C : g ⇒ A/C : λ x.f ( g ( x )) ( > B ) B \ C : g A \ B : f ⇒ A \ C : λ x.f ( g ( x )) ( < B ) • Equivalent to function composition* ! • Two directions: forward and backward * Formal definition of logical composition in supplementary slides

  18. CCG Operations Composition f g f ◦ g A/B : f B/C : g ⇒ A/C : λ x.f ( g ( x )) ( > B ) B \ C : g A \ B : f ⇒ A \ C : λ x.f ( g ( x )) ( < B ) • Equivalent to function composition* ! • Two directions: forward and backward * Formal definition of logical composition in supplementary slides

  19. CCG Operations Type Shifting ADJ : λ x.g ( x ) ⇒ N/N : λ f. λ x.f ( x ) ∧ g ( x ) PP : λ x.g ( x ) ⇒ N \ N : λ f. λ x.f ( x ) ∧ g ( x ) AP : λ e.g ( e ) ⇒ S \ S : λ f. λ e.f ( e ) ∧ g ( e ) AP : λ e.g ( e ) ⇒ S/S : λ f. λ e.f ( e ) ∧ g ( e ) • Category-specific unary operations ! • Modify category type to take an argument ! • Helps in keeping a compact lexicon

  20. CCG Operations Type Shifting Input Output ADJ : λ x.g ( x ) ⇒ N/N : λ f. λ x.f ( x ) ∧ g ( x ) PP : λ x.g ( x ) ⇒ N \ N : λ f. λ x.f ( x ) ∧ g ( x ) AP : λ e.g ( e ) ⇒ S \ S : λ f. λ e.f ( e ) ∧ g ( e ) AP : λ e.g ( e ) ⇒ S/S : λ f. λ e.f ( e ) ∧ g ( e ) • Category-specific unary operations ! • Modify category type to take an argument ! • Helps in keeping a compact lexicon

  21. CCG Operations Type Shifting Input Output ADJ : λ x.g ( x ) ⇒ N/N : λ f. λ x.f ( x ) ∧ g ( x ) PP : λ x.g ( x ) ⇒ N \ N : λ f. λ x.f ( x ) ∧ g ( x ) AP : λ e.g ( e ) ⇒ S \ S : λ f. λ e.f ( e ) ∧ g ( e ) AP : λ e.g ( e ) ⇒ S/S : λ f. λ e.f ( e ) ∧ g ( e ) Topicalization • Category-specific unary operations ! • Modify category type to take an argument ! • Helps in keeping a compact lexicon

  22. CCG Operations Coordination and ` C : conj or ` C : disj • Coordination is special cased ! - Specific rules perform coordination ! - Coordinating operators are marked with special lexical entries

  23. Parsing with CCGs square blue or round yellow pillow ADJ ADJ C ADJ ADJ N λ x.square ( x ) λ x.blue ( x ) disj λ x.round ( x ) λ x.yellow ( x ) λ x.pillow ( x ) N/N N/N N/N N/N λ f. λ x.f ( x ) ∧ square ( x ) λ f. λ x.f ( x ) ∧ blue ( x ) λ f. λ x.f ( x ) ∧ round ( x ) λ f. λ x.f ( x ) ∧ yellow ( x ) > B > B N/N N/N λ f. λ x.f ( x ) ∧ square ( x ) ∧ blue ( x ) λ f. λ x.f ( x ) ∧ round ( x ) ∧ yellow ( x ) < Φ > N/N λ f. λ x.f ( x ) ∧ (( square ( x ) ∧ blue ( x )) ∨ ( round ( x ) ∧ yellow ( x ))) < N λ x.pillow ( x ) ∧ (( square ( x ) ∧ blue ( x )) ∨ ( round ( x ) ∧ yellow ( x )))

  24. Parsing with CCGs square blue or round yellow pillow ADJ ADJ C ADJ ADJ N λ x.square ( x ) λ x.blue ( x ) disj λ x.round ( x ) λ x.yellow ( x ) λ x.pillow ( x ) N/N N/N N/N N/N λ f. λ x.f ( x ) ∧ square ( x ) λ f. λ x.f ( x ) ∧ blue ( x ) λ f. λ x.f ( x ) ∧ round ( x ) λ f. λ x.f ( x ) ∧ yellow ( x ) > B > B N/N N/N λ f. λ x.f ( x ) ∧ square ( x ) ∧ blue ( x ) λ f. λ x.f ( x ) ∧ round ( x ) ∧ yellow ( x ) < Φ > N/N λ f. λ x.f ( x ) ∧ (( square ( x ) ∧ blue ( x )) ∨ ( round ( x ) ∧ yellow ( x ))) < N λ x.pillow ( x ) ∧ (( square ( x ) ∧ blue ( x )) ∨ ( round ( x ) ∧ yellow ( x ))) Use lexicon to match words and phrases with their categories

  25. Parsing with CCGs square blue or round yellow pillow ADJ ADJ C ADJ ADJ N λ x.square ( x ) λ x.blue ( x ) disj λ x.round ( x ) λ x.yellow ( x ) λ x.pillow ( x ) N/N N/N N/N N/N λ f. λ x.f ( x ) ∧ square ( x ) λ f. λ x.f ( x ) ∧ blue ( x ) λ f. λ x.f ( x ) ∧ round ( x ) λ f. λ x.f ( x ) ∧ yellow ( x ) > B > B N/N N/N λ f. λ x.f ( x ) ∧ square ( x ) ∧ blue ( x ) λ f. λ x.f ( x ) ∧ round ( x ) ∧ yellow ( x ) < Φ > N/N λ f. λ x.f ( x ) ∧ (( square ( x ) ∧ blue ( x )) ∨ ( round ( x ) ∧ yellow ( x ))) < N λ x.pillow ( x ) ∧ (( square ( x ) ∧ blue ( x )) ∨ ( round ( x ) ∧ yellow ( x ))) Shift adjectives to combine ADJ : λ x.g ( x ) ⇒ N/N : λ f. λ x.f ( x ) ∧ g ( x )

  26. Parsing with CCGs square blue or round yellow pillow ADJ ADJ C ADJ ADJ N λ x.square ( x ) λ x.blue ( x ) disj λ x.round ( x ) λ x.yellow ( x ) λ x.pillow ( x ) N/N N/N N/N N/N λ f. λ x.f ( x ) ∧ square ( x ) λ f. λ x.f ( x ) ∧ blue ( x ) λ f. λ x.f ( x ) ∧ round ( x ) λ f. λ x.f ( x ) ∧ yellow ( x ) > B > B N/N N/N λ f. λ x.f ( x ) ∧ square ( x ) ∧ blue ( x ) λ f. λ x.f ( x ) ∧ round ( x ) ∧ yellow ( x ) < Φ > N/N λ f. λ x.f ( x ) ∧ (( square ( x ) ∧ blue ( x )) ∨ ( round ( x ) ∧ yellow ( x ))) < N λ x.pillow ( x ) ∧ (( square ( x ) ∧ blue ( x )) ∨ ( round ( x ) ∧ yellow ( x ))) Shift adjectives to combine ADJ : λ x.g ( x ) ⇒ N/N : λ f. λ x.f ( x ) ∧ g ( x )

  27. Parsing with CCGs square blue or round yellow pillow ADJ ADJ C ADJ ADJ N λ x.square ( x ) λ x.blue ( x ) disj λ x.round ( x ) λ x.yellow ( x ) λ x.pillow ( x ) N/N N/N N/N N/N λ f. λ x.f ( x ) ∧ square ( x ) λ f. λ x.f ( x ) ∧ blue ( x ) λ f. λ x.f ( x ) ∧ round ( x ) λ f. λ x.f ( x ) ∧ yellow ( x ) > B > B N/N N/N λ f. λ x.f ( x ) ∧ square ( x ) ∧ blue ( x ) λ f. λ x.f ( x ) ∧ round ( x ) ∧ yellow ( x ) < Φ > N/N λ f. λ x.f ( x ) ∧ (( square ( x ) ∧ blue ( x )) ∨ ( round ( x ) ∧ yellow ( x ))) > N λ x.pillow ( x ) ∧ (( square ( x ) ∧ blue ( x )) ∨ ( round ( x ) ∧ yellow ( x ))) Compose pairs of adjectives A/B : f B/C : g ⇒ A/C : λ x.f ( g ( x )) ( > B )

  28. Parsing with CCGs square blue or round yellow pillow ADJ ADJ C ADJ ADJ N λ x.square ( x ) λ x.blue ( x ) disj λ x.round ( x ) λ x.yellow ( x ) λ x.pillow ( x ) N/N N/N N/N N/N λ f. λ x.f ( x ) ∧ square ( x ) λ f. λ x.f ( x ) ∧ blue ( x ) λ f. λ x.f ( x ) ∧ round ( x ) λ f. λ x.f ( x ) ∧ yellow ( x ) > B > B N/N N/N λ f. λ x.f ( x ) ∧ square ( x ) ∧ blue ( x ) λ f. λ x.f ( x ) ∧ round ( x ) ∧ yellow ( x ) < Φ > N/N λ f. λ x.f ( x ) ∧ (( square ( x ) ∧ blue ( x )) ∨ ( round ( x ) ∧ yellow ( x ))) > N λ x.pillow ( x ) ∧ (( square ( x ) ∧ blue ( x )) ∨ ( round ( x ) ∧ yellow ( x ))) Coordinate composed adjectives

  29. Parsing with CCGs square blue or round yellow pillow ADJ ADJ C ADJ ADJ N λ x.square ( x ) λ x.blue ( x ) disj λ x.round ( x ) λ x.yellow ( x ) λ x.pillow ( x ) N/N N/N N/N N/N λ f. λ x.f ( x ) ∧ square ( x ) λ f. λ x.f ( x ) ∧ blue ( x ) λ f. λ x.f ( x ) ∧ round ( x ) λ f. λ x.f ( x ) ∧ yellow ( x ) > B > B N/N N/N λ f. λ x.f ( x ) ∧ square ( x ) ∧ blue ( x ) λ f. λ x.f ( x ) ∧ round ( x ) ∧ yellow ( x ) < Φ > N/N λ f. λ x.f ( x ) ∧ (( square ( x ) ∧ blue ( x )) ∨ ( round ( x ) ∧ yellow ( x ))) > N λ x.pillow ( x ) ∧ (( square ( x ) ∧ blue ( x )) ∨ ( round ( x ) ∧ yellow ( x ))) Apply coordinated adjectives to noun A/B : f B : g ⇒ A : f ( g ) ( > )

  30. Parsing with CCGs x CCG is fun NP S \ NP/ADJ ADJ CCG λ f. λ x.f ( x ) λ x.fun ( x ) y > S \ NP λ x.fun ( x ) < z S fun ( CCG ) Lexical Many parsing Many potential + Ambiguity decisions trees and LFs

  31. Weighted Linear CCGs • Given a weighted linear model: ! - CCG lexicon Λ ! - Feature function ! f : X × Y → R m - Weights ! w ∈ R m • The best parse is: ! y ∗ = arg max w · f ( x, y ) ! y • We consider all possible parses y for sentence x given the lexicon Λ

  32. Parsing Algorithms • Syntax-only CCG parsing has polynomial time CKY-style algorithms ! • Parsing with semantics requires entire category as chart signature ! - e.g., ! ADJ : λ x.fun ( x ) • In practice, prune to top-N for each span ! - Approximate, but polynomial time

  33. More on CCGs • Generalized type-raising operations ! • Cross composition operations for cross serial dependencies ! • Compositional approaches to English intonation ! • and a lot more ... even Jazz [Steedman 1996; 2000; 2011; Granroth and Steedman 2012]

  34. Parsing Learning Modeling ! • Lambda calculus ! • Parsing with Combinatory Categorial Grammars ! • Linear CCGs ! • Factored lexicons Online

  35. Learning Learning Data CCG Algorithm • What kind of data/supervision we can use? ! • What do we need to learn?

  36. Parsing as Structure Prediction show me flights to Boston S/N N PP/NP NP λ f.f λ x.flight ( x ) λ y. λ x.to ( x, y ) BOSTON > PP λ x.to ( x, BOSTON ) N \ N λ f. λ x.f ( x ) ∧ to ( x, BOSTON ) < N λ x.flight ( x ) ∧ to ( x, BOSTON ) > S λ x.flight ( x ) ∧ to ( x, BOSTON )

  37. Learning CCG show me flights to Boston S/N N PP/NP NP λ f.f λ x.flight ( x ) λ y. λ x.to ( x, y ) BOSTON > PP λ x.to ( x, BOSTON ) N \ N λ f. λ x.f ( x ) ∧ to ( x, BOSTON ) < N λ x.flight ( x ) ∧ to ( x, BOSTON ) > S λ x.flight ( x ) ∧ to ( x, BOSTON ) Combinators w Lexicon Predefined

  38. Supervised Data show me flights to Boston S/N N PP/NP NP λ f.f λ x.flight ( x ) λ y. λ x.to ( x, y ) BOSTON > PP λ x.to ( x, BOSTON ) N \ N λ f. λ x.f ( x ) ∧ to ( x, BOSTON ) < N λ x.flight ( x ) ∧ to ( x, BOSTON ) > S λ x.flight ( x ) ∧ to ( x, BOSTON )

  39. Supervised Data t show me flights to Boston n S/N N PP/NP NP e λ f.f λ x.flight ( x ) λ y. λ x.to ( x, y ) BOSTON t a > PP L λ x.to ( x, BOSTON ) N \ N λ f. λ x.f ( x ) ∧ to ( x, BOSTON ) < N λ x.flight ( x ) ∧ to ( x, BOSTON ) > S λ x.flight ( x ) ∧ to ( x, BOSTON )

  40. Supervised Data Supervised learning is done from pairs of sentences and logical forms Show me flights to Boston λ x.flight ( x ) ∧ to ( x, BOSTON ) I need a flight from baltimore to seattle λ x.flight ( x ) ∧ from ( x, BALTIMORE ) ∧ to ( x, SEATTLE ) what ground transportation is available in san francisco λ x.ground transport ( x ) ∧ to city ( x, SF ) [Zettlemoyer and Collins 2005; 2007]

  41. Weak Supervision • Logical form is latent ! • “Labeling” requires less expertise ! • Labels don’t uniquely determine correct logical forms ! • Learning requires executing logical forms within a system and evaluating the result

  42. Weak Supervision Learning from Query Answers What is the largest state that borders Texas? New Mexico [Clarke et al. 2010; Liang et al. 2011]

  43. Weak Supervision Learning from Query Answers What is the largest state that borders Texas? New Mexico argmax ( λ x.state ( x ) ∧ border ( x, TX ) , λ y.size ( y )) argmax ( λ x.river ( x ) ∧ in ( x, TX ) , λ y.size ( y )) [Clarke et al. 2010; Liang et al. 2011]

  44. Weak Supervision Learning from Query Answers What is the largest state that borders Texas? New Mexico argmax ( λ x.state ( x ) New Mexico ∧ border ( x, TX ) , λ y.size ( y )) argmax ( λ x.river ( x ) Rio Grande ∧ in ( x, TX ) , λ y.size ( y )) [Clarke et al. 2010; Liang et al. 2011]

  45. Weak Supervision Learning from Query Answers What is the largest state that borders Texas? New Mexico argmax ( λ x.state ( x ) New Mexico ∧ border ( x, TX ) , λ y.size ( y )) argmax ( λ x.river ( x ) Rio Grande ∧ in ( x, TX ) , λ y.size ( y )) [Clarke et al. 2010; Liang et al. 2011]

  46. Weak Supervision Learning from Demonstrations at the chair, move forward three steps past the sofa [Chen and Mooney 2011; Kim and Mooney 2012; Artzi and Zettlemoyer 2013b]

  47. Weak Supervision Learning from Demonstrations at the chair, move forward three steps past the sofa Some examples from other domains: ! • Sentences and labeled game states [Goldwasser and Roth 2011] ! • Sentences and sets of physical objects [Matuszek et al. 2012] [Chen and Mooney 2011; Kim and Mooney 2012; Artzi and Zettlemoyer 2013b]

  48. Weak Supervision Learning from Conversation Logs how can I help you ? (OPEN_TASK) S YSTEM i ‘ d like to fly to new york U SER S YSTEM flying to new york . (CONFIRM: from(fl, ATL) ) leaving what city ? (ASK: λ x.from(fl,x) ) from boston on june seven with american airlines U SER S YSTEM flying to new york . (CONFIRM: to(fl, NYC) ) what date would you like to depart boston ? (ASK: λ x.date(fl,x) ∧ to(fl, BOS) ) june seventh U SER [ CONVERSATION CONTINUES ] [Artzi and Zettlemoyer 2011]

  49. Parsing Learning Modeling ! • Structured perceptron ! • A unified learning algorithm ! • Supervised learning ! • Weak supervision Online

  50. Structured Perceptron • Simple additive updates ! - Only requires efficient decoding ( argmax ) ! - Closely related to MaxEnt and other feature rich models ! - Provably finds linear separator in finite updates, if one exists ! • Challenge: learning with hidden variables

  51. Structured Perceptron • Simple additive updates ! - Only requires efficient decoding ( argmax ) ! - Closely related to MaxEnt and other feature rich models ! - Provably finds linear separator in finite updates, if one exists ! • Challenge: learning with hidden variables Derivations in the complete tutorial

  52. Hidden Variable Perceptron • No known convergence guarantees ! - Log-linear version is non-convex ! • Simple and easy to implement ! - Works well with careful initialization ! • Modifications for semantic parsing ! - Lots of different hidden information ! - Can add a margin constraint, do probabilistic version, etc.

  53. Unified Learning Algorithm • Handle various learning signals ! • Estimate parsing parameters ! • Induce lexicon structure ! • Related to loss-sensitive structured perceptron [Singh-Miller and Collins 2007]

Recommend


More recommend