neural symbolic machines
play

Neural Symbolic Machines Semantic Parsing on Freebase with Weak - PowerPoint PPT Presentation

Neural Symbolic Machines Semantic Parsing on Freebase with Weak Supervision Chen Liang, Jonathan Berant, Quoc Le, Kenneth Forbus, Ni Lao Overview Motivation: Semantic Parsing and Program Induction Neural Symbolic Machines Key-Variable


  1. Neural Symbolic Machines Semantic Parsing on Freebase with Weak Supervision Chen Liang, Jonathan Berant, Quoc Le, Kenneth Forbus, Ni Lao

  2. Overview ● Motivation: Semantic Parsing and Program Induction ● Neural Symbolic Machines ○ Key-Variable Memory ○ Code Assistance ○ Augmented REINFORCE ● Experiments and analysis

  3. Semantic Parsing: Language to Programs Answer T T N N E E T T A A L L Program / Natural Language Question/Instruction Logical Form Goal Full supervision (hard to collect) Weak supervision (easy to collect) [Berant, et al 2013; Liang 2013]

  4. Question Answering with Knowledge Base GO Largest city in US? (Hop V1 CityIn) NYC (Argmax V2 Population) RETURN Freebase, DBpedia, YAGO , NELL 1. Compositionality 2. Large Search Space Freebase: 23K predicates, 82M entities, 417M triplets

  5. WebQuestionsSP Dataset 5,810 questions Google Suggest API & Amazon MTurk 1 ● Remove invalid QA pairs 2 ● ● 3,098 training examples, 1,639 testing examples remaining ● Open-domain, and contains grammatical error ● Multiple entities as answer => macro-averaged F1 Multiple entities Grammatical error • What do Michelle Obama do for a living? writer, lawyer • What character did Natalie Portman play in Star Wars? Padme Amidala • What currency do you use in Costa Rica? Costa Rican colon • What did Obama study in school? political science • What killed Sammy Davis Jr? throat cancer [Berant et al, 2013; Yih et al, 2016]

  6. (Scalable) Neural Program Induction ● The learned operations are not as ● Impressive works to show NN can scalable and precise. learn addition and sorting, but... [Reed & Freitas 2015] ● Why not use existing modules that are scalable, precise and interpretable? [Zaremba & Sutskever 2016]

  7. Overview ● Motivation: Semantic Parsing and Program Induction ● Neural Symbolic Machines ○ Key-Variable Memory ○ Code Assistance ○ Augmented REINFORCE ● Experiments and analysis

  8. Neural Symbolic Machines Weak Neural Symbolic supervision Knowledge Base Question Program Manager Programmer Computer Answer Predefined Output Functions Abstract Scalable Precise Non-differentiable

  9. Simple Seq2Seq model is not enough ) Return Population ( Hop R2 R0 ) ( Argmax !CityIn Argmax Population ) R1 ) Largest city in US GO ( ( Hop !CityIn R0 1. Compositionality 2. Large Search Space 23K predicates, 82M entities, 417M triplets 2.Code Assistance 1.Key-Variable Memory 3.Augmented REINFORCE

  10. Overview ● Motivation: Semantic Parsing and Program Induction ● Neural Symbolic Machines ○ Key-Variable Memory ○ Code Assistance ○ Augmented REINFORCE ● Experiments and analysis

  11. Key-Variable Memory for Compositionality m.NYC Key Variable Key Variable Key Variable Execute Execute Execute ( Argmax R2 Population ) ( Hop R1 !CityIn ) v 1 R1(m.USA) v 1 R1(m.USA) ... ... Return v 2 R2(list of US cities) v 3 R3(m.NYC) Entity Resolver ) Return Population ( Hop ( R2 R1 ) Argmax !CityIn Argmax Population ) R2 ) Largest city in US ( GO ( !CityIn Hop R1 ● A linearised bottom-up derivation of the recursive program.

  12. Key-Variable Memory: Save Intermediate Value Key Variable Value (Embedding) (Symbol) (Data in Computer) V 0 R0 m.USA V 1 R1 [m.SF, m.NYC, ...] Expression is finished. Result ( Hop R0 !CityIn ) Computer Execution GO ( Hop R0 !CityIn

  13. Key-Variable Memory: Reuse Intermediate Value Key Variable Value (Embedding) (Symbol) (Data in Computer) V 0 R0 m.USA Softmax V 1 R1 [m.SF, m.NYC, ...] Neural Symbolic ) ( Argmax ) Argmax ( !CityIn

  14. Overview ● Motivation: Semantic Parsing and Program Induction ● Neural Symbolic Machines ○ Key-Variable Memory ○ Code Assistance ○ Augmented REINFORCE ● Experiments and analysis

  15. Code Assistance: Prune Search Space Pen and paper IDE

  16. Code Assistance: Syntactic Constraint Decoder Vocab V 0 R0 V 1 R1 Variables: <10 ... ... E 0 Hop Softmax E 1 Argmax Functions: <10 ... ... P 0 CityIn ( Predicates: 23K P 1 BornIn ... ... GO (

  17. Code Assistance: Syntactic Constraint Decoder Vocab Last token is ‘(’, so V 0 R0 has to output a V 1 R1 Variables: <10 function name next. ... ... E 0 Hop Softmax E 1 Argmax Functions: <10 ... ... P 0 CityIn ( Predicates: 23K P 1 BornIn ... ... GO (

  18. Code Assistance: Semantic Constraint Decoder Vocab V 0 R0 V 1 R1 Variables: <10 ... ... E 0 Hop Softmax E 1 Argmax Functions: <10 ... ... P 0 CityIn ( Hop R0 Predicates: 23K P 1 BornIn ... ... GO ( Hop R0

  19. Code Assistance: Semantic Constraint Decoder Vocab Given definition of Hop , need to output V 0 R0 a predicate that is V 1 R1 Variables: <10 connected to R2 ... ... ( m.USA ). E 0 Hop Softmax E 1 Argmax Functions: <10 ... ... P 0 CityIn ( Hop R0 Valid Predicates: Predicates: 23K P 1 BornIn <100 ... ... GO ( Hop R0

  20. Overview ● Motivation: Semantic Parsing and Program Induction ● Neural Symbolic Machines ○ Key-Variable Memory ○ Code Assistance ○ Augmented REINFORCE ● Experiments and analysis

  21. REINFORCE Training Samples Policy gradient Sampling update Updated 1. High variance 2. Cold start problem Requires a lot of Model Without supervised (expensive) samples pretraining, the gradients at the beginning

  22. Iterative Maximum Likelihood Training (Hard EM) Approximate Gold Programs Maximum likelihood Beam search update Updated 2.Lack of negative examples 1.Spurious program Model Mistake SibilingsOf for Mistake PlaceOfBirth ParentsOf . for PlaceOfDeath .

  23. Augmented REINFORCE (1 − α) Top k in beam Approximate α Gold Programs Policy gradient Beam search update 2. Mix in approximate gold 1.Reduce variance Updated programs to bootstrap and at the cost of bias Model stabilize training

  24. Overview ● Motivation: Semantic Parsing and Program Induction ● Neural Symbolic Machines ○ Key-Variable Memory ○ Code Assistance ○ Augmented REINFORCE ● Experiments and analysis

  25. Distributed Architecture ● 200 actors, 1 learner, 50 Knowledge Graph servers Actor 1 QA pairs 1 Solutions 1 KG server 1 Actor 2 QA pairs 2 Solutions 2 …... Learner …... …... …... KG server m Actor n QA pairs n Solutions n Model checkpoint

  26. Generated Programs ● Question : “what college did russell wilson go to?” ● Generated program : (hop v1 /people/person/education) (hop v2 /education/education/institution) (filter v3 v0 /common/topic/notable_types ) <EOP> In which v0 = “College/University” (m.01y2hnl) v1 = “Russell Wilson” (m.05c10yf) ● Distribution of the length of generated programs

  27. New State-of-the-Art on WebQuestionsSP ● First end-to-end neural network to achieve SOTA on semantic parsing with weak supervision over large knowledge base ● The performance is approaching SOTA with full supervision

  28. Augmented REINFORCE ● REINFORCE get stuck at local maxima ● Iterative ML training is not directly optimizing the F1 score ● Augmented REINFORCE obtains the best performances

  29. Weak Symbolic Neural supervision Knowledge Base Question Programs Manager Programmer Computer Predefined Answer Outputs Functions Key-Variable Code Augmented Memory Assistance REINFORCE Thanks!

  30. Backup Slides

  31. Semantic Parsing as Program Induction Learning classifiers Learning programs Semantic parsing: learning to write programs (given natural language instructions/questions) [Graves et al, 2016; Silicon Valley, Season 4]

  32. Related Topic: Neural Program Induction Learning classifiers Learning programs Semantic parsing: learning to write programs (given natural language instructions/questions) [Graves et al, 2016; Silicon Valley, Season 4]

  33. Iterative Maximum Likelihood Training Approximate Gold Programs Maximum Reward-Augmented Beam Search Likelihood Model 1.Spurious program 2.Lack of negative examples Mistake Mistake SibilingsOf for PlaceOfBirth ParentsOf . for PlaceOfDeath .

  34. Key-Variable Memory: Reuse Intermediate Value Key Variable Value (Embedding) (Symbol) (Data in Computer) V 0 R0 m.USA Softmax V 1 R1 [m.SF, m.NYC, ...] ) ( Argmax ) Argmax ( !CityIn

  35. Generated Programs ● Question : “what college did russell wilson go to?” ● Generated program : (hop v1 /people/person/education) (hop v2 /education/education/institution) (filter v3 v0 /common/topic/notable_types ) <EOP> In which v0 = “College/University” (m.01y2hnl) v1 = “Russell Wilson” (m.05c10yf) ● Distribution of the length of generated programs

  36. REINFORCE 1. High variance Requires a lot of (expensive) samples Repeat Sampling Learner Actor Policy gradient Samples 2. Bootstrap problem Small gradients at the beginning

  37. Iterative Maximum Likelihood Training 1.Spurious program Repeat Mistake PlaceOfBirth for PlaceOfDeath . Reward-Augmented Beam Search Learner Actor Maximum Likelihood Approximate Gold Programs 2.Lack of negative examples Mistake SibilingsOf for ParentsOf .

Recommend


More recommend