1 Scenario I: Semantic Parsing [CoNLL 10 ,ACL 11 ] Scenario II. The - PDF document

Can we rely on this interaction to provide supervision? Connecting Language to the World Can I get a coffee with sugar and no milk Learning Great ! from Natural Instructions Arggg Semantic Parser MAKE(COFFEE,SUGAR=YES,MILK=NO) Dan Roth  Coding Instructions & Traditional “example based” ML Department of Computer Science  Require deep understanding of the system University of Illinois at Urbana-Champaign  Annotation burden With thanks to:  Instructable computing Collaborators: Ming-Wei Chang, James Clarke, Dan Goldwasser, Michael Connor July 2011  Natural communication between teacher/agent Vivek Srikumar, Many others ALIHT-2011, IJCAI, Barcelona, Spain Funding: NSF: ITR IIS-0085836, SoD-HCER-0613885, DHS; NIH DARPA: Bootstrap Learning & Machine Reading Programs DASH Optimization (Xpress-MP) Page 2 Scenarios I: Understanding Instructions [IJCAI’11] What to Learn from Natural Instructions?  Understanding Games’ Instructions  Two conceptual ways to think about learning from instructions A top card can be moved to the tableau if it has a different color than the color of the  (i) Learn directly to play the game [EMNLP’09; Barziley et. al 10,11] top tableau card, and the card have  Consults the natural language instructions successive values.  Use them as a way to improve your feature based representation  (ii) Learn to interpret a natural language lesson [IJCAI’ 11]  Allow a human teacher to interact with an automated learner  (And jointly) how to use this interpretation to do well on the final task. using natural instructions  Will this help generalizing to other games?  Agonstic of agent's internal representations  Semantic Parsing into some logical representation is a  Contrasts with traditional 'example-based' ML necessary intermediate step  Learn how to semantically parse from task level feedback  Evaluate at the task, rather than the representation level Page 4 1

Scenario I’: Semantic Parsing [CoNLL’ 10 ,ACL’ 11 …] Scenario II. The language-world mapping problem “the world” [IJCAI’11, ACL’10,…] X : “What is the largest state that borders New York and Maryland ?" How do we acquire language?  Y: largest( state( next_to( state(NY)) AND next_to (state(MD))))  Successful interpretation involves multiple decisions “the language”  What entities appear in the interpretation?  “New York” refers to a state or a city?  How to compose fragments together? [Topid rivvo den marplox.]  state(next_to()) >< next_to(state())  Question: How to learn to semantically parse from “task  Is it possible to learn the meaning of verbs from natural, level” feedback. behavior level, feedback? (no intermediate representation) Page 5 Page 6 Outline Interpret Language Into An Executable Representation  Background: NL Structure with Integer Linear Programming X : “What is the largest state that borders New York and Maryland ?"  Global Inference with expressive structural constraints in NLP Y: largest( state( next_to( state(NY) AND next_to (state(MD))))  Constraints Driven Learning with Indirect Supervision  Successful interpretation involves multiple decisions  Training Paradigms for latent structure  What entities appear in the interpretation?  Indirect Supervision Training with latent structure (NAACL’10)  “New York” refers to a state or a city?  Training Structure Predictors by Inventing binary labels (ICML’10)  How to compose fragments together?  Response based Learning  state(next_to()) >< next_to(state())  Driving supervision signal from World’s Response (CoNLL’10,IJCAI’11)  Semantic Parsing ; playing Freecell; Language Acquisition  Question: How to learn to semantically parse from “task level” feedback. Page 7 Page 8 2

Constrained Conditional Models (aka ILP Inference) Learning and Inference in NLP Penalty for violating the constraint.  Natural Language Decisions are Structured  Global decisions in which several local decisions play a role but there (Soft) constraints component are mutual dependencies on their outcome. Weight Vector for “local” models  It is essential to make coherent decisions in a way that takes How far y is from Features, classifiers; log- a “legal” assignment the interdependencies into account. Joint, Global Inference . linear models (HMM, CRF) or a combination  But: Learning structured models requires annotating structures. How to solve? How to train? This is an Integer Linear Program Training is learning the objective  Interdependencies among decision variables should be function exploited in Decision Making (Inference) and in Learning. Solving using ILP packages gives an exact solution. Decouple? Decompose?  Goal: learn from minimal, indirect supervision  Amplify it using interdependencies among variables Cutting Planes, Dual Decomposition & How to exploit the structure to other search techniques are possible minimize supervision? Examples: CCM Formulations (aka ILP for NLP) Three Ideas Modeling  Idea 1: Separate modeling and problem formulation from algorithms CCMs can be viewed as a general interface to easily combine  Similar to the philosophy of probabilistic modeling declarative domain knowledge with data driven statistical models Inference  Idea 2: Formulate NLP Problems as ILP problems (inference may be done otherwise) 1. Sequence tagging (HMM/CRF + Global constraints) Keep model simple, make expressive decisions (via constraints) 2. Sentence Compression (Language Model + Global Constraints)  Unlike probabilistic modeling, where models become more expressive 3. SRL (Independent classifiers + Global Constraints) Sequential Prediction Sentence Linguistics Constraints Linguistics Constraints  Idea 3: Learning Compression/Summarization: Expressive structured decisions can be supervised indirectly via HMM/CRF based: Cannot have both A states and B states Argmax  ¸ ij x ij Language Model based: in an output sequence. If a modifier chosen, include its head related simple binary decisions Argmax  ¸ ijk x ijk If verb is chosen, include its arguments  Global Inference can be used to amplify the minimal supervision. 3

Any Boolean rule can be encoded as a (collection of) linear constraints. Information extraction without Prior Knowledge Example: Sequence Tagging LBJ: allows a developer to encode Lars Ole Andersen . Program am analy alysis and special ializ izat ation for the constraints in FOL, to be compiled Example: the man saw the dog HMM / CRF: C Program amming ing languag age. PhD thesis. is. DIKU , into linear inequalities automatically. D D D D D n ¡ 1 Y y ¤ = argmax Universit ity of Cope penhag agen, May 1994 . N N N N N P ( y 0 ) P ( x 0 j y 0 ) P ( y i j y i ¡ 1 ) P ( x i j y i ) y 2Y i =1 A A A A A As an ILP: V V V V V Prediction result of a trained HMM Lars Ole Andersen . Program analysis and X [AUTHOR] 1 f y 0 = y g = 1 Discrete predictions specialization for the [TITLE] E] y 2Y X C [EDITOR] R] 8 y; 1 f y 0 = y g = 1 f y 0 = y ^ y 1 = y 0 g Programming language y 0 2 Y [BOOKTITLE] E] output consistency X X 8 y; i > 1 1 f y i ¡ 1 = y 0 ^ y i = y g = 1 f y i = y ^ y i + 1 = y 00 g . PhD thesis . [TECH-REP REPORT RT] y 0 2Y y 00 2Y DIKU , University of Copenhagen , May [INST STITUTION] n ¡ 1 X X 1994 . E] [DATE] 1 f y 0 = \V" g + 1 f y i ¡ 1 = y ^ y i = \V" g ¸ 1 Other constraints i =1 y 2Y Violates lots of natural constraints! Strategies for Improving the Results Examples of Constraints  (Pure) Machine Learning Approaches  Each field must be a consecutive list of words and can appear at most once in a citation.  Higher Order HMM/CRF? Increasing the model complexity  Increasing the window size?  Adding a lot of new features State transitions must occur on punctuation marks.   Requires a lot of labeled examples The citation can only start with AUTHOR or EDITOR .   What if we only have a few labeled examples? Can we keep the learned model simple and The words pp., pages correspond to PAGE.  still make expressive decisions?  Four digits starting with 20xx and 19xx are DATE .  Other options? Quotations can appear only in TITLE  Easy to express pieces of “knowledge”  Constrain the output to make sense …….   Push the (simple) model in a direction that makes sense Non Propositional; May use Quantifiers Page 15 Page 16 4

1 Scenario I: Semantic Parsing [CoNLL 10 ,ACL 11 ] Scenario II. The - PDF document

Can we rely on this interaction to provide supervision? Connecting Language to the World Can I get a coffee with sugar and no milk Learning Great ! from Natural Instructions Arggg Semantic Parser MAKE(COFFEE,SUGAR=YES,MILK=NO) Dan Roth

List edge multicoloring in bounded cyclicity graphs Dniel Marx Budapest University of

Complexity Theory J org Kreiker Chair for Theoretical Computer Science Prof. Esparza TU M

References Gentry, C., A fully homomorphic encryption scheme , Ph.D. Thesis, 1 Standford

Zero-Knowledge Against Quantum Attacks John Watrous Department of Computer Science University of

Proof-of-work Certificates for High Complexity Computations for Linear Algebra Erich L. Kaltofen

Symmetric encrypBon CS642: Computer Security Professor

Structured Encryption Seny Kamara Microsoft Research Lei Wei University of North Carolina

Session #6: Another Application of LWE: Pseudorandom Functions Chris Peikert Georgia Institute

Driving Semantic Parsing from the Worlds Response James Clarke , Dan Goldwasser, Ming-Wei

Ou Outso source urced Co Comp mputatio tation Graham Cormode G.Cormode@warwick.ac.uk Amit

Physical Zero-Knowledge Proofs for Akari, Takuzu, Kakuro and KenKen X. Bultel 1 J. Dreier 2 J-G.

r

Certification of Minimal Approximant Bases Pascal Giorgi 1 , Vincent Neiger 2 1 Universit 2

-Testing Sofya Raskhodnikova Penn State University, visiting Boston University and

Polychromatic Colorings of Complete Graphs with Respect to 1-,2-factors and Hamiltonian Cycles

Agenda Announcements Course Evaluation Tuples Hangman help 1/14/2013 CompSci101

Machine Learning Classification over Encrypted Data Raphal Bost Raluca Ada Popa,

Shafi Goldwasser, Michael Sipser Shafi Goldwasser, Michael Sipser l (| w |) (uniformly)

Computer-aided cryptographic proofs Gilles Barthe MPI-SP , Germany IMDEA Software Institute,

The (all guards move) Eternal Domination number for 3 n Grids Margaret-Ellen Messinger Mount

ANALYZING YZING MA MASS SSIVE VE DATASETS SETS WITH ITH MI MISSI SSING NG ENTRIES TRIES

Accelerating LTV Based Homomorphic Encryption in Reconfigurable Hardware Yark n Dorz, Erdin

Interactive Verifiable Polynomial Evaluation Saeid Sahraei, Mohammad Ali Maddah-Ali and Salman

How Fast Indexing Makes Databases Greener Martin Farach-Colton Michael A. Bender Rutgers and