wrapup ie qa and dialog
play

Wrapup: IE, QA, and Dialog Mausam Grading 50% 40% project 20% - PowerPoint PPT Presentation

Wrapup: IE, QA, and Dialog Mausam Grading 50% 40% project 20% final exam 15% 20% regular reviews 15% 10% midterm survey 10% presentation Extra credit: participation Plan (1 st half of the course) Classical


  1. Wrapup: IE, QA, and Dialog Mausam

  2. Grading • 50% 40% project • 20% final exam • 15% 20% regular reviews • 15% 10% midterm survey • 10% presentation • Extra credit: participation

  3. Plan (1 st half of the course) • Classical papers/problems in IE: Bootstrapping, NELL, Open IE • Important techniques for IE: CRFs, tree kernels, distant supervision, joint inference, deep learning, reinforcement learning • IE++ • coreference • paraphrases • inference Plan (2 nd half of the course) • QA: • Conversational agents:

  4. Plan (1 st half++ of the course) • Classical papers/problems in IE: Bootstrapping, NELL, Open IE • Important techniques for IE: Semi-CRFs, tree kernels, distant supervision, joint inference, topic models, deep learning (CNNs), reinforcement learning • IE++: coreference • paraphrases • Inference: random walks, neural models Plan (2 nd half of the course) • QA: open QA, semantic parsing. LSTM, attention, more attention, Recursive NN, deep feature fusion network • Conversational agents: Gen. Hierarchical nets, GANs, MemNets

  5. NLP (or any application course) • Techniques/Models • Problems • Bootstrapping • NER • (coupled) Semi-SSL • Entity/Rel/Event Extraction • PGMs: semi-CRF, MultiR, LDA • Open Rel/Event Extraction • Tree Kernels • Multi-task learning • Multi-instance learning • KB inference • Random walks over graphs • Open QA • Reinforcement learning • Machine comprehension • Task-oriented dialog w/ KB • CNN, LSTM, Bi-LSTM, Recursive NN • General dialog • Attention, MemNets, • GANs

  6. How much data? • Large supervised dataset: supervised learning • Trick to compute large supervised dataset w/o noise • Semi-CRF, Twit-NER/POS, QuizBowl, SQUaD QA, CNN QA, Movies, Ubuntu, OQA, random walks… (negative data can be artificial) • Small supervised dataset: semi-supervised learning • Bootstrapping, co-training, Graph-based SSL • No supervised dataset: unsupervised learning/rules • TwitIE • ReVerb • Trick to compute large supervised dataset with noise: distant supervision • MultiR, PCNNs

  7. Non-deep L Ideas: Semi-supervised • Bootstrapping • (in a loop) automatic generation of training data by matching known facts • Multi-view / Multi-task co-training • Constraints between tasks; Agreement between multiple classifiers for same concept • Graph-based SSL • Agreement between nodes of the graph

  8. Non-deep L Ideas: distant supervision • KB of facts: known. Extraction supervision: unknown • Bootstrap a training dataset: matching sentences with facts • Hypothesis 1: all such sentences are positive training for a fact: NOISY • Hypothesis 2: all such sentences form a bag. Each bag must have a unique relation: BETTER • Hypothesis 3: each bag can have multiple labels: EVEN BETTER • Multi-Instance Learning • Noisy OR in PGMs • maximize the max probability in the bag

  9. Non-deep L Ideas: No Intermediate Supervision • QA tasks: (Question, Answer) pairs known; inference chain: unknown • Distant Supervision: KB fact known; which sentence to extract from: unknown • OQA (which proof is better is not known) • Random walk inference (which path is better is not known) • MultiR (which sentence in corpus is not known) • Approach • create a model for scoring each path/proof using weights on properties of each constituent • train using known supervision (perceptron style updates) • Differences: OQA scores each edge separately, PRA scores path; MultiR – mil.

  10. Non-deep L Ideas: Sparsity • Tree Kernels: two features (paths) are similar if one has many constituent elements with the other. Similarity weighted by penalty to non-similar elements • Paraphrase dataset for QA • Open relations as supplements in KB inference

  11. Deep Learning Models • Convolutional NNs • Handle fixed length contexts • Recurrent NNs • Handle small variable length histories • LSTMs/GRUs • Handle larger variable length histories • Bi-LSTMs • Handle larger variable length histories and futures • Recursive NNs • Handle variable length partially ordered histories

  12. Deep Learning Models (contd) • Hierarchical Recurrent NNs • RNN over RNNs • Attention models • attach non-uniform importance to histories based on evidence (question) • Co-attention models • attach non-uniform importances to histories in two different NNs • MemNets • add an external storage with explicit read, write, updates • Generative Adversarial Nets • a better training procedure using actor-critic architecture

  13. Hierarchical Models • Semi-CRFs: joint segmentation and labeling • Sentence is a sequence of segments, which are sequence of words • Allows segment level features to be added • HRED: LSTM over LSTM • Document is a sequence of sentences, which is a sequence of words • Conversation is a sequence of utterances, which is a sequence of words

  14. RL for Text • Two uses • Use 1: search the Web to find easy documents for IE • Use 2: Policy gradient algorithm for updating weights for generator in GANs.

  15. Bootstrapping • [Akshay] Fuzzy matching between seed tuples and text • [Shantanu] Named entity tags in patterns • [Gagan, Barun] Confidence level for each pattern and fact • Semantic drift

  16. NELL • Never-ending/lifelong learning • Human supervision to guide the learning • [many] multi-view multi-task co-training • [many] coupling constraints for high precision. • [Dinesh] ontology to define the constraints

  17. Open IE • [many] ontology-free, scalablity • [Surag] data-driven research through extensive error analysis • [Dinesh] reusing datasets from one task to another • [Partha] open relations as supplementary knowledge to reduce sparsity

  18. Tree Kernels • [Shantanu] major info about the relation lies in the shortest path of the dependency parse

  19. Semi-CRFs • [many] segment level features in CRF • [Dinesh] joint segmentation and labeling ? • Order L CRFs vs Semi-CRFs

  20. MultiR • [Rishab] Use of KB to create a training set • [Surag] multi-instance learning in PGMs • [Akshay] relationship between sentence-level and aggregate extractions • [Gagan] Vitterbi approximation (replace expectation with max)

  21. PCNNs • [Haroun] Max pooling to make layers independent of sentence size • [Akshay] Piecewise max pooling to capture arg1, rel, arg2 • [Akshay] Multi-instance learning in neural nets • Positional embeddings

  22. TwitIE • [Haroun] tweets are challenging, but redundancy is good • [Dinesh] G 2 test for ranking entities for a given date • [Shantanu] event type discovery using topic models

  23. RL for IE • [many] active querying for gathering external evidence

  24. PRA for KB inference • [Haroun, Akshay] low variance sampling • [Arindam] learning non-functional relations • [Nupur] paths as features in a learning model

  25. Joint MF-TF • [Akshay, Shantanu] OOV handling • [Nupur] loss function in joint modeling

  26. Open QA • [Surag] structured perceptron in a pipeline model • [Akshay] paraphrase corpus for question rewriting • [Shantanu] mining paraphrase operators from corpus • [Arindam] decomposition of scoring over derivation steps

  27. LSTMs • [Haroun] attention > depth • [Akshay] cool way to construct the dataset • [Dinesh] two types of readers

  28. Co-attention • [many] iterative refinement of answer span selection*

  29. HRED • [Akshay] pretraining dialog model with a QA dataset • [Arindam] passing intermediate context improves coherence? • [Barun] split of local dialog generator and global state tracker

  30. MSQU • [many] partially annotated data • [many] natural language -> SQL

  31. GANs • [many] teacher forcing • [Akshay] interesting heuristics • [Arindam] discriminator feedback can be backpropagated despite being non-differentiable

  32. MemNets • [Surag] typed OOVs • [Haroun] hops • [Shantanu, Gagan] subtask-styled evaluation

  33. Open/Next Issues • IE: mature? • Event extraction • Temporal extraction • Rapid retargettability • KB Inference • Long way to go • Combining DL and path-based models

  34. Open/Next Issues • QA systems • Dataset driven research: [MC] SQUaD – tremendous progress • Answering in the wild: not clear (large answer spaces?) • Deep learning for large-scale QA • Conversational agents • [Task driven] how to get DL model to issue a variety of queries • [General] how to get the system to say something interesting? • DL: what are the systems really capturing!?

  35. Conclusions • Learn key historical developments in IE • Learn (some) state of the art in IE, inference, QA and dialog • Learn how to critique strengths and weaknesses of a paper • Learn how to brainstorm next steps and future directions • Learn how to summarize an advanced area of research • Learn to do research at the cutting edge

  36. Exam • Bring a laptop • Internet enabled • PDFLatex enabled • Bring a mobile • Taking a picture • Extension cords • It is ok even if you have not deeply understood every paper

  37. Project Presentations • Motivation & Problem definition • 1 Slide of Contribution • Background • Technical Approach • Experiments • Analysis • Conclusions • Future Work

Recommend


More recommend