Wrapup: IE, QA, and Dialog Mausam Grading 50% 40% project 20% - PowerPoint PPT Presentation

Wrapup: IE, QA, and Dialog Mausam

Grading • 50% 40% project • 20% final exam • 15% 20% regular reviews • 15% 10% midterm survey • 10% presentation • Extra credit: participation

Plan (1 st half of the course) • Classical papers/problems in IE: Bootstrapping, NELL, Open IE • Important techniques for IE: CRFs, tree kernels, distant supervision, joint inference, deep learning, reinforcement learning • IE++ • coreference • paraphrases • inference Plan (2 nd half of the course) • QA: • Conversational agents:

Plan (1 st half++ of the course) • Classical papers/problems in IE: Bootstrapping, NELL, Open IE • Important techniques for IE: Semi-CRFs, tree kernels, distant supervision, joint inference, topic models, deep learning (CNNs), reinforcement learning • IE++: coreference • paraphrases • Inference: random walks, neural models Plan (2 nd half of the course) • QA: open QA, semantic parsing. LSTM, attention, more attention, Recursive NN, deep feature fusion network • Conversational agents: Gen. Hierarchical nets, GANs, MemNets

NLP (or any application course) • Techniques/Models • Problems • Bootstrapping • NER • (coupled) Semi-SSL • Entity/Rel/Event Extraction • PGMs: semi-CRF, MultiR, LDA • Open Rel/Event Extraction • Tree Kernels • Multi-task learning • Multi-instance learning • KB inference • Random walks over graphs • Open QA • Reinforcement learning • Machine comprehension • Task-oriented dialog w/ KB • CNN, LSTM, Bi-LSTM, Recursive NN • General dialog • Attention, MemNets, • GANs

How much data? • Large supervised dataset: supervised learning • Trick to compute large supervised dataset w/o noise • Semi-CRF, Twit-NER/POS, QuizBowl, SQUaD QA, CNN QA, Movies, Ubuntu, OQA, random walks… (negative data can be artificial) • Small supervised dataset: semi-supervised learning • Bootstrapping, co-training, Graph-based SSL • No supervised dataset: unsupervised learning/rules • TwitIE • ReVerb • Trick to compute large supervised dataset with noise: distant supervision • MultiR, PCNNs

Non-deep L Ideas: Semi-supervised • Bootstrapping • (in a loop) automatic generation of training data by matching known facts • Multi-view / Multi-task co-training • Constraints between tasks; Agreement between multiple classifiers for same concept • Graph-based SSL • Agreement between nodes of the graph

Non-deep L Ideas: distant supervision • KB of facts: known. Extraction supervision: unknown • Bootstrap a training dataset: matching sentences with facts • Hypothesis 1: all such sentences are positive training for a fact: NOISY • Hypothesis 2: all such sentences form a bag. Each bag must have a unique relation: BETTER • Hypothesis 3: each bag can have multiple labels: EVEN BETTER • Multi-Instance Learning • Noisy OR in PGMs • maximize the max probability in the bag

Non-deep L Ideas: No Intermediate Supervision • QA tasks: (Question, Answer) pairs known; inference chain: unknown • Distant Supervision: KB fact known; which sentence to extract from: unknown • OQA (which proof is better is not known) • Random walk inference (which path is better is not known) • MultiR (which sentence in corpus is not known) • Approach • create a model for scoring each path/proof using weights on properties of each constituent • train using known supervision (perceptron style updates) • Differences: OQA scores each edge separately, PRA scores path; MultiR – mil.

Non-deep L Ideas: Sparsity • Tree Kernels: two features (paths) are similar if one has many constituent elements with the other. Similarity weighted by penalty to non-similar elements • Paraphrase dataset for QA • Open relations as supplements in KB inference

Deep Learning Models • Convolutional NNs • Handle fixed length contexts • Recurrent NNs • Handle small variable length histories • LSTMs/GRUs • Handle larger variable length histories • Bi-LSTMs • Handle larger variable length histories and futures • Recursive NNs • Handle variable length partially ordered histories

Deep Learning Models (contd) • Hierarchical Recurrent NNs • RNN over RNNs • Attention models • attach non-uniform importance to histories based on evidence (question) • Co-attention models • attach non-uniform importances to histories in two different NNs • MemNets • add an external storage with explicit read, write, updates • Generative Adversarial Nets • a better training procedure using actor-critic architecture

Hierarchical Models • Semi-CRFs: joint segmentation and labeling • Sentence is a sequence of segments, which are sequence of words • Allows segment level features to be added • HRED: LSTM over LSTM • Document is a sequence of sentences, which is a sequence of words • Conversation is a sequence of utterances, which is a sequence of words

RL for Text • Two uses • Use 1: search the Web to find easy documents for IE • Use 2: Policy gradient algorithm for updating weights for generator in GANs.

Bootstrapping • [Akshay] Fuzzy matching between seed tuples and text • [Shantanu] Named entity tags in patterns • [Gagan, Barun] Confidence level for each pattern and fact • Semantic drift

NELL • Never-ending/lifelong learning • Human supervision to guide the learning • [many] multi-view multi-task co-training • [many] coupling constraints for high precision. • [Dinesh] ontology to define the constraints

Open IE • [many] ontology-free, scalablity • [Surag] data-driven research through extensive error analysis • [Dinesh] reusing datasets from one task to another • [Partha] open relations as supplementary knowledge to reduce sparsity

Tree Kernels • [Shantanu] major info about the relation lies in the shortest path of the dependency parse

Semi-CRFs • [many] segment level features in CRF • [Dinesh] joint segmentation and labeling ? • Order L CRFs vs Semi-CRFs

MultiR • [Rishab] Use of KB to create a training set • [Surag] multi-instance learning in PGMs • [Akshay] relationship between sentence-level and aggregate extractions • [Gagan] Vitterbi approximation (replace expectation with max)

PCNNs • [Haroun] Max pooling to make layers independent of sentence size • [Akshay] Piecewise max pooling to capture arg1, rel, arg2 • [Akshay] Multi-instance learning in neural nets • Positional embeddings

TwitIE • [Haroun] tweets are challenging, but redundancy is good • [Dinesh] G 2 test for ranking entities for a given date • [Shantanu] event type discovery using topic models

RL for IE • [many] active querying for gathering external evidence

PRA for KB inference • [Haroun, Akshay] low variance sampling • [Arindam] learning non-functional relations • [Nupur] paths as features in a learning model

Joint MF-TF • [Akshay, Shantanu] OOV handling • [Nupur] loss function in joint modeling

Open QA • [Surag] structured perceptron in a pipeline model • [Akshay] paraphrase corpus for question rewriting • [Shantanu] mining paraphrase operators from corpus • [Arindam] decomposition of scoring over derivation steps

LSTMs • [Haroun] attention > depth • [Akshay] cool way to construct the dataset • [Dinesh] two types of readers

Co-attention • [many] iterative refinement of answer span selection*

HRED • [Akshay] pretraining dialog model with a QA dataset • [Arindam] passing intermediate context improves coherence? • [Barun] split of local dialog generator and global state tracker

MSQU • [many] partially annotated data • [many] natural language -> SQL

GANs • [many] teacher forcing • [Akshay] interesting heuristics • [Arindam] discriminator feedback can be backpropagated despite being non-differentiable

MemNets • [Surag] typed OOVs • [Haroun] hops • [Shantanu, Gagan] subtask-styled evaluation

Open/Next Issues • IE: mature? • Event extraction • Temporal extraction • Rapid retargettability • KB Inference • Long way to go • Combining DL and path-based models

Open/Next Issues • QA systems • Dataset driven research: [MC] SQUaD – tremendous progress • Answering in the wild: not clear (large answer spaces?) • Deep learning for large-scale QA • Conversational agents • [Task driven] how to get DL model to issue a variety of queries • [General] how to get the system to say something interesting? • DL: what are the systems really capturing!?

Conclusions • Learn key historical developments in IE • Learn (some) state of the art in IE, inference, QA and dialog • Learn how to critique strengths and weaknesses of a paper • Learn how to brainstorm next steps and future directions • Learn how to summarize an advanced area of research • Learn to do research at the cutting edge

Exam • Bring a laptop • Internet enabled • PDFLatex enabled • Bring a mobile • Taking a picture • Extension cords • It is ok even if you have not deeply understood every paper

Project Presentations • Motivation & Problem definition • 1 Slide of Contribution • Background • Technical Approach • Experiments • Analysis • Conclusions • Future Work

Wrapup: IE, QA, and Dialog Mausam Grading 50% 40% project 20% - PowerPoint PPT Presentation

Wrapup: IE, QA, and Dialog Mausam Grading 50% 40% project 20% final exam 15% 20% regular reviews 15% 10% midterm survey 10% presentation Extra credit: participation Plan (1 st half of the course) Classical

Advanced NLU & Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

Speech Processing 15-492/18-492 Spoken Dialog Systems Advanced Concepts in Dialog Spoken Dialog

Beyond Semantics Wrapup Annual meeting of the DGfS AG 1 Gttingen, 2011 Wrapup Bonnie

Frontend Wrapup COMP 520: Compiler Design (4 credits) Alexander Krolik

AI DIALOG SEARCH news services Josef Krupi ka Michal Svoboda Goals dialog system

Dialog Models 11-716 September 18, 2003 Thomas Harris What is a (dialog) model? A model is

Dialog Management EE596/LING580 -- Conversational Artificial Intelligence Hao Cheng University

Wrapping Up Ling575 Spoken Dialog Systems June 5, 2013 Roadmap Overview Distinctive

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

Dialogue and Conversational Agents Ling575 Spoken Dialog Systems April 2, 2015 Roadmap

Dialogue and Conversational Agents Ling575 Spoken Dialog Systems March 29, 2017 Roadmap

Dialogue and Conversational Agents Ling575 Spoken Dialog Systems March 31, 2016 Roadmap

Di Dialog Syste tems an and Visu Visual l Dia ialo log Sayyed Nezhadi CSC2539 Feb 2017

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 15-492/18-492 Spoken Dialog Systems SDS components Spoken Dialog Systems More

Dialog with USAID: Dialog with USAID: Fistula Research Priorities Fistula Research Priorities

Recurrent Neural Networks: Stability analysis and LSTMs M. Soleymani Sharif University of

About me... Musician, Electrical Engineer, Mixing/Mastering Engineer Studied audio DSP at

Neural Network Training: Old & New Tricks Old: (80s) Stochastic Gradient Descent,

NLP with recurrent networks Chapter 9 in Martin/Jurafsky Feed-forward networks for text

SYNTAX PROCESSING Statistical Natural Language Processing 23.04.19 1 Syntax, Grammars, Parsing

CS 444/544 Intro to Cybersecurity Jed Crandall crandall@cs.unm.edu A little about me

Programming of Interactive Systems Anastasia.Bezerianos@lri.fr 1 A.Bezerianos - Intro ProgIS -

GSP Coordination Committee ing December Coor Coordina dination tion Commit Committee tee

Wrapup: IE, QA, and Dialog Mausam Grading 50% 40% project 20% - PowerPoint PPT Presentation

Wrapup: IE, QA, and Dialog Mausam Grading 50% 40% project 20% final exam 15% 20% regular reviews 15% 10% midterm survey 10% presentation Extra credit: participation Plan (1 st half of the course) Classical

Advanced NLU &amp; Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

Speech Processing 15-492/18-492 Spoken Dialog Systems Advanced Concepts in Dialog Spoken Dialog

Beyond Semantics Wrapup Annual meeting of the DGfS AG 1 Gttingen, 2011 Wrapup Bonnie

Frontend Wrapup COMP 520: Compiler Design (4 credits) Alexander Krolik

AI DIALOG SEARCH news services Josef Krupi ka Michal Svoboda Goals dialog system

Dialog Models 11-716 September 18, 2003 Thomas Harris What is a (dialog) model? A model is

Dialog Management EE596/LING580 -- Conversational Artificial Intelligence Hao Cheng University

Wrapping Up Ling575 Spoken Dialog Systems June 5, 2013 Roadmap Overview Distinctive

SDS: ASR, NLU, &amp; VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

Dialogue and Conversational Agents Ling575 Spoken Dialog Systems April 2, 2015 Roadmap

Dialogue and Conversational Agents Ling575 Spoken Dialog Systems March 29, 2017 Roadmap

Dialogue and Conversational Agents Ling575 Spoken Dialog Systems March 31, 2016 Roadmap

Di Dialog Syste tems an and Visu Visual l Dia ialo log Sayyed Nezhadi CSC2539 Feb 2017

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 15-492/18-492 Spoken Dialog Systems SDS components Spoken Dialog Systems More

Dialog with USAID: Dialog with USAID: Fistula Research Priorities Fistula Research Priorities

Recurrent Neural Networks: Stability analysis and LSTMs M. Soleymani Sharif University of

About me... Musician, Electrical Engineer, Mixing/Mastering Engineer Studied audio DSP at

Neural Network Training: Old &amp; New Tricks Old: (80s) Stochastic Gradient Descent,

NLP with recurrent networks Chapter 9 in Martin/Jurafsky Feed-forward networks for text

SYNTAX PROCESSING Statistical Natural Language Processing 23.04.19 1 Syntax, Grammars, Parsing

CS 444/544 Intro to Cybersecurity Jed Crandall crandall@cs.unm.edu A little about me

Programming of Interactive Systems Anastasia.Bezerianos@lri.fr 1 A.Bezerianos - Intro ProgIS -

GSP Coordination Committee ing December Coor Coordina dination tion Commit Committee tee

Advanced NLU & Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

Neural Network Training: Old & New Tricks Old: (80s) Stochastic Gradient Descent,