Structured Prediction Final words CS 6355: Structured Prediction 1

A look back • What is a structure? • The machine learning of interdependent variables 2

Recall: A working definition of a structure A structure is a concept that can be applied to any complex thing, whether it be a bicycle, a commercial company, or a carbon molecule. By complex , we mean: 1. It is divisible into parts, 2. There are different kinds of parts, 3. The parts are arranged in a specifiable way, and, 4. Each part has a specifiable function in the structure of the thing as a whole From the book Analysing Sentences: An Introduction to English Syntax by Noel Burton-Roberts, 1986. 3

An example task: Semantic Parsing Find the largest state in the US SELECT expression FROM table WHERE condition US_STATES US_CITIES name name MAX (numeric list) population population ORDERBY predicate size state capital DELETE FROM table WHERE condition SELECT expression FROM table Expression 1 = Expression 2 4

A plausible strategy to build the query Find the largest state in the US SELECT expression FROM table WHERE condition MAX numeric list US_CITIES US_STATES ORDERBY predicate name name DELETE FROM table WHERE condition population population SELECT expression FROM table size state Expression 1 = Expression 2 5 capital

A plausible strategy to build the query Find the largest state in the US SELECT expression FROM table WHERE condition SELECT expression FROM table WHERE condition MAX numeric list US_CITIES US_STATES ORDERBY predicate name name DELETE FROM table WHERE condition population population SELECT expression FROM table size state Expression 1 = Expression 2 6 capital

A plausible strategy to build the query Find the largest state in the US SELECT expression FROM table WHERE condition US_STATES SELECT expression FROM table WHERE condition MAX numeric list US_CITIES US_STATES ORDERBY predicate name name DELETE FROM table WHERE condition population population SELECT expression FROM table size state Expression 1 = Expression 2 7 capital

A plausible strategy to build the query Find the largest state in the US SELECT expression FROM table WHERE condition name US_STATES SELECT expression FROM table WHERE condition MAX numeric list US_CITIES US_STATES ORDERBY predicate name name DELETE FROM table WHERE condition population population SELECT expression FROM table size state Expression 1 = Expression 2 8 capital

A plausible strategy to build the query Find the largest state in the US SELECT expression FROM table WHERE condition name US_STATES Expression 1 = Expression 2 SELECT expression FROM table SELECT expression FROM table WHERE condition MAX numeric list US_CITIES US_STATES ORDERBY predicate name name DELETE FROM table WHERE condition population population SELECT expression FROM table size state Expression 1 = Expression 2 9 capital

A plausible strategy to build the query Find the largest state in the US SELECT expression FROM table WHERE condition name US_STATES Expression 1 = Expression 2 SELECT expression FROM table MAX numeric list SELECT expression FROM table WHERE condition MAX numeric list US_CITIES US_STATES ORDERBY predicate name name DELETE FROM table WHERE condition population population SELECT expression FROM table size state Expression 1 = Expression 2 10 capital

A plausible strategy to build the query Find the largest state in the US SELECT expression FROM table WHERE condition name US_STATES Expression 1 = Expression 2 SELECT expression FROM table MAX numeric list US_STATES SELECT expression FROM table WHERE condition MAX numeric list US_CITIES US_STATES ORDERBY predicate name name DELETE FROM table WHERE condition population population SELECT expression FROM table size state Expression 1 = Expression 2 11 capital

A plausible strategy to build the query Find the largest state in the US SELECT expression FROM table WHERE condition name US_STATES Expression 1 = Expression 2 SELECT expression FROM table size Or perhaps population? MAX numeric list US_STATES size SELECT expression FROM table WHERE condition MAX numeric list US_CITIES US_STATES ORDERBY predicate name name DELETE FROM table WHERE condition population population SELECT expression FROM table size state Expression 1 = Expression 2 12 capital

A plausible strategy to build the query Find the largest state in the US • At each step many, many decisions to make SELECT expression FROM table WHERE condition • Some decisions are simply not allowed name US_STATES Expression 1 = Expression 2 - A query has to be well formed! SELECT expression FROM table size • Even so, many possible options - Why does “Find” map to SELECT? Or perhaps population? MAX numeric list US_STATES - Largest by size/population/population of capital? size SELECT expression FROM table WHERE condition MAX numeric list US_CITIES US_STATES ORDERBY predicate name name DELETE FROM table WHERE condition population population SELECT expression FROM table size state Expression 1 = Expression 2 13 capital

Standard classification tools can’t predict structures X: “Find the largest state in the US.” Y: SELECT name FROM us_states WHERE size = (SELECT MAX(size) FROM us_states) Classification is about making one decision Spam or not spam, or predict one label, etc – We need to make multiple decisions Each part needs a label – Should “ US ” be mapped to us_states or us_cities? • • Should “ Find” be mapped to SELECT or DELETE? The decisions interact with each other – If the outer FROM clause talks about the table us_states, then the inner FROM clause should not talk • about utah_counties How to compose the fragments together to create the whole structure? – Should the output consist of a WHERE clause? What should go in it? • 14

How did we get here? Multiclass classification Different strategies Binary classification • Learning algorithms One-vs-all, all-vs-all • • Prediction is easy – Threshold Global learning algorithms • • One feature vector per outcome • Features (???) • Each outcome scored • Prediction = highest scoring outcome • Structured classification • Global models or local models Each outcome scored • Prediction = highest scoring outcome • • Inference is no longer easy! Makes all the difference • 15

Structured output is… Representation • A graph, possibly labeled and/or directed – Possibly from a restricted family, such as chains, trees, etc. – A discrete representation of input – Eg. A table, the SRL frame output, a sequence of labels etc • A collection of inter-dependent decisions Procedural – Eg: The sequence of decisions used to construct the output • The result of a combinatorial optimization problem Formally – argmax y 2 all outputs score( x , y ) 16

Challenges with structured output • Two challenges 1. We cannot train a separate weight vector for each possible inference outcome • For multiclass, we could train one weight vector for each label 1. We cannot enumerate all possible structures for inference • Inference for binary/multiclass is easy • Solution – Decompose the output into parts that are labeled – Define • how the parts interact with each other • how labels are scored for each part • an inference algorithm to assign labels to all the parts 17

Multiclass as a structured output • A structure is… • Multiclass – A graph with one node and – A graph (in general, no edges hypergraph), possibly labeled and/or directed • Node label is the output – A collection of inter- – Can be composed via multiple dependent decisions decisions – The output of a combinatorial – Winner-take-all optimization problem argmax i w T Á ( x , i) argmax y 2 all outputs score( x , y ) 18

Multiclass is a structure: Implications 1. A lot of the ideas from multiclass may be generalized to structures – Not always trivial, but useful to keep in mind 2. Broad statements about structured learning must apply to multiclass classification Useful for sanity check, also for understanding – 3. Binary classification is the most “trivial” form of structured classification Multiclass with two classes – 19

Structured Prediction The machine learning of interdependent variables 20

Computational issues Model definition What are the parts of the output? What are the inter-dependencies? Data annotation difficulty Background How to train the knowledge about How to do inference ? model? domain Semi- supervised/indirectly supervised? 21

Computational issues Model definition What are the parts of the output? What are the inter-dependencies? Data annotation difficulty Background How to train the knowledge about How to do inference ? model? domain Semi- supervised/indirectly supervised? 22

What does it mean to define the model? Say we want to predict four output variables from some input x y1 y2 y4 y3 23

What does it mean to define the model? Say we want to predict four output variables from some input Recall: Each factor is a x local expert about all the random variables connected to it i.e. A factor can assign y1 y2 y4 y3 a score to assignments of variables connected to it Option 1: Score each decision separately Pro: Prediction is easy, each y independent Con: No consideration of interactions 24

Structured Prediction Final words CS 6355: Structured Prediction 1 - PowerPoint PPT Presentation

Structured Prediction Final words CS 6355: Structured Prediction 1 A look back What is a structure? The machine learning of interdependent variables 2 Recall: A working definition of a structure A structure is a concept that can be

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

L101: Introduction to Structured Prediction Ryan Cotterell What is structured prediction?

Training Strategies CS 6355: Structured Prediction 1 So far we saw What is structured output

CSCE 496/896 Lecture 11: Structured Prediction and Structured Prediction and Probabilistic

Course Information CS 6355: Structured Prediction Building up structured output prediction

L101: Incremental structured prediction Structured prediction reminder Given an input x (e.g. a

Complex Prediction Problems A novel approach to multiple Structured Output Prediction Yasemin

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

CSCE 970 Lecture 8: Prediction Stephen Scott Structured Prediction and Vinod Variyam

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Structured Electronic Design Structured Electronic Design ET 8016 5 ECTS credits 1

Semi-structured data Data is not just text, but is not as well- Semi-structured data

Introduction to SparkSQL Structured Data Processing in Spark 1 Structured Data Processing A

Variational Inference for Tutorial Outline Structured NLP Models 1. Structured Models and Factor

CSEP504: Advanced topics in software systems Tonight: 2 nd of three lectures on software tools

The strength of weak models EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

Helping Jurors Overcome Jargon Practical Guidance: A.B.A. Principles for Juries & Jury

Obtaining Relevant Juror Information Effectively Judge-Lawyer Collaboration [E]nsurethe

Suppression of maximal linear gluon polarization in angular asymmetries Danil Boer REF 2017,

Household of Faith Biblical Principles for Family Life I. What is the family and why is

COMP 110-003 Introduction to Programming Midterm Solutions March 19, 2013 Haohan Li TR 11:00

Automated Reasoning Jacques Fleuriot September 14, 2013 1 / 26 Lecture 6 Representation