Learning to search: General setting Predicting an output π³ as a sequence of decisions General data structures β State: Partial assignments to (π§ 1 , π§ 2 , β¦ , π§ π ) β Initial state: Empty assignments (β, β, β¦ , β) β Actions: Pick a π§ π component and assign an label to it β Transition model: Move from one partial structure to another β Goal test: Whether all π§ components are assigned β’ A goal state does not need to be optimal 30
Learning to search: General setting Predicting an output π³ as a sequence of decisions General data structures β State: Partial assignments to (π§ 1 , π§ 2 , β¦ , π§ π ) β Initial state: Empty assignments (β, β, β¦ , β) β Actions: Pick a π§ π component and assign an label to it β Transition model: Move from one partial structure to another β Goal test: Whether all π§ components are assigned β’ A goal state does not need to be optimal β Path cost/score function: π± π π(π², node) , or more generally, a neural network that depends on the π² and the node β’ A node contains the current state and the back pointer to trace back the search path 31
Example Suppose each y can be one of A, B or C y 3 y 1 y 2 x 1 x 2 x 3 32
Example Suppose each y can be one of A, B or C y 3 y 1 y 2 x 1 x 2 x 3 State: Triples (y 1 , y 2 , y 3 ) all possibly unknown β’ (A, -, -), (-, A, A), (-, -, -),β¦ β’ Transition: Fill in one of the unknowns β’ Start state: (-,-,-) β’ End state: All three yβs are assigned β’ 33
Example Suppose each y can be one of A, B or C y 3 y 1 y 2 x 1 x 2 x 3 (-,-,-) State: Triples (y 1 , y 2 , y 3 ) all possibly unknown β’ (A, -, -), (-, A, A), (-, -, -),β¦ β’ (A,-,-) (B,-,-) (C,-,-) Transition: Fill in one of the unknowns β’ (A,A,-) (C,C,-) Start state: (-,-,-) β¦.. β’ (A,A,A) End state: All three yβs are assigned (C,C,C) β’ 34
1 st Framework: LaSO: Learning as Search Optimization [Hal DaumΓ© III and Daniel Marcu, ICML 2005] 35
The enqueue function in LaSO 36
The enqueue function in LaSO β’ The goal of learning is to produce an enqueue function that β places good hypotheses high on the queue β places bad hypotheses low on the queue 37
The enqueue function in LaSO β’ The goal of learning is to produce an enqueue function that β places good hypotheses high on the queue β places bad hypotheses low on the queue β’ LaSO assumes enqueue is based on two components g + h 38
The enqueue function in LaSO β’ The goal of learning is to produce an enqueue function that β places good hypotheses high on the queue β places bad hypotheses low on the queue β’ LaSO assumes enqueue is based on two components g + h β g: path component. (g = w T Ο(x, node)) 39
The enqueue function in LaSO β’ The goal of learning is to produce an enqueue function that β places good hypotheses high on the queue β places bad hypotheses low on the queue β’ LaSO assumes enqueue is based on two components g + h β g: path component. (g = w T Ο(x, node)) β h: heuristic component. (h is given) β’ A * if h is admissible, heuristic search if h is not admissible, best first search if h = 0, beam search if queue is limited. 40
The enqueue function in LaSO β’ The goal of learning is to produce an enqueue function that β places good hypotheses high on the queue β places bad hypotheses low on the queue β’ LaSO assumes enqueue is based on two components g + h The goal is to learn w. β g: path component. (g = w T Ο(x, node)) How? β h: heuristic component. (h is given) β’ A * if h is admissible, heuristic search if h is not admissible, best first search if h = 0, beam search if queue is limited. 41
βy-goodβ node 42
βy-goodβ node Assumption : for any given node s and an gold output y, we can tell whether s can or cannot lead to y. 43
βy-goodβ node Assumption : for any given node s and an gold output y, we can tell whether s can or cannot lead to y. Definition : The node s is y-good if s can lead to y 44
βy-goodβ node Assumption : for any given node s and an gold output y, we can tell whether s can or cannot lead to y. Definition : The node s is y-good if s can lead to y y = (y 1 , y 2 , y 3 ) Suppose each y can be one of A, B or C, and the true label is (y 1 =A, y 2 =B, y 3 =C) 45
βy-goodβ node Assumption : for any given node s and an gold output y, we can tell whether s can or cannot lead to y. Definition : The node s is y-good if s can lead to y (-,-,-) y = (y 1 , y 2 , y 3 ) (A,-,-) (-,B,-) (C,-,-) Suppose each y can be one of A, B or C, and the true (A,A,-) (C,C,-) label is (y 1 =A, y 2 =B, y 3 =C) β¦.. (A,A,A) (C,C,C) 46
Learning in LaSO 47
Learning in LaSO β’ Search as if in the prediction phase, but when an error is made: 48
Learning in LaSO β’ Search as if in the prediction phase, but when an error is made: β update w 49
Learning in LaSO β’ Search as if in the prediction phase, but when an error is made: β update w β clear the queue and insert all the correct moves 50
Learning in LaSO β’ Search as if in the prediction phase, but when an error is made: β update w β clear the queue and insert all the correct moves β’ Two kinds of errors: 51
Learning in LaSO β’ Search as if in the prediction phase, but when an error is made: β update w β clear the queue and insert all the correct moves β’ Two kinds of errors: β Error type 1: none of the queue is y-good 52
Learning in LaSO β’ Search as if in the prediction phase, but when an error is made: β update w β clear the queue and insert all the correct moves β’ Two kinds of errors: β Error type 1: none of the queue is y-good β Error type 2: the goal state is not y-good 53
Learning Algorithm in LaSO 54
Learning Algorithm in LaSO Algo Learn( problem, initial, enqueue, w, x, y ) nodes = MakeQueue(MakeNode( problem, initial )) while nodes is not empty: node = Pop( nodes ) if error step 1: update w step 2: refresh queue else if GoalTest( node ) then return w next = Result( node , Actions( node )) nodes = enqueue ( problem, nodes, next, w ) 55
Learning Algorithm in LaSO Algo Learn( problem, initial, enqueue, w, x, y ) nodes = MakeQueue(MakeNode( problem, initial )) while nodes is not empty: node = Pop( nodes ) if error step 1: update w step 2: refresh queue else if GoalTest( node ) then return w next = Result( node , Actions( node )) nodes = enqueue ( problem, nodes, next, w ) 56
Learning Algorithm in LaSO Algo Learn( problem, initial, enqueue, w, x, y ) nodes = MakeQueue(MakeNode( problem, initial )) while nodes is not empty: node = Pop( nodes ) if error step 1: update w step 2: refresh queue else if GoalTest( node ) then return w next = Result( node , Actions( node )) nodes = enqueue ( problem, nodes, next, w ) 57
Learning Algorithm in LaSO Algo Learn( problem, initial, enqueue, w, x, y ) nodes = MakeQueue(MakeNode( problem, initial )) while nodes is not empty: node = Pop( nodes ) if error step 1: update w step 2: refresh queue else if GoalTest( node ) then return w next = Result( node , Actions( node )) nodes = enqueue ( problem, nodes, next, w ) 58
Learning Algorithm in LaSO Algo Learn( problem, initial, enqueue, w, x, y ) nodes = MakeQueue(MakeNode( problem, initial )) while nodes is not empty: node = Pop( nodes ) if error step 1: update w step 2: refresh queue else if GoalTest( node ) then return w next = Result( node , Actions( node )) nodes = enqueue ( problem, nodes, next, w ) 59
Learning Algorithm in LaSO Algo Learn( problem, initial, enqueue, w, x, y ) nodes = MakeQueue(MakeNode( problem, initial )) while nodes is not empty: node = Pop( nodes ) if error step 1: update w step 2: refresh queue else if GoalTest( node ) then return w next = Result( node , Actions( node )) nodes = enqueue ( problem, nodes, next, w ) 60
Learning Algorithm in LaSO Algo Learn( problem, initial, enqueue, w, x, y ) nodes = MakeQueue(MakeNode( problem, initial )) while nodes is not empty: node = Pop( nodes ) if error step 1: update w step 2: refresh queue else if GoalTest( node ) then return w next = Result( node , Actions( node )) nodes = enqueue ( problem, nodes, next, w ) 61
Learning Algorithm in LaSO Algo Learn( problem, initial, enqueue, w, x, y ) nodes = MakeQueue(MakeNode( problem, initial )) while nodes is not empty: node = Pop( nodes ) if error step 1: update w step 2: refresh queue else if GoalTest( node ) then return w next = Result( node , Actions( node )) nodes = enqueue ( problem, nodes, next, w ) 62
Learning Algorithm in LaSO Algo Learn( problem, initial, enqueue, w, x, y ) nodes = MakeQueue(MakeNode( problem, initial )) while nodes is not empty: node = Pop( nodes ) if error step 1: update w step 2: refresh queue else if GoalTest( node ) then return w next = Result( node , Actions( node )) nodes = enqueue ( problem, nodes, next, w ) 63
Learning Algorithm in LaSO Algo Learn( problem, initial, enqueue, w, x, y ) nodes = MakeQueue(MakeNode( problem, initial )) while nodes is not empty: node = Pop( nodes ) if error step 1: update w step 2: refresh queue else if GoalTest( node ) then return w next = Result( node , Actions( node )) nodes = enqueue ( problem, nodes, next, w ) 64
Learning Algorithm in LaSO Algo Learn( problem, initial, enqueue, w, x, y ) nodes = MakeQueue(MakeNode( problem, initial )) while nodes is not empty: node = Pop( nodes ) if error step 1: update w step 2: refresh queue else if GoalTest( node ) then return w next = Result( node , Actions( node )) nodes = enqueue ( problem, nodes, next, w ) 65
What should learning do? node 1 y-good node 2 node 3 y-good y-good node 4 node 5 current y-good y-good Letβs say we found an error (of either type) at the current node, then we should have made the choice of node 4 instead of the current node 66
What should learning do? node 1 y-good node 2 node 3 y-good y-good node 5 node 4 current y-good y-good Letβs say we found an error (of either type) at the current node, then we should have made the choice of node 4 instead of the current node Node 4 is the y-good sibling of the current node 67
Learning Algorithm in LaSO Algo Learn( problem, initial, enqueue, w, x, y ) nodes = MakeQueue(MakeNode( problem, initial )) while nodes is not empty: node = Pop( nodes ) if none of ( node + nodes ) is y-good or GoalTest( node ) and node is not y-good then sibs = siblings( node, y ) w = update( w, x, sibs, node, nodes ) nodes = MakeQueue( sibs ) else if GoalTest( node ) then return w next = Result( node , Actions( node )) nodes = enqueue ( problem, nodes, next, w ) 68
Learning Algorithm in LaSO Algo Learn( problem, initial, enqueue, w, x, y ) nodes = MakeQueue(MakeNode( problem, initial )) while nodes is not empty: node = Pop( nodes ) if none of ( node + nodes ) is y-good or GoalTest( node ) and node is not y-good then sibs = siblings( node, y ) w = update( w, x, sibs, node, nodes ) nodes = MakeQueue( sibs ) else if GoalTest( node ) then return w next = Result( node , Actions( node )) nodes = enqueue ( problem, nodes, next, w ) 69
Learning Algorithm in LaSO Algo Learn( problem, initial, enqueue, w, x, y ) nodes = MakeQueue(MakeNode( problem, initial )) while nodes is not empty: node = Pop( nodes ) if none of ( node + nodes ) is y-good or GoalTest( node ) and node is not y-good then sibs = siblings( node, y ) w = update( w, x, sibs, node, nodes ) nodes = MakeQueue( sibs ) else if GoalTest( node ) then return w next = Result( node , Actions( node )) nodes = enqueue ( problem, nodes, next, w ) 70
Learning Algorithm in LaSO Algo Learn( problem, initial, enqueue, w, x, y ) nodes = MakeQueue(MakeNode( problem, initial )) while nodes is not empty: node = Pop( nodes ) if none of ( node + nodes ) is y-good or GoalTest( node ) and node is not y-good then sibs = siblings( node, y ) w = update( w, x, sibs, {node, nodes} ) nodes = MakeQueue( sibs ) else if GoalTest( node ) then return w next = Result( node , Actions( node )) nodes = enqueue ( problem, nodes, next, w ) 71
Learning Algorithm in LaSO Algo Learn( problem, initial, enqueue, w, x, y ) nodes = MakeQueue(MakeNode( problem, initial )) while nodes is not empty: node = Pop( nodes ) if none of ( node + nodes ) is y-good or GoalTest( node ) and node is not y-good then sibs = siblings( node, y ) w = update( w, x, sibs, {node, nodes} ) nodes = MakeQueue( sibs ) else if GoalTest( node ) then return w next = Result( node , Actions( node )) nodes = enqueue ( problem, nodes, next, w ) 72
Parameter Updates We need to specify w = update( w, x, sibs, nodes ) A simple perceptron-style update rule: w = w + Ξ Ξ¦ ( x, n ) Ξ¦ ( x, n ) X X β = | sibs | β | nodes | n β sibs n β nodes It comes with the usual perceptron-style mistake bound and generalization bound. (See references) 73
2 nd Framework: SEARN: Search and Learning Hal DaumΓ© III, John Langford, Daniel Marcu (2007) 74
Policy A policy is a mapping from a state to an action β’ For a given node, the policy tells what action should be taken β’ 75
Policy A policy is a mapping from a state to an action β’ For a given node, the policy tells what action should be taken β’ A policy gives a search path in the search space. β’ β Different policy means different search path β Can be thought as the βdriverβ in the search space 76
Policy A policy is a mapping from a state to an action β’ For a given node, the policy tells what action should be taken β’ A policy gives a search path in the search space. β’ β Different policy means different search path β Can be thought as the βdriverβ in the search space A policy may be deterministic, or may contain some randomness. β’ (More on this later) 77
Reference Policy and Learned Policy 78
Reference Policy and Learned Policy β’ We assume we already have a good reference policy π for training data (π², π) β i.e. examples associated with costs for outputs 79
Reference Policy and Learned Policy β’ We assume we already have a good reference policy π for training data (π², π) β i.e. examples associated with costs for outputs β’ Goal: Learn a good policy for test data when we do not have access to cost vector c. (Imitation Learning) 80
Reference Policy and Learned Policy β’ We assume we already have a good reference policy π for training data (π², π) β i.e. examples associated with costs for outputs β’ Goal: Learn a good policy for test data when we do not have access to cost vector c. (Imitation Learning) Ο ref ref Ο 81
Reference Policy and Learned Policy β’ We assume we already have a good reference policy π for training data (π², π) β i.e. examples associated with costs for outputs β’ Goal: Learn a good policy for test data when we do not have access to cost vector c. (Imitation Learning) For example if we are using Hamming distance for cost vector π , then the reference policy is trivial Ο ref ref Ο to compute, why? 82
Reference Policy and Learned Policy β’ We assume we already have a good reference policy π for training data (π², π) β i.e. examples associated with costs for outputs β’ Goal: Learn a good policy for test data when we do not have access to cost vector c. (Imitation Learning) For example if we are using Hamming distance for cost vector π , then the reference policy is trivial Ο ref ref Ο to compute, why? Just make the right decision at every step 83
Reference Policy and Learned Policy β’ We assume we already have a good reference policy π for training data (π², π) β i.e. examples associated with costs for outputs β’ Goal: Learn a good policy for test data when we do not have access to cost vector c. (Imitation Learning) For example if we are using Hamming distance for cost vector π , then the reference policy is trivial Ο ref ref Ο to compute, why? Just make the right decision at every step Suppose gold state is (A, B, C, A) and we are at the state (A, C, -, -) The reference policy tells us the next action is assigned C to the third slot. 84
Cost-Sensitive Classification Suppose we want to learn a classifier β that maps examples to one of πΏ labels Standard multiclass classification Training data: Pairs of examples associated with labels β’ π¦, π§ β π Γ[πΏ] β Learning goal: To find a classifier that has low error β’ β min = Pr β π¦ β π§ Cost-sensitive classification Training data: An example paired with a cost vector that lists out the cost β’ of predicting each label π¦, π β π Γ 0, β S β Learning goal: To find a classifier that has low cost β’ β min = πΉ >,T π = > 85
Cost-Sensitive Classification Suppose we want to learn a classifier β that maps examples to one of πΏ labels Standard multiclass classification Training data: Pairs of examples associated with labels β’ π¦, π§ β π Γ[πΏ] β Learning goal: To find a classifier that has low error β’ β min = Pr β π¦ β π§ Cost-sensitive classification Training data: An example paired with a cost vector that lists out the cost β’ of predicting each label π¦, π β π Γ 0, β S β Learning goal: To find a classifier that has low cost β’ β min = πΉ >,T π = > 86
Cost-Sensitive Classification Suppose we want to learn a classifier β that maps examples to one of πΏ labels Standard multiclass classification Training data: Pairs of examples associated with labels β’ π¦, π§ β π Γ[πΏ] β Learning goal: To find a classifier that has low error β’ β min = Pr β π¦ β π§ Cost-sensitive classification Training data: An example paired with a cost vector that lists out the cost β’ of predicting each label π¦, π β π Γ 0, β S β Learning goal: To find a classifier that has low cost β’ β min = πΉ >,T π = > 87
Cost-Sensitive Classification Suppose we want to learn a classifier β that maps examples to one of πΏ labels Standard multiclass classification Training data: Pairs of examples associated with labels β’ π¦, π§ β π Γ[πΏ] β Learning goal: To find a classifier that has low error β’ β min = Pr β π¦ β π§ Cost-sensitive classification Training data: An example paired with a cost vector that lists out the cost β’ of predicting each label Exercise: How would π¦, π β π Γ 0, β S β you design a cost- Learning goal: To find a classifier that has low cost β’ sensitive learner? β min = πΉ >,T π = > 88
Cost-Sensitive Classification Suppose we want to learn a classifier β that maps examples to one of πΏ labels Standard multiclass classification Training data: Pairs of examples associated with labels β’ π¦, π§ β π Γ[πΏ] β Learning goal: To find a classifier that has low error β’ β min = Pr β π¦ β π§ Cost-sensitive classification Training data: An example paired with a cost vector that lists out the cost β’ of predicting each label π¦, π β π Γ 0, β S β Learning goal: To find a classifier that has low cost β’ β min = πΉ >,T π = > SEARN uses a cost-sensitive learner to learn a policy 89
SEARN at test time We already have learned a policy. We can use this policy to construct a sequence of decisions y and get the final structured output. 90
SEARN at test time We already have learned a policy. We can use this policy to construct a sequence of decisions y and get the final structured output. 1. Use the learned policy on initial state (-,β¦, -) to compute y 1 91
SEARN at test time We already have learned a policy. We can use this policy to construct a sequence of decisions y and get the final structured output. 1. Use the learned policy on initial state (-,β¦, -) to compute y 1 2. Use the learned policy on state (y 1 , -,β¦,-) to compute y 2 92
SEARN at test time We already have learned a policy. We can use this policy to construct a sequence of decisions y and get the final structured output. 1. Use the learned policy on initial state (-,β¦, -) to compute y 1 2. Use the learned policy on state (y 1 , -,β¦,-) to compute y 2 3. Keep going until we get y = (y 1 ,β¦,y n ) 93
SEARN at training time 94
SEARN at training time β’ The core idea in training is to notice that at each decision step, we are actually doing a cost-sensitive classification 95
SEARN at training time β’ The core idea in training is to notice that at each decision step, we are actually doing a cost-sensitive classification β’ Construct cost-sensitive classification examples (s, c) with state s and cost vector c. 96
SEARN at training time β’ The core idea in training is to notice that at each decision step, we are actually doing a cost-sensitive classification β’ Construct cost-sensitive classification examples (s, c) with state s and cost vector c. β’ Learn a cost-sensitive classifier. (This is nothing but a policy) 97
Roll-in, Roll-out 98
Roll-in, Roll-out roll in At each state, use some policy to move to a new state. 99
Roll-in, Roll-out roll in What is the cost of deviating from the policy at this step? 100
Recommend
More recommend