Learning Fast-Mixing Models for Structured Prediction Jacob Steinhardt Percy Liang Stanford University { jsteinhardt,pliang } @cs.stanford.edu July 8, 2015 J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 1 / 11
Structured Prediction Task q w e r t y u i o p a s d f g h j k l z x c v b n m x: b d s a d b n n n f a a s s j j j z: b # # a # # n-n-n # a-a # # n-n a y: b a n a n a J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 2 / 11
Structured Prediction Task q w e r t y u i o p a s d f g h j k l z x c v b n m x: b d s a d b n n n f a a s s j j j z: b # # a # # n-n-n # a-a # # n-n a y: b a n a n a Goal: fit maximum likelihood model p θ ( z | x ) . Two routes: J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 2 / 11
Structured Prediction Task q w e r t y u i o p a s d f g h j k l z x c v b n m x: b d s a d b n n n f a a s s j j j z: b # # a # # n-n-n # a-a # # n-n a y: b a n a n a Goal: fit maximum likelihood model p θ ( z | x ) . Two routes: Use simple model u , exact inference J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 2 / 11
Structured Prediction Task q w e r t y u i o p a s d f g h j k l z x c v b n m x: b d s a d b n n n f a a s s j j j z: b # # a # # n-n-n # a-a # # n-n a y: b a n a n a Goal: fit maximum likelihood model p θ ( z | x ) . Two routes: Use simple model u , exact inference Use expressive model, Gibbs sampling (transition kernel A ) J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 2 / 11
Structured Prediction Task q w e r t y u i o p a s d f g h j k l z x c v b n m x: b d s a d b n n n f a a s s j j j z: b # # a # # n-n-n # a-a # # n-n a y: b a n a n a Goal: fit maximum likelihood model p θ ( z | x ) . Two routes: Use simple model u , exact inference Use expressive model, Gibbs sampling (transition kernel A ) Can we get the best of both worlds? J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 2 / 11
Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11
Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11
Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11
Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11
Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11
Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u u A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11
Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u u A A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11
Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u u u A A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11
Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u u u A A A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11
Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . u u u A A A A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11
Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . ··· u u u A A A A A A J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11
Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . ··· u u u A A A A A A All Doeblin chains mix quickly: Proposition If ˜ A is ε strong Doeblin, then its mixing time is at most 1 ε . J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11
Strong Doeblin Chains Definition (Doeblin, 1940) A chain ˜ A is strong Doeblin with parameter ε if ˜ A ( z t | z t − 1 ) = ε u ( z t )+( 1 − ε ) A ( z t | z t − 1 ) for some u , A . ··· u u u A A A A A A All Doeblin chains mix quickly: Proposition If ˜ A is ε strong Doeblin, then its mixing time is at most 1 ε . Moreover, the stationary distribution is A T u , where T ∼ Geometric ( ε ) . J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 3 / 11
A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11
A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ A θ = ε u θ +( 1 − ε ) A θ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11
A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ A θ = ε u θ +( 1 − ε ) A θ π θ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11
A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11
A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ Three model families: J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11
A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ Three model families: F 0 { u θ } θ ∈ Θ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11
A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ Three model families: { π θ } θ ∈ Θ F F 0 { u θ } θ ∈ Θ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11
A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ Three model families: ˜ F { π θ } θ ∈ Θ { ˜ π θ } θ ∈ Θ F F 0 { u θ } θ ∈ Θ J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11
A Strong Doeblin Family Let θ parameterize a distribution u θ and transition matrix A θ . ˜ = ε u θ +( 1 − ε ) A θ A θ π θ π θ ˜ Three model families: ˜ F { π θ } θ ∈ Θ { ˜ π θ } θ ∈ Θ F F 0 { u θ } θ ∈ Θ ˜ F parameterizes computationally tractable distributions! J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 4 / 11
Strategy Parameterize strong Doeblin distributions ˜ π θ Maximize log-likelihood: L ( θ ) = 1 n ∑ n π θ ( z ( i ) ) i = 1 log ˜ Issue: hard to compute ∇ L ( θ ) J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 5 / 11
Strategy Parameterize strong Doeblin distributions ˜ π θ Maximize log-likelihood: L ( θ ) = 1 n ∑ n π θ ( z ( i ) ) i = 1 log ˜ Issue: hard to compute ∇ L ( θ ) Insight: interpret Markov chain as latent variable model: u θ A θ A θ A θ p θ : z 1 z 2 ··· z T J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 5 / 11
Recommend
More recommend