amortised learning by wake sleep
play

Amortised learning by wake-sleep Li Kevin Wenliang, Ted Moskovitz, - PowerPoint PPT Presentation

Amortised learning by wake-sleep Li Kevin Wenliang, Ted Moskovitz, Heishiro Kanagawa, Maneesh Sahani Gatsby Unit, University College London direct max likelihood upda update i te in V n VAE AE amor amortis tised ed lear learning


  1. Amortised learning by wake-sleep Li Kevin Wenliang, Ted Moskovitz, Heishiro Kanagawa, Maneesh Sahani Gatsby Unit, University College London

  2. direct max likelihood 𝜾 upda update i te in V n VAE AE amor amortis tised ed lear learning ning consis isten tent! t! biased… simple le, direct! ect! Intractable… approximate ximate agnost gnostic t ic to o model s model str tructur ucture and type and ty pe of of Z Z gives giv es bett better er tr trained ained models models ,

  3. Least square regression gives conditional expectation

  4. How to estimate ? β€’ define β€’ then β€’ In practice, draws and solve Algorithm: Issues: } 1. 𝑨 π‘œ , 𝑦 π‘œ ∼ π‘ž πœ„ β€’ sleep is high dimensional 2. find ො 𝑕 by regression β€’ computing for all sleep samples can be slow 3. 𝑦 𝑛 ∼ 𝒠 } wake 4. update πœ„ by ො 𝑕(𝑦 𝑛 )

  5. How to estimate more efficiently ? β€’ define β€’ suppose we estimate with kernel ridge regression, then auto-diff is an estimator of by kernel ridge regression Theorem : if and the kernel is rich, then is a consistent estimator of

  6. Amortised learning by wake-sleep 1. 𝑨 π‘œ , 𝑦 π‘œ ∼ π‘ž πœ„ consis isten tent! t! 2. kernel ridge regression simple le, direct! ect! 3. 𝑦 𝑛 ∼ 𝒠 4. update πœ„ by 𝑕 𝑦 𝑛 = βˆ‡ πœ„ መ 𝑔 πœ„ (𝑦 𝑛 ) Assumptions: Non-assumptions: β€’ easy to sample from π‘ž πœ„ β€’ posterior β€’ βˆ‡ πœ„ log π‘ž πœ„ (𝑦, 𝑨) exists β€’ structure of π‘ž πœ„ 2 β€’ true gradient is β„’ π‘ž β€’ type of π‘Ž ,

  7. consis isten tent! t! Experiments β€’ Log likelihood gradient estimation simple le, direct! ect! β€’ Non-Euclidean latent β€’ Dynamical models β€’ Image generation β€’ Non-negative matrix factorisation β€’ Hierarchical models β€’ Independent component analysis β€’ Neural processes

  8. Experiment 1: gradient estimation

  9. Experiment II: prior on the unit circle 𝑨 ∈

  10. Experiment III: dynamical model

  11. Experiment IV:sample quality

  12. Experiment IV: downstream tasks

  13. amor amortis tised ed lear learning ning consis isten tent! t! simple, le, direct! ect! , Thank you!

Recommend


More recommend