Amortised learning by wake-sleep Li Kevin Wenliang, Ted Moskovitz, Heishiro Kanagawa, Maneesh Sahani Gatsby Unit, University College London
direct max likelihood πΎ upda update i te in V n VAE AE amor amortis tised ed lear learning ning consis isten tent! t! biasedβ¦ simple le, direct! ect! Intractableβ¦ approximate ximate agnost gnostic t ic to o model s model str tructur ucture and type and ty pe of of Z Z gives giv es bett better er tr trained ained models models ,
Least square regression gives conditional expectation
How to estimate ? β’ define β’ then β’ In practice, draws and solve Algorithm: Issues: } 1. π¨ π , π¦ π βΌ π π β’ sleep is high dimensional 2. find ΰ· π by regression β’ computing for all sleep samples can be slow 3. π¦ π βΌ π } wake 4. update π by ΰ· π(π¦ π )
How to estimate more efficiently ? β’ define β’ suppose we estimate with kernel ridge regression, then auto-diff is an estimator of by kernel ridge regression Theorem : if and the kernel is rich, then is a consistent estimator of
Amortised learning by wake-sleep 1. π¨ π , π¦ π βΌ π π consis isten tent! t! 2. kernel ridge regression simple le, direct! ect! 3. π¦ π βΌ π 4. update π by π π¦ π = β π α π π (π¦ π ) Assumptions: Non-assumptions: β’ easy to sample from π π β’ posterior β’ β π log π π (π¦, π¨) exists β’ structure of π π 2 β’ true gradient is β π β’ type of π ,
consis isten tent! t! Experiments β’ Log likelihood gradient estimation simple le, direct! ect! β’ Non-Euclidean latent β’ Dynamical models β’ Image generation β’ Non-negative matrix factorisation β’ Hierarchical models β’ Independent component analysis β’ Neural processes
Experiment 1: gradient estimation
Experiment II: prior on the unit circle π¨ β
Experiment III: dynamical model
Experiment IV:sample quality
Experiment IV: downstream tasks
amor amortis tised ed lear learning ning consis isten tent! t! simple, le, direct! ect! , Thank you!
Recommend
More recommend