function space priors in bayesian deep learning
play

Function Space Priors in Bayesian Deep Learning Roger Grosse - PowerPoint PPT Presentation

Function Space Priors in Bayesian Deep Learning Roger Grosse Motivation Today Bayesian deep learning is most often tested on regularization (Bayesian Occams Razor, description length regularization) smoothing the predictions


  1. Function Space Priors in Bayesian Deep Learning Roger Grosse

  2. Motivation • Today Bayesian deep learning is most often tested on • regularization (Bayesian Occam’s Razor, description length regularization) • smoothing the predictions • calibration and confidence intervals • novelty and out-of-distribution detection • noise to encourage exploration in RL • But these all have non-Bayesian approaches that are competitive

  3. The Three X’s Explanation Exploration Extrapolation

  4. Compositional GP Kernels Gaussian processes are distributions over functions, specified by kernels. Primitive kernels: Composite kernels: Per SE Lin × Lin SE × Per RQ Lin Lin + Per Lin × Per

  5. Automatic Statistician - Duvenaud et al., 2013, “Structure discovery in nonparametric regression through compositional kernel search”

  6. Automatic Statistician - Duvenaud et al., 2013, “Structure discovery in nonparametric regression through compositional kernel search”

  7. - Lloyd et al., 2014, “Automatic construction and natural-language description of nonparametric regression models”

  8. Structured Priors and Deep Learning • This demonstrates the power and flexibility of function space priors. • Problems • Requires a discrete search over the space of kernel structures (tries thousands to analyze a dataset) • Need to re-fit the kernel hyperparameters for each candidate structure • Can Bayesian deep learning discover and exploit structured function space priors? • Discover: Neural Kernel Network learns compositional kernels • Exploit: functional variational BNN performs variational inference in function space • Caveat: we haven’t yet figured out how to do both simultaneously

  9. Differentiable Compositional Kernel Learning for Gaussian Processes Shengyang Sun, Guodong Zhang, Chaoqi Wang, Wenyuan Zeng, Jiaman Li ICML 2018

  10. Neural Kernel Network • Neural Kernel Network: a neural net architecture that takes two input locations and computes the kernel between them • Layers are defined using the composition rules, so every unit corresponds to a valid kernel. • Good at representing the same compositional structures as the Automatic Statistician, but is end-to-end differentiable

  11. Learning Flexible GP Kernels • Extrapolates time series datasets similarly to the Automatic Statistician • Runs in minutes rather than hours: Airline Mauna Solar Automatic Statistician 6147 51065 37716 NKN 201 576 962

  12. Learning Flexible GP Kernels • Extrapolating 2-D patterns Ground truth Observation NKN Spectral Mixture (10 components) NKN prediction

  13. <latexit sha1_base64="1kZFLH0CfGcPFr1Die3lSA4/HCc=">AnSXiclVpLc+S2ER5783A2r3VyzIWJKspuLG9ptHbF1dZj32MrNeOpF1XNIUSGJmaJEFwRH0s7yX+TX5Jrc8gvyM3JL5ZQGCHLwIkeWSioS3V93o9FAN3rGz+IoZ5ub/7o4wc/+vFPfvrJzx7+/Be/NWvH36mzc5KWiAzwMSE/qdj3IcRyk+ZxGL8XcZxSjxY/zWv97l9LdzTPOIpGfsLsOXCZqm0SQKEIOh8aOnk8e34/6GN4pDwvIN73Z89MT72hOjT7zPvFHAx+GBjxw9GT9a23y6KX48+6EvH9Z68udk/On6g1FIgiLBKQtilOcX/c2MXS4QZVEQ4/LhqMhxhoJrNMUX8JiBOeXCzGx0vsjITehFD4S5knRlXEAiV5fpf4wJkgNstNGh90S4KNvnqchGlWcFwGlSKJkXsMeJxL3lhRHA4jt4QAGNwFYvmCGKAga+1LTkLEH0joblw4ejFN8EJElQGi5G+DYDCeVixJX7/uJ5WeoM13EYzSWdJou9ctw8f3tQmtzTaI5T4N74MNqwBO0gmnPayCamhCYolmoCFC+OTMnvMSVvcFBbOlsmhxRCIsXsbsly8DJUiqsto5DxDp0kBTnmhV9kyObpMo0/mqS/Wh6rNCPOX0PQ8hRDJpnxmiBH6Z/DXnOvh/3QJjKI0z0iOQwjmbkAFIWK/L12+bsElpX/M1yUhjgDCcCYjX4vfvmjuZiUpHs4Zqi82LoUIL5VCR0v1vpfr21a2W0XMCfpbOCS8vD3PZrSAo4LAYNp9DbMENQ5oWfM4h2sMAbjTxuhBnsMbrDdBDeghocx2YIFskBp0OIHpi6owRQxym4PHKSzm5IufjeFnjKcCbCZWjGUcFgTx+iTFhjChUbviaYvMUFgDsLxZJjNiQ3pfo+7reywtGrs27ZswBx4JQzSwqQAC5IJkoaeYqZEoxn5pou1QJXKMKpjaGStOfgSfw8eg8rc2iNCy8lZuRimuJ4yHfJ0IwPSEnbAYvmIunky02eW5tcYzwoL/qX5iDfCStgZ81ObsNfKbu9U9RA7IcLl4xL+L/VbclAGHAPkwenQo/g3PBsfz0OXEhBMX7QbjD8uKZDbxaPF579sTE3+BoOmPKir1t4ZBLJd9cE64ZB/tC/4XGK3zprT1rAQ32TwXopjZ3w8HLJ/RW568m2YnyIySXyXdZzclaXvJbOKQD5FuHA3aA4NHgXfijXxEl17uROgersZWqrG8reDafI5cOxV1cEnLUPceRe0b1ES2705VyL7cmibauS8VplckgXkJx6tT7QTYU4TBFbPksPaJ1vh7zZWLcky3lrFqxvVWYHaTgmNtToe0IqDISUpv3IgWtWF2cxMtSqD9J465PKdyM57CBJRE3a3jmMTChVu6pLpvcUElXoidi2sM4lDfkUgsTdiM8ysQI5Jniu59MCkT4pU6puYBSOiU6wUtXcWNMpnmGqF74tWHumk5t3loepaok1+bk0+mZQLuDsZSHDrHlwnaeQXlYAMSjuaeJxgigjIXE/XSycuRid51M0vJ6IPumZjwAbpvBV5tfjcBdS20s46yO4fpgGv0QAuDd4ab2b3jKRFmHqnDrkuabXBMJ2lHCS+sZYmpMdfPLeRiYFuvhSoH4NsZWEch3x+tCXH34YxUz7/iAWVFCRb69DMtdF31nSXfiYctWqXgZ3C6DOesuwVC+mhscxdnMWmpxGpxnEOi4Y9vEKPFDdMJ5y/rNLDB4INUc4sXUFNXUzDwOJamuaCPXvIwVrOJWPxQqCj+S+9aR7IfVEWiG+/KGvreU0zQCW4LHGBpgBN9T/3VJhK5z0XljuSCLn/QGaHPzM3TOskOkcv5dspsmXqXYM0LXYydDlHCfJAGTaSP2vn28MK3+ete6ISt3y3RC5JlUSF1ZRKZ+SUibQJTwbtnZPX1spUiPvWStmLdOoapKFseh3MrUvMtGDNdUvB217WBLsQhwy3Pn8xveMX+NzJ8wIFDYdZVgRxlEnluy2FzZk8RuSrO1ATwpuRaLNOvgOtPO3qzsFlsfdUnhnKGm1T5toQpMo3Q6rA7l6tx2iTkilCtJTSVoPj2LEpxDySb6dcjBUF0GgXprncbScdvzaX0VaGrGdlbD6zDSeMhwEUapVkCZLRUMKJXWMcJnpoeMLieJ1ldBFTcDvaXVZfypSM51WRFjLMyMrQeHR1x/0ZxiG27HdBds3Y8jaWLbwlSAKgflAaWh9spqpF1fA1HSuTNSDp/FtKuNPNksDpKoW/qfeaEXeN1JDBf2ptpapegdRyTjPalmV0WJNXTJw7kxhgNXc4BbQkz4ZXREHVsZChoM67DVhFo74eLdX4EmAsCB/q8qsMadeSCPWrsI5F4cPdq9IAjWVRkF3rsoTbC2OKcK4M3xp1oxt2x8AmD9KGum9S01R8RKGvS7VNnaz19V28OJtkA5DRO9eNFflkW9fJUyu8zQqHVjLFV6E1pm3+5oYixv0/rgioYNcOwgFsyMSF8qvAfWqVdQXMrxbYZS/tlshYpjn6LgGjOPf9RCqXwzQDOSEhbrzobypqvTrubz0t3WeOrO9fdTju1neaGtxnQ6jrZzxzifIYy3LRTKZ7jlQ1VCdI7qnLQGSN62hO+vUeWqTOug78+BY3thtkNodn9ZKFtFXLJvXK97gmU4x7Y7QGqWPXtXYlVBncNfwe4VKbYFaDHCLaAtWLpMUZuPjfeUsRYDLJijl9mlVjR3wTS4sB1o5fW5MzM17CRnOv+bdv6qLI0BCsRlhJ3umxuHfXi1dcOt8vkflK9JYe6HKV9PuHAVoQuCUpFouLViuceaL5hXCRlzvwzUft2sUH2xefCV8pkBpGPGwx1VfgxLkCNko1UYe2rArlXuzSPHRqdsAcqoemaquzATFd3STUzqyz5uCXJ/ChzQ0XTbtjUsk9fUeVdLdj8c1IDduWJxKquTjrX7D0hDTaD6BtCv7ZvDUfCsnLHljRXnbsg5w4n/P+/NzyEcz04aCzrGWla0WafXdBd6ld3yEnsobN2xAq0bi/dhDEuJY3DR12sGw5X4N93i1zyHu8UYGEM3CI8TkrVIU1ynyY1R6ylQ9AzbEzReOfB8mAmLHj9b65pfE7Ic3W0/7z5+fqLtW+kl8g+6T3u94feo97/d5fet/0XvVOeue9oPe3t97/+j9c/1f6/9Z/+/6/yrWjz+SmN/2tJ8/Pfg/zulEWg=</latexit> Structured Kernels for Bayes Opt • Structured kernels can help BayesOpt search much faster. • E.g., if a function is additive, i.e. , then f ( x 1 , . . . , x N ) = f ( x 1 ) + · · · + f ( x N ) the search is linear rather than exponential. (e.g. Kandasamy et al., 2015) • BayesOpt with an NKN kernel can learn to make use of additive structure when it exists.

  14. Structured Kernels for Bayes Opt • Note: Bayesian neural nets don’t achieve this by default. • Even though they’re good at representing additive functions, they don’t seem to enjoy the inductive bias. 6tyEtDng-10D 2rDFOH −200 G3-5B) 11G −220 H0C )B11 −240 (nsHPEOH(5) )unFtiRn vDOuH −260 −280 −300 −320 −340 −360 0 10000 20000 30000 40000 50000 CRst

  15. Functional Variational BNNs Guodong Jiaxin Shengyang Zhang Shi Sun ICLR 2019

  16. Functional variational BNNs • Define a stochastic process prior (e.g. a GP) • Goal: train a generator network to produce functions as close as possible to the stochastic process posterior y x 1 x 2 • The stochastic weights and units are shared between all input locations. Hence, even the stochastic units represent epistemic, not aleatoric, uncertainty.

Recommend


More recommend