Probability Functional Descent: A Unifying Perspective on GANs, VI, and RL Casey Chu < caseychu@stanford.edu > Jose Blanchet Peter Glynn
Deep generative models
Deep generative models Variational inference
Deep generative models Variational inference Deep reinforcement learning
Probability functional J : P ( X ) → ℝ
Probability functional J : P ( X ) → ℝ “gradient” ∇ J
Probability functional J : P ( X ) → ℝ “gradient” ∇ J von Mises influence function Ψ : X → ℝ
Gradient descent on f : ℝ n → ℝ Initialize x ∈ ℝ n arbitrarily 0. Compute the gradient g = ∇ f ( x ) 1. 2. Choose x ′ such that x ′ · g < x · g (usually, we set x ′ = x − αg )
Gradient descent on f : ℝ n → ℝ Initialize x ∈ ℝ n arbitrarily 0. Compute the gradient g = ∇ f ( x ) 1. 2. Choose x ′ such that x ′ · g < x · g (usually, we set x ′ = x − αg ) Probability functional descent on J : P ( X ) → ℝ 0. Initialize a distribution μ ∈ P ( X ) arbitrarily 1. Compute the influence function Ψ of J at μ 2. Choose μ ′ such that 𝔽 x ~ μ ′ [Ψ( x )] < 𝔽 x ~ μ [Ψ( x )]
Generative modeling Probability functional descent 1. Compute the influence J G ( μ ) = D( μ || ν 0 ) function Ψ of J at μ where D is e.g. Jensen–Shannon, Wasserstein 2. Choose μ ′ such that 𝔽 x ~ μ ′ [Ψ( x )] < 𝔽 x ~ μ [Ψ( x )] Optimize the discriminator , which 1. approximates the influence function of J G Update the generator μ 2. PFD recovers: Minimax GAN ● Non-saturating GAN ● Wasserstein GAN ●
Variational inference Probability functional descent 1. Compute the influence J VI ( q ) = KL( q ( θ ) || p ( θ | x )) function Ψ of J at μ 2. Choose μ ′ such that Compute the ELBO , log( q ( θ )/ p ( x , θ )) , the 1. 𝔽 x ~ μ ′ [Ψ( x )] < 𝔽 x ~ μ [Ψ( x )] influence function for J VI Update the approximate posterior q 2. PFD recovers: Black-box variational inference ● Adversarial variational Bayes ● Approximate posterior distillation ●
Reinforcement learning Probability functional descent 1. Compute the influence J RL (π) = 𝔽 π [∑ t γ t R t ] function Ψ of J at μ 2. Choose μ ′ such that Approximate the advantage Q π ( s , a ) 1. 𝔽 x ~ μ ′ [Ψ( x )] < 𝔽 x ~ μ [Ψ( x )] − V π ( s ) , the influence function for J RL Update the policy π 2. PFD recovers: Policy gradient ● Actor-critic ● Dual actor critic ●
Probability functional descent is a unifying perspective that enables the easy development of new algorithms.
Probability functional descent is a unifying perspective that enables the easy development of new algorithms. https://www.freecodecamp.org/news/an-intuitive-introduction-to-generative-adversarial-networks-gans-7a2264a81394/ https://arxiv.org/abs/1710.10196 https://www.analyticsvidhya.com/blog/2016/06/bayesian-statistics-beginners-simple-english/ https://stats.stackexchange.com/questions/246117/applying-stochastic-variational-inference-to-bayesian-mixture-of-gaussian http://people.csail.mit.edu/hongzi/content/publications/DeepRM-HotNets16.pdf https://towardsdatascience.com/atari-reinforcement-learning-in-depth-part-1-ddqn-ceaa762a546f
Probability Functional Descent: A Unifying Perspective on GANs, VI, and RL Casey Chu < caseychu@stanford.edu > Jose Blanchet Peter Glynn
Recommend
More recommend