relaxing bijectivitiy constraints with continuously
play

Relaxing Bijectivitiy Constraints with Continuously Indexed - PowerPoint PPT Presentation

. . . . . . . . . . . . . . Relaxing Bijectivitiy Constraints with Continuously Indexed Normalising Flows ICML 2020 Rob Cornish, Anthony Caterini, George Deligiannidis, Arnaud Doucet University of Oxford July 12-18, 2020


  1. . . . . . . . . . . . . . . Relaxing Bijectivitiy Constraints with Continuously Indexed Normalising Flows ICML 2020 Rob Cornish, Anthony Caterini, George Deligiannidis, Arnaud Doucet University of Oxford July 12-18, 2020 University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . . . . . . . . . . . . . . 1 / 18

  2. . . . . . . . . . . . . . . . . Motivation The following densities were learned using a Gaussian prior with a 10-layer Figure 1: Darker regions indicate lower density. Data shown in black. University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . . . . . . . . . . . . 2 / 18 Residual Flow [Chen et al., 2019] (.5M parameters) trained to convergence.

  3. Hence the support of X will share the same topological properties as the support . . . . . . . . . . . . Why Does This Occur? . Normalising Flows (NFs) defjne the following process: where f is a difgeomorphism. of Z , i.e. Number of connected components Number of “holes” How they are “knotted” etc. University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 / 18 Z ∼ P Z , X := f ( Z ) ,

  4. . . . . . . . . . . . . Why Does This Occur? . Normalising Flows (NFs) defjne the following process: where f is a difgeomorphism. Hence the support of X will share the same topological properties as the support of Z , i.e. Number of connected components Number of “holes” How they are “knotted” etc. University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 / 18 Z ∼ P Z , X := f ( Z ) ,

  5. Moreover, to approximate the target closely, our fmow must approach . . . . . . . . . . . . . . . . Problem non-invertibility. University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . . . . . . . . . . . . 4 / 18 This suggests a problem when the support of the prior P Z is simple (e.g. a Gaussian): we usually can’t then reproduce the target exactly.

  6. . . . . . . . . . . . . . . . . Problem non-invertibility. University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . . . . . . . . . . . . 4 / 18 This suggests a problem when the support of the prior P Z is simple (e.g. a Gaussian): we usually can’t then reproduce the target exactly. Moreover, to approximate the target closely, our fmow must approach

  7. . . . . . . . . . . . . . . . Our Proposal: Continuously Indexed Flows Continuously indexed fmows (CIFs) instead use the process Any existing normalising fmow can be used to construct F . A continuous index means the density of X is no longer tractable, but can be trained via a natural ELBO objective instead. University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . . . . . . . . . . . . . 5 / 18 Z ∼ P Z , U | Z ∼ P U | Z ( · | Z ) , X := F ( Z ; U ) , where U is a continuous index variable, and each F ( · ; u ) is a normalising fmow.

  8. . . . . . . . . . . . . . . . Our Proposal: Continuously Indexed Flows Continuously indexed fmows (CIFs) instead use the process Any existing normalising fmow can be used to construct F . A continuous index means the density of X is no longer tractable, but can be trained via a natural ELBO objective instead. University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . . . . . . . . . . . . . 5 / 18 Z ∼ P Z , U | Z ∼ P U | Z ( · | Z ) , X := F ( Z ; U ) , where U is a continuous index variable, and each F ( · ; u ) is a normalising fmow.

  9. . . . . . . . . . . . . . . . Our Proposal: Continuously Indexed Flows Continuously indexed fmows (CIFs) instead use the process Any existing normalising fmow can be used to construct F . A continuous index means the density of X is no longer tractable, but can be trained via a natural ELBO objective instead. University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . . . . . . . . . . . . . 5 / 18 Z ∼ P Z , U | Z ∼ P U | Z ( · | Z ) , X := F ( Z ; U ) , where U is a continuous index variable, and each F ( · ; u ) is a normalising fmow.

  10. . . . . . . . . . . . . . . . Benefjts Intuitively, CIFs can “clean up” mass that would otherwise be misplaced by a single bijection. Figure 2: 10-layer Residual Flow (top) and Continuously-Indexed Residual Flow (bottom). Both use .5M parameters. University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . . . . . . . . . . . . . 6 / 18

  11. . . . . . . . . . . . . . . . . Going Deeper What happens when we model a complicated target using a normalising fmow? n University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . . . . . . . . 7 / 18 . . . . Theorem: If the prior Z has non-homeomorphic support to a target X ⋆ , then a sequence of fmows f n ( Z ) → X ⋆ in distribution only if Lip f n , Lip f − 1 { } max → ∞ .

  12. . . . . . . . . . . . . . . . . Going Deeper What happens when we model a complicated target using a normalising fmow? n University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . . . . . . . . 7 / 18 . . . . Theorem: If the prior Z has non-homeomorphic support to a target X ⋆ , then a sequence of fmows f n ( Z ) → X ⋆ in distribution only if Lip f n , Lip f − 1 { } max → ∞ .

  13. . . . . . . . . . . . . . . . Implications for Residual Flows For residual fmows [Chen et al., 2019], n regardless of training time, neural network size, etc. University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . . . . . . . . . 8 / 18 . . . . 1 + κ, ( 1 − κ ) − 1 } L < ∞ , Lip f n , Lip f − 1 { } { max ≤ max where κ ∈ ( 0 , 1 ) is fjxed and L is the number of layers. Hence the previous theorem guarantees we cannot have f n ( Z ) → X ⋆ in distribution

  14. . . . . . . . . . . . . . . Implications for Other Flows n is unconstrained [Behrmann et al., 2020]. homeomorphic. It seems reasonable to hope for better performance if we can generalise our model University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . . . . . . . . . . . . . . 9 / 18 Lip f n , Lip f − 1 { } For most other fmows, max However, we can still only have f n ( Z ) = X ⋆ exactly if the supports of Z and X ⋆ are class so that f n ( Z ) = X ⋆ is at least possible.

  15. This is compatible with all existing normalising fmows: take . . . . . . . . . . . . Continuously Indexed Flows . Recap: Continuously-indexed fmows (CIFs) use the process F z u f e s u z t u where f is a standard fmow. University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . . . 10 / 18 . . . . . . . . . . . . X := F ( Z ; U ) , Z ∼ P Z , U | Z ∼ P U | Z ( · | Z ) , where U is a continuous index variable, and each F ( · ; u ) is a normalising fmow.

  16. . . . . . . . . . . . . . . . . Continuously Indexed Flows Recap: Continuously-indexed fmows (CIFs) use the process where f is a standard fmow. University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . . . . . . . . 10 / 18 . . . . X := F ( Z ; U ) , Z ∼ P Z , U | Z ∼ P U | Z ( · | Z ) , where U is a continuous index variable, and each F ( · ; u ) is a normalising fmow. This is compatible with all existing normalising fmows: take ( e − s ( u ) ⊙ z − t ( u ) ) F ( z ; u ) = f .

  17. . . . . . . . . . . . . . . . . Multi-layer CIFs An L -layer CIF is obtained by Figure 3: Graphical multi-layer CIF generative model. University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . 11 / 18 . . . . . . . . . . . Z 0 ∼ P Z 0 , U 1 ∼ P U 1 | Z 0 ( ·| Z 0 ) , Z 1 = F 1 ( Z 0 ; U 1 ) , · · · U L ∼ P U L | Z L − 1 ( ·| Z L − 1 ) , X = F L ( Z L − 1 ; U L ) . � 1 � � − 1 � � ... � 0 � 1 � � − 1 �

  18. Given an inference model q U 1 L X , we can use the ELBO for training: p X U 1 L x u 1 L p X x q U 1 L X q U 1 L X u 1 L x p X x to arbitrary precision using an m -sample . . . . . . . . . . u 1 L Training and inference x . x At test time, we can estimate IWAE estimate with m 1. University of Oxford Continuously Indexed Flows July 12-18, 2020 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 / 18 The marginal p X is intractable, but the joint p X , U 1 : L has a closed-form.

Recommend


More recommend