latent normalizing flows for discrete sequences
play

Latent Normalizing Flows for Discrete Sequences Zachary M. Ziegler, - PowerPoint PPT Presentation

Latent Normalizing Flows for Discrete Sequences Zachary M. Ziegler, Alexander M. Rush School of Engineering and Applied Sciences, Harvard University Poster #3 @ Pacific Ballroom Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific


  1. Latent Normalizing Flows for Discrete Sequences Zachary M. Ziegler, Alexander M. Rush School of Engineering and Applied Sciences, Harvard University Poster #3 @ Pacific Ballroom Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific Ballroom 1 / 8

  2. Motivation: Normalizing flows For invertible f θ : ǫ → Z and base density p ǫ ( ǫ ) , � det ∂f − 1 � � ( z ) p Z ( z ) = p ǫ ( f − 1 � θ � ( z )) � � θ ∂ z � Flows generalize autoregressive models for continuous data, allowing increased model flexibility and non-autoregressive generation . Kingma and Dhariwal 2018, van den Oord et al. 2017, Rezende and Mohamed 2015 Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific Ballroom 2 / 8

  3. Goal: Flows for discrete data For discrete sequences MLE autoregressive models are ubiquitous. Can flows go beyond AR models for discrete sequences? Figure: OpenNMT Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific Ballroom 3 / 8

  4. Challenges and approach Discrete change of variables poses theoretical and practical challenges 1 compared to continuous change of variables. x ∈ V T z ∈ R T × H x 1 T Latent variable model where prior p ( z 1: T ) captures dynamics of discrete data over time. x 2 ǫ 1: T z 1: T Key: weak conditionally independent Flow Prior . . emission model. . VAE for inference, optimize ELBO. x T Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific Ballroom 4 / 8

  5. Challenges and approach Discrete change of variables poses theoretical and practical challenges 1 compared to continuous. Discrete data is inherently highly multimodal. 2 Specialized flows for multimodal sequences: Model dependencies across dimension and across time. ǫ 1 z 1 ǫ 1 z 1 ǫ 1 z 1 ← ↔ ↔ ǫ 2 z 2 ǫ 2 z 2 ǫ 2 z 2 ← ↔ ↔ ǫ 3 z 3 ǫ 3 z 3 ǫ 3 z 3 ← ↔ ↔ Autoregressive Autoregressive ( ← ) Non-autoregressive ( → ) in time ( ← ) Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific Ballroom 5 / 8

  6. Challenges and approach Discrete change of variables poses theoretical and practical challenges 1 compared to continuous. Discrete data is inherently highly multimodal. 2 Specialized flows for multimodal sequences: Model dependencies across dimension and across time. Replace underlying affine transformation with non-linear transformation. Initial and Final Learned Example Transform Densities Distribution 5 2 density 0 y 2 0 y −5 −2 −5 0 5 −5 0 5 −2 0 2 x x, y y 1 Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific Ballroom 6 / 8

  7. Experiments: Character-level LM, PTB Model Test NLL Reconst. KL LSTM 1.41 - - Independent-across-time flow 2.90 0.15 2.77 Autoregressive ( ← ) 1.42 0.10 1.37 Autoregressive in time ( ← ) 1.46 0.10 1.43 Non-autoregressive ( → ) 1.63 0.21 1.55 KL always makes up > 90% of loss, indicating continuous flow models vast majority of uncertainty. Additional experiments on polyphonic music generation. _ g r o u p s _ Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific Ballroom 7 / 8

  8. Conclusions Latent variable model for discrete sequences modeling discrete dynamics in continuous latent space with continuous flows. See poster for details of approach, more experimental results, and generation speed comparison. Poster #3 @ Pacific Ballroom, for details and more experiments Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific Ballroom 8 / 8

Recommend


More recommend