so sorting g out lipsch chitz funct ction approximation
play

So Sorting g Out Lipsch chitz Funct ction Approximation Cem - PowerPoint PPT Presentation

So Sorting g Out Lipsch chitz Funct ction Approximation Cem Anil* James Lucas* Roger Grosse Pacific Ballroom Poster #15 (6:30 9:00 PM) *Equal contribution Goal Train neural networks subject to a strict Lipschitz constraint while


  1. So Sorting g Out Lipsch chitz Funct ction Approximation Cem Anil* James Lucas* Roger Grosse Pacific Ballroom Poster #15 (6:30 – 9:00 PM) *Equal contribution

  2. Goal Train neural networks subject to a strict Lipschitz constraint while maintaining expressive power.

  3. Goal Train neural networks subject to a strict Lipschitz constraint while maintaining expressive power. Norm of Lipschitz Norm of Input Change Constant Output Change

  4. Goal Train neural networks subject to a strict Lipschitz constraint while maintaining expressive power. Norm of Gradient Lipschitz Norm of Lipschitz Input Change Constant Norm Output Change Constant

  5. Goal Train neural networks subject to a strict Lipschitz constraint while maintaining expressive power. Norm of Gradient Lipschitz Norm of Lipschitz Input Change Constant Norm Output Change Constant

  6. Why Care? • Provable Adversarial Robustness (Cisse et. al., 2018) • Wasserstein Distance Estimation (Arjovsky et. al., 2017) • Training Generative Models (Arjovsky et. al., 2017) (Behrmann et. al., 2019) • Computing Generalization Bounds (Bartlett et. al., 1998,2017) • Stabilizing Neural Net Training (Xiao et. al., 2018) (Odena et. al., 2018) • ...

  7. Lipschitz via. Architectural Constraints Design an architecture that is: Expressive Enough Constrained Enough Approximate any K-Lipschitz Function (universality). Never violates a prescribed K-Lipschitz constraint Universal Lipschitz Function Approximation

  8. Lipschitz via. Architectural Constraints Design an architecture that is: Expressive Enough Constrained Enough Approximate any K-Lipschitz Function (universality). Never violates a prescribed K-Lipschitz constraint Universal Lipschitz Function Approximation Main Contributions Propose an expressive Lipschitz constrained architecture that • Overcomes a previously unidentified limitation in prior art. • Can recover Universal Lipschitz function approximation.

  9. Lipschitz via. Architectural Constraints Design an architecture that is: Expressive Enough Constrained Enough Approximate any K-Lipschitz Function (universality). Never violates a prescribed K-Lipschitz constraint Universal Lipschitz Function Approximation Main Contributions Propose an expressive Lipschitz constrained architecture that • Overcomes a previously unidentified limitation in prior art. • Can recover Universal Lipschitz function approximation. Apply this architecture to • Train classifiers provably robust to adversarial perturbations. • Obtain tight estimates of Wasserstein distance.

  10. Lipschitz via. Architectural Constraints • Compose Lipschitz linear layers and Lipschitz activations. Activation Activation Activation Lipschitz Lipschitz Lipschitz Lipschitz Lipschitz Lipschitz … Linear Linear Linear x y Lipschitz Network

  11. • Compose Lipschitz linear layers and Lipschitz activations. x Lipschitz via. Architectural Constraints 1- Lipschitz Linear 1- Lipschitz Activation 1- Lipschitz Linear 1- Lipschitz Network 1- Lipschitz Activation … 1- Lipschitz Activation 1- Lipschitz Linear y

  12. Lipschitz via. Architectural Constraints First thing to try: approximate absolute value function. 1- Lipschitz 1- Lipschitz 1- Lipschitz 1- Lipschitz 1- Lipschitz Activation Activation Linear Linear Linear x y 1- Lipschitz Linear 1-Lipschitz Network

  13. Lipschitz via. Architectural Constraints First thing to try: approximate absolute value function. 1- Lipschitz 1- Lipschitz 1- Lipschitz Linear Linear Linear tanh tanh x y 1-Lipschitz Network

  14. Lipschitz via. Architectural Constraints First thing to try: approximate absolute value function. 1- Lipschitz 1- Lipschitz 1- Lipschitz Linear Linear Linear tanh tanh x y 1-Lipschitz Network

  15. Lipschitz via. Architectural Constraints First thing to try: approximate absolute value function. 1- Lipschitz 1- Lipschitz 1- Lipschitz Linear Linear Linear ReLU ReLU x y 1-Lipschitz Network

  16. Lipschitz via. Architectural Constraints What went wrong? 1- Lipschitz 1- Lipschitz 1- Lipschitz Linear Linear Linear ReLU ReLU x y 1-Lipschitz Network ???

  17. Lipschitz via. Architectural Constraints • Diagnosing the issue: Inspect gradient norms! 1- Lipschitz 1- Lipschitz 1- Lipschitz Linear Linear Linear ReLU ReLU x y Gradient Norms of Output wrt. Activations 1 Norm of Gradients After After After After output input ReLU W2 W1 ReLU

  18. Lipschitz via. Architectural Constraints • Diagnosing the issue: Inspect gradient norms! 1- Lipschitz 1- Lipschitz 1- Lipschitz Linear Linear Linear ReLU ReLU x y Gradient Norms of Output wrt. Activations 1 Norm of Gradients After After After After output input ReLU W2 W1 ReLU

  19. Lipschitz via. Architectural Constraints • Diagnosing the issue: Inspect gradient norms! 1- Lipschitz 1- Lipschitz 1- Lipschitz Linear Linear Linear ReLU ReLU x y Gradient Norms of Output wrt. Activations 1 Norm of Gradients After After After After output input ReLU W2 W1 ReLU

  20. Lipschitz via. Architectural Constraints • Diagnosing the issue: Inspect gradient norms! 1- Lipschitz 1- Lipschitz 1- Lipschitz Linear Linear Linear ReLU ReLU x y Gradient Norms of Output wrt. Activations 1 Problem: Norm of Architecture is losing Gradients gradient norm! After After After After output input ReLU W2 W1 ReLU

  21. Solution: Gradient Norm Preservation

  22. Solution: Gradient Norm Preservation • Activation: GroupSort

  23. Solution: Gradient Norm Preservation • Activation: GroupSort • Nonlinear, continuous and differentiable almost everywhere. • Gradient Norm Preserving

  24. Solution: Gradient Norm Preservation • Activation: GroupSort • Nonlinear, continuous and differentiable almost everywhere. • Gradient Norm Preserving • Linear Transformation: Described in the paper.

  25. Gradient Norm Preservation => Expressive Power

  26. Gradient Norm Preservation => Expressive Power

  27. Gradient Norm Preservation => Expressive Power

  28. Gradient Norm Preservation => Expressive Power

  29. Universal Lipschitz Function Approximation • Norm constrained GroupSort architectures can recover Universal Lipschitz Function Approximation! Subtleties and details in the paper/poster

  30. Wasserstein Distance Estimation • Much tighter estimates of Wasserstein distance • Training Wasserstein GANs (Arjovsky et. al. 2017)

  31. Provable Adversarial Robustness • L-inf constrained GroupSort networks + multi-class hinge loss gets us provable adversarial robustness with little hit to accuracy.

  32. Main Contributions Propose an Lipschitz GroupSort Networks that • Buy us expressivity via. Gradient norm preservation. • Can recover Universal Lipschitz function approximation. Apply GroupSort Networks to • Train classifiers provably robust to adversarial perturbations. • Obtain tight estimates of Wasserstein distance.

  33. So Sorting g Out Lipsch chitz Funct ction Approximation Cem Anil* James Lucas* Roger Grosse Pacific Ballroom Poster #15 (6:30 – 9:00 PM) *Equal contribution

Recommend


More recommend