So Sorting g Out Lipsch chitz Funct ction Approximation Cem - PowerPoint PPT Presentation

So Sorting g Out Lipsch chitz Funct ction Approximation Cem Anil* James Lucas* Roger Grosse Pacific Ballroom Poster #15 (6:30 – 9:00 PM) *Equal contribution

Goal Train neural networks subject to a strict Lipschitz constraint while maintaining expressive power.

Goal Train neural networks subject to a strict Lipschitz constraint while maintaining expressive power. Norm of Lipschitz Norm of Input Change Constant Output Change

Goal Train neural networks subject to a strict Lipschitz constraint while maintaining expressive power. Norm of Gradient Lipschitz Norm of Lipschitz Input Change Constant Norm Output Change Constant

Why Care? • Provable Adversarial Robustness (Cisse et. al., 2018) • Wasserstein Distance Estimation (Arjovsky et. al., 2017) • Training Generative Models (Arjovsky et. al., 2017) (Behrmann et. al., 2019) • Computing Generalization Bounds (Bartlett et. al., 1998,2017) • Stabilizing Neural Net Training (Xiao et. al., 2018) (Odena et. al., 2018) • ...

Lipschitz via. Architectural Constraints Design an architecture that is: Expressive Enough Constrained Enough Approximate any K-Lipschitz Function (universality). Never violates a prescribed K-Lipschitz constraint Universal Lipschitz Function Approximation

Lipschitz via. Architectural Constraints Design an architecture that is: Expressive Enough Constrained Enough Approximate any K-Lipschitz Function (universality). Never violates a prescribed K-Lipschitz constraint Universal Lipschitz Function Approximation Main Contributions Propose an expressive Lipschitz constrained architecture that • Overcomes a previously unidentified limitation in prior art. • Can recover Universal Lipschitz function approximation.

Lipschitz via. Architectural Constraints Design an architecture that is: Expressive Enough Constrained Enough Approximate any K-Lipschitz Function (universality). Never violates a prescribed K-Lipschitz constraint Universal Lipschitz Function Approximation Main Contributions Propose an expressive Lipschitz constrained architecture that • Overcomes a previously unidentified limitation in prior art. • Can recover Universal Lipschitz function approximation. Apply this architecture to • Train classifiers provably robust to adversarial perturbations. • Obtain tight estimates of Wasserstein distance.

Lipschitz via. Architectural Constraints • Compose Lipschitz linear layers and Lipschitz activations. Activation Activation Activation Lipschitz Lipschitz Lipschitz Lipschitz Lipschitz Lipschitz … Linear Linear Linear x y Lipschitz Network

• Compose Lipschitz linear layers and Lipschitz activations. x Lipschitz via. Architectural Constraints 1- Lipschitz Linear 1- Lipschitz Activation 1- Lipschitz Linear 1- Lipschitz Network 1- Lipschitz Activation … 1- Lipschitz Activation 1- Lipschitz Linear y

Lipschitz via. Architectural Constraints First thing to try: approximate absolute value function. 1- Lipschitz 1- Lipschitz 1- Lipschitz 1- Lipschitz 1- Lipschitz Activation Activation Linear Linear Linear x y 1- Lipschitz Linear 1-Lipschitz Network

Lipschitz via. Architectural Constraints First thing to try: approximate absolute value function. 1- Lipschitz 1- Lipschitz 1- Lipschitz Linear Linear Linear tanh tanh x y 1-Lipschitz Network

Lipschitz via. Architectural Constraints First thing to try: approximate absolute value function. 1- Lipschitz 1- Lipschitz 1- Lipschitz Linear Linear Linear ReLU ReLU x y 1-Lipschitz Network

Lipschitz via. Architectural Constraints What went wrong? 1- Lipschitz 1- Lipschitz 1- Lipschitz Linear Linear Linear ReLU ReLU x y 1-Lipschitz Network ???

Lipschitz via. Architectural Constraints • Diagnosing the issue: Inspect gradient norms! 1- Lipschitz 1- Lipschitz 1- Lipschitz Linear Linear Linear ReLU ReLU x y Gradient Norms of Output wrt. Activations 1 Norm of Gradients After After After After output input ReLU W2 W1 ReLU

Lipschitz via. Architectural Constraints • Diagnosing the issue: Inspect gradient norms! 1- Lipschitz 1- Lipschitz 1- Lipschitz Linear Linear Linear ReLU ReLU x y Gradient Norms of Output wrt. Activations 1 Problem: Norm of Architecture is losing Gradients gradient norm! After After After After output input ReLU W2 W1 ReLU

Solution: Gradient Norm Preservation

Solution: Gradient Norm Preservation • Activation: GroupSort

Solution: Gradient Norm Preservation • Activation: GroupSort • Nonlinear, continuous and differentiable almost everywhere. • Gradient Norm Preserving

Solution: Gradient Norm Preservation • Activation: GroupSort • Nonlinear, continuous and differentiable almost everywhere. • Gradient Norm Preserving • Linear Transformation: Described in the paper.

Gradient Norm Preservation => Expressive Power

Universal Lipschitz Function Approximation • Norm constrained GroupSort architectures can recover Universal Lipschitz Function Approximation! Subtleties and details in the paper/poster

Wasserstein Distance Estimation • Much tighter estimates of Wasserstein distance • Training Wasserstein GANs (Arjovsky et. al. 2017)

Provable Adversarial Robustness • L-inf constrained GroupSort networks + multi-class hinge loss gets us provable adversarial robustness with little hit to accuracy.

Main Contributions Propose an Lipschitz GroupSort Networks that • Buy us expressivity via. Gradient norm preservation. • Can recover Universal Lipschitz function approximation. Apply GroupSort Networks to • Train classifiers provably robust to adversarial perturbations. • Obtain tight estimates of Wasserstein distance.

So Sorting g Out Lipsch chitz Funct ction Approximation Cem Anil* James Lucas* Roger Grosse Pacific Ballroom Poster #15 (6:30 – 9:00 PM) *Equal contribution

So Sorting g Out Lipsch chitz Funct ction Approximation Cem - PowerPoint PPT Presentation

So Sorting g Out Lipsch chitz Funct ction Approximation Cem Anil* James Lucas* Roger Grosse Pacific Ballroom Poster #15 (6:30 9:00 PM) *Equal contribution Goal Train neural networks subject to a strict Lipschitz constraint while

Lip ipsch chitz itz an and ou d oute ter bi r bi-Lip ipschi chitz tz ex exte tendabi

Set s and Funct ions Set s and Funct ions Reading f or COMP 364 and CSI T571 Reading f or COMP

SORTING Review of Sorting Merge Sort Sets sorting 1 Sorting Algorithms

Overview/Questions What is sorting? Why does sorting matter? How is sorting

Sorting Lower Bound Sorting Lower Bound 1 Comparison-Based Sorting (10.4) Many sorting

Sorting Insertion sort Bubble sort Divide and conquer sorting Sorting Last time: introduction

The Foundat ions: Logic and The Foundat ions: Logic and The Foundat ions: Logic and Proof , Set

Sorting with Pop Stacks Stack sorting Pop stack sorting 1-pop-stack sortability 2-pop-stack

Sorting Sorting used as a step in many algorithms Savitch Chapter 7.4 Sorting algorithms

Sorting Sorting as a tool Sorting problem: Given a list a with n elements possessing a There are

Sorting Sorting: to arrange data in some sequential order Sorting occurs as a part in

Chapter 7 External Sorting Sorting Tables Larger Than Main Memory Query Processing Sorting

Sorting Algorithms Introduction Sorting Problem Sorting Problem Given a sequence A = a 1 , .

6. Approximation and fitting norm approximation least-norm problems regularized

Sorting Algorithms CENG 707 Data Structures and Algorithms Sorting Sorting is a process

Sorting Algorithms October 18, 2017 CMPE 250 Sorting Algorithms October 18, 2017 1 / 74

Chapter 2 Section 3 MA1032 Data, Functions & Graphs Sidney Butler Michigan Technological

Flow networks, flow, maximum flow Can interpret directed graph as flow network. Material

Intersecting two planes in R 3 . Suppose we have two planes with normal vectors n 1 , n 2

Chapter 4. Markov Chains Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning Yuting Wei Carnegie

Discrete time Markov chains Today: Short recap of probability theory Markov chain

Rotor-routing, smoothing kernels, and reduction of variance: breaking the O(1/n) barrier Jim

Physics 2D Lecture Slides Lecture 9 : Jan 19th 2005 Vivek Sharma UCSD Physics Definition

So Sorting g Out Lipsch chitz Funct ction Approximation Cem - PowerPoint PPT Presentation

So Sorting g Out Lipsch chitz Funct ction Approximation Cem Anil* James Lucas* Roger Grosse Pacific Ballroom Poster #15 (6:30 9:00 PM) *Equal contribution Goal Train neural networks subject to a strict Lipschitz constraint while

Lip ipsch chitz itz an and ou d oute ter bi r bi-Lip ipschi chitz tz ex exte tendabi

Set s and Funct ions Set s and Funct ions Reading f or COMP 364 and CSI T571 Reading f or COMP

SORTING Review of Sorting Merge Sort Sets sorting 1 Sorting Algorithms

Overview/Questions What is sorting? Why does sorting matter? How is sorting

Sorting Lower Bound Sorting Lower Bound 1 Comparison-Based Sorting (10.4) Many sorting

Sorting Insertion sort Bubble sort Divide and conquer sorting Sorting Last time: introduction

The Foundat ions: Logic and The Foundat ions: Logic and The Foundat ions: Logic and Proof , Set

Sorting with Pop Stacks Stack sorting Pop stack sorting 1-pop-stack sortability 2-pop-stack

Sorting Sorting used as a step in many algorithms Savitch Chapter 7.4 Sorting algorithms

Sorting Sorting as a tool Sorting problem: Given a list a with n elements possessing a There are

Sorting Sorting: to arrange data in some sequential order Sorting occurs as a part in

Chapter 7 External Sorting Sorting Tables Larger Than Main Memory Query Processing Sorting

Sorting Algorithms Introduction Sorting Problem Sorting Problem Given a sequence A = a 1 , .

6. Approximation and fitting norm approximation least-norm problems regularized

Sorting Algorithms CENG 707 Data Structures and Algorithms Sorting Sorting is a process

Sorting Algorithms October 18, 2017 CMPE 250 Sorting Algorithms October 18, 2017 1 / 74

Chapter 2 Section 3 MA1032 Data, Functions &amp; Graphs Sidney Butler Michigan Technological

Flow networks, flow, maximum flow Can interpret directed graph as flow network. Material

Intersecting two planes in R 3 . Suppose we have two planes with normal vectors n 1 , n 2

Chapter 4. Markov Chains Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning Yuting Wei Carnegie

Discrete time Markov chains Today: Short recap of probability theory Markov chain

Rotor-routing, smoothing kernels, and reduction of variance: breaking the O(1/n) barrier Jim

Physics 2D Lecture Slides Lecture 9 : Jan 19th 2005 Vivek Sharma UCSD Physics Definition

Chapter 2 Section 3 MA1032 Data, Functions & Graphs Sidney Butler Michigan Technological