BRIGEABLE ANR Chair in AI Jean-Christophe Pesquet Center for - PowerPoint PPT Presentation

Motivation Objective Workplan 1/10 BRIGEABLE ANR Chair in AI Jean-Christophe Pesquet Center for Visual Computing, OPIS Inria group, CentraleSup´ elec, University Paris-Saclay DATAIA - September 2020

Motivation Objective Workplan 2/10 Motivation BRIDinG thE gAp Between iterative proximaL methods and nEural networks Frank Rosenblatt Jean-Jacques Moreau (1928–1971) (1923–2014)

Motivation Objective Workplan 3/10 Gradient descent ✓ Basic optimization problem 1 2 � Hx − y � 2 minimize x ∈ C where C nonempty closed convex subset of R N , y ∈ R M , and H ∈ R M × N . ✓ Projected gradient algorithm x n − 1 − γ n H ⊤ ( Hx n − 1 − y ) � � ( ∀ n ∈ N \ { 0 } ) x n = proj C where γ n > 0 is the step-size

Motivation Objective Workplan 3/10 Gradient descent ✓ Projected gradient algorithm x n − 1 − γ n H ⊤ ( Hx n − 1 − y ) � � ( ∀ n ∈ N \ { 0 } ) x n = proj C = proj C ( W n x n − 1 + γ n H ⊤ y ) where γ n > 0 is the step-size and W n = Id − γ n H ⊤ H . γ 1 H ⊤ y γ m H ⊤ y x m x 0 W 1 proj C W m proj C + + · · ·

Motivation Objective Workplan 4/10 Feedforward NNs b 1 b m x W 1 R 1 W m R m T x + + · · · N EURAL NETWORK MODEL T = T m ◦ · · · ◦ T 1 T i : R N i − 1 → R N i : x �→ R i ( W i x + b i ) , where ( ∀ i ∈ { 1 , . . . , m } ) W i ∈ R N i × N i − 1 is a weight matrix, b i is a bias vector in R N i , and R i : R N i → R N i is an activation operator. R EMARK ( W i ) 1 � i � m can be convolutive operators

Motivation Objective Workplan 5/10 Link ✓ Proximity operator [Moreau, 1962] Let f : R N → ] −∞ , + ∞ ] be a lower-semicontinuous convex function. For every x ∈ R N , 1 2 � z − x � 2 + f ( z ) . prox f ( x ) = argmin z ∈ R N If f is the indicator function of C , then prox f = proj C . projected gradient algorithm � proximal gradient algorithm

Motivation Objective Workplan 5/10 Link ✓ Proximity operator [Moreau, 1962] Let f : R N → ] −∞ , + ∞ ] be a lower-semicontinuous convex function. For every x ∈ R N , 1 2 � z − x � 2 + f ( z ) . prox f ( x ) = argmin z ∈ R N If f is the indicator function of C , then prox f = proj C . projected gradient algorithm � proximal gradient algorithm ✓ Most of the activation operators are proximity operators

Motivation Objective Workplan 5/10 Link ✓ Most of the activation operators are proximity operators Example of the squashing function used in capsnets µ � x � 8 ( ∀ x ∈ R N ) Rx = 1 + � x � 2 x = prox φ ◦�·� x, µ = √ , 3 3 where  � | ξ | ( µ − | ξ | ) − ξ 2 | ξ |  � µ arctan µ − | ξ | − 2 , if | ξ | < µ ;      φ : ξ �→ µ ( π − µ ) if | ξ | = µ ; ,  2     + ∞ , otherwise. 

Motivation Objective Workplan 5/10 Link ✓ Most of the activation operators are proximity operators ✓ Difficulty

Motivation Objective Workplan 6/10 Objective B ETTER UNDERSTANDING OF NEURAL NETWORKS E XPLAINABILITY Under some assumptions, NNs are shown to solve variational inequalities [Combettes, Pesquet, 2020]

Motivation Objective Workplan 6/10 Objective B ETTER UNDERSTANDING OF NEURAL NETWORKS E XPLAINABILITY Under some assumptions, NNs are shown to solve variational inequalities [Combettes, Pesquet, 2020] R OBUSTNESS Sensitivity to adversarial perturbations [Szegedy et al., 2013]

Motivation Objective Workplan 7/10 Robustness issues ✓ Certifiability requirement for NNs in critically safe environments ✓ Deriving sharp Lipschitz constant estimates

Motivation Objective Workplan 7/10 Robustness issues Example of a NN for Air Traffic Management developed by Thales (CIFRE PhD thesis of K. Gupta) L IPSCHITZ STAR

Motivation Objective Workplan 7/10 Robustness issues Example of Automatic Gesture Recognition based on surface Electromyographic signals (PhD thesis of A. Neacsu in collaboration with Polithenica University of Bucharest) ✓ standard training accuracy = 99.78 %, but Lipschitz constant > 10 12

Motivation Objective Workplan 7/10 Robustness issues Example of Automatic Gesture Recognition based on surface Electromyographic signals (PhD thesis of A. Neacsu in collaboration with Polithenica University of Bucharest) ✓ standard training accuracy = 99.78 %, but Lipschitz constant > 10 12 ✓ proximal algorithm for training the network subject to a Lispchitz bound constraint Accuracy 75 % 80 % 85 % 90 % 95 % Lipschitz constant 0.36 0.46 0.82 2.68 3.38

Motivation Objective Workplan 8/10 Workplan ✓ WP1: Design of robust networks generalization of existing results, constrained training,... ✓ WP2: Proposition of new fixed point strategies link with plug and play methods, fixed point training,... ✓ WP3: Proximal view of Deep Dictionary Learning change of metrics, theoretical analysis,...

Motivation Objective Workplan 8/10 Workplan ✓ WP1: Design of robust networks generalization of existing results, constrained training,... ✓ WP2: Proposition of new fixed point strategies link with plug and play methods, fixed point training,... ✓ WP3: Proximal view of Deep Dictionary Learning change of metrics, theoretical analysis,... ... September 2020 → August 2024

Motivation Objective Workplan 9/10 Partners ✓ Industrial • Schneider Electric (WP 1) • GE Healthcare (WP 2) • IFPEN (WP 3) • Additional collaborations with Thales and Essilor

Motivation Objective Workplan 9/10 Partners ✓ Industrial • Schneider Electric (WP 1) • GE Healthcare (WP 2) • IFPEN (WP 3) • Additional collaborations with Thales and Essilor ✓ Academic • P . Combettes, NCSU (WP 1) • A. Repetti and Y. Wiaux, Heriot Watt University (WP 2) • H. Krim, NCSU (WP 3) • M. Kaaniche, Univ. Sorbonne Paris Nord (WP 3).

Motivation Objective Workplan 10/10 Some references P . L. Combettes and J.-C. Pesquet Proximal splitting methods in signal processing in Fixed-Point Algorithms for Inverse Problems in Science and Engineering , H. H. Bauschke, R. Burachik, P . L. Combettes, V. Elser, D. R. Luke, and H. Wolkowicz editors. Springer-Verlag, New York, pp. 185-212, 2011. C. Bertocchi, E. Chouzenoux, M.-C. Corbineau,J.-C. Pesquet, M. Prato Deep unfolding of a proximal interior point method for image restoration Inverse Problems, vol. 36, no 3, pp. 034005, Feb. 2020. P . L. Combettes and J.-C. Pesquet Lipschitz certificates for layered network structures driven by averaged activation operators SIAM Journal on Mathematics of Data Science, vol. 2, no. 2, pp. 529–557, June 2020. P . L. Combettes and J.-C. Pesquet Deep neural network structures solving variational inequalities Set-Valued and Variational Analysis, vol. 28, pp. 491–518, Sept. 2020. P . L. Combettes and J.-C. Pesquet Fixed point strategies in data science https://arxiv.org/abs/2008.02260, 2020.

BRIGEABLE ANR Chair in AI Jean-Christophe Pesquet Center for - PowerPoint PPT Presentation

Motivation Objective Workplan 1/10 BRIGEABLE ANR Chair in AI Jean-Christophe Pesquet Center for Visual Computing, OPIS Inria group, CentraleSup elec, University Paris-Saclay DATAIA - September 2020 Motivation Objective Workplan 2/10

Lecture 03 Dynamic Programming (Chapter 15) Rod Cutting (1) Notes and Questions I A company

A Case Study of Bank Branch Performance Using Linear Mixed Models Peggy Ng, Claudia Czado, Eike

Three approaches to recommender systems Martin Powers University of Minnesota - Morris Morris,

Rethinking Gauge Theory through Connes Noncommutative Geometry Chen Sun Virginia Tech

Magnetic Behaviour of RM 5 Intermetallic Compounds where R is a Rare- Earth and M=Ni or Co

On Recent Improvements in the Interior-Point Optimizer in MOSEK ISMP2015 14 July 2015

Building large-scale conic optimization models using MOSEK Fusion Andrea Cassioli Erling D.

t t tt r t

PageRank: Ranking of nodes in graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data

Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember

ADMM and Mirror Descent Geoff Gordon & Ryan Tibshirani (I am Aaditya Ramdas and I approve

Advanced Machine Learning - Exercise 3 Deep learning essentials Introduction Whats the plan?

Lecture 4.4: Finitely generated abelian groups Matthew Macauley Department of Mathematical

The moment-LP and moment-SOS approaches Jean B. Lasserre LAAS-CNRS and Institute of Mathematics,

Overview Motivation and Introduction Solving CMPs A heuristic Application Implementation

Basics of Numerical Optimization: Iterative Methods Ju Sun Computer Science & Engineering

Restarting accelerated gradient methods with a rough strong convexity estimate Olivier Fercoq

Linear Convergence of Randomized Primal-Dual Coordinate Method for Large-scale Linear Constrained

CS 744: PYTORCH Shivaram Venkataraman Fall 2020 ADMINISTRIVIA week ) ( Monday 10/5 next

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE & ICME

Classification of Poincar e inequalities and PI-rectifiablity Sylvester ErikssonBique

Distributed Frequency and Voltage Control of Islanded Microgrids John W. Simpson-Porco, Florian

On a Resampling Scheme for Empirical Copula Hideatsu Tsukahara (tsukahar@seijo.ac.jp) Dept of

An alternating variable metric inexact linesearch based algorithm for nonconvex nonsmooth

BRIGEABLE ANR Chair in AI Jean-Christophe Pesquet Center for - PowerPoint PPT Presentation

Motivation Objective Workplan 1/10 BRIGEABLE ANR Chair in AI Jean-Christophe Pesquet Center for Visual Computing, OPIS Inria group, CentraleSup elec, University Paris-Saclay DATAIA - September 2020 Motivation Objective Workplan 2/10

Lecture 03 Dynamic Programming (Chapter 15) Rod Cutting (1) Notes and Questions I A company

A Case Study of Bank Branch Performance Using Linear Mixed Models Peggy Ng, Claudia Czado, Eike

Three approaches to recommender systems Martin Powers University of Minnesota - Morris Morris,

Rethinking Gauge Theory through Connes Noncommutative Geometry Chen Sun Virginia Tech

Magnetic Behaviour of RM 5 Intermetallic Compounds where R is a Rare- Earth and M=Ni or Co

On Recent Improvements in the Interior-Point Optimizer in MOSEK ISMP2015 14 July 2015

Building large-scale conic optimization models using MOSEK Fusion Andrea Cassioli Erling D.

t t tt r t

PageRank: Ranking of nodes in graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data

Uses of duality Geoff Gordon &amp; Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember

ADMM and Mirror Descent Geoff Gordon &amp; Ryan Tibshirani (I am Aaditya Ramdas and I approve

Advanced Machine Learning - Exercise 3 Deep learning essentials Introduction Whats the plan?

Lecture 4.4: Finitely generated abelian groups Matthew Macauley Department of Mathematical

The moment-LP and moment-SOS approaches Jean B. Lasserre LAAS-CNRS and Institute of Mathematics,

Overview Motivation and Introduction Solving CMPs A heuristic Application Implementation

Basics of Numerical Optimization: Iterative Methods Ju Sun Computer Science &amp; Engineering

Restarting accelerated gradient methods with a rough strong convexity estimate Olivier Fercoq

Linear Convergence of Randomized Primal-Dual Coordinate Method for Large-scale Linear Constrained

CS 744: PYTORCH Shivaram Venkataraman Fall 2020 ADMINISTRIVIA week ) ( Monday 10/5 next

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE &amp; ICME

Classification of Poincar e inequalities and PI-rectifiablity Sylvester ErikssonBique

Distributed Frequency and Voltage Control of Islanded Microgrids John W. Simpson-Porco, Florian

On a Resampling Scheme for Empirical Copula Hideatsu Tsukahara (tsukahar@seijo.ac.jp) Dept of

An alternating variable metric inexact linesearch based algorithm for nonconvex nonsmooth

Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember

ADMM and Mirror Descent Geoff Gordon & Ryan Tibshirani (I am Aaditya Ramdas and I approve

Basics of Numerical Optimization: Iterative Methods Ju Sun Computer Science & Engineering

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE & ICME