Fast di fg erentiable so ru ing and ranking M.Blondel O. Teboul - PowerPoint PPT Presentation

Fast di fg erentiable   so ru ing and ranking M.Blondel O. Teboul Q. Berthet J. Djolonga March 12th, 2020

Background Proposed method Experimental results

DL as Di fg erentiable Programming

DL as Di fg erentiable Programming Deep learning increasingly synonymous with differentiable programming “People are now building a new kind of software by assembling networks of parameterized functional blocks (including loops and conditionals) and by training them from examples using some form of gradient-based optimization.” Yann LeCun, 2018 People are now building a new kind of software by assembling networks of parameterized functional blocks and by training them from examples using some form of gradient-based optimization . An increasingly large number of people are de�ning the networks procedurally in a data-dependent way (with loops and conditionals), allowing them to change dynamically as a function of the input data fed to them. Yann LeCun, 2018.

DL as Di fg erentiable Programming Deep learning increasingly synonymous with differentiable programming “People are now building a new kind of software by assembling networks of parameterized functional blocks (including loops and conditionals) and by training them from examples using some form of gradient-based optimization.” Yann LeCun, 2018 People are now building a new kind of software by assembling networks of parameterized functional blocks and by training them from examples using some form of gradient-based optimization . Many computer programming operations remain poorly differentiable An increasingly large number of people are de�ning the networks procedurally in a data-dependent way (with loops and conditionals), allowing them to change dynamically as a function of the input data fed to them. In this work, we focus on sorting and ranking . Yann LeCun, 2018.

So ru ing as subroutine in ML Trimmed k- NN   regression   (1) select neighbours   (2) majority vote ignore large errors Classifiers   select top- k activations MoM   Ranking / Sorting estimators O(n log n) Learning to rank   NDCG loss and others Descriptive statistics   Empirical distribution function   Rank-based statistics   quantile normalization data viewed as ranks Slide credit: Marco Cuturi

So ru ing θ 1 θ 4 θ 2 θ 3 Argsort (decending) σ ( θ ) = (2,4,3,1)

So ru ing θ 1 θ 4 θ 2 θ 3 Argsort (decending) σ ( θ ) = (2,4,3,1) s ( θ ) ≜ θ σ ( θ ) Sort (descending)

So ru ing θ 1 θ 4 θ 2 θ 3 Argsort (decending) σ ( θ ) = (2,4,3,1) s ( θ ) ≜ θ σ ( θ ) = ( θ 2 , θ 4 , θ 3 , θ 1 ) Sort (descending)

So ru ing θ 1 θ 4 θ 2 θ 3 Argsort (decending) σ ( θ ) = (2,4,3,1) s ( θ ) ≜ θ σ ( θ ) = ( θ 2 , θ 4 , θ 3 , θ 1 ) Sort (descending) piecewise linear induces   non-convexity

Ranking θ 1 θ 4 θ 2 θ 3 r ( θ ) ≜ σ − 1 ( θ ) Ranks

Ranking θ 1 θ 4 θ 2 θ 3 r ( θ ) ≜ σ − 1 ( θ ) = (4,1,3,2) Ranks

Ranking θ 1 θ 4 θ 2 θ 3 r ( θ ) ≜ σ − 1 ( θ ) = (4,1,3,2) Ranks discontinuous piecewise constant

Related work on so fu ranks Soft ranks : differentiable proxies to “hard” ranks

Related work on so fu ranks Soft ranks : differentiable proxies to “hard” ranks ● Random perturbation technique to compute expected ranks in O(n 3 ) time [Taylor et al., 2008]

Related work on so fu ranks Soft ranks : differentiable proxies to “hard” ranks ● Random perturbation technique to compute expected ranks in O(n 3 ) time [Taylor et al., 2008] ● Using pairwise comparisons in O(n 2 ) time [Qin et al., 2010] r i ( θ ) ≜ 1 + ∑ 1 [ θ i < θ j ] i ≠ j

Related work on so fu ranks Soft ranks : differentiable proxies to “hard” ranks ● Random perturbation technique to compute expected ranks in O(n 3 ) time [Taylor et al., 2008] ● Using pairwise comparisons in O(n 2 ) time [Qin et al., 2010] r i ( θ ) ≜ 1 + ∑ 1 [ θ i < θ j ] i ≠ j ● Regularized optimal transport approach and Sinkhorn in   O(T n 2 ) time [Cuturi et al., 2019]

Related work on so fu ranks Soft ranks : differentiable proxies to “hard” ranks ● Random perturbation technique to compute expected ranks in O(n 3 ) time [Taylor et al., 2008] ● Using pairwise comparisons in O(n 2 ) time [Qin et al., 2010] r i ( θ ) ≜ 1 + ∑ 1 [ θ i < θ j ] i ≠ j ● Regularized optimal transport approach and Sinkhorn in   O(T n 2 ) time [Cuturi et al., 2019] None of these works achieves O(n log n) complexity

Background Proposed method Experimental results

Our proposal

Our proposal • Differentiable (soft) relaxations of s( θ ) and r( θ )

Our proposal • Differentiable (soft) relaxations of s( θ ) and r( θ ) • Two formulations: L2 and Entropy regularised

Our proposal • Differentiable (soft) relaxations of s( θ ) and r( θ ) • Two formulations: L2 and Entropy regularised • “Convexification” effect

Our proposal • Differentiable (soft) relaxations of s( θ ) and r( θ ) • Two formulations: L2 and Entropy regularised • “Convexification” effect • Exact computation in O(n log n) time (forward pass)

Our proposal • Differentiable (soft) relaxations of s( θ ) and r( θ ) • Two formulations: L2 and Entropy regularised • “Convexification” effect • Exact computation in O(n log n) time (forward pass) • Exact multiplication with the Jacobian in O(n) time   without unrolling (backward pass)

Strategy outline

Strategy outline 1. Express s( θ ) and r( θ ) as linear programs (LP) over convex polytopes

Strategy outline 1. Express s( θ ) and r( θ ) as linear programs (LP) over convex polytopes → Turn algorithmic function into an optimization problem

Strategy outline 1. Express s( θ ) and r( θ ) as linear programs (LP) over convex polytopes → Turn algorithmic function into an optimization problem 2. Introduce regularization in the LP

Strategy outline 1. Express s( θ ) and r( θ ) as linear programs (LP) over convex polytopes → Turn algorithmic function into an optimization problem 2. Introduce regularization in the LP → Turn LP into a projection onto convex polytopes

Strategy outline 1. Express s( θ ) and r( θ ) as linear programs (LP) over convex polytopes → Turn algorithmic function into an optimization problem 2. Introduce regularization in the LP → Turn LP into a projection onto convex polytopes 3. Derive algorithm for computing the projection

Strategy outline 1. Express s( θ ) and r( θ ) as linear programs (LP) over convex polytopes → Turn algorithmic function into an optimization problem 2. Introduce regularization in the LP → Turn LP into a projection onto convex polytopes 3. Derive algorithm for computing the projection → Ideally, the projection shoud be computable in the same cost as the original function…

Strategy outline 1. Express s( θ ) and r( θ ) as linear programs (LP) over convex polytopes → Turn algorithmic function into an optimization problem 2. Introduce regularization in the LP → Turn LP into a projection onto convex polytopes 3. Derive algorithm for computing the projection → Ideally, the projection shoud be computable in the same cost as the original function… 4. Derive algorithm for differentiating the projection

Strategy outline 1. Express s( θ ) and r( θ ) as linear programs (LP) over convex polytopes → Turn algorithmic function into an optimization problem 2. Introduce regularization in the LP → Turn LP into a projection onto convex polytopes 3. Derive algorithm for computing the projection → Ideally, the projection shoud be computable in the same cost as the original function… 4. Derive algorithm for differentiating the projection → Could be challenging (argmin differentiation problem)

Strategy outline Cuturi et al. [2019] This work

Strategy outline Cuturi et al. [2019] This work 1. LP Birkhoff polytope Permutahedron (2 , 3 , 1) ϕ ((2 , 3 , 1)) ϕ ((1 , 3 , 2)) (1 , 3 , 2) (3 , 2 , 1) ϕ ((3 , 2 , 1)) 𝒬 ⊂ ℝ n ℬ ⊂ ℝ n × n ϕ ((1 , 2 , 3)) (1 , 2 , 3) (3 , 1 , 2) ϕ ((3 , 1 , 2)) ϕ ((2 , 1 , 3)) (2 , 1 , 3)

Strategy outline Cuturi et al. [2019] This work 1. LP Birkhoff polytope Permutahedron (2 , 3 , 1) ϕ ((2 , 3 , 1)) ϕ ((1 , 3 , 2)) (1 , 3 , 2) (3 , 2 , 1) ϕ ((3 , 2 , 1)) 𝒬 ⊂ ℝ n ℬ ⊂ ℝ n × n ϕ ((1 , 2 , 3)) (1 , 2 , 3) (3 , 1 , 2) ϕ ((3 , 1 , 2)) ϕ ((2 , 1 , 3)) (2 , 1 , 3) 2. Regularization Entropy L2 or Entropy

Strategy outline Cuturi et al. [2019] This work 1. LP Birkhoff polytope Permutahedron (2 , 3 , 1) ϕ ((2 , 3 , 1)) ϕ ((1 , 3 , 2)) (1 , 3 , 2) (3 , 2 , 1) ϕ ((3 , 2 , 1)) 𝒬 ⊂ ℝ n ℬ ⊂ ℝ n × n ϕ ((1 , 2 , 3)) (1 , 2 , 3) (3 , 1 , 2) ϕ ((3 , 1 , 2)) ϕ ((2 , 1 , 3)) (2 , 1 , 3) 2. Regularization Entropy L2 or Entropy Pool Adjacent   3. Computation Sinkhorn Violators (PAV)

Strategy outline Cuturi et al. [2019] This work 1. LP Birkhoff polytope Permutahedron (2 , 3 , 1) ϕ ((2 , 3 , 1)) ϕ ((1 , 3 , 2)) (1 , 3 , 2) (3 , 2 , 1) ϕ ((3 , 2 , 1)) 𝒬 ⊂ ℝ n ℬ ⊂ ℝ n × n ϕ ((1 , 2 , 3)) (1 , 2 , 3) (3 , 1 , 2) ϕ ((3 , 1 , 2)) ϕ ((2 , 1 , 3)) (2 , 1 , 3) 2. Regularization Entropy L2 or Entropy Pool Adjacent   3. Computation Sinkhorn Violators (PAV) Backprop through Differentiate 4. Differentiation Sinkhorn iterates PAV solution

Fast di fg erentiable so ru ing and ranking M.Blondel O. Teboul - PowerPoint PPT Presentation

Fast di fg erentiable so ru ing and ranking M.Blondel O. Teboul Q. Berthet J. Djolonga March 12th, 2020 Background Proposed method Experimental results Background Proposed method Experimental results DL as Di fg erentiable Programming

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Spelling, Punctuation and Grammar Suffixes -ing Year One SPaG | Suffixes -ing Suffixes Suffixes

Construction of di ff erentiable functions between Banach spaces. joint work with P. Hajek, then

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

KNN and re ranking models for English KNN and re-ranking models for English patent mining at

Tutorial Ranking Mechanisms in Games Vanessa Volz and Boris Naujoks CIG 2018, Maastricht

Ranking Related News Predictions Nattiya Kanhabua 1 , Roi Blanco 2 and Michael Matthews 2 1

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

Lecture 3: Improving Ranking with Lecture 3: Improving Ranking with Behavior Data Eugene

Online Ranking Combination Erzs ebet Frig o Institute for Computer Science and Control (MTA

1 Similarity ranking: example Weighted scoring with linear combination A simple weighted

Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking

EE456 Digital Communications Professor Ha Nguyen September 2016 EE456 Digital

Energy balance The Basics of Transport Phenomena Peter

6. Initial consonant clusters parrot speak = parrots beak 5. Semi-vowels w and y CLOSED

To examine options for alignment, innovation and development of UKPHRs standards, practices and

Update in Management of Skin and So1 Tissue Infec7ons

Questions are worth 6 points, except as shown For questions 1-5: The P-V diagram above is for

1 & 2 Samuel Series Lesson #084 February 14, 2017 Dean Bible Ministries

Stability analysis of LPV systems with piecewise differentiable parameters Corentin Briat and

Fast di fg erentiable so ru ing and ranking M.Blondel O. Teboul - PowerPoint PPT Presentation

Fast di fg erentiable so ru ing and ranking M.Blondel O. Teboul Q. Berthet J. Djolonga March 12th, 2020 Background Proposed method Experimental results Background Proposed method Experimental results DL as Di fg erentiable Programming

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Spelling, Punctuation and Grammar Suffixes -ing Year One SPaG | Suffixes -ing Suffixes Suffixes

Construction of di ff erentiable functions between Banach spaces. joint work with P. Hajek, then

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

KNN and re ranking models for English KNN and re-ranking models for English patent mining at

Tutorial Ranking Mechanisms in Games Vanessa Volz and Boris Naujoks CIG 2018, Maastricht

Ranking Related News Predictions Nattiya Kanhabua 1 , Roi Blanco 2 and Michael Matthews 2 1

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

Lecture 3: Improving Ranking with Lecture 3: Improving Ranking with Behavior Data Eugene

Online Ranking Combination Erzs ebet Frig o Institute for Computer Science and Control (MTA

1 Similarity ranking: example Weighted scoring with linear combination A simple weighted

Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking

EE456 Digital Communications Professor Ha Nguyen September 2016 EE456 Digital

Energy balance The Basics of Transport Phenomena Peter

6. Initial consonant clusters parrot speak = parrots beak 5. Semi-vowels w and y CLOSED

To examine options for alignment, innovation and development of UKPHRs standards, practices and

Update in Management of Skin and So1 Tissue Infec7ons

Questions are worth 6 points, except as shown For questions 1-5: The P-V diagram above is for

1 &amp; 2 Samuel Series Lesson #084 February 14, 2017 Dean Bible Ministries

Stability analysis of LPV systems with piecewise differentiable parameters Corentin Briat and

1 & 2 Samuel Series Lesson #084 February 14, 2017 Dean Bible Ministries