Randomized methods for machine learning David Lopez-Paz, FAIR May - PowerPoint PPT Presentation

Randomized methods for machine learning David Lopez-Paz, FAIR May 17, 2016 http://tinyurl.com/randomized-practical

Some random examples Building atomic bombs Truncated SVD Dimensionality reduction Kernel methods for big data Nonlinear component analysis Dependence measurament Low-dimensional kernel mean embeddings

It all starts with a big bang...

... By some smart people.

The problem � f ( x )d x

The problem, simplified � p ( x ) f ( x )d x

The solution m p ( x ) f ( x )d x ≈ 1 � � f ( x i ) , x i ∼ p m i =1

From [Eck87].

Exercise

Example: computing π O ( m − 1 / 2 ) convergence regardless of the dimensionality of x ! Why?

Monte Carlo model selection Still cross-validating over grids? From [BB12], a.k.a. “The rule of 59“ [NLB16]: P ( F µ (min( x 1 , . . . , x T )) ≤ α ) = 1 − (1 − α ) ⊤ .

Truncated SVD From research.facebook.com/blog/fast-randomized-svd/ . Complexity of O ( mn 2 )! [GVL12]

Randomized SVD [HMT11] Computation of r -rank SVD of A ∈ R m × n 1. Compute a column-ort. Q ∈ R m × ( r + p ) s.t. A ≈ QQ ⊤ A 2. Construct B = Q ⊤ A , now B ∈ R ( r + p ) × n 3. Compute the SVD of B = S Σ V ⊤ O (( r + p ) n 2 )) 4. Note A ≈ QQ ⊤ A = QB = Q ( S Σ V ⊤ ) 5. Taking U = QS , return the SVD A = U Σ V ⊤

Randomized SVD [HMT11] Computation of r -rank SVD of A ∈ R m × n 1. Compute a column-ort. Q ∈ R m × ( r + p ) s.t. A ≈ QQ ⊤ A 2. Construct B = Q ⊤ A , now B ∈ R ( r + p ) × n 3. Compute the SVD of B = S Σ V ⊤ O (( r + p ) n 2 )) 4. Note A ≈ QQ ⊤ A = QB = Q ( S Σ V ⊤ ) 5. Taking U = QS , return the SVD A = U Σ V ⊤ Hey, but how do I compute Q ?

Randomized SVD [HMT11] Computation of r -rank SVD of A ∈ R m × n 1. Compute a column-ort. Q ∈ R m × ( r + p ) s.t. A ≈ QQ ⊤ A 2. Construct B = Q ⊤ A , now B ∈ R ( r + p ) × n 3. Compute the SVD of B = S Σ V ⊤ O (( r + p ) n 2 )) 4. Note A ≈ QQ ⊤ A = QB = Q ( S Σ V ⊤ ) 5. Taking U = QS , return the SVD A = U Σ V ⊤ Hey, but how do I compute Q ? At random! :)

Randomized SVD [HMT11] Computation of r -rank SVD of A ∈ R m × n 1. Compute a column-ort. Q ∈ R m × ( r + p ) s.t. A ≈ QQ ⊤ A 2. Construct B = Q ⊤ A , now B ∈ R ( r + p ) × n 3. Compute the SVD of B = S Σ V ⊤ O (( r + p ) n 2 )) 4. Note A ≈ QQ ⊤ A = QB = Q ( S Σ V ⊤ ) 5. Taking U = QS , return the SVD A = U Σ V ⊤ Hey, but how do I compute Q ? At random! :) 1. Take Y = A Ω, where Ω i,j ∼ N (0 , 1). 2. Ω ∈ R n × ( r + p ) , but allows efficient multiplication 3. Compute Y = QR O ( m ( r + p ) 2 ) 4. Return Q ∈ R m × ( r + p )

Exercise

Dimensionality reduction Random projections offer fast and efficient dimensionality reduction. w ∈ R 40500 × 100 x 1 y 1 δ δ (1 ± ǫ ) y 2 x 2 w ∈ R 40500 × 100 R 40500 R 100 (1 − ǫ ) � x 1 − x 2 � 2 ≤ � y 1 − y 2 � 2 ≤ (1 + ǫ ) � x 1 − x 2 � 2 This result is formalized in the Johnson-Lindenstrauss Lemma

The Johnson-Lindenstrauss Lemma The proof is one example of Erd¨ os’ probabilistic method (1947). Paul Erd¨ os Joram Lindenstrauss William Johnson 1913-1996 1936-2012 1944- § 12.5 of Foundations of Machine Learning (Mohri et al., 2012)

Auxiliary Lemma 1 Let Q be a random variable following a χ 2 distribution with k degrees of freedom. Then, for any 0 < ǫ < 1 / 2: Pr[(1 − ǫ ) k ≤ Q ≤ (1 + ǫ ) k ] ≥ 1 − 2 e − ( ǫ 2 − ǫ 3 ) k/ 4 . Proof:

Auxiliary Lemma 1 Let Q be a random variable following a χ 2 distribution with k degrees of freedom. Then, for any 0 < ǫ < 1 / 2: Pr[(1 − ǫ ) k ≤ Q ≤ (1 + ǫ ) k ] ≥ 1 − 2 e − ( ǫ 2 − ǫ 3 ) k/ 4 . � � Pr[ X ≥ a ] ≤ E [ X ] Proof: start with Markov’s inequality : a Pr[ Q ≥ (1+ ǫ ) k ] =

Auxiliary Lemma 1 Let Q be a random variable following a χ 2 distribution with k degrees of freedom. Then, for any 0 < ǫ < 1 / 2: Pr[(1 − ǫ ) k ≤ Q ≤ (1 + ǫ ) k ] ≥ 1 − 2 e − ( ǫ 2 − ǫ 3 ) k/ 4 . � � Pr[ X ≥ a ] ≤ E [ X ] Proof: start with Markov’s inequality : a Pr[ Q ≥ (1+ ǫ ) k ] = Pr[ e λQ ≥ e λ (1+ ǫ ) k ] ≤ E [ e λQ ] e λ (1+ ǫ ) k =

Auxiliary Lemma 1 Let Q be a random variable following a χ 2 distribution with k degrees of freedom. Then, for any 0 < ǫ < 1 / 2: Pr[(1 − ǫ ) k ≤ Q ≤ (1 + ǫ ) k ] ≥ 1 − 2 e − ( ǫ 2 − ǫ 3 ) k/ 4 . � � Pr[ X ≥ a ] ≤ E [ X ] Proof: start with Markov’s inequality : a Pr[ Q ≥ (1+ ǫ ) k ] = Pr[ e λQ ≥ e λ (1+ ǫ ) k ] ≤ E [ e λQ ] e λ (1+ ǫ ) k = (1 − 2 λ ) − k/ 2 , e λ (1+ ǫ ) k where E [ e λQ ] = (1 − 2 λ ) − k/ 2 is the mgf of a χ 2 distr., λ < 1 2 .

Auxiliary Lemma 1 Let Q be a random variable following a χ 2 distribution with k degrees of freedom. Then, for any 0 < ǫ < 1 / 2: Pr[(1 − ǫ ) k ≤ Q ≤ (1 + ǫ ) k ] ≥ 1 − 2 e − ( ǫ 2 − ǫ 3 ) k/ 4 . � � Pr[ X ≥ a ] ≤ E [ X ] Proof: start with Markov’s inequality : a Pr[ Q ≥ (1+ ǫ ) k ] = Pr[ e λQ ≥ e λ (1+ ǫ ) k ] ≤ E [ e λQ ] e λ (1+ ǫ ) k = (1 − 2 λ ) − k/ 2 , e λ (1+ ǫ ) k where E [ e λQ ] = (1 − 2 λ ) − k/ 2 is the mgf of a χ 2 distr., λ < 1 2 . ǫ To tight the bound we minimize the rhs with λ = 2(1+ ǫ ) : ǫ 1+ ǫ ) − k/ 2 � k/ 2 (1 − = (1 + ǫ ) k/ 2 � 1 + ǫ Pr[ Q ≥ (1 + ǫ ) k ] ≤ = . e ǫk/ 2 ( e ǫ ) k/ 2 e ǫ

Auxiliary Lemma 1 Using 1 + ǫ ≤ e ǫ − ( ǫ 2 − ǫ 3 ) / 2 yields    e ǫ − ǫ 2 − ǫ 3 � k/ 2 � 1 + ǫ 2  = e − k 4 ( ǫ 2 − ǫ 3 ) . Pr[ Q ≥ (1 + ǫ ) k ] ≤ ≤ e ǫ e ǫ Pr[ Q ≤ (1 − ǫ ) k ] is bounded similarly, and the lemma follows by union bound.

Auxiliary Lemma 2 Let x ∈ R N , k < N and A ∈ R k × N with A ij ∼ N (0 , 1). Then, for any 0 ≤ ǫ ≤ 1 / 2: Pr[(1 − ǫ ) � x � 2 ≤ � 1 Ax � 2 ≤ (1 + ǫ ) � x � 2 ] ≥ 1 − 2 e − ( ǫ 2 − ǫ 3 ) k/ 4 . √ k Proof: let ˆ x = Ax . Then, E [ˆ x j ] = 0, and � N � N  � 2  N � x 2 �  = E � A 2 ji x 2 � x 2 i = � x � 2 . E [ˆ j ] = E A ji x i =  i i i i x j / � x � ∼ N (0 , 1). Then, Q = � k i T 2 j ∼ χ 2 Note that T j = ˆ k . Remember the previous lemma?

Auxiliary Lemma 2 x j / � x � ∼ N (0 , 1), Q = � k i T 2 j ∼ χ 2 Remember: ˆ x = Ax , T j = ˆ k : Pr[(1 − ǫ ) � x � 2 ≤ � 1 Ax � 2 ≤ (1 + ǫ ) � x � 2 ] = √ k x � 2 Pr[(1 − ǫ ) � x � 2 ≤ � ˆ ≤ (1 + ǫ ) � x � 2 ] = k x � 2 Pr[(1 − ǫ ) k ≤ � ˆ � x � 2 ≤ (1 + ǫ ) k ] = � k � � T 2 Pr (1 − ǫ ) k ≤ j ≤ (1 + ǫ ) k = i Pr [(1 − ǫ ) k ≤ Q ≤ (1 + ǫ ) k ] ≥ 1 − 2 e − ( ǫ 2 − ǫ 3 ) k/ 4

The Johnson-Lindenstrauss Lemma For any 0 < ǫ < 1 / 2 and any integer m > 4, let k = 20 log m . ǫ 2 Then, for any set V of m points in R N ∃ f : R N → R k s.t. ∀ u , v ∈ V : (1 − ǫ ) � u − v � 2 ≤ � f ( u ) − f ( v ) � 2 ≤ (1 + ǫ ) � u − v � 2 . 1 k A , A ∈ R k × N , k < N and A ij ∼ N (0 , 1). Proof: Let f = √ ◮ Apply previous lemma with x = u − v to lower bound the success probability by 1 − 2 e − ( ǫ 2 − ǫ 3 ) k/ 4 . ◮ Union bound over the m 2 pairs in V with k = 20 log m and ǫ 2 ǫ < 1 / 2 to obtain: Pr[ success ] ≥ 1 − 2 m 2 e − ( ǫ 2 − ǫ 3 ) k/ 4 = 1 − 2 m 5 ǫ − 3 > 1 − 2 m − 1 / 2 > 0 .

Exercise

The kernel trick k ( x, x ′ ) = � φ ( x ) , φ ( x ′ ) � H , f ( x )

The kernel trick? k ( x, x ′ ) = � φ ( x ) , φ ( x ′ ) � H , n � f ( x ) ≈ α i k ( x, x i ) . i =1

The kernel trap! To compute { α i } n i =1 , construct the n × n monster: k ( x i , x j ) · K

Mercer’s theorem Theorem (Mercer’s condition [Mer09]) Under mild technical assumptions, k admit a representation ∞ � k ( x, x ′ ) = λ j φ λ j ( x ) φ λ j ( x ′ ) . j =1 If � λ � 1 := � i | λ j | ≤ ∞ , we can cast the previous as k ( x, x ′ ) = � λ � 1 φ λ ( x ) φ λ ( x ′ ) � � . E λ ∼ p ( λ ) Any ideas? :)

Randomized methods for machine learning David Lopez-Paz, FAIR May - PowerPoint PPT Presentation

Randomized methods for machine learning David Lopez-Paz, FAIR May 17, 2016 http://tinyurl.com/randomized-practical Some random examples Building atomic bombs Truncated SVD Dimensionality reduction Kernel methods for big data Nonlinear

Randomized Algorithms Randomized Algorithms Two Types of Randomized Algorithms Two Types of

CSC373 Week 11: Randomized Algorithms 373F19 - Nisarg Shah & Karan Singh 1 Randomized

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

To Randomize or Not To Randomize: Space Optimal Summaries for Hyperlink Analysis Tam as

under Modular Updates Shachar Lovett (UCSD) Kaave Hosseini (UCSD CMU), Grigory Yaroslavtsev

How to Make ASLR Win the Clone Wars: Runtime Re-Randomization Kangjie Lu , Stefan Nrnberger,

Noncespaces: Using Randomization to Enforce Information Flow Tracking and Thwart Cross-Site

(Still) Exploiting TCP Timestamps Veit N. Hailperin 1 1 scip AG Hack in Paris, June 2015 Veit N.

Recommendation on Data Missing Not at Random A Doubly Robust Joint Learning Approach Rating

On the Exact Security of Schnorr-Type Signatures in the Random Oracle Model Yannick Seurin

Computation for Mali licious Adversaries and an Honest Majority Jun Furukawa*, Yehuda Lindell**,

Sambuz

Useful Links

Newsletter

Mail Us

Randomized methods for machine learning David Lopez-Paz, FAIR May - PowerPoint PPT Presentation

Randomized methods for machine learning David Lopez-Paz, FAIR May 17, 2016 http://tinyurl.com/randomized-practical Some random examples Building atomic bombs Truncated SVD Dimensionality reduction Kernel methods for big data Nonlinear

Randomized Algorithms Randomized Algorithms Two Types of Randomized Algorithms Two Types of

CSC373 Week 11: Randomized Algorithms 373F19 - Nisarg Shah &amp; Karan Singh 1 Randomized

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

To Randomize or Not To Randomize: Space Optimal Summaries for Hyperlink Analysis Tam as

under Modular Updates Shachar Lovett (UCSD) Kaave Hosseini (UCSD CMU), Grigory Yaroslavtsev

How to Make ASLR Win the Clone Wars: Runtime Re-Randomization Kangjie Lu , Stefan Nrnberger,

Noncespaces: Using Randomization to Enforce Information Flow Tracking and Thwart Cross-Site

(Still) Exploiting TCP Timestamps Veit N. Hailperin 1 1 scip AG Hack in Paris, June 2015 Veit N.

Recommendation on Data Missing Not at Random A Doubly Robust Joint Learning Approach Rating

On the Exact Security of Schnorr-Type Signatures in the Random Oracle Model Yannick Seurin

Computation for Mali licious Adversaries and an Honest Majority Jun Furukawa*, Yehuda Lindell**,

Sambuz

Useful Links

Newsletter

Mail Us

CSC373 Week 11: Randomized Algorithms 373F19 - Nisarg Shah & Karan Singh 1 Randomized