Sequential complexities and uniform martingale laws of large numbers - PowerPoint PPT Presentation

Sequential complexities and uniform martingale laws of large numbers Ambuj Tewari (based on joint work with Alexander Rakhlin and Karthik Sridharan) Department of Statistics, and Department of EECS, University of Michigan, Ann Arbor November 15, 2014 Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Some Prediction Problems Will a friendship relation form between two Facebook users? Which ads should Google show me when I search for flights to Mexico ? 507,000 webpages match game-theoretic probability : in which order should Google show them to me? Should Gmail put the email with subject FREE ONLINE COURSES!!! in the spam folder? Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Mathematical Formulation of Prediction Problems Input space X (vectors, matrices, text, graphs) Label space Y (classification) Y = {± 1 } (regression) Y = [ − 1 , +1] (ranking) Y = S k , group of k -permutations Want to learn a prediction function f : X → Y Loss function: how bad is prediction f ( x ) if “truth” is y Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Predictions and Losses Learner/Statistician/Decision Maker chooses prediction function f : X → Y Adversary/Nature/Environment produces examples ( x , y ) ∈ X × Y Learner’s loss ℓ ( f ( x ) , y ) Assume ℓ is bounded Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Probabilistic Approach ( x t , y t ) are drawn from a stochastic process For instance, ( x t , y t ) i.i.d. from some distribution P Parametric case: P = P θ with θ ∈ Θ ⊆ R p Distribution free or “agnostic” case: P arbitrary Goal: Choose � f based on the sample (( x t , y t )) n t =1 to have small expected loss � � ℓ ( � f ( x ) , y ) E x 1: n , y 1: n , x , y ∼ P Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Empirical Risk Minimization Risk and empirical risk n � L ( f ) = 1 � L ( f ) = E ( x , y ) ∼ P [ ℓ ( f ( x ) , y )] ℓ ( f ( x t ) , y t ) n t =1 Risk minimizer f ⋆ = argmin L ( f ) f ∈F Empirical risk minimizer (ERM) � � f = argmin L ( f ) f ∈F Excess risk L ( � f ) − L ( f ⋆ ) Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Game Theoretic Approach FOR t = 1 to n Adversary plays x t ∈ X Learner plays f t ∈ F Adversary plays y t ∈ Y Learner suffers ℓ ( f t ( x t ) , y t ) ENDFOR No assumption on data generating mechanism Want to “do well” on every sequence ( x 1 , y 1 ) , . . . , ( x n , y n ) Tricky to define Goal: Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Regret Measure learner’s loss relative to some benchmark computed in hindsight (External) Regret n n � � ℓ ( f t ( x t ) , y t ) − min ℓ ( f ( x t ) , y t ) f ∈F t =1 t =1 Benchmark here is the best fixed decision in hindsight Many variants exist (switching regret, Φ-regret) Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Why Study Regret? Lets us proceed with no assumptions on the data generating process Regret-minimizing algorithms perform well if data is i.i.d. Yields simple one-pass algorithms If players in a game follow regret-minimizing algorithms, the empirical distribution of play converges to an equilibrium Long history in Computer Science, Finance, Game Theory, Information Theory, and Statistics Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Two pioneers James Hannan (1922-2010) David Blackwell (1919-2010) Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Simplest Case: Finite Class of Functions |F| = K Hannan’s theorem. There is a (randomized) learner strategy for which (expected) regret = o ( n ) “no-regret learning” or “Hannan consistency”: when regret = o ( n ) Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Multiple Discovery Originally proved by Hannan (1956) Blackwell (1956) showed how it follows from his approachability theorem Result has been proven many times since then: Banos (1968) Cover (1991) Foster & Vohra (1993) Vovk (1993) Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Rest of the Talk Rademacher complexity and its sequential analog Fat-shattering dimension and its sequential analog Uniform martingale law of large numbers Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Rademacher Complexity Recall ERM � f , RM f ⋆ f ⋆ = argmin � � f = argmin L ( f ) L ( f ) f ∈F f ∈F Easy to show � � � � L ( � L ( f ) − � f ) − L ( f ⋆ ) E ≤ E sup L ( f ) f ∈F Symmetrization ( ǫ t ’s are Rademacher, i.e. symmetric Bernoulli) � � � � � n 1 L ( f ) − � sup L ( f ) ≤ 2 E ǫ 1: n , x 1: n , y 1: n sup ǫ t ℓ ( f ( x t ) , y t ) E n f ∈F f ∈F t =1 Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Which Algorithm Should We Analyze? Obvious analogue of ERM is “follow-the-leader” or “fictitious play”: � t f t +1 = argmin ℓ ( f ( x s ) , y s ) f ∈F s =1 Does not enjoy good regret bound Lack of a generic regret-minimizing strategy is a problem Directly attack minimax regret Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Minimax Regret Minimax regret: � n � � � n V n := min max E ℓ ( f t ( x t ) , y t ) − min ℓ ( f ( x t ) , y t ) f ∈F Learner Adversary t =1 t =1 strategies strategies Theorem (Rakhlin, Sridharan, Tewari (2010)) V n ≤ 2 R seq n Important precursor: Abernethy et al. (2009) Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Sequential Rademacher Complexity � � n � R seq := sup sup ǫ t ℓ ( f ( x ( ǫ 1: t − 1 )) , y ( ǫ 1: t − 1 ) x , y E ǫ 1: n n f ∈F t =1 x 1 , y 1 x 3 , y 3 x 2 , y 2 Tree x , y x 4 , y 4 x 6 , y 6 x 7 , y 7 x 5 , y 5 Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Sequential Rademacher Complexity � � n � R seq := sup sup ǫ t ℓ ( f ( x ( ǫ 1: t − 1 )) , y ( ǫ 1: t − 1 ) x , y E ǫ 1: n n f ∈F t =1 x ( ∅ ) , y ( ∅ ) x 3 , y 3 x 2 , y 2 Tree x , y x 4 , y 4 x 6 , y 6 x 7 , y 7 x 5 , y 5 Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Sequential Rademacher Complexity � � n � R seq := sup sup ǫ t ℓ ( f ( x ( ǫ 1: t − 1 )) , y ( ǫ 1: t − 1 ) x , y E ǫ 1: n n f ∈F t =1 x 1 , y 1 x 3 , y 3 x ( − 1) , y ( − 1) Tree x , y x 4 , y 4 x 6 , y 6 x 7 , y 7 x 5 , y 5 Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Sequential Rademacher Complexity � � n � R seq := sup sup ǫ t ℓ ( f ( x ( ǫ 1: t − 1 )) , y ( ǫ 1: t − 1 ) x , y E ǫ 1: n n f ∈F t =1 x 1 , y 1 x 3 , y 3 x 2 , y 2 Tree x , y x 4 , y 4 x 6 , y 6 x 7 , y 7 x ( − 1 , 1) , y ( − 1 , 1) Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Sequential Rademacher Complexity � � n � R seq := sup sup ǫ t ℓ ( f ( x ( ǫ 1: t − 1 )) , y ( ǫ 1: t − 1 ) x , y E ǫ 1: n n f ∈F t =1 x 1 , y 1 x 3 , y 3 x 2 , y 2 Tree x , y x 4 , y 4 x 6 , y 6 x 7 , y 7 x 5 , y 5 Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Sequential Rademacher Complexity � � n � R seq := sup sup ǫ t ℓ ( f ( x ( ǫ 1: t − 1 )) , y ( ǫ 1: t − 1 ) x , y E ǫ 1: n n f ∈F t =1 x 1 , y 1 − 1 x 3 , y 3 x 2 , y 2 Tree x , y +1 x 4 , y 4 x 6 , y 6 x 7 , y 7 x 5 , y 5 +1 Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Rademacher Complexity: Classical vs. Sequential � � n � R n ( ℓ ◦ F ) := E ǫ 1: n , x 1: n , y 1: n sup ǫ t ℓ ( f ( x t ) , y t )) f ∈F t =1 � � n � R seq n ( ℓ ◦ F ) := sup sup ǫ t ℓ ( f ( x ( ǫ 1: t − 1 )) , y ( ǫ 1: t − 1 )) x , y E ǫ 1: n f ∈F t =1 Sequences x 1: n , y 1: n replaced by tree x , y Expectation over sequences x 1: n , y 1: n replaced by supremum over trees x , y Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Seq. Rademacher Complexity: Properties (inclusion) If F ⊆ F ′ then R seq n ( ℓ ◦ F ) ≤ R seq n ( ℓ ◦ F ′ ) (scaling) If c ∈ R then R seq n ( c ℓ ◦ F ) = | c | · R seq n ( ℓ ◦ F ) (translation) If ℓ ′ = ℓ + h then n ( ℓ ◦ F ) = R seq ( ℓ ′ ◦ F ) R seq Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Seq. Rademacher Complexity: Properties (inclusion) If F ⊆ F ′ then R seq n ( ℓ ◦ F ) ≤ R seq n ( ℓ ◦ F ′ ) (scaling) If c ∈ R then R seq n ( c ℓ ◦ F ) = | c | · R seq n ( ℓ ◦ F ) (translation) If ℓ ′ = ℓ + h then n ( ℓ ◦ F ) = R seq ( ℓ ′ ◦ F ) R seq Using these and other properties, possible to bound seq. Rademacher complexity of decision trees, neural networks, etc. Rakhlin, Sridharan, Tewari Sequential complexities and uniform martingale LLNs

Sequential complexities and uniform martingale laws of large numbers - PowerPoint PPT Presentation

Sequential complexities and uniform martingale laws of large numbers Ambuj Tewari (based on joint work with Alexander Rakhlin and Karthik Sridharan) Department of Statistics, and Department of EECS, University of Michigan, Ann Arbor November

On the rate of convergence of the Biggins martingale The rate of convergence Biggins martingale

Complexities Pter Gcs Computer Science Department Boston University Spring 2018 Outline

Investigating the extremal martingale measures with pre-specified marginals Luciano Campi 1 ,

Martingale Difference Central Limit Theorem Yichen Zhou May 9, 2016 Intuition Why martingale

Uniform Law Commission Chicago, IL 60602 (312) 450-6600 tel (312) 450-6601 fax NATIONAL

AND NEWTONS LAWS Law 1 Every body perseveres in its state of rest, or of uniform

Martingale Problem under Nonlinear Expectations Chen Pan USTC, China and UC Berkeley Sixth WCMF

Commutators, paraproducts and BMO in non-homogeneous martingale harmonic analysis Sergei Treil

Curriculum on The Cadet Corps Uniform Class A Uniform Class A Uniform Agenda C1. Class A

A canonical martingale coupling Workshop on Optimal Transportation and Appplications Nicolas

Friction, Circular Motion, and More Applications of Newtons Laws Friction Uniform

Louisiana Tax Commission Administers and enforces property tax assessment laws throughout the

Curriculum on The Cadet Corps Uniform Wear It WIth honor Class C Uniform Class C Uniform

Winter Uniform If out of uniform students must present a note of explanation to their Year Level

OVERVIEW 1 What is the Uniform Guidance? Rules that set uniform standards for the award and

Uniform Guidance aka UG, UniGui HUGE: CSU Harnessing Uniform Guidance Effectively An update

Advanced Algorithms (VI) Shanghai Jiao Tong University Chihao Zhang April 13, 2020 Martingale

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Non-Uniform Computation Lecture 10 Non-Uniform Computational Models: Circuits 1 Non-Uniform

MAZENOD COLLEGE STUDENT PRESENTATION POLICY SUMMER UNIFORM If out of uniform students must

UNIFORM PROPOSAL CONTENTS 4 UNIFORM REFLECTIVE TAPE 12 11 10 9 8 6 5 3 UNIFORM

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Pricing American options using martingale bases Jrme Lelong Grenoble Alpes University ETH

Sequential Circuits Combinational circuits : current input output Sequential circuit :