Minimax Rates for Memory-Constrained Sparse Linear Regression Jacob - PowerPoint PPT Presentation

Minimax Rates for Memory-Constrained Sparse Linear Regression Jacob Steinhardt John Duchi Stanford University { jsteinha,jduchi } @stanford.edu July 6, 2015 J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 1 / 11

Resource-Constrained Learning How do we solve statistical problems with limited resources? J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 2 / 11

Resource-Constrained Learning How do we solve statistical problems with limited resources? computation (Natarajan, 1995; Berthet & Rigollet, 2013; Zhang et al., 2014; Foster et al., 2015) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 2 / 11

Resource-Constrained Learning How do we solve statistical problems with limited resources? computation (Natarajan, 1995; Berthet & Rigollet, 2013; Zhang et al., 2014; Foster et al., 2015) privacy (Kasiviswanathan et al., 2011; Duchi et al., 2013) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 2 / 11

Resource-Constrained Learning How do we solve statistical problems with limited resources? computation (Natarajan, 1995; Berthet & Rigollet, 2013; Zhang et al., 2014; Foster et al., 2015) privacy (Kasiviswanathan et al., 2011; Duchi et al., 2013) communication / memory (Zhang et al., 2013; Shamir, 2014; Garg et al., 2014; Braverman et al., 2015) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 2 / 11

Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations W ∗ J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations W ∗ X ( 1 ) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations W ∗ X ( 1 ) Y ( 1 ) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations W ∗ X ( 1 ) Y ( 1 ) Z ( 1 ) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations W ∗ X ( 1 ) X ( 2 ) Y ( 1 ) Z ( 1 ) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations W ∗ X ( 1 ) X ( 2 ) Y ( 1 ) Y ( 2 ) Z ( 1 ) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations W ∗ X ( 1 ) X ( 2 ) Y ( 1 ) Y ( 2 ) b Z ( 1 ) Z ( 2 ) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

Setting Sparse linear regression in R d : Y ( i ) = � w ∗ , X ( i ) � + ε ( i ) � w ∗ � 0 = k , k ≪ d Memory constraint: ( X ( i ) , Y ( i ) ) observed as read-only stream Only keep b bits of state Z ( i ) between successive observations W ∗ X ( 1 ) X ( 2 ) X ( 3 ) Y ( 1 ) Y ( 2 ) Y ( 3 ) b b ... Z ( 1 ) Z ( 2 ) Z ( 3 ) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 3 / 11

Motivating Question If we have enough memory to represent the answer, can we also efficiently learn the answer? J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 4 / 11

Problem Statement How much data n is needed to obtain estimator ˆ w with w − w ∗ � 2 E [ � ˆ 2 ] ≤ ε ? J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 5 / 11

Problem Statement How much data n is needed to obtain estimator ˆ w with w − w ∗ � 2 E [ � ˆ 2 ] ≤ ε ? Classical case (no memory constraint): Theorem (Wainwright, 2009) k ε log ( d ) � n � k ε log ( d ) J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 5 / 11

Problem Statement How much data n is needed to obtain estimator ˆ w with w − w ∗ � 2 E [ � ˆ 2 ] ≤ ε ? Classical case (no memory constraint): Theorem (Wainwright, 2009) k ε log ( d ) � n � k ε log ( d ) Achievable with ˜ O ( d ) memory (Agarwal et al., 2012; S., Wager, & Liang, 2015). J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 5 / 11

Problem Statement How much data n is needed to obtain estimator ˆ w with w − w ∗ � 2 E [ � ˆ 2 ] ≤ ε ? Classical case (no memory constraint): Theorem (Wainwright, 2009) k ε log ( d ) � n � k ε log ( d ) With memory constraints b : Theorem (S. & Duchi, 2015) k d b � n � k d ε ε 2 b J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 5 / 11

Problem Statement How much data n is needed to obtain estimator ˆ w with w − w ∗ � 2 E [ � ˆ 2 ] ≤ ε ? Classical case (no memory constraint): Theorem (Wainwright, 2009) k ε log ( d ) � n � k ε log ( d ) With memory constraints b : Theorem (S. & Duchi, 2015) k d b � n � k d ε ε 2 b Exponential increase if b ≪ d ! J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 5 / 11

Problem Statement How much data n is needed to obtain estimator ˆ w with w − w ∗ � 2 E [ � ˆ 2 ] ≤ ε ? Classical case (no memory constraint): Theorem (Wainwright, 2009) k ε log ( d ) � n � k ε log ( d ) With memory constraints b : Theorem (S. & Duchi, 2015) k d b � n � k d ε ε 2 b [Note: up to log factors; assumes k log ( d ) ≪ b ≤ d ] J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 5 / 11

Proof Overview Lower bound: J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 6 / 11

Proof Overview Lower bound: information-theoretic J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 6 / 11

Proof Overview Lower bound: information-theoretic strong data-processing inequality d W ∗ X , Y Z 1 J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 6 / 11

Proof Overview Lower bound: information-theoretic strong data-processing inequality ✁ db W ∗ X , Y Z ✁ 1 b d J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 6 / 11

Proof Overview Lower bound: information-theoretic strong data-processing inequality ✁ db W ∗ X , Y Z ✁ 1 b d main challenge: dependence between X , Y J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 6 / 11

Proof Overview Lower bound: information-theoretic strong data-processing inequality ✁ db W ∗ X , Y Z ✁ 1 b d main challenge: dependence between X , Y Upper bound: J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 6 / 11

Proof Overview Lower bound: information-theoretic strong data-processing inequality ✁ db W ∗ X , Y Z ✁ 1 b d main challenge: dependence between X , Y Upper bound: count-min sketch + ℓ 1 -regularized dual averaging J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6, 2015 6 / 11

Minimax Rates for Memory-Constrained Sparse Linear Regression Jacob - PowerPoint PPT Presentation

Minimax Rates for Memory-Constrained Sparse Linear Regression Jacob Steinhardt John Duchi Stanford University { jsteinha,jduchi } @stanford.edu July 6, 2015 J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6,

PROPERTY RATES PROPERTY RATES PROPERTY RATES PROPERTY RATES BUFFALO CITY MUNICIPALITY

4. Minimax and planning problems Optimizing piecewise linear functions Minimax problems

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Configuration Configuration management in memory management in memory constrained constrained

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Memory Management Memory Manager Requirements Minimize primary memory access time

How we run SQL queries in-memory when available memory is constrained with Kognitio analytical

Linear Programming Linear programming is the simplest form of constrained optimization because

Inf2D 04: Adversarial Search Valerio Restocchi School of Informatics, University of Edinburgh

Foundations of Artificial Intelligence 42. Board Games: Minimax Search and Evaluation Functions

CS885 Reinforcement Learning Lecture 13c: June 13, 2018 Adversarial Search [RusNor] Sec. 5.1-5.4

Game Playing Tail end of Constraint Satisfaction Ch. 5.1-5.3, 5.4.1, 5.5 Questions Game

Robust Digital Filters Part 1: Minimax FIR Filters Wu-Sheng Lu Takao Hinamoto University of

CS540 Midterm Review Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University

Chapter6 Adversarial Search 20070419 Chap6 1 Game Theory Studied by mathematicians,

Homework 7.1 C D Here is the payoff matrix for the most commonly used version of the