Active Regression via Linear-Sample Sparsification Xue Chen Eric - PowerPoint PPT Presentation

Agnostic learning of linear spaces: results Degree 5 polynomial, σ = 1, x ∈ [ − 1 , 1]. (Matrix) Chernoff bound depends on 1 / d 2 f ( x ) 2 . K := sup sup x f ∈ F � f � D =1 O ( K log d + K ǫ ) samples suffice for agnostic learning [Cohen-Davenport-Leviatan ’13, Hsu-Sabato ’14] D ≤ ǫ � f ∗ − y � 2 ◮ Mean zero noise: � � f − f ∗ � 2 D Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 4 / 18

Agnostic learning of linear spaces: results Degree 5 polynomial, σ = 1, x ∈ [ − 1 , 1]. (Matrix) Chernoff bound depends on 1 / d 2 f ( x ) 2 . K := sup sup x f ∈ F � f � D =1 O ( K log d + K ǫ ) samples suffice for agnostic learning [Cohen-Davenport-Leviatan ’13, Hsu-Sabato ’14] D ≤ ǫ � f ∗ − y � 2 ◮ Mean zero noise: � � f − f ∗ � 2 D � � ◮ Generic noise: f − f � 2 D ≤ (1 + ǫ ) � f − y � 2 D Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 4 / 18

Agnostic learning of linear spaces: results Degree 5 polynomial, σ = 1, x ∈ [ − 1 , 1]. (Matrix) Chernoff bound depends on 1 / d 2 f ( x ) 2 . K := sup sup x f ∈ F � f � D =1 O ( K log d + K ǫ ) samples suffice for agnostic learning [Cohen-Davenport-Leviatan ’13, Hsu-Sabato ’14] D ≤ ǫ � f ∗ − y � 2 ◮ Mean zero noise: � � f − f ∗ � 2 D � � ◮ Generic noise: f − f � 2 D ≤ (1 + ǫ ) � f − y � 2 D Also necessary (coupon collector) Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 4 / 18

Agnostic learning of linear spaces: results Degree 5 polynomial, σ = 1, x ∈ [ − 1 , 1]. (Matrix) Chernoff bound depends on 1 / d 2 f ( x ) 2 . K := sup sup x f ∈ F � f � D =1 O ( K log d + K ǫ ) samples suffice for agnostic learning [Cohen-Davenport-Leviatan ’13, Hsu-Sabato ’14] D ≤ ǫ � f ∗ − y � 2 ◮ Mean zero noise: � � f − f ∗ � 2 D � � ◮ Generic noise: f − f � 2 D ≤ (1 + ǫ ) � f − y � 2 D Also necessary (coupon collector) How can we avoid the dependence on K ? Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 4 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: ◮ Receive x 1 , . . . , x m ∼ D Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: ◮ Receive x 1 , . . . , x m ∼ D ◮ Pick S ⊂ [ m ] of size s Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: ◮ Receive x 1 , . . . , x m ∼ D ◮ Pick S ⊂ [ m ] of size s ◮ See y i for i ∈ S . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Our result: avoid K with more powerful access patterns With more powerful access models, can replace f ( x ) 2 K := sup sup x f ∈ F � f � D =1 with f ( x ) 2 . κ := E sup x f ∈ F � f � D =1 For linear spaces of functions, κ = d . Query model: ◮ Can pick x i of our choice, see y i ∼ ( Y | X = x i ). ◮ Know D (which just defines � f − � f � D ). Active learning model: ◮ Receive x 1 , . . . , x m ∼ D ◮ Pick S ⊂ [ m ] of size s ◮ See y i for i ∈ S . Some results for non-linear spaces. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 5 / 18

Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = f ( x ) 2 D ( x ) sup f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = 1 f ( x ) 2 κ D ( x ) sup f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = 1 f ( x ) 2 κ D ( x ) sup f ∈ F � f � D =1 Estimate norm via m � S , D ′ := 1 D ( x i ) � f � 2 D ′ ( x i ) f ( x i ) 2 m i =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = 1 f ( x ) 2 κ D ( x ) sup f ∈ F � f � D =1 Estimate norm via m � S , D ′ := 1 D ( x i ) � f � 2 D ′ ( x i ) f ( x i ) 2 m i =1 Still equals � f � 2 D in expectation, but now max contribution is κ . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

Query model: basic approach ERM needs empirical norm to � f � S to approximate � f � D for all f ∈ F . This takes O ( K log d ) samples from D . Improved by biasing samples towards high-variance points. D ′ ( x ) = 1 f ( x ) 2 κ D ( x ) sup f ∈ F � f � D =1 Estimate norm via m � S , D ′ := 1 D ( x i ) � f � 2 D ′ ( x i ) f ( x i ) 2 m i =1 Still equals � f � 2 D in expectation, but now max contribution is κ . ◮ This gives O ( κ log d ) sample complexity by Matrix Chernoff. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 6 / 18

Bounding κ for linear function spaces f ( x ) 2 κ = E sup x f ∈ F � f � D =1 Express f ∈ F via an orthonormal basis: � f ( x ) = α j φ j ( x ) . j Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 7 / 18

Bounding κ for linear function spaces f ( x ) 2 κ = E sup x f ∈ F � f � D =1 Express f ∈ F via an orthonormal basis: � f ( x ) = α j φ j ( x ) . j Then f ( x ) 2 = � α, { φ j ( x ) } d j =1 � 2 sup sup � f � D =1 � α � 2 =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 7 / 18

Bounding κ for linear function spaces f ( x ) 2 κ = E sup x f ∈ F � f � D =1 Express f ∈ F via an orthonormal basis: � f ( x ) = α j φ j ( x ) . j Then d � f ( x ) 2 = j =1 � 2 = � α, { φ j ( x ) } d φ j ( x ) 2 . sup sup � f � D =1 � α � 2 =1 j =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 7 / 18

Bounding κ for linear function spaces f ( x ) 2 κ = E sup x f ∈ F � f � D =1 Express f ∈ F via an orthonormal basis: � f ( x ) = α j φ j ( x ) . j Then d � f ( x ) 2 = j =1 � 2 = � α, { φ j ( x ) } d φ j ( x ) 2 . sup sup � f � D =1 � α � 2 =1 j =1 Hence d � x φ j ( x ) 2 = d . κ = E j =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 7 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). ◮ Analogous to Batson-Spielman-Srivastava linear size sparsification. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). ◮ Analogous to Batson-Spielman-Srivastava linear size sparsification. ◮ Yes – using Lee-Sun sparsification. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). ◮ Analogous to Batson-Spielman-Srivastava linear size sparsification. ◮ Yes – using Lee-Sun sparsification. Mean zero noise: E [( � f ( x ) − f ( x )) 2 ] ≤ ǫ E [( y − f ( x )) 2 ]. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Query model: so far Upsampling x proportional to sup f f ( x ) 2 gets O ( d log d ) sample complexity. ◮ Essentially the same as leverage score sampling. ◮ Also analogous to Spielman-Srivastava graph sparsification Can we bring this down to O ( d )? ◮ Not with independent sampling (coupon collector). ◮ Analogous to Batson-Spielman-Srivastava linear size sparsification. ◮ Yes – using Lee-Sun sparsification. Mean zero noise: E [( � f ( x ) − f ( x )) 2 ] ≤ ǫ E [( y − f ( x )) 2 ]. Generic noise: E [( � f ( x ) − f ( x )) 2 ] ≤ (1 + ǫ ) E [( y − f ( x )) 2 ]. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 8 / 18

Active learning Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Our result: both at the same time. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Our result: both at the same time. ◮ In this talk: mostly s = O ( d log d ) version. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Our result: both at the same time. ◮ In this talk: mostly s = O ( d log d ) version. ◮ Prior work: s = O (( d log d ) 5 / 4 ) [Sabato-Munos ’14], Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Query model supposes we know D and can query any point. Active learning: ◮ Get x 1 , . . . , x m ∼ D . ◮ Pick S ⊆ [ m ] of size s . ◮ Learn y i for i ∈ S . Minimize s : ◮ m → ∞ = ⇒ learn D and query any point = ⇒ query model. ◮ Hence s = Θ( d ) optimal. Minimize m : ◮ Label every point = ⇒ agnostic learning. ◮ Hence m = Θ( K log d + K ǫ ) optimal. Our result: both at the same time. ◮ In this talk: mostly s = O ( d log d ) version. ◮ Prior work: s = O (( d log d ) 5 / 4 ) [Sabato-Munos ’14], s = O ( d log d ) via “volume sampling” [Derezinski-Warmuth-Hsu ’18]. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 9 / 18

Active learning Warmup: suppose we know D . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: f ( x i ) 2 . Pr[Label x i ] ∝ sup f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Chance each sample gets labeled is x [Pr[Label x i ]] = κ E K Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Chance each sample gets labeled is x [Pr[Label x i ]] = κ K = d K . E Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

Active learning Warmup: suppose we know D . Can simulate the query algorithm via rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 Just needs s = O ( d log d ). Chance each sample gets labeled is x [Pr[Label x i ]] = κ K = d K . E Gives m = O ( K log d ) unlabeled samples, s = O ( d log d ) labeled samples. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 10 / 18

Active learning without knowing D Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Just need to estimate � f � D for all f ∈ F . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Just need to estimate � f � D for all f ∈ F . Matrix Chernoff gets this with m = O ( K log d ) unlabeled samples. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Just need to estimate � f � D for all f ∈ F . Matrix Chernoff gets this with m = O ( K log d ) unlabeled samples. Gives m = O ( K log d ) unlabeled samples, s = O ( d log d ) labeled samples. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

Active learning without knowing D Want to perform rejection sampling: Pr[Label x i ] = 1 f ( x i ) 2 . sup K f ∈ F � f � D =1 but don’t know D . Just need to estimate � f � D for all f ∈ F . Matrix Chernoff gets this with m = O ( K log d ) unlabeled samples. Gives m = O ( K log d ) unlabeled samples, s = O ( d log d ) labeled samples. Can improve to m = O ( K log d ), s = O ( d ). Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 11 / 18

Getting to s = O ( d ) Based on Lee-Sun ’15 Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18

Getting to s = O ( d ) Based on Lee-Sun ’15 O ( d log d ) comes from coupon collector. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18

Getting to s = O ( d ) Based on Lee-Sun ’15 O ( d log d ) comes from coupon collector. Change to non-independent sampling: Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18

Getting to s = O ( d ) Based on Lee-Sun ’15 O ( d log d ) comes from coupon collector. Change to non-independent sampling: ◮ x i ∼ D i where D i depends on x 1 , . . . , x i − 1 . Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18

Getting to s = O ( d ) Based on Lee-Sun ’15 O ( d log d ) comes from coupon collector. Change to non-independent sampling: ◮ x i ∼ D i where D i depends on x 1 , . . . , x i − 1 . ◮ D 1 = D ′ , D 2 avoids points near x 1 , etc. Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 12 / 18

Active Regression via Linear-Sample Sparsification Xue Chen Eric - PowerPoint PPT Presentation

Active Regression via Linear-Sample Sparsification Xue Chen Eric Price UT Austin Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 1 / 18 Agnostic learning Xue Chen, Eric Price (UT Austin) Active Regression

Sample stascs and linear regression NEU 466M Instructor:

Simple Linear Regression Chapter 10 1 Motivation Have data (sample, x s) Want to

Unit 6: Introduction to linear regression 1. Introduction to regression The CDC monitors the

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

Active Learning for Regression: Active Learning for Regression: Algorithms and Applications

Simple Linear Regression and Correlation Model for designed experiment: Y i = 0 + 1 x i +

Which models can be fit with linear regression? Simple linear regression in Matlab X = rand(3,3)

Decomposition of sum of squares 8 y y 6 y y y y 4 y y 2 2 4 6 8

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model

Gov 2000: 8. Simple Linear Regression Matthew Blackwell Fall 2016 1 / 84 1. Assumptions of the

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression Sta 101 - Fall

Outline The Simple Linear Regression Model (12.1) Fitting the Regression Line (12.2)

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Overview IAML: Linear Regression The linear model Fitting the linear model to data

Learning From Data Lecture 8 Linear Classification and Regression Linear Classification Linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today

Secure Linear Regression on Secure Linear Regression on Vertically Partitioned Datasets

Linear Regression Yijun Zhao Northeastern University Fall 2016 Yijun Zhao Linear Regression

Active Regression via Linear-Sample Sparsification Xue Chen Eric - PowerPoint PPT Presentation

Active Regression via Linear-Sample Sparsification Xue Chen Eric Price UT Austin Xue Chen, Eric Price (UT Austin) Active Regression via Linear-Sample Sparsification 1 / 18 Agnostic learning Xue Chen, Eric Price (UT Austin) Active Regression

Sample sta*s*cs and linear regression NEU 466M Instructor:

Simple Linear Regression Chapter 10 1 Motivation Have data (sample, x s) Want to

Unit 6: Introduction to linear regression 1. Introduction to regression The CDC monitors the

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

Active Learning for Regression: Active Learning for Regression: Algorithms and Applications

Simple Linear Regression and Correlation Model for designed experiment: Y i = 0 + 1 x i +

Which models can be fit with linear regression? Simple linear regression in Matlab X = rand(3,3)

Decomposition of sum of squares 8 y y 6 y y y y 4 y y 2 2 4 6 8

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model

Gov 2000: 8. Simple Linear Regression Matthew Blackwell Fall 2016 1 / 84 1. Assumptions of the

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression Sta 101 - Fall

Outline The Simple Linear Regression Model (12.1) Fitting the Regression Line (12.2)

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Overview IAML: Linear Regression The linear model Fitting the linear model to data

Learning From Data Lecture 8 Linear Classification and Regression Linear Classification Linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today

Secure Linear Regression on Secure Linear Regression on Vertically Partitioned Datasets

Linear Regression Yijun Zhao Northeastern University Fall 2016 Yijun Zhao Linear Regression

Sample stascs and linear regression NEU 466M Instructor: