scalable non parametric statistical estimation
play

Scalable Non-Parametric Statistical Estimation Aymeric DIEULEVEUT - PowerPoint PPT Presentation

Scalable Non-Parametric Statistical Estimation Aymeric DIEULEVEUT ENS Paris, INRIA February 6, 2017 Statistics Statistical model Performance measure Estimator Convergence: F (# obs ) Optimization Statistics Minimize a given function


  1. Scalable Non-Parametric Statistical Estimation Aymeric DIEULEVEUT ENS Paris, INRIA February 6, 2017

  2. Statistics Statistical model Performance measure Estimator Convergence: F (# obs )

  3. Optimization Statistics Minimize a given function Statistical model Algorithm focused Performance measure Scales with dimension and Estimator observations Convergence: F (# obs ) Convergence: F (#iter)

  4. Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter)

  5. Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter) Non-parametric Regression Square loss Tikhonov regularization

  6. Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter) Non-parametric Stochastic Regression algorithms Square loss First order methods Tikhonov regularization Few passes on the data

  7. Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter) Non-parametric Non-parametric Stochastic Regression Stochastic algorithms Square loss First order methods Approximation, Tikhonov regularization Few passes on the data AOS, 2015

  8. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression.

  9. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. � ( f ( X ) − Y ) 2 � ε ( f ) := E ( X , Y ) .

  10. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. � ( f ( X ) − Y ) 2 � ε ( f ) := E ( X , Y ) . Within a reproducing kernel Hilbert space H : f ∈H ε ( f ) . min ( x i , y i ) i.i.d. observations.

  11. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � ε ( f ) := E ( X , Y ) . Within a reproducing kernel Hilbert space H : f ∈H ε ( f ) . min ( x i , y i ) i.i.d. observations.

  12. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � Update after each observation. ε ( f ) := E ( X , Y ) . Within a reproducing kernel Hilbert space H : f ∈H ε ( f ) . min ( x i , y i ) i.i.d. observations.

  13. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � Update after each observation. ε ( f ) := E ( X , Y ) . Using unbiased gradients of the Within a reproducing kernel Hilbert loss function: space H : f ∈H ε ( f ) . min ( x i , y i ) i.i.d. observations.

  14. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � Update after each observation. ε ( f ) := E ( X , Y ) . Using unbiased gradients of the Within a reproducing kernel Hilbert loss function: space H : f t +1 = f t − γ t ( f t ( x t ) − y t ) K x t , f ∈H ε ( f ) . min where: K is the kernel, ( x i , y i ) i.i.d. observations. . K x = K ( x , · ).

  15. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � Update after each observation. ε ( f ) := E ( X , Y ) . Using unbiased gradients of the Within a reproducing kernel Hilbert loss function: space H : f t +1 = f t − γ t ( f t ( x t ) − y t ) K x t , f ∈H ε ( f ) . min where: K is the kernel, ( x i , y i ) i.i.d. observations. . K x = K ( x , · ). � Stochastic Approximation.

  16. Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � Update after each observation. ε ( f ) := E ( X , Y ) . Using unbiased gradients of the Within a reproducing kernel Hilbert loss function: space H : f t +1 = f t − γ t ( f t ( x t ) − y t ) K x t , f ∈H ε ( f ) . min where: K is the kernel, ( x i , y i ) i.i.d. observations. . K x = K ( x , · ). � Stochastic Approximation. Depending on assumptions on: ◮ the Gaussian complexity of the unit ball of the kernel space, ◮ the smoothness in H of the optimal predictor f ∗ ( X ) = E [ Y | X ].

  17. Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ H L 2 ρ X

  18. Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ H H L 2 L 2 ρ X ρ X

  19. Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ H H H L 2 L 2 L 2 ρ X ρ X ρ X

  20. Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ H H H L 2 L 2 L 2 ρ X ρ X ρ X Theorem: Averaged, unregularized, least mean squares algorithm, with large step sizes, gets Statistical optimal rate of convergence.

  21. Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ H H H L 2 L 2 L 2 ρ X ρ X ρ X Theorem: Averaged, unregularized, least mean squares algorithm, with large step sizes, gets Statistical optimal rate of convergence. � � σ 2 d � Recovers the finite dimension situation with rate O . n � Optimal rates in both the well-specified regime and some situations of the mis-specified.

  22. Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ H H H L 2 L 2 L 2 ρ X ρ X ρ X Theorem: Averaged, unregularized, least mean squares algorithm, with large step sizes, gets Statistical optimal rate of convergence. � � σ 2 d � Recovers the finite dimension situation with rate O . n � Optimal rates in both the well-specified regime and some situations of the mis-specified.

  23. Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ H H H L 2 L 2 L 2 ρ X ρ X ρ X Theorem: Averaged, unregularized, least mean squares algorithm, with large step sizes, gets Statistical optimal rate of convergence. � � σ 2 d � Recovers the finite dimension situation with rate O . n � Optimal rates in both the well-specified regime and some situations of the mis-specified.

  24. Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter) Non-parametric Non-parametric Stochastic Regression Stochastic algorithms Square loss First order methods Approximation, Tikhonov regularization Few passes on the data AOS, 2015

  25. Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter) Non-parametric Non-parametric Stochastic Regression Stochastic algorithms Square loss First order methods Approximation, Tikhonov regularization Few passes on the data AOS, 2015 Faster Rates for Least-Squares Regression, Tech. report, 2016

Recommend


More recommend