Scalable Non-Parametric Statistical Estimation Aymeric DIEULEVEUT ENS Paris, INRIA February 6, 2017
Statistics Statistical model Performance measure Estimator Convergence: F (# obs )
Optimization Statistics Minimize a given function Statistical model Algorithm focused Performance measure Scales with dimension and Estimator observations Convergence: F (# obs ) Convergence: F (#iter)
Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter)
Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter) Non-parametric Regression Square loss Tikhonov regularization
Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter) Non-parametric Stochastic Regression algorithms Square loss First order methods Tikhonov regularization Few passes on the data
Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter) Non-parametric Non-parametric Stochastic Regression Stochastic algorithms Square loss First order methods Approximation, Tikhonov regularization Few passes on the data AOS, 2015
Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression.
Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. � ( f ( X ) − Y ) 2 � ε ( f ) := E ( X , Y ) .
Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. � ( f ( X ) − Y ) 2 � ε ( f ) := E ( X , Y ) . Within a reproducing kernel Hilbert space H : f ∈H ε ( f ) . min ( x i , y i ) i.i.d. observations.
Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � ε ( f ) := E ( X , Y ) . Within a reproducing kernel Hilbert space H : f ∈H ε ( f ) . min ( x i , y i ) i.i.d. observations.
Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � Update after each observation. ε ( f ) := E ( X , Y ) . Within a reproducing kernel Hilbert space H : f ∈H ε ( f ) . min ( x i , y i ) i.i.d. observations.
Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � Update after each observation. ε ( f ) := E ( X , Y ) . Using unbiased gradients of the Within a reproducing kernel Hilbert loss function: space H : f ∈H ε ( f ) . min ( x i , y i ) i.i.d. observations.
Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � Update after each observation. ε ( f ) := E ( X , Y ) . Using unbiased gradients of the Within a reproducing kernel Hilbert loss function: space H : f t +1 = f t − γ t ( f t ( x t ) − y t ) K x t , f ∈H ε ( f ) . min where: K is the kernel, ( x i , y i ) i.i.d. observations. . K x = K ( x , · ).
Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � Update after each observation. ε ( f ) := E ( X , Y ) . Using unbiased gradients of the Within a reproducing kernel Hilbert loss function: space H : f t +1 = f t − γ t ( f t ( x t ) − y t ) K x t , f ∈H ε ( f ) . min where: K is the kernel, ( x i , y i ) i.i.d. observations. . K x = K ( x , · ). � Stochastic Approximation.
Non-parametric Stochastic Approximation with large step sizes 1/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. Random design least-squares regression. Sequence of estimators f t ∈ H . � ( f ( X ) − Y ) 2 � Update after each observation. ε ( f ) := E ( X , Y ) . Using unbiased gradients of the Within a reproducing kernel Hilbert loss function: space H : f t +1 = f t − γ t ( f t ( x t ) − y t ) K x t , f ∈H ε ( f ) . min where: K is the kernel, ( x i , y i ) i.i.d. observations. . K x = K ( x , · ). � Stochastic Approximation. Depending on assumptions on: ◮ the Gaussian complexity of the unit ball of the kernel space, ◮ the smoothness in H of the optimal predictor f ∗ ( X ) = E [ Y | X ].
Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ H L 2 ρ X
Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ H H L 2 L 2 ρ X ρ X
Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ H H H L 2 L 2 L 2 ρ X ρ X ρ X
Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ H H H L 2 L 2 L 2 ρ X ρ X ρ X Theorem: Averaged, unregularized, least mean squares algorithm, with large step sizes, gets Statistical optimal rate of convergence.
Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ H H H L 2 L 2 L 2 ρ X ρ X ρ X Theorem: Averaged, unregularized, least mean squares algorithm, with large step sizes, gets Statistical optimal rate of convergence. � � σ 2 d � Recovers the finite dimension situation with rate O . n � Optimal rates in both the well-specified regime and some situations of the mis-specified.
Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ H H H L 2 L 2 L 2 ρ X ρ X ρ X Theorem: Averaged, unregularized, least mean squares algorithm, with large step sizes, gets Statistical optimal rate of convergence. � � σ 2 d � Recovers the finite dimension situation with rate O . n � Optimal rates in both the well-specified regime and some situations of the mis-specified.
Non-parametric Stochastic Approximation with large step sizes 2/2. Aymeric Dieuleveut & Francis Bach, in the Annals of Statistics , 2015. x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ x f ∗ H H H L 2 L 2 L 2 ρ X ρ X ρ X Theorem: Averaged, unregularized, least mean squares algorithm, with large step sizes, gets Statistical optimal rate of convergence. � � σ 2 d � Recovers the finite dimension situation with rate O . n � Optimal rates in both the well-specified regime and some situations of the mis-specified.
Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter) Non-parametric Non-parametric Stochastic Regression Stochastic algorithms Square loss First order methods Approximation, Tikhonov regularization Few passes on the data AOS, 2015
Optimization Statistics Minimize a given function Statistical model Accurate & Efficient Algorithm focused Performance measure Scalable estimators with Scales with dimension and Estimator optimal statistical properties observations Convergence: F (# obs ) Convergence: F (#iter) Non-parametric Non-parametric Stochastic Regression Stochastic algorithms Square loss First order methods Approximation, Tikhonov regularization Few passes on the data AOS, 2015 Faster Rates for Least-Squares Regression, Tech. report, 2016
Recommend
More recommend