robust regression with coarse data
play

Robust Regression with Coarse Data Marco Cattaneo and Andrea - PowerPoint PPT Presentation

Robust Regression with Coarse Data Marco Cattaneo and Andrea Wiencierz Department of Statistics, LMU Munich Statistische Woche 2011, Leipzig, Germany 21 September 2011 coarse data unobserved precise data observed coarse data Marco Cattaneo


  1. Robust Regression with Coarse Data Marco Cattaneo and Andrea Wiencierz Department of Statistics, LMU Munich Statistische Woche 2011, Leipzig, Germany 21 September 2011

  2. coarse data unobserved precise data observed coarse data Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  3. coarse data unobserved precise data observed coarse data ◮ in the literature, two kinds of general approaches to regression with coarse data: Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  4. coarse data unobserved precise data observed coarse data ◮ in the literature, two kinds of general approaches to regression with coarse data: ◮ represent the observed coarse data by few precise values (e.g., intervals by center and width), and apply standard regression methods to those values: see for instance Domingues et al. (2010) Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  5. coarse data unobserved precise data observed coarse data ◮ in the literature, two kinds of general approaches to regression with coarse data: ◮ represent the observed coarse data by few precise values (e.g., intervals by center and width), and apply standard regression methods to those values: see for instance Domingues et al. (2010) ◮ apply standard regression methods to all possible precise data compatible with the observed coarse data, and consider the range of outcomes as the imprecise result: see for example Ferson et al. (2007) Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  6. coarse data unobserved precise data observed coarse data ◮ in the literature, two kinds of general approaches to regression with coarse data: ◮ represent the observed coarse data by few precise values (e.g., intervals by center and width), and apply standard regression methods to those values: see for instance Domingues et al. (2010) ◮ apply standard regression methods to all possible precise data compatible with the observed coarse data, and consider the range of outcomes as the imprecise result: see for example Ferson et al. (2007) ◮ LIR (Likelihood-based Imprecise Regression): new regression method directly applicable to coarse data (Cattaneo and Wiencierz, 2011) Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  7. nonparametric likelihood ◮ precise data (unobserved): random variables V i = ( X i , Y i ) ∈ X × R Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  8. nonparametric likelihood ◮ precise data (unobserved): random variables V i = ( X i , Y i ) ∈ X × R ◮ coarse data (observed): random sets V ∗ i ⊆ X × R Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  9. nonparametric likelihood ◮ precise data (unobserved): random variables V i = ( X i , Y i ) ∈ X × R ◮ coarse data (observed): random sets V ∗ i ⊆ X × R ◮ nonparametric model: P is the set of all probability measures such that Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  10. nonparametric likelihood ◮ precise data (unobserved): random variables V i = ( X i , Y i ) ∈ X × R ◮ coarse data (observed): random sets V ∗ i ⊆ X × R ◮ nonparametric model: P is the set of all probability measures such that ◮ ( V 1 , V ∗ 1 ) , . . . , ( V n , V ∗ n ) i.i.d. Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  11. nonparametric likelihood ◮ precise data (unobserved): random variables V i = ( X i , Y i ) ∈ X × R ◮ coarse data (observed): random sets V ∗ i ⊆ X × R ◮ nonparametric model: P is the set of all probability measures such that ◮ ( V 1 , V ∗ 1 ) , . . . , ( V n , V ∗ n ) i.i.d. ◮ P ( V i ∈ V ∗ i ) ≥ 1 − ε (where ε ∈ [0 , 1] is fixed) Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  12. nonparametric likelihood ◮ precise data (unobserved): random variables V i = ( X i , Y i ) ∈ X × R ◮ coarse data (observed): random sets V ∗ i ⊆ X × R ◮ nonparametric model: P is the set of all probability measures such that ◮ ( V 1 , V ∗ 1 ) , . . . , ( V n , V ∗ n ) i.i.d. ◮ P ( V i ∈ V ∗ i ) ≥ 1 − ε (where ε ∈ [0 , 1] is fixed) ◮ the observed (coarse) data V ∗ 1 = A 1 , . . . , V ∗ n = A n induce the (normalized) likelihood function lik : P → [0 , 1] with P ( V ∗ 1 = A 1 , . . . , V ∗ n = A n ) lik ( P ) = max P ′ ∈P P ′ ( V ∗ 1 = A 1 , . . . , V ∗ n = A n ) Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  13. regression problem ◮ regression functions: F is a certain set of functions f : X → R Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  14. regression problem ◮ regression functions: F is a certain set of functions f : X → R ◮ absolute residuals: R f , i = | Y i − f ( X i ) | Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  15. regression problem ◮ regression functions: F is a certain set of functions f : X → R ◮ absolute residuals: R f , i = | Y i − f ( X i ) | ◮ for each function f ∈ F , the quantiles of the distribution of the absolute residuals R f , i can be estimated even under the nonparametric model P Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  16. regression problem ◮ regression functions: F is a certain set of functions f : X → R ◮ absolute residuals: R f , i = | Y i − f ( X i ) | ◮ for each function f ∈ F , the quantiles of the distribution of the absolute residuals R f , i can be estimated even under the nonparametric model P ◮ the regression problem can be interpreted as the minimization of the p-quantile of the distribution of the absolute residuals R f , i (where p ∈ (0 , 1) is fixed) Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  17. generalized LQS regression ◮ likelihood-based confidence interval for the p -quantile of the distribution of the absolute residuals R f , i (where Q f ( P ) is the interval of all p -quantiles of R f , i under P , and β ∈ (0 , 1) is fixed): � C f = Q f ( P ) P ∈P : lik ( P ) >β Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  18. generalized LQS regression ◮ likelihood-based confidence interval for the p -quantile of the distribution of the absolute residuals R f , i (where Q f ( P ) is the interval of all p -quantiles of R f , i under P , and β ∈ (0 , 1) is fixed): � C f = Q f ( P ) P ∈P : lik ( P ) >β ◮ point estimate : f LRM is the function in F minimizing sup C f (Likelihood-based Region Minimax: see Cattaneo, 2007) Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  19. generalized LQS regression ◮ likelihood-based confidence interval for the p -quantile of the distribution of the absolute residuals R f , i (where Q f ( P ) is the interval of all p -quantiles of R f , i under P , and β ∈ (0 , 1) is fixed): � C f = Q f ( P ) P ∈P : lik ( P ) >β ◮ point estimate : f LRM is the function in F minimizing sup C f (Likelihood-based Region Minimax: see Cattaneo, 2007) ◮ f LRM has a simple geometrical interpretation: B f LRM , q LRM is the thinnest band of the form B f , q = { ( x , y ) ∈ X × R : | y − f ( x ) | ≤ q } containing at least k coarse data (where k > ( p + ε ) n depends on n , ε, p , β ), for all f ∈ F and all q ∈ [0 , + ∞ ) Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

  20. generalized LQS regression ◮ likelihood-based confidence interval for the p -quantile of the distribution of the absolute residuals R f , i (where Q f ( P ) is the interval of all p -quantiles of R f , i under P , and β ∈ (0 , 1) is fixed): � C f = Q f ( P ) P ∈P : lik ( P ) >β ◮ point estimate : f LRM is the function in F minimizing sup C f (Likelihood-based Region Minimax: see Cattaneo, 2007) ◮ f LRM has a simple geometrical interpretation: B f LRM , q LRM is the thinnest band of the form B f , q = { ( x , y ) ∈ X × R : | y − f ( x ) | ≤ q } containing at least k coarse data (where k > ( p + ε ) n depends on n , ε, p , β ), for all f ∈ F and all q ∈ [0 , + ∞ ) ◮ when the observed data are in fact precise, f LRM corresponds to the LQS (Least Quantile of Squares) estimate with quantile k n Marco Cattaneo and Andrea Wiencierz @ LMU Munich Robust Regression with Coarse Data

Recommend


More recommend