regularization prescriptions and convex duality density
play

Regularization prescriptions and convex duality: density estimation - PowerPoint PPT Presentation

Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta Department of Mathematical and Statistical Sciences Edmonton, Alberta, Canada Linz, October 2008 joint work with Roger


  1. Regularization prescriptions and convex duality: density estimation and Renyi entropies Ivan Mizera University of Alberta Department of Mathematical and Statistical Sciences Edmonton, Alberta, Canada Linz, October 2008 joint work with Roger Koenker (University of Illinois at Urbana-Champaign) Gratefully acknowledging the support of the Natural Sciences and Engineering Research Council of Canada

  2. Density estimation (say) A useful heuristics: maximum likelihood Given the datapoints X 1 , X 2 , . . . , X n , solve n � f ( X i ) � max ! f i = 1 or equivalently n � − log f ( X i ) � min ! f i = 1 under the side conditions � f � 0, f = 1 1

  3. Note that useful... 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 5 10 15 20 25 2

  4. Dirac catastrophe! 3

  5. Preventing the disaster for general case • Sieves (...) 4

  6. Preventing the disaster for general case • Sieves (...) • Regularization � n � − log f ( X i ) � min ! f � 0, f = 1 f i = 1 4

  7. Preventing the disaster for general case • Sieves (...) • Regularization � n � − log f ( X i ) � min ! J ( f ) � Λ , f � 0, f = 1 f i = 1 4

  8. Preventing the disaster for general case • Sieves (...) • Regularization � n � − log f ( x i ) + λJ ( f ) � min ! f � 0, f = 1 f i = 1 4

  9. Preventing the disaster for general case • Sieves (...) • Regularization � n � log f ( x i ) + λJ ( f ) � min ! f � 0, f = 1 − f i = 1 J ( · ) - penalty (penalizing complexity, lack of smoothness etc.) � | ( log f ) ′′ | = TV (( log f ) ′ ) for instance, J ( f ) = � | ( log f ) ′′′ | = TV (( log f ) ′′ ) or also J ( f ) = Good (1971), Good and Gaskins (1971), Silverman (1982), Leonard (1978), Gu (2002), Wahba, Lin, and Leng (2002) See also: Eggermont and LaRiccia (2001) Ramsay and Silverman (2006) Hartigan (2000), Hartigan and Hartigan (1985) Davies and Kovac (2004) 4

  10. See also in particular Roger Koenker and Ivan Mizera (2007) Density estimation by total variation regularization Roger Koenker and Ivan Mizera (2006) The alter egos of the regularized maximum likelihood density estimators: deregularized maximum-entropy, Shannon, R´ enyi, Simpson, Gini, and stretched strings Roger Koenker, Ivan Mizera, and Jungmo Yoon (200?) What do kernel density estimators optimize? Roger Koenker and Ivan Mizera (2008): Primal and dual formulations relevant for the numerical estimation of a probability density via regularization Roger Koenker and Ivan Mizera (200?) Quasi-concave density estimation http://www.stat.ualberta.ca/ ∼ mizera/ http://www.econ.uiuc.edu/ ∼ roger/ 5

  11. Preventing the disaster for special cases • Shape constraint: monotonicity � n � log f ( X i ) � min ! f � 0, f = 1 − f i = 1 6

  12. Preventing the disaster for special cases • Shape constraint: monotonicity � n � log f ( X i ) � min ! f decreasing, f � 0, f = 1 − f i = 1 Grenander (1956), Jongbloed (1998), Groeneboom, Jongbloed, and Wellner (2001),... 6

  13. Preventing the disaster for special cases • Shape constraint: monotonicity � n � log f ( X i ) � min ! f decreasing, f � 0, f = 1 − f i = 1 Grenander (1956), Jongbloed (1998), Groeneboom, Jongbloed, and Wellner (2001),... • Shape constraint: (strong) unimodality � n � log f ( X i ) � min ! f � 0, f = 1 − f i = 1 6

  14. Preventing the disaster for special cases • Shape constraint: monotonicity � n � log f ( X i ) � min ! f decreasing, f � 0, f = 1 − f i = 1 Grenander (1956), Jongbloed (1998), Groeneboom, Jongbloed, and Wellner (2001),... • Shape constraint: (strong) unimodality � n � log f ( X i ) � min ! − log f convex, f � 0, f = 1 − f i = 1 Eggermont and LaRiccia (2000), Walther (2000) Rufibach and Dumbgen (2006) Pal, Woodroofe, and Meyer (2006) 6

  15. Note Shape constraint: no regularization parameter to be set... ... but of course, we need to believe that the shape is plausible 7

  16. Note Shape constraint: no regularization parameter to be set... ... but of course, we need to believe that the shape is plausible Regularization via TV penalty... ... vs log-concavity shape constraint: The differential operator is the same, only the constraint is somewhat different � | ( log f ) ′′ | � Λ , in the dual | ( log f ) ′′ | � Λ Log-concavity: ( log f ) ′′ � 0 7

  17. Note Shape constraint: no regularization parameter to be set... ... but of course, we need to believe that the shape is plausible Regularization via TV penalty... ... vs log-concavity shape constraint: The differential operator is the same, only the constraint is somewhat different � | ( log f ) ′′ | � Λ , in the dual | ( log f ) ′′ | � Λ Log-concavity: ( log f ) ′′ � 0 Only the functional analysis may be a bit more difficult... ... so let us do the shape-constrained case first 7

  18. The hidden charm of log-concave distributions A density f is called log-concave if − log f is convex. (Usual conventions: − log 0 = ∞ , convex where finite, ...) 8

  19. The hidden charm of log-concave distributions A density f is called log-concave if − log f is convex. (Usual conventions: − log 0 = ∞ , convex where finite, ...) Schoenberg 1940’s, Karlin 1950’s (monotone likelihood ratio) Karlin (1968) - monograph about their mathematics Barlow and Proschan (1975) - reliability Flinn and Heckman (1975) - social choice Caplin and Nalebuff (1991a,b) - voting theory Devroye (1984) - how to simulate from them Mizera (1994) - M-estimators 8

  20. The hidden charm of log-concave distributions A density f is called log-concave if − log f is convex. (Usual conventions: − log 0 = ∞ , convex where finite, ...) Schoenberg 1940’s, Karlin 1950’s (monotone likelihood ratio) Karlin (1968) - monograph about their mathematics Barlow and Proschan (1975) - reliability Flinn and Heckman (1975) - social choice Caplin and Nalebuff (1991a,b) - voting theory Devroye (1984) - how to simulate from them Mizera (1994) - M-estimators Uniform, Normal, Exponential, Logistic, Weibull, Gamma... - all log-concave If f is log-concave, then - it is unimodal (“strongly”) - the convolution with any unimodal density is unimodal - the convolution with any log-concave density is log-concave - f = e − g , with g convex... 8

  21. The hidden charm of log-concave distributions A density f is called log-concave if − log f is convex. (Usual conventions: − log 0 = ∞ , convex where finite, ...) Schoenberg 1940’s, Karlin 1950’s (monotone likelihood ratio) Karlin (1968) - monograph about their mathematics Barlow and Proschan (1975) - reliability Flinn and Heckman (1975) - social choice Caplin and Nalebuff (1991a,b) - voting theory Devroye (1984) - how to simulate from them Mizera (1994) - M-estimators Uniform, Normal, Exponential, Logistic, Weibull, Gamma... - all log-concave If f is log-concave, then - it is unimodal (“strongly”) - the convolution with any unimodal density is unimodal - the convolution with any log-concave density is log-concave - f = e − g , with g convex... No heavy tails! t -distributions (finance!): not log-concave (!!) 8

  22. A convex problem Let g = − log f ; let K be the cone of convex functions. The original problem is transformed: � n � e − g = 1 g ( X i ) � min ! g ∈ K , g i = 1 9

  23. A convex problem Let g = − log f ; let K be the cone of convex functions. The original problem is transformed: � n � e − g � min g ( X i ) + ! g ∈ K g i = 1 9

  24. A convex problem Let g = − log f ; let K be the cone of convex functions. The original problem is transformed: � n � e − g � min g ( X i ) + ! g ∈ K g i = 1 and generalized: let ψ be convex and nonincreasing (like e − x ) � n � e − g � min g ( X i ) + ! g ∈ K g i = 1 9

  25. A convex problem Let g = − log f ; let K be the cone of convex functions. The original problem is transformed: � n � e − g � min g ( X i ) + ! g ∈ K g i = 1 and generalized: let ψ be convex and nonincreasing (like e − x ) � n � ψ ( g ) � min ! g ∈ K g ( X i ) + g i = 1 9

  26. Primal and dual Recall: K is the cone of convex functions; ψ is convex and nonincreasing The strong Fenchel dual of � n 1 � g ( X i ) + ψ ( g ) dx � min ! g ∈ K (P) n g i = 1 is � f = d ( P n − G ) ψ ∗ (− f ) dx � max G ∈ K ∗ ! , (D) − dx f Extremal relation: f = − ψ ′ ( g ) . For penalized estimation, in discretized setting: Koenker and Mizera (2007b) 10

  27. Remarks ψ ∗ ( y ) = x ∈ dom ψ ( yx − ψ ( x )) is the conjugate of ψ sup if primal solutions g are sought in some space, then dual solutions G are sought in a dual space for instance, if g ∈ C ( X ) , and X is compact, then G ∈ C ( X ) ∗ , the space of (signed) Radon measures on X . The equality f = d ( P n − G ) is thus a feasibility constraint dx (for other G , the dual objective is − ∞ ) K ∗ is the dual cone to K - a collection of (signed) Radon � measures such that gdG � 0 for any convex g . Dual: good for computation... 11

  28. Dual: good not only for computation Couldn’t we have here heavy-tailed distribution too? ...possibly going beyond log-concavity? Recall: the strong Fenchel dual of � n 1 � g ( X i ) + ψ ( g ) dx � min ! g ∈ K (P) n g i = 1 is � f = d ( P n − G ) ψ ∗ (− f ) dx � max G ∈ K ∗ − ! , (D) dx f Extremal relation: f = − ψ ′ ( g ) . 12

Recommend


More recommend