regularizing objective functionals in semi supervised
play

Regularizing objective functionals in semi-supervised learning - PowerPoint PPT Presentation

Regularizing objective functionals in semi-supervised learning Dejan Slep cev Carnegie Mellon University February 9, 2018. . 1 / 47 References S.,Thorpe, Analysis of p-Laplacian regularization in semi-supervised learning , arxiv


  1. Regularizing objective functionals in semi-supervised learning Dejan Slepˇ cev Carnegie Mellon University February 9, 2018. . 1 / 47

  2. References S.,Thorpe, Analysis of p-Laplacian regularization in semi-supervised learning , arxiv 1707.06213. Dunlop, S., Stuart, Thorpe, Large-data and zero-noise limits of graph-based semi-supervised learning algorithms in preparation. Garc´ ıa Trillos, Gerlach, Hein, and S., Error estimates for spectral convergence of the graph Laplacian on random geometric graphs towards the Laplace–Beltrami operator , arxiv 1801.10108 Garc´ ıa Trillos and S., Continuum limit of total variation on point clouds , Arch. Ration. Mech. Anal., 220 no. 1, (2016) 193-241. Garc´ ıa Trillos, S., J. von Brecht, T. Laurent, and X. Bresson, Consistency of Cheeger and ratio graph cuts , J. Mach. Learn. Res. 17 (2016) 1-46. Garc´ ıa Trillos, S., A variational approach to the consistency of spectral clustering , published online Applied and Computational Harmonic Analysis. Garc´ ıa Trillos and S., On the rate of convergence of empirical measures in ∞ -transportation distance , Canad. J. Math, 67, (2015), pp. 1358-1383. . 2 / 47

  3. Semi-supervised learning Colors denote real-valued labels Task: Assign real-valued labels to all of the data points . 3 / 47

  4. Semi-supervised learning Graph is used to represent the geometry of the data set . 4 / 47

  5. Semi-supervised learning Consider graph-based objective functions which reward the regularity of the estimator and impose agreement with preassigned labels . 5 / 47

  6. From point clouds to graphs Let V = { x 1 , . . . , x n } be a point cloud in R d : x i W i , j x j Connect nearby vertices: Edge weights W i , j . . 6 / 47

  7. Graph Constructions proximity based graphs W i , j = η ( x i − x j ) η η L L kNN graphs: Connect each vertex with its k nearest neighbors . 7 / 47

  8. p-Dirichelt energy V n = { x 1 , . . . , x n } , weight matrix W : W ij := η ( | x i − x j | ) . p-Dirichlet energy of f n : V n → R is E ( f n ) = 1 � W ij | f n ( x i ) − f n ( x j ) | p . 2 i , j For p = 2 associated operator is the (unnormalized) graph laplacian L = D − W , where D = diag ( d 1 , . . . , d n ) and d i = � j W i , j . . 8 / 47

  9. p-Laplacian semi-supervised learning Assume we are given k labeled points ( x 1 , y 1 ) , . . . ( x k , y k ) and unlabeled points x k + 1 , . . . x n . Question. How to label the rest of the points? p-Laplacian SSL E ( f n ) = 1 � W ij | f n ( x i ) − f n ( x j ) | p Minimize 2 i , j subject to constraint f ( x i ) = y i for i = 1 , . . . , k . Zhu, Ghahramani, and Lafferty ’03 introduced the approach with p = 2. Zhou and Sch¨ olkopf ’05 consider general p . . 9 / 47

  10. p-Laplacian semi-supervised learning: Asymptotics p-Laplacian SSL E ( f n ) = 1 � W ij | f n ( x i ) − f n ( x j ) | p Minimize 2 i , j f ( x i ) = y i for i = 1 , . . . , k . subject to constraint Questions. What happens as n → ∞ ? Do minimizers f n converge to a solution of a limiting problem? In what topology should the question be considered? Remark. We would like to localize η as n → ∞ . . 10 / 47

  11. p-Laplacian semi-supervised learning: Asymptotics p-Laplacian SSL 1 � η ε ( x i − x j ) | f n ( x i ) − f n ( x j ) | p E n ( f n ) = Minimize ε 2 n 2 i , j subject to constraint f n ( x i ) = y i for i = 1 , . . . , k . where � · η ε ( · ) = 1 � ε d η . ε Questions. Do minimizers f n converge to a solution of the limiting problem? In what topology should the question be considered? How shall ε n scale with n for the convergence to hold? . 11 / 47

  12. Ground Truth Assumption We assume points x 1 , x 2 , . . . , are drawn i.i.d out of measure d ν = ρ dx We also assume ρ is supported on a Lipschitz domain Ω and is bounded above and below by positive constants. . 12 / 47

  13. Ground Truth Assumption: Manifold version Assume points x 1 , x 2 , . . . , are drawn i.i.d out of measure d ν = ρ d Vol M , where M is a compact manifold without boundary, and 0 < ρ < C is continuous. x = x, y = -(2 cos(t) (1 - x 2 ) 1/2 (cos(3 x) - 8/5))/5, z = -(2 sin(t) (1 - x 2 ) 1/2 (cos(3 x) - 8/5))/5 0.6 0.4 0.2 0 z -0.2 -0.4 -0.6 -0.6 -0.4 -0.2 -1 0 -0.8 -0.6 -0.4 0.2 -0.2 0 0.4 0.2 y 0.4 0.6 0.6 0.8 x 1 . 13 / 47

  14. Harmonic semi-supervised learning Nadler, Srebro, and Zhou ’09 observed that for p = 2 the minimizers are spiky as n → ∞ . [Also see Wahba ’90.] 1 0.5 1 0.5 00 0.5 1 Figure: Graph of the minimizer for p = 2, n = 1280, i.i.d. data on square; training points ( 0 . 5 , 0 . 2 ) with label 0 and ( 0 . 5 , 0 . 8 ) with label 1. . 14 / 47

  15. p-Laplacian semi-supervised learning El Alaoui, Cheng, Ramdas, Wainwright, and Jordan ’16 , show that spikes can occur for all p ≤ d and propose using p > d . Heuristics. n 1 E ( p ) � η ε ( x i − x j ) | f ( x i ) − f ( x j ) | p n ( f ) = ε p n 2 i , j = 1 � p �� � | f ( x ) − f ( y | n →∞ ≈ η ε ( x i − x j ) ρ ( x ) ρ ( y ) dxdy ε � ε → 0 |∇ f ( x ) | p ρ ( x ) 2 dx ≈ σ η Sobolev space W 1 , p (Ω) embeds into continuous functions iff p > d . . 15 / 47

  16. Continuum p-Laplacian semi-supervised learning µ - measure with density ρ , positive on Ω . Continuum p-Laplacian SSL Minimize � |∇ f ( x ) | p ρ ( x ) 2 dx E ∞ ( f ) = Ω subject to constraints that f ( x i ) = y i for all i = 1 , . . . , k . The functional is convex The problem has a unique minimizer iff p > d . The minimizer lies in W 1 , p (Ω) . 16 / 47

  17. p-Laplacian semi-supervised learning Here: d = 1 and p = 1 . 5. For ε > 0 . 02 the minimizers lack the expected regularity. 0.5 100 1 0.4 80 0.8 % Graphs Connected ( f n ) 0.3 60 0.6 err (1 . 5) f n n 0.2 40 0.4 0.1 20 0.2 0 0 0 0.005 0.01 0.015 0.02 0.025 0 0.2 0.4 0.6 0.8 1 ε Ω (a) error for p = 1 . 5 and d = 1 (b) minimizers for ε = 0 . 023, n = 1280, ten realizations. Labeled points are ( 0 , 0 ) and ( 1 , 1 ) . . 17 / 47

  18. p-Laplacian semi-supervised learning Theorem (Thorpe and S. ’17) Let p > 1 . Let f n be a sequence of minimizers of E ( p ) satisfying n constraints. Let f be a minimizer of E ( p ) ∞ satisfying constraints. � 1 � log n 1 d p ≫ ε n ≫ (i) If d ≥ 3 and n then p > d, f is continuous and n f n converges locally uniformly to f, meaning that for any Ω ′ ⊂⊂ Ω { k ≤ n : x k ∈ Ω ′ } | f ( x k ) − f n ( x k ) | = 0 . lim max n →∞ 1 p then there exists a sequence of real numbers c n (ii) If 1 ≫ ε n ≫ n such that f n − c n converges to zero locally uniformly. Note that in case (ii) all information about labels is lost in the limit. The discrete minimizers exhibit spikes. . 18 / 47

  19. p-Laplacian semi-supervised learning 1 1 0.5 0.5 1 1 0.5 0.5 00 00 0.5 0.5 1 1 (a) discrete minimizer (b) continuum minimizer Minimizer for p = 4, n = 1280, ε = 0 . 058 i.i.d. data on square, with training points ( 0 . 2 , 0 . 5 ) and ( 0 . 8 , 0 . 5 ) and labels 0 and 1 respectively. . 19 / 47

  20. p-Laplacian semi-supervised learning (a) ε = 0 . 058 (b) ε = 0 . 09 (c) ε = 0 . 2 p = 4 which in 2D is in the well-posed regime . 20 / 47

  21. Improved p-Laplacian semi-supervised learning p > d . Labeled points { ( x i , y i ) : i = 1 , . . . , k } . p-Laplacian SSL Minimize 1 � η ε ( x i − x j ) | f n ( x i ) − f n ( x j ) | p E n ( f n ) = ε 2 n 2 i , j subject to constraint f n ( x m ) = y i whenever | x m − x i | < 2 ε, for all i = 1 , . . . , k . where � · η ε ( · ) = 1 � ε d η . ε . 21 / 47

  22. Asymptotics of improved p-Laplacian SSL Theorem (Thorpe and S. ’17) Let p > d. f n be a sequence of minimizers of improved p-Laplacian SSL on n-point sample. f minimizer of E ( p ) ∞ satisfying constraints. Since p > d we know f is continuous. � 1 � log n d If d ≥ 3 and 1 ≫ ε n ≫ then f n converges locally uniformly to f, n meaning that for any Ω ′ ⊂⊂ Ω { k ≤ n : x k ∈ Ω ′ } | f ( x k ) − f n ( x k ) | = 0 . lim max n →∞ . 22 / 47

  23. Comparing the original and improved model Here: d = 1, p = 2, and n = 1280. Labeled points are ( 0 , 0 ) and ( 1 , 1 ) . 0.5 100 0.6 100 0.5 0.4 80 80 % Graphs Connected % Graphs Connected 0.4 n ( f n ) n ( f n ) 0.3 60 60 err (2) err (2) 0.3 0.2 40 40 0.2 0.1 20 20 0.1 0 0 0 0.01 0.02 0.03 0.04 0.05 0 0.05 0.1 0.15 0.2 ε ε (a) original model (b) improved model Note that the axes on the error plots for the models are not the same . 23 / 47

  24. Techniques general approach developed with Garcia–Trillos (ARMA ’16) Γ -convergence. Notion and set of techniques of calculus of variations to consider asymptotics of functionals (here random discrete to continuum) TL p space. Notion of topology based on optimal transportation which allows to compare functions defined on different spaces (here f n ∈ L p ( µ n ) and f ∈ L p ( µ ) ) We also need Nonlocal operators and their asymptotics In SSL, for constraint to be satisfied we need uniform convergence. This also requires discrete regularity and finer compactness results. . 24 / 47

Recommend


More recommend