Regularizing objective functionals in semi-supervised learning - PowerPoint PPT Presentation

Regularizing objective functionals in semi-supervised learning Dejan Slepˇ cev Carnegie Mellon University February 9, 2018. . 1 / 47

References S.,Thorpe, Analysis of p-Laplacian regularization in semi-supervised learning , arxiv 1707.06213. Dunlop, S., Stuart, Thorpe, Large-data and zero-noise limits of graph-based semi-supervised learning algorithms in preparation. Garc´ ıa Trillos, Gerlach, Hein, and S., Error estimates for spectral convergence of the graph Laplacian on random geometric graphs towards the Laplace–Beltrami operator , arxiv 1801.10108 Garc´ ıa Trillos and S., Continuum limit of total variation on point clouds , Arch. Ration. Mech. Anal., 220 no. 1, (2016) 193-241. Garc´ ıa Trillos, S., J. von Brecht, T. Laurent, and X. Bresson, Consistency of Cheeger and ratio graph cuts , J. Mach. Learn. Res. 17 (2016) 1-46. Garc´ ıa Trillos, S., A variational approach to the consistency of spectral clustering , published online Applied and Computational Harmonic Analysis. Garc´ ıa Trillos and S., On the rate of convergence of empirical measures in ∞ -transportation distance , Canad. J. Math, 67, (2015), pp. 1358-1383. . 2 / 47

Semi-supervised learning Colors denote real-valued labels Task: Assign real-valued labels to all of the data points . 3 / 47

Semi-supervised learning Graph is used to represent the geometry of the data set . 4 / 47

Semi-supervised learning Consider graph-based objective functions which reward the regularity of the estimator and impose agreement with preassigned labels . 5 / 47

From point clouds to graphs Let V = { x 1 , . . . , x n } be a point cloud in R d : x i W i , j x j Connect nearby vertices: Edge weights W i , j . . 6 / 47

Graph Constructions proximity based graphs W i , j = η ( x i − x j ) η η L L kNN graphs: Connect each vertex with its k nearest neighbors . 7 / 47

p-Dirichelt energy V n = { x 1 , . . . , x n } , weight matrix W : W ij := η ( | x i − x j | ) . p-Dirichlet energy of f n : V n → R is E ( f n ) = 1 � W ij | f n ( x i ) − f n ( x j ) | p . 2 i , j For p = 2 associated operator is the (unnormalized) graph laplacian L = D − W , where D = diag ( d 1 , . . . , d n ) and d i = � j W i , j . . 8 / 47

p-Laplacian semi-supervised learning Assume we are given k labeled points ( x 1 , y 1 ) , . . . ( x k , y k ) and unlabeled points x k + 1 , . . . x n . Question. How to label the rest of the points? p-Laplacian SSL E ( f n ) = 1 � W ij | f n ( x i ) − f n ( x j ) | p Minimize 2 i , j subject to constraint f ( x i ) = y i for i = 1 , . . . , k . Zhu, Ghahramani, and Lafferty ’03 introduced the approach with p = 2. Zhou and Sch¨ olkopf ’05 consider general p . . 9 / 47

p-Laplacian semi-supervised learning: Asymptotics p-Laplacian SSL E ( f n ) = 1 � W ij | f n ( x i ) − f n ( x j ) | p Minimize 2 i , j f ( x i ) = y i for i = 1 , . . . , k . subject to constraint Questions. What happens as n → ∞ ? Do minimizers f n converge to a solution of a limiting problem? In what topology should the question be considered? Remark. We would like to localize η as n → ∞ . . 10 / 47

p-Laplacian semi-supervised learning: Asymptotics p-Laplacian SSL 1 � η ε ( x i − x j ) | f n ( x i ) − f n ( x j ) | p E n ( f n ) = Minimize ε 2 n 2 i , j subject to constraint f n ( x i ) = y i for i = 1 , . . . , k . where � · η ε ( · ) = 1 � ε d η . ε Questions. Do minimizers f n converge to a solution of the limiting problem? In what topology should the question be considered? How shall ε n scale with n for the convergence to hold? . 11 / 47

Ground Truth Assumption We assume points x 1 , x 2 , . . . , are drawn i.i.d out of measure d ν = ρ dx We also assume ρ is supported on a Lipschitz domain Ω and is bounded above and below by positive constants. . 12 / 47

Ground Truth Assumption: Manifold version Assume points x 1 , x 2 , . . . , are drawn i.i.d out of measure d ν = ρ d Vol M , where M is a compact manifold without boundary, and 0 < ρ < C is continuous. x = x, y = -(2 cos(t) (1 - x 2 ) 1/2 (cos(3 x) - 8/5))/5, z = -(2 sin(t) (1 - x 2 ) 1/2 (cos(3 x) - 8/5))/5 0.6 0.4 0.2 0 z -0.2 -0.4 -0.6 -0.6 -0.4 -0.2 -1 0 -0.8 -0.6 -0.4 0.2 -0.2 0 0.4 0.2 y 0.4 0.6 0.6 0.8 x 1 . 13 / 47

Harmonic semi-supervised learning Nadler, Srebro, and Zhou ’09 observed that for p = 2 the minimizers are spiky as n → ∞ . [Also see Wahba ’90.] 1 0.5 1 0.5 00 0.5 1 Figure: Graph of the minimizer for p = 2, n = 1280, i.i.d. data on square; training points ( 0 . 5 , 0 . 2 ) with label 0 and ( 0 . 5 , 0 . 8 ) with label 1. . 14 / 47

p-Laplacian semi-supervised learning El Alaoui, Cheng, Ramdas, Wainwright, and Jordan ’16 , show that spikes can occur for all p ≤ d and propose using p > d . Heuristics. n 1 E ( p ) � η ε ( x i − x j ) | f ( x i ) − f ( x j ) | p n ( f ) = ε p n 2 i , j = 1 � p �� | f ( x ) − f ( y | n →∞ ≈ η ε ( x i − x j ) ρ ( x ) ρ ( y ) dxdy ε � ε → 0 |∇ f ( x ) | p ρ ( x ) 2 dx ≈ σ η Sobolev space W 1 , p (Ω) embeds into continuous functions iff p > d . . 15 / 47

Continuum p-Laplacian semi-supervised learning µ - measure with density ρ , positive on Ω . Continuum p-Laplacian SSL Minimize � |∇ f ( x ) | p ρ ( x ) 2 dx E ∞ ( f ) = Ω subject to constraints that f ( x i ) = y i for all i = 1 , . . . , k . The functional is convex The problem has a unique minimizer iff p > d . The minimizer lies in W 1 , p (Ω) . 16 / 47

p-Laplacian semi-supervised learning Here: d = 1 and p = 1 . 5. For ε > 0 . 02 the minimizers lack the expected regularity. 0.5 100 1 0.4 80 0.8 % Graphs Connected ( f n ) 0.3 60 0.6 err (1 . 5) f n n 0.2 40 0.4 0.1 20 0.2 0 0 0 0.005 0.01 0.015 0.02 0.025 0 0.2 0.4 0.6 0.8 1 ε Ω (a) error for p = 1 . 5 and d = 1 (b) minimizers for ε = 0 . 023, n = 1280, ten realizations. Labeled points are ( 0 , 0 ) and ( 1 , 1 ) . . 17 / 47

p-Laplacian semi-supervised learning Theorem (Thorpe and S. ’17) Let p > 1 . Let f n be a sequence of minimizers of E ( p ) satisfying n constraints. Let f be a minimizer of E ( p ) ∞ satisfying constraints. � 1 � log n 1 d p ≫ ε n ≫ (i) If d ≥ 3 and n then p > d, f is continuous and n f n converges locally uniformly to f, meaning that for any Ω ′ ⊂⊂ Ω { k ≤ n : x k ∈ Ω ′ } | f ( x k ) − f n ( x k ) | = 0 . lim max n →∞ 1 p then there exists a sequence of real numbers c n (ii) If 1 ≫ ε n ≫ n such that f n − c n converges to zero locally uniformly. Note that in case (ii) all information about labels is lost in the limit. The discrete minimizers exhibit spikes. . 18 / 47

p-Laplacian semi-supervised learning 1 1 0.5 0.5 1 1 0.5 0.5 00 00 0.5 0.5 1 1 (a) discrete minimizer (b) continuum minimizer Minimizer for p = 4, n = 1280, ε = 0 . 058 i.i.d. data on square, with training points ( 0 . 2 , 0 . 5 ) and ( 0 . 8 , 0 . 5 ) and labels 0 and 1 respectively. . 19 / 47

p-Laplacian semi-supervised learning (a) ε = 0 . 058 (b) ε = 0 . 09 (c) ε = 0 . 2 p = 4 which in 2D is in the well-posed regime . 20 / 47

Improved p-Laplacian semi-supervised learning p > d . Labeled points { ( x i , y i ) : i = 1 , . . . , k } . p-Laplacian SSL Minimize 1 � η ε ( x i − x j ) | f n ( x i ) − f n ( x j ) | p E n ( f n ) = ε 2 n 2 i , j subject to constraint f n ( x m ) = y i whenever | x m − x i | < 2 ε, for all i = 1 , . . . , k . where � · η ε ( · ) = 1 � ε d η . ε . 21 / 47

Asymptotics of improved p-Laplacian SSL Theorem (Thorpe and S. ’17) Let p > d. f n be a sequence of minimizers of improved p-Laplacian SSL on n-point sample. f minimizer of E ( p ) ∞ satisfying constraints. Since p > d we know f is continuous. � 1 � log n d If d ≥ 3 and 1 ≫ ε n ≫ then f n converges locally uniformly to f, n meaning that for any Ω ′ ⊂⊂ Ω { k ≤ n : x k ∈ Ω ′ } | f ( x k ) − f n ( x k ) | = 0 . lim max n →∞ . 22 / 47

Comparing the original and improved model Here: d = 1, p = 2, and n = 1280. Labeled points are ( 0 , 0 ) and ( 1 , 1 ) . 0.5 100 0.6 100 0.5 0.4 80 80 % Graphs Connected % Graphs Connected 0.4 n ( f n ) n ( f n ) 0.3 60 60 err (2) err (2) 0.3 0.2 40 40 0.2 0.1 20 20 0.1 0 0 0 0.01 0.02 0.03 0.04 0.05 0 0.05 0.1 0.15 0.2 ε ε (a) original model (b) improved model Note that the axes on the error plots for the models are not the same . 23 / 47

Techniques general approach developed with Garcia–Trillos (ARMA ’16) Γ -convergence. Notion and set of techniques of calculus of variations to consider asymptotics of functionals (here random discrete to continuum) TL p space. Notion of topology based on optimal transportation which allows to compare functions defined on different spaces (here f n ∈ L p ( µ n ) and f ∈ L p ( µ ) ) We also need Nonlocal operators and their asymptotics In SSL, for constraint to be satisfied we need uniform convergence. This also requires discrete regularity and finer compactness results. . 24 / 47

Regularizing objective functionals in semi-supervised learning - PowerPoint PPT Presentation

Regularizing objective functionals in semi-supervised learning Dejan Slep cev Carnegie Mellon University February 9, 2018. . 1 / 47 References S.,Thorpe, Analysis of p-Laplacian regularization in semi-supervised learning , arxiv

Regularizing Part Geometry Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

A Kernel Perspective for Regularizing Deep Neural Networks Julien Mairal Inria Grenoble Imaging

Computable functionals over non-flat data types Helmut Schwichtenberg (j.w.w. Basil Karadais,

Minimal logic for computable functionals Helmut Schwichtenberg Mathematisches Institut der

Data reshaping with tidyr Data reshaping with tidyr and functionals with purrr and functionals

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Semi-Crystalline Polymer Morphologies and their Hierarchical Morphologies 1 Semi-Crystalline

The Wonderful World of Brownian Functionals Satya N. Majumdar Laboratoire de Physique Th

Abduction and Induction Deduction, Abduction and Induction Abduction Mathematical

Provable Security Introduction UCL - Louvain-la-Neuve Monday, July 8th, 2002 David Pointcheval

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Basel |

Design Patterns I Encapsulation, Observer, and Decorator CS 370 SE Practicum, Cengiz Gnay

Towards Reconfigurable Rack-Scale Networking Tyler Szepesi , Bernard Wong, Tim Brecht, Sajjad Rizvi

Densest/Heaviest k -subgraph on Interval Graphs, Chordal Graphs and Planar Graphs Presented by

Descriptive Set Theory and the Wadge order Reducibility notions for the Scott domain PhDs in

. Two Keywords . . . The Shore-Slaman Join Theorem (1999) 1 It was proved by using