Large scale graph learning from smooth signals Kalofolias Vassilis Nathanael Perraudin 13 November 2019
Graph learning learn Given W graph G matrix X x i x j X weighted adjacency matrix rows: objects 2
Dimensionality - manifolds Interesting problems: (# nodes) ≫ (# features) Example: MNIST n m 60K No structure? X W ⇒ Full W m ✘ Ill-posed m 784 1 1 1 1 1 1 3
Dimensionality - manifolds Interesting problems: (# nodes) ≫ (# features) Example: MNIST n m 60K No structure? X W ⇒ Full W m ✘ Ill-posed m 784 thickness 1 1 Low-dimensional manifold? ⇒ Local dependencies 1 1 ✔ Sparse W 1 1 angle 4
<latexit sha1_base64="t9eKquhWI1rzE6zxwUXWcUo1Tck=">ACXHicbVHLahsxFNVMHnWcl9tANt2ImkAKxcykoY9FIDSbLrpIoY5NLMdo5Du2iOaBdKfUKPrJ7LpryTy2C1N0gOCw7n3XF0dJaWSBqPoLghXVtfWXzQ2mptb2zu7rZevLkxRaQFdUahC9xNuQMkcuihRQb/UwLNEQS+5PpvXez9BG1nkP3BWwjDjk1ymUnD0qhlGMIvtKgdU5DioWX1TJuoClzfXTEsSvrticq0nEzxLT2hLNVc2NjZo4Wf3fxp1TB2PceE1IJeLgzsZmTjd7EbtdpRJ6pBn5N4SdpkifNR65aNC1FlkKNQ3JhBHJU4tFyjFApck1UGSi6u+QGnuY8AzO09R6OHnhlTNC+5MjrdV/HZnxsyxHdmHKfmaW0u/q82qD9NLQyLyuEXCwuSitFsaDzpOlYahCoZp5woaXflYop93mh/49mHcLnOT78fJzcnHUid93jr8ft0+/LONokNfkDTkMflITslXck6RJA7ch80go3gd7gabobi9YwWHr2yCOE+w/lrjZ</latexit> Smoothness Data is smooth on graph Data lives on a low-dimensional manifold Dirichlet energy is small: = 1 kr G X k 2 X > LX � � X W i,j k x i � x j k 2 = tr 2 F 2 i,j r > G r G Z ij = k x i � x j k 2 D − W = 1 � 2 k W � Z k 1 , 1 graph data sparsity smoothness 5
Graphs from smooth signals 2 min tr s.t W 1 ≥ 1 LX [Daitch etal 2009] F 2 ≤ α n 2 s.t 1 ⊤ max ( 0 , W 1 ) min tr LX [Daitch etal 2009] F min tr ( X ⊤ LX ) − log | L + α I | + β ∥ W ∥ 1,1 [Lake & Tenenbaum 2010] min tr ( X ⊤ LX ) + α ∥ L ∥ 2 F s.t. tr ( L ) = n [Hu etal 2015, Dong etal 2016] min tr ( X ⊤ LX ) − α 1 ⊤ log( W 1 ) + β 2 ∥ W ∥ 2 [Kalofolias 2016] F 6
The log-degrees model W 2 W m k W � Z k 1 , 1 � α 1 > log( W 1 ) + β 2 k W k 2 min F O ( n 2 ) • First algorithm of • Best results among “scalable” models Goal: scale it further! 7
How to scale it? O ( n 2 ) 1. Reduce the number of variables from 2. Eliminate grid search: automatic parameter selection 8
How to scale it? O ( n 2 ) 1. Reduce the number of variables from 2. Eliminate grid search: automatic parameter selection 9
Optimization Objective can be split in 3 functions: W 2 W m k W � Z k 1 , 1 � α 1 > log( W 1 ) + β 2 k W k 2 min F Sketch of algorithm: Approximately minimize each function O ( n 2 ) 1. Shrink edges according to distance O ( n 2 ) 2. Enhance edges of badly connected nodes 3. Shrink large edges O ( n 2 ) 10
<latexit sha1_base64="J739DmoyEdJSPwCcw04Pa7jaqo8=">ACNXicbVDLSsNAFJ3Ud31FXboJFkE3JVXxsRNFcCGoYK3Q1DKZ3rRDJw9mbtQS81Nu/A9XunChiFt/wSQNRa0Hhjmc+/MvcOBFdomi9aYWR0bHxicqo4PTM7N68vLF4qP5QMqswXvryqQLBPagiRwFXgQTq2gJqdvcw9Ws3IBX3vQvsBdBwadvjDmcUE6mpn1guxQ6jIjqNLQEOrkVW9mokodVX7gclR/G1hXCHERXCv019ydsdvM/v9aZeMstmBmOYVHJSIjnOmvqT1fJZ6IKHTFCl6hUzwEZEJXImIC5aoYKAsi5tQz2hHnVBNaJsvthYTZSW4fgyOR4amfqzI6KuUj3XTirTBdRfLxX/8+ohOruNiHtBiOCx/kdOKAz0jTRCo8UlMBS9hFAmeTKrwTpUoZJ0MUshL0U24OVh8nlRrmyWd463yrtH+RxTJlskLWSIXskH1yTM5IlTDyQJ7JG3nXHrVX7UP7JcWtLxnifyC9vUNnK2vGw=</latexit> <latexit sha1_base64="J739DmoyEdJSPwCcw04Pa7jaqo8=">ACNXicbVDLSsNAFJ3Ud31FXboJFkE3JVXxsRNFcCGoYK3Q1DKZ3rRDJw9mbtQS81Nu/A9XunChiFt/wSQNRa0Hhjmc+/MvcOBFdomi9aYWR0bHxicqo4PTM7N68vLF4qP5QMqswXvryqQLBPagiRwFXgQTq2gJqdvcw9Ws3IBX3vQvsBdBwadvjDmcUE6mpn1guxQ6jIjqNLQEOrkVW9mokodVX7gclR/G1hXCHERXCv019ydsdvM/v9aZeMstmBmOYVHJSIjnOmvqT1fJZ6IKHTFCl6hUzwEZEJXImIC5aoYKAsi5tQz2hHnVBNaJsvthYTZSW4fgyOR4amfqzI6KuUj3XTirTBdRfLxX/8+ohOruNiHtBiOCx/kdOKAz0jTRCo8UlMBS9hFAmeTKrwTpUoZJ0MUshL0U24OVh8nlRrmyWd463yrtH+RxTJlskLWSIXskH1yTM5IlTDyQJ7JG3nXHrVX7UP7JcWtLxnifyC9vUNnK2vGw=</latexit> <latexit sha1_base64="1X4DdUDhRTIg6jQ2POJ2K9/o60=">ACm3icdVHbahsxENVuL0ncm9O+BEpB1BQc2pdJ6TtSwkNtKG0kEKdDbWcRStrbRFdFklbML8Uf2UvVvqrVNb0kPCB3NnJnRzBQVZ8YmyY8ovnb9xs2Nza3Wrdt37t5rb98/NarWhA6I4kqfFdhQziQdWGY5Pas0xaLgNCsujhp/9pVqw5T8bGcVHQk8kaxkBNtgytvfkGAydxli0iGCOcx8LjxcoHl4NtFO07H/6BFhmsBsdX1B89ylz9Kge76ACPNqiES2E6L0qX+HFlVQcTVpNu9Msvub+0uXDwNKUqNiUMFtdi7vof/qz7P357383Yn6SVLwMskXZMOWOMkb39HY0VqQaUlHBszTJPKjhzWlhFOfQvVhlaYXOAJHQYqsaBm5Jb1PXwSLGNYKh2OtHBp/TPCYWHMTBRB2TRl/vU1xqt8w9qWL0eOyaq2VJVobLm0CrYLAqOmabE8lkgmGgW/grJFIc52bDO1nIrxoc/Gr5Mjnt9K93v6n/c7hm/U4NsFD8Bh0QpegENwDE7AJBoJ3odvYuO40fxUfw+/rCSxtE65gH4C/HgJ0cjzcs=</latexit> <latexit sha1_base64="J739DmoyEdJSPwCcw04Pa7jaqo8=">ACNXicbVDLSsNAFJ3Ud31FXboJFkE3JVXxsRNFcCGoYK3Q1DKZ3rRDJw9mbtQS81Nu/A9XunChiFt/wSQNRa0Hhjmc+/MvcOBFdomi9aYWR0bHxicqo4PTM7N68vLF4qP5QMqswXvryqQLBPagiRwFXgQTq2gJqdvcw9Ws3IBX3vQvsBdBwadvjDmcUE6mpn1guxQ6jIjqNLQEOrkVW9mokodVX7gclR/G1hXCHERXCv019ydsdvM/v9aZeMstmBmOYVHJSIjnOmvqT1fJZ6IKHTFCl6hUzwEZEJXImIC5aoYKAsi5tQz2hHnVBNaJsvthYTZSW4fgyOR4amfqzI6KuUj3XTirTBdRfLxX/8+ohOruNiHtBiOCx/kdOKAz0jTRCo8UlMBS9hFAmeTKrwTpUoZJ0MUshL0U24OVh8nlRrmyWd463yrtH+RxTJlskLWSIXskH1yTM5IlTDyQJ7JG3nXHrVX7UP7JcWtLxnifyC9vUNnK2vGw=</latexit> Optimization Objective can be split in 3 functions: W 2 W m k M � W � Z k 1 , 1 � α 1 > log(( M � W ) 1 ) + β 2 k M � W k 2 min F �� � E allowed � �� O 1. Shrink edges according to distance �� � E allowed � �� 2. Enhance edges of badly connected nodes O �� � E allowed � 3. Shrink large edges �� O 11
Reducing allowed edge set How do we choose a restricted edge set? • Prior: structure imposed by application e.g. geometric constraints What if no structure known? • Approximate Nearest Neighbours (ANN) 12
<latexit sha1_base64="AaM9PN1QDHhPamzaKcpO8GZJaU=">AB63icbVDLSsNAFL2pr1pfUZduBovgqiRafCyEohuXFewD2lAm0k7dGYSZiZCf0FNy4UcesPufNvTNIgaj1w4XDOvdx7jx9xpo3jfFqlpeWV1bXyemVjc2t7x97da+swVoS2SMhD1fWxpxJ2jLMcNqNFMXC57TjT24yv/NAlWahvDfTiHoCjyQLGMEmkyZXrjOwq07NyYEWiVuQKhRoDuyP/jAksaDSEI617rlOZLwEK8MIp7NKP9Y0wmSCR7SXUokF1V6S3zpDR6kyREGo0pIG5erPiQLrafCTzsFNmP918vE/7xebIL2Eyig2VZL4oiDkyIcoeR0OmKDF8mhJMFEtvRWSMFSYmjaeSh3CZ4ez75UXSPqm5p7X6Xb3auC7iKMBHMIxuHAODbiFJrSAwBge4RleLGE9Wa/W27y1ZBUz+/AL1vsXS9ON4w=</latexit> <latexit sha1_base64="rt96viDYlqLKtVSWihgSAO2zXs=">ACFnicbVBNSyNBFOzxY9Wsu2b16KUxCHvZMNmIHwdBFMGjglEhk5U3nRdt0tM9dL9xDWN+hRf/ihcPingVb/4be2KQVbegoaiqx+tXcaqkozB8DkZGx8a/TExOlb5Of/s+U/4xe+BMZgU2hFHGHsXgUEmNDZKk8Ci1CEms8DubhX+4RlaJ43ep16KrQROtOxIAeSl4/KviygBOhWg8u3+n4jwnHJQyvzFdv8igjS15pzXu5qv83ro85WwGg7AP5PakFTYELvH5aeobUSWoCahwLlmLUyplYMlKRT2S1HmMAXRhRNseqohQdfKB2f1+aJX2rxjrH+a+ED9dyKHxLleEvtkcYP76BXi/7xmRp3Vi51mhFq8bqokylOhcd8ba0KEj1PAFhpf8rF6dgQZBvsjQoYa3A8tvJn8nB72qtXl3aW6psbA7rmGTzbIH9ZDW2wjbYDtlDSbYJbtmt+wuApugvg4TU6Egxn5tg7BI8v1vCf3w=</latexit> Using ANN to reduce cost k = 10 “I want a graph with 10 edges per node on average” O ( n log( n ) m ) Compute approximate 30 NN graph (binary) |E allowed | ≈ 3 kn = 30 n Learn weights for allowed edges O ( nk ) Some of them are deleted! (W ij =0) Cost? Final 10 NN graph 13
How to scale it? O ( n 2 ) 1. Reduce the number of variables from 2. Eliminate grid search: automatic parameter selection 14
Recommend
More recommend