learning graphs from data
play

Learning graphs from data: A signal processing perspective Xiaowen - PowerPoint PPT Presentation

Learning graphs from data: A signal processing perspective Xiaowen Dong MIT Media Lab Graph Signal Processing Workshop Pittsburgh, PA, May 2017 Introduction What is the problem of graph learning? 2 /34 Introduction What is the


  1. A (partial) historical overview ` 1 -regularized quadratic approx. covariance neighborhood -regularized of Gauss. neg. ` 1 selection regression log-determinant log-likelihood Meinshausen Banerjee Dempster Hsieh & Buhlmann Friedman 1972 2006 2008 2011 Estimation of sparse precision matrix graphical LASSO maximizes likelihood of precision matrix : Θ max log det Θ − tr( S Θ ) − ρ || Θ || 1 Θ log-likelihood function X S 9 /34

  2. A (partial) historical overview ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 selection regression log-determinant regression log-likelihood Meinshausen Banerjee Dempster Ravikumar Hsieh & Buhlmann Friedman 1972 2006 2008 2010 2011 Neighborhood learning for discrete variables v 3 v 4 β 13 β 14 v 2 β 15 v 5 β 12 v 1 10 /34

  3. A (partial) historical overview ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 selection regression log-determinant regression log-likelihood Meinshausen Banerjee Dempster Ravikumar Hsieh & Buhlmann Friedman 1972 2006 2008 2010 2011 Neighborhood learning for discrete variables m X 1 m v 3 v 4 β 13 β 14 X \ 1 m v 2 β 15 v 5 β 12 v 1 10 /34

  4. A (partial) historical overview ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 selection regression log-determinant regression log-likelihood Meinshausen Banerjee Dempster Ravikumar Hsieh & Buhlmann Friedman 1972 2006 2008 2010 2011 Neighborhood learning for discrete variables m X 1 m v 3 v 4 β 13 β 14 X \ 1 m v 2 β 15 v 5 β 12 regularized logistic regression: v 1 max log P β ( X 1 m | X \ 1 m ) − λ || β 1 || 1 β 1 logistic function 10 /34

  5. A (partial) historical overview • Simple and intuitive methods - Sample correlation - Similarity function (e.g., Gaussian RBF) • Learning graphical models - Classical learning approaches lead to both positive/negative relations - What about learning a graph topology with non-negative weights? 11 /34

  6. A (partial) historical overview • Simple and intuitive methods - Sample correlation - Similarity function (e.g., Gaussian RBF) • Learning graphical models - Classical learning approaches lead to both positive/negative relations - What about learning a graph topology with non-negative weights? • Learning topologies with non-negative weights - M-matrices (sym., p.d., non-pos. o ff -diag.) have been used as precision, leading to attractive GMRF (Slawski and Hein 2015) 11 /34

  7. A (partial) historical overview • Simple and intuitive methods - Sample correlation - Similarity function (e.g., Gaussian RBF) • Learning graphical models - Classical learning approaches lead to both positive/negative relations - What about learning a graph topology with non-negative weights? • Learning topologies with non-negative weights - M-matrices (sym., p.d., non-pos. o ff -diag.) have been used as precision, leading to attractive GMRF (Slawski and Hein 2015) The combinatorial graph Laplacian L = Deg - W belongs to M-matrices and is - equivalent to graph topology 11 /34

  8. A (partial) historical overview • Simple and intuitive methods - Sample correlation - Similarity function (e.g., Gaussian RBF) • Learning graphical models - Classical learning approaches lead to both positive/negative relations - What about learning a graph topology with non-negative weights? • Learning topologies with non-negative weights - M-matrices (sym., p.d., non-pos. o ff -diag.) have been used as precision, leading to attractive GMRF (Slawski and Hein 2015) The combinatorial graph Laplacian L = Deg - W belongs to M-matrices and is - equivalent to graph topology From arbitrary precision matrix to graph Laplacian! 11 /34

  9. A (partial) historical overview ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 selection regression log-determinant regression log-likelihood Meinshausen Banerjee Dempster Ravikumar Hsieh & Buhlmann Friedman 1972 2006 2008 2010 2011 max log det Θ − tr( S Θ ) − ρ || Θ || 1 Θ graph Laplacian L can be the precision, BUT it is singular 12 /34

  10. A (partial) historical overview ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 selection regression log-determinant regression log-likelihood Meinshausen Banerjee Dempster Ravikumar Hsieh & Buhlmann Friedman 1972 2006 2008 2010 2011 max log det Θ − tr( S Θ ) − ρ || Θ || 1 Θ s.t. Θ = L + 1 σ 2 I Lake ` 1 -regularized graph Laplacian L can be the precision, log-determinant BUT it is singular on generalized L 12 /34

  11. A (partial) historical overview ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 selection regression log-determinant regression log-likelihood Meinshausen Banerjee Dempster Ravikumar Hsieh & Buhlmann Friedman 1972 2006 2008 2010 2011 max log det Θ − tr( S Θ ) − ρ || Θ || 1 Θ s.t. Θ = L + 1 σ 2 I Lake ` 1 -regularized graph Laplacian L can be the precision, log-determinant BUT it is singular on generalized L precision by Laplacian by graphical LASSO Lake et al. 12 /34

  12. A (partial) historical overview ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 selection regression log-determinant regression log-likelihood Meinshausen Banerjee Dempster Ravikumar Hsieh & Buhlmann Friedman 1972 2006 2008 2010 2011 max log det Θ − tr( S Θ ) − ρ || Θ || 1 Θ s.t. Θ = L + 1 σ 2 I Lake ` 1 -regularized graph Laplacian L can be the precision, log-determinant BUT it is singular on generalized L precision by Laplacian by graphical LASSO Lake et al. Slawski and Hein (2015) 12 /34

  13. A (partial) historical overview ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 selection regression log-determinant regression log-likelihood Meinshausen Banerjee Dempster Ravikumar Hsieh & Buhlmann Friedman 1972 2006 2008 2009 2010 2011 2013 Daitch Lake Hu quadratic form ` 1 -regularized quadratic form tr( X T L s X ) − β || W || F F = X T L 2 X || LX || 2 log-determinant of power of L of power of L on generalized L locally linear embedding [Roweis00] Slawski and Hein (2015) 13 /34

  14. A (partial) historical overview x : V → R N ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 v 1 selection regression log-determinant regression log-likelihood v 4 v 3 v 2 Meinshausen Banerjee v 5 Dempster Ravikumar Hsieh v 6 & Buhlmann Friedman v 7 v 8 v 9 1972 2006 2008 2009 2010 2011 2013 GSP Daitch Lake Hu quadratic form ` 1 -regularized quadratic form log-determinant of power of L of power of L on generalized L Slawski and Hein (2015) 14 /34

  15. A (partial) historical overview x : V → R N ` 1 -regularized ` 1 -regularized quadratic approx. covariance neighborhood -regularized logistic of Gauss. neg. ` 1 v 1 selection regression log-determinant regression log-likelihood v 4 v 3 v 2 Meinshausen Banerjee v 5 Dempster Ravikumar Hsieh v 6 & Buhlmann Friedman v 7 v 8 v 9 1972 2006 2008 2009 2010 2011 2013 2015 2016 GSP Daitch Lake Hu Dong Egilmez Thanou Mei Kalofolias Baingana quadratic form ` 1 -regularized quadratic form log-determinant of power of L of power of L Pasdeloup Segarra Chepuri on generalized L signal processing Slawski and Hein perspective (2015) 14 /34

  16. A signal processing perspective • Existing approaches have limitations - Simple correlation or similarity functions are not enough - Most classical methods for learning graphical models do not directly lead to topologies with non-negative weights - There is no strong emphasis on signal/graph interaction with spectral/frequency- domain interpretation 15 /34

  17. A signal processing perspective • Existing approaches have limitations - Simple correlation or similarity functions are not enough - Most classical methods for learning graphical models do not directly lead to topologies with non-negative weights - There is no strong emphasis on signal/graph interaction with spectral/frequency- domain interpretation • Opportunity and challenge for graph signal processing - GSP tools such as frequency-analysis and filtering can contribute to the graph learning problem - Filtering-based approaches can provide generative models for signals with complex non-Gaussian behavior 15 /34

  18. A signal processing perspective • Signal processing is about D c = x × = D c x 16 /34

  19. A signal processing perspective • Graph signal processing is about D ( G ) c = x v 3 v 4 × = v 2 v 5 v 1 D ( G ) G c x 16 /34

  20. A signal processing perspective • Forward: Given G and x , design D to study c v 3 v 4 × = v 2 v 5 v 1 D ( G ) G c x Fourier/wavelet graph Fourier/ [Coifman06,Narang09,Hammond11, atoms wavelet coe ffi cient Shuman13,Sandryhaila13] trained dictionary graph dictionary [Zhang12,Thanou14] atoms coe ffi cient 16 /34

  21. A signal processing perspective • Backward (graph learning): Given x , design D and c to infer G v 3 v 4 × = v 2 v 5 v 1 D ( G ) G c x 16 /34

  22. A signal processing perspective • Backward (graph learning): Given x , design D and c to infer G v 3 v 4 × = v 2 v 5 v 1 D ( G ) G c x The key is a signal/graph model behind D - - Designed around graph operators (adjacency/Laplacian matrices, shift operators) 16 /34

  23. A signal processing perspective • Backward (graph learning): Given x , design D and c to infer G v 3 v 4 × = v 2 v 5 v 1 D ( G ) G c x The key is a signal/graph model behind D - - Designed around graph operators (adjacency/Laplacian matrices, shift operators) Choice of/assumption on c often determines signal characteristics - 16 /34

  24. Model 1: Global smoothness • Signal values vary smoothly between all pairs of nodes that are connected • Example: Temperature of di ff erent locations in a flat geographical region • Usually quantified by the Laplacian quadratic form: x T Lx = 1 W ij ( x ( i ) − x ( j )) 2 X 2 i,j 17 /34

  25. Model 1: Global smoothness • Signal values vary smoothly between all pairs of nodes that are connected • Example: Temperature of di ff erent locations in a flat geographical region • Usually quantified by the Laplacian quadratic form: x : V → R N x T Lx = 1 W ij ( x ( i ) − x ( j )) 2 X 2 v 1 i,j v 3 v 4 v 2 v 5 v 6 v 7 v 8 v 9 x T Lx = 1 v 1 v 4 v 3 v 2 v 5 v 6 v 7 v 8 v 9 x T Lx = 21 17 /34

  26. Model 1: Global smoothness • Signal values vary smoothly between all pairs of nodes that are connected • Example: Temperature of di ff erent locations in a flat geographical region • Usually quantified by the Laplacian quadratic form: x : V → R N x T Lx = 1 W ij ( x ( i ) − x ( j )) 2 X 2 v 1 i,j v 3 v 4 v 2 v 5 v 6 v 7 v 8 Similar to previous approaches: v 9 x T Lx = 1 σ 2 I log det Θ − 1 M tr( XX T Θ ) − ρ || Θ || 1 Lake (2010): max Θ = L + 1 v 1 X T L 2 X Daitch (2009): min v 4 v 3 v 2 v 5 L v 6 v 7 v 8 tr( X T L s X ) − β || W || F Hu (2013): min v 9 L x T Lx = 21 17 /34

  27. Model 1: Global smoothness • Dong et al. (2015) & Kalofolias (2016) v 3 v 4 × = v 2 v 5 v 1 (eigenvector matrix of L ) - D ( G ) = χ G χ c x Gaussian assumption on c : c ∼ N ( 0 , Λ ) - 18 /34

  28. Model 1: Global smoothness • Dong et al. (2015) & Kalofolias (2016) v 3 v 4 × = v 2 v 5 v 1 (eigenvector matrix of L ) - D ( G ) = χ G χ c x Gaussian assumption on c : c ∼ N ( 0 , Λ ) - Maximum a posterior (MAP) estimation on c leads to minimization of Laplacian - quadratic form: || x − χ c || 2 min 2 − log P c ( c ) c F + α tr( Y T LY ) + β || L || 2 L , Y || X − Y || 2 min F regularization data fidelity smoothness on Y 18 /34

  29. Model 1: Global smoothness • Dong et al. (2015) & Kalofolias (2016) v 3 v 4 × = v 2 v 5 v 1 (eigenvector matrix of L ) - D ( G ) = χ G χ c x Gaussian assumption on c : c ∼ N ( 0 , Λ ) - Maximum a posterior (MAP) estimation on c leads to minimization of Laplacian - quadratic form: || x − χ c || 2 min 2 − log P c ( c ) v 1 c v 3 v 4 v 2 v 5 v 6 v 7 v 8 F + α tr( Y T LY ) + β || L || 2 L , Y || X − Y || 2 min v 9 F regularization data fidelity smoothness on Y x 18 /34

  30. Model 1: Global smoothness • Dong et al. (2015) & Kalofolias (2016) v 3 v 4 × = v 2 v 5 v 1 (eigenvector matrix of L ) - D ( G ) = χ G χ c x Gaussian assumption on c : c ∼ N ( 0 , Λ ) - Maximum a posterior (MAP) estimation on c leads to minimization of Laplacian - quadratic form: || x − χ c || 2 min 2 − log P c ( c ) v 1 c v 3 v 4 v 2 v 5 v 6 v 7 v 8 F + α tr( Y T LY ) + β || L || 2 L , Y || X − Y || 2 min v 9 F regularization data fidelity smoothness on Y y 18 /34

  31. Model 1: Global smoothness • Dong et al. (2015) & Kalofolias (2016) v 3 v 4 × = v 2 v 5 v 1 (eigenvector matrix of L ) - D ( G ) = χ G χ c x Gaussian assumption on c : c ∼ N ( 0 , Λ ) - Maximum a posterior (MAP) estimation on c leads to minimization of Laplacian - quadratic form: || x − χ c || 2 min 2 − log P c ( c ) v 1 c v 3 v 4 v 2 v 5 v 6 v 7 v 8 F + α tr( Y T LY ) + β || L || 2 L , Y || X − Y || 2 min v 9 F regularization data fidelity smoothness on Y y Learning enforces signal property (global smoothness)! 18 /34

  32. Model 1: Global smoothness • Egilmez et al. (2016) v 3 v 4 × = v 2 v 5 v 1 K = S − α 2 ( 11 T − I ) min tr( Θ K ) − log det Θ s.t. G χ c x Θ - Solve for as three di ff erent graph Laplacian matrices: Θ 19 /34

  33. Model 1: Global smoothness • Egilmez et al. (2016) v 3 v 4 × = v 2 v 5 v 1 K = S − α 2 ( 11 T − I ) min tr( Θ K ) − log det Θ s.t. G χ c x Θ - Solve for as three di ff erent graph Laplacian matrices: Θ non-negative negative generalized Laplacian Θ = L + V = Deg − W + V 19 /34

  34. Model 1: Global smoothness • Egilmez et al. (2016) v 3 v 4 × = v 2 v 5 v 1 K = S − α 2 ( 11 T − I ) min tr( Θ K ) − log det Θ s.t. G χ c x Θ - Solve for as three di ff erent graph Laplacian matrices: Θ non-negative negative generalized diagonally dominant Laplacian generalized Laplacian Θ = L + V Θ = L + V = Deg − W + V ( V ≥ 0 ) = Deg − W + V 19 /34

  35. Model 1: Global smoothness • Egilmez et al. (2016) v 3 v 4 × = v 2 v 5 v 1 K = S − α 2 ( 11 T − I ) min tr( Θ K ) − log det Θ s.t. G χ c x Θ - Solve for as three di ff erent graph Laplacian matrices: Θ non-negative negative generalized diagonally dominant combinatorial Laplacian generalized Laplacian Laplacian Θ = L + V Θ = L + V Θ = L = Deg − W = Deg − W + V ( V ≥ 0 ) = Deg − W + V 19 /34

  36. Model 1: Global smoothness • Egilmez et al. (2016) v 3 v 4 × = v 2 v 5 v 1 K = S − α 2 ( 11 T − I ) min tr( Θ K ) − log det Θ s.t. G χ c x Θ - Solve for as three di ff erent graph Laplacian matrices: Θ non-negative negative generalized diagonally dominant combinatorial Laplacian generalized Laplacian Laplacian Θ = L + V Θ = L + V Θ = L = Deg − W = Deg − W + V ( V ≥ 0 ) = Deg − W + V Generalizes graphical LASSO and Lake Adding priors on edge weights leads to interpretation of MAP estimation 19 /34

  37. Model 1: Global smoothness • Chepuri et al. (2016) v 3 v 4 × = v 2 v 5 v 1 G χ c x - An edge selection mechanism based on the same smoothness measure: v 1 v 4 v 3 v 2 v 5 v 6 v 7 v 8 v 9 20 /34

  38. Model 1: Global smoothness • Chepuri et al. (2016) v 3 v 4 × = v 2 v 5 v 1 G χ c x - An edge selection mechanism based on the same smoothness measure: v 1 v 4 v 3 v 2 v 5 v 6 v 7 v 8 v 9 20 /34

  39. Model 1: Global smoothness • Chepuri et al. (2016) v 3 v 4 × = v 2 v 5 v 1 G χ c x - An edge selection mechanism based on the same smoothness measure: v 1 v 1 v 4 v 3 v 4 v 3 v 2 v 2 v 5 v 5 v 6 v 6 v 7 v 7 v 8 v 8 v 9 v 9 20 /34

  40. Model 1: Global smoothness • Chepuri et al. (2016) v 3 v 4 × = v 2 v 5 v 1 G χ c x - An edge selection mechanism based on the same smoothness measure: v 1 v 1 v 1 v 4 v 3 v 4 v 3 v 4 v 3 v 2 v 2 v 2 v 5 v 5 v 5 v 6 v 6 v 6 v 7 v 7 v 7 v 8 v 8 v 8 v 9 v 9 v 9 20 /34

  41. Model 1: Global smoothness • Chepuri et al. (2016) v 3 v 4 × = v 2 v 5 v 1 G χ c x - An edge selection mechanism based on the same smoothness measure: v 1 v 1 v 1 v 4 v 3 v 4 v 3 v 4 v 3 v 2 v 2 v 2 v 5 v 5 v 5 v 6 v 6 v 6 v 7 v 7 v 7 v 8 v 8 v 8 v 9 v 9 v 9 Similar in spirit to Dempster Good for learning unweighted graph Explicit edge-handler is desirable in some applications 20 /34

  42. Model 2: Di ff usion process • Signals are outcome of some di ff usion processes on the graph (more of local smoothness than global one!) • Example: Movement of people/vehicles in transportation network 21 /34

  43. Model 2: Di ff usion process • Signals are outcome of some di ff usion processes on the graph (more of local smoothness than global one!) • Example: Movement of people/vehicles in transportation network • Characterized by di ff usion operators v 1 v 3 v 4 v 2 v 5 v 6 v 7 v 8 v 9 initial stage 21 /34

  44. Model 2: Di ff usion process • Signals are outcome of some di ff usion processes on the graph (more of local smoothness than global one!) • Example: Movement of people/vehicles in transportation network • Characterized by di ff usion operators v 1 v 3 v 4 v 2 v 5 observation v 6 heat di ff usion v 7 v 8 v 1 v 9 v 3 v 4 v 2 v 5 v 6 v 7 v 8 v 9 initial stage 21 /34

  45. Model 2: Di ff usion process • Signals are outcome of some di ff usion processes on the graph (more of local smoothness than global one!) • Example: Movement of people/vehicles in transportation network • Characterized by di ff usion operators v 1 v 3 v 4 v 2 v 5 observation v 6 heat di ff usion v 7 v 8 v 1 v 9 v 3 v 4 v 2 v 5 v 6 v 7 v 8 v 1 v 9 v 3 v 4 general v 2 v 5 observation graph shift v 6 initial stage v 7 v 8 operator (e.g., A ) v 9 21 /34

  46. Model 2: Di ff usion process • Pasdeloup et al. (2015, 2016) v 3 v 4 × = v 2 v 5 v 1 D ( G ) = T k ( m ) = W k ( m ) - W k G c x norm norm - are i.i.d. samples with independent entries { c m } 22 /34

  47. Model 2: Di ff usion process • Pasdeloup et al. (2015, 2016) v 3 v 4 × = v 2 v 5 v 1 D ( G ) = T k ( m ) = W k ( m ) - W k G c x norm norm - are i.i.d. samples with independent entries { c m } - Two-step approach: - Estimate eigenvector matrix from sample covariance (if covariance unknown): M M h X ( m ) X ( m ) T i X X W 2 k ( m ) (polynomial of ) Σ = E = W norm norm m =1 m =1 22 /34

  48. Model 2: Di ff usion process • Pasdeloup et al. (2015, 2016) v 3 v 4 × = v 2 v 5 v 1 D ( G ) = T k ( m ) = W k ( m ) - W k G c x norm norm - are i.i.d. samples with independent entries { c m } - Two-step approach: - Estimate eigenvector matrix from sample covariance (if covariance unknown): M M h X ( m ) X ( m ) T i X X W 2 k ( m ) (polynomial of ) Σ = E = W norm norm m =1 m =1 Optimize for eigenvalues given constraints of (mainly non-negativity of - W norm o ff -diagonal of and eigenvalue range) and some priors (e.g., sparsity) W norm 22 /34

  49. Model 2: Di ff usion process • Pasdeloup et al. (2015, 2016) v 3 v 4 × = v 2 v 5 v 1 D ( G ) = T k ( m ) = W k ( m ) - W k G c x norm norm - are i.i.d. samples with independent entries { c m } - Two-step approach: - Estimate eigenvector matrix from sample covariance (if covariance unknown): M M h X ( m ) X ( m ) T i X X W 2 k ( m ) (polynomial of ) Σ = E = W norm norm m =1 m =1 Optimize for eigenvalues given constraints of (mainly non-negativity of - W norm o ff -diagonal of and eigenvalue range) and some priors (e.g., sparsity) W norm More a “graph-centric” learning framework: Cost function on graph components instead of signals 22 /34

  50. Model 2: Di ff usion process • Segarra et al. (2016) v 3 v 4 × = v 2 v 5 L − 1 v 1 - X l D ( G ) = H ( S G ) = h l S G H ( S G ) G c x l =0 (di ff usion defined by a graph shift operator that can be arbitrary, but practically W or L ) S G c is a white signal - 23 /34

  51. Model 2: Di ff usion process • Segarra et al. (2016) v 3 v 4 × = v 2 v 5 L − 1 v 1 - X l D ( G ) = H ( S G ) = h l S G H ( S G ) G c x l =0 (di ff usion defined by a graph shift operator that can be arbitrary, but practically W or L ) S G c is a white signal - - Two-step approach: - Estimate eigenvector matrix: Σ = HH T 23 /34

  52. Model 2: Di ff usion process • Segarra et al. (2016) v 3 v 4 × = v 2 v 5 L − 1 v 1 - X l D ( G ) = H ( S G ) = h l S G H ( S G ) G c x l =0 (di ff usion defined by a graph shift operator that can be arbitrary, but practically W or L ) S G c is a white signal - - Two-step approach: - Estimate eigenvector matrix: Σ = HH T - Select eigenvalues that satisfy constraints of : S G N “spectral templates” X T S G , λ || S G || 0 S G = λ n v n v n min s.t. (eigenvectors) n =1 23 /34

  53. Model 2: Di ff usion process • Segarra et al. (2016) v 3 v 4 × = v 2 v 5 L − 1 v 1 - X l D ( G ) = H ( S G ) = h l S G H ( S G ) G c x l =0 (di ff usion defined by a graph shift operator that can be arbitrary, but practically W or L ) S G c is a white signal - - Two-step approach: - Estimate eigenvector matrix: Σ = HH T - Select eigenvalues that satisfy constraints of : S G N “spectral templates” X T S G , λ || S G || 0 S G = λ n v n v n min s.t. (eigenvectors) n =1 Similar in spirit to Pasdeloup, same assumption on stationarity but di ff erent inference framework due to di ff erent D Can handle noisy or incomplete information on spectral templates 23 /34

  54. Model 2: Di ff usion process • Thanou et al. (2016) v 3 v 4 × = v 2 v 5 v 1 - (localization in vertex domain) D ( G ) = e − τ L e − τ L G c x Sparsity assumption on c - 24 /34

  55. Model 2: Di ff usion process • Thanou et al. (2016) v 3 v 4 × = v 2 v 5 v 1 - D ( G ) = e − τ L (localization in vertex domain) e − τ L G c x Sparsity assumption on c - - Each signal is a combination of several heat di ff usion processes at time τ 24 /34

  56. Model 2: Di ff usion process • Thanou et al. (2016) v 3 v 4 × = v 2 v 5 v 1 - (localization in vertex domain) D ( G ) = e − τ L e − τ L G c x Sparsity assumption on c - - Each signal is a combination of several heat di ff usion processes at time τ M X L , C , τ || X − D ( L ) C || 2 || c m || 1 + β || L || 2 D = [ e − τ 1 L , ..., e − τ S L ] min F + α s.t. F m =1 24 /34

  57. Model 2: Di ff usion process • Thanou et al. (2016) v 3 v 4 × = v 2 v 5 v 1 - (localization in vertex domain) D ( G ) = e − τ L e − τ L G c x Sparsity assumption on c - - Each signal is a combination of several heat di ff usion processes at time τ M X L , C , τ || X − D ( L ) C || 2 || c m || 1 + β || L || 2 D = [ e − τ 1 L , ..., e − τ S L ] min F + α s.t. F m =1 data fidelity sparsity on c regularization 24 /34

  58. Model 2: Di ff usion process • Thanou et al. (2016) v 3 v 4 × = v 2 v 5 v 1 - (localization in vertex domain) D ( G ) = e − τ L e − τ L G c x Sparsity assumption on c - - Each signal is a combination of several heat di ff usion processes at time τ M X L , C , τ || X − D ( L ) C || 2 || c m || 1 + β || L || 2 D = [ e − τ 1 L , ..., e − τ S L ] min F + α s.t. F m =1 data fidelity sparsity on c regularization Still di ff usion-based model, but more “signal-centric” No assumption on eigenvectors/stationarity, but on signal structure and sparsity Can be extended to general polynomial case (Maretic et al. 2017) 24 /34

  59. Model 3: Time-varying observations • Signals are time-varying observations that are causal outcome of current or past values (mixed degree of smoothness depending on previous states) • Example: Evolution of individual behavior due to influence of di ff erent friends at di ff erent timestamps • Characterized by an autoregressive model or a structural equation model (SEM) 25 /34

  60. Model 3: Time-varying observations • Mei and Moura (2015) ⌘ ⇣ v 3 v 4 Σ S × = v 2 s =1 v 5 v 1 : polynomial of W of degree s - D s ( G ) = P s ( W ) x [ t − s ] G P s ( W ) x - Define as x [ t − s ] c s 26 /34

  61. Model 3: Time-varying observations • Mei and Moura (2015) ⌘ ⇣ v 3 v 4 Σ S × = v 2 s =1 v 5 v 1 : polynomial of W of degree s - D s ( G ) = P s ( W ) x [ t − s ] G P s ( W ) x - Define as x [ t − s ] c s v 1 v 1 v 1 v 4 v 4 v 3 v 3 v 4 v 3 ... v 2 v 2 v 2 v 5 v 5 v 5 v 6 v 6 v 6 v 7 v 7 + + v 7 v 8 v 8 v 8 = v 9 v 9 v 9 x [ t − 1] x [ t ] x [ t − S ] 26 /34

  62. Model 3: Time-varying observations • Mei and Moura (2015) ⌘ ⇣ v 3 v 4 Σ S × = v 2 s =1 v 5 v 1 : polynomial of W of degree s - D s ( G ) = P s ( W ) x [ t − s ] G P s ( W ) x - Define as x [ t − s ] c s v 1 v 1 v 1 v 4 v 4 v 3 v 3 v 4 v 3 ... v 2 v 2 v 2 v 5 v 5 v 5 v 6 v 6 v 6 v 7 v 7 + + v 7 v 8 v 8 v 8 = v 9 v 9 v 9 x [ t − 1] x [ t ] x [ t − S ] K S 1 X X P s ( W ) x [ k − s ] || 2 min || x [ k ] − 2 + λ 1 || vec( W ) || 1 + λ 2 || a || 1 2 W , a k = S +1 s =1 26 /34

  63. Model 3: Time-varying observations • Mei and Moura (2015) ⌘ ⇣ v 3 v 4 Σ S × = v 2 s =1 v 5 v 1 : polynomial of W of degree s - D s ( G ) = P s ( W ) x [ t − s ] G P s ( W ) x - Define as x [ t − s ] c s v 1 v 1 v 1 v 4 v 4 v 3 v 3 v 4 v 3 ... v 2 v 2 v 2 v 5 v 5 v 5 v 6 v 6 v 6 v 7 v 7 + + v 7 v 8 v 8 v 8 = v 9 v 9 v 9 x [ t − 1] x [ t ] x [ t − S ] K S 1 X X P s ( W ) x [ k − s ] || 2 min || x [ k ] − 2 + λ 1 || vec( W ) || 1 + λ 2 || a || 1 2 W , a k = S +1 s =1 data fidelity sparsity on W sparsity on a 26 /34

  64. Model 3: Time-varying observations • Mei and Moura (2015) ⌘ ⇣ v 3 v 4 Σ S × = v 2 s =1 v 5 v 1 : polynomial of W of degree s - D s ( G ) = P s ( W ) x [ t − s ] G P s ( W ) x - Define as x [ t − s ] c s v 1 v 1 v 1 v 4 v 4 v 3 v 3 v 4 v 3 ... v 2 v 2 v 2 v 5 v 5 v 5 v 6 v 6 v 6 v 7 v 7 + + v 7 v 8 v 8 v 8 = v 9 v 9 v 9 x [ t − 1] x [ t ] x [ t − S ] K S 1 X X P s ( W ) x [ k − s ] || 2 min || x [ k ] − 2 + λ 1 || vec( W ) || 1 + λ 2 || a || 1 2 W , a k = S +1 s =1 data fidelity sparsity on W sparsity on a Polynomial design similar in spirit to Pasdeloup and Segarra Good for inferring causal relations between signals Kernelized version (nonlinear): Shen et al. (2016) 26 /34

  65. Model 3: Time-varying observations • Baingana and Giannakis (2016) v 3 v 4 + × = v 2 v 5 v 1 D ( G ) = W s ( t ) : Graph at time t - ext. G W x x (topologies switch at each time between S discrete states) Define c as x - 27 /34

  66. Model 3: Time-varying observations • Baingana and Giannakis (2016) v 3 v 4 + × = v 2 v 5 v 1 D ( G ) = W s ( t ) : Graph at time t - ext. G W x x (topologies switch at each time between S discrete states) Define c as x - x [ t ] = W s ( t ) x [ t ] + B s ( t ) y [ t ] internal (neighbors) external 27 /34

  67. Model 3: Time-varying observations • Baingana and Giannakis (2016) v 3 v 4 + × = v 2 v 5 v 1 : Graph at time t D ( G ) = W s ( t ) - ext. G W x x (topologies switch at each time between S discrete states) Define c as x - x [ t ] = W s ( t ) x [ t ] + B s ( t ) y [ t ] internal (neighbors) external Solve for all states of W : - T S 1 X || x [ t ] − W s ( t ) x [ t ] − B s ( t ) y [ t ] || 2 X λ s || W s ( t ) || 1 min F + 2 { W s ( t ) , B s ( t ) } t =1 s =1 data fidelity sparsity on W 27 /34

  68. Model 3: Time-varying observations • Baingana and Giannakis (2016) v 3 v 4 + × = v 2 v 5 v 1 D ( G ) = W s ( t ) : Graph at time t - ext. G W x x (topologies switch at each time between S discrete states) Define c as x - x [ t ] = W s ( t ) x [ t ] + B s ( t ) y [ t ] internal (neighbors) external Solve for all states of W : - T S 1 X || x [ t ] − W s ( t ) x [ t ] − B s ( t ) y [ t ] || 2 X λ s || W s ( t ) || 1 min F + 2 { W s ( t ) , B s ( t ) } t =1 s =1 data fidelity sparsity on W Good for inferring causal relations between signals as well as dynamic topologies 27 /34

  69. Comparison of di ff erent methods Methods Signal model Assumption Learning output Edge direction Inference Dong (2015) Global smoothness Gaussian Laplacian Undirected Signal-centric Kalofolias (2016) Global smoothness Gaussian Adjacency Undirected Signal-centric Generalized Egilmez (2016) Global smoothness Gaussian Undirected Signal-centric Laplacian Chepuri (2016) Global smoothness Gaussian Adjacency Undirected Graph-centric Normalized Adj./ Pasdeloup (2015) Di ff usion by Adj. Stationary Undirected Graph-centric Laplacian Di ff usion by Graph Graph shift Segarra (2016) Stationary Undirected Graph-centric shift operator operator Thanou (2016) Heat di ff usion Sparsity Laplacian Undirected Signal-centric Dependent on Mei (2015) Time-varying Adjacency Directed Signal-centric previous states Dependent on Time-varying Baingana (2016) Time-varying Directed Signal-centric current int/ext info Adjacency 28 /34

Recommend


More recommend