hidden relations in data a signal processing on graphs
play

Hidden Relations in Data A Signal Processing on Graphs Approach - PowerPoint PPT Presentation

Carnegie Mellon Hidden Relations in Data A Signal Processing on Graphs Approach Jos M. F. Moura Phillip L. and Marsha Dowd University Professor moura@cmu.edu, www.ece.cmu.edu/~moura Acknowledgements: AFOSR grant FA95501210087 Work with:


  1. Carnegie Mellon Hidden Relations in Data – A Signal Processing on Graphs Approach José M. F. Moura Phillip L. and Marsha Dowd University Professor moura@cmu.edu, www.ece.cmu.edu/~moura Acknowledgements: AFOSR grant FA95501210087 Work with: Kar, Sandryhaila, Deri, Mei

  2. Carnegie Mellon Data Data Data Data Data Data Data Data Data Data Data Data  Big Data:  Variety, Volume, Velocity, Veracity, Variability, Value, Visualization  Unstructured, Distributed, •••  By 2020, all digital data created, replicated, consumed, in a year (IDC, Dec 2012):  44 ZB ≈ 170 M US LoC  44,000 EB  44,000,000 PB • 44,000,000,000 TB • 44,000,000,000,000 GB

  3. Carnegie Mellon Data: Traditional Signals  Time signals Speech Radar Signal Time series  Images, video Forbes, 03/05/ 2013 KU Band SAR Image Mendrelic: Time Lapse Nice Sandia Nat Lab http://vimeo.com/18554749

  4. Carnegie Mellon Data: Variety (Social, Web, Companies, …) Social networks Wireless Service Providers Web: hyperlinked blogs Linkedin Contacts Adamic, Glance Sensor Networks Internet Friendship Networks http://vidi.cs.ucdavis.edu/projects/ AggressionNetworks/

  5. Carnegie Mellon Data: The Old and the New  Time series  Company data

  6. Carnegie Mellon Analytics of Data Science: DSP on Graphs  (Linear) DSP for social, biological and physical graph data : Graph signal model, filters, filtering and convolution, impulse response, z - and Fourier transforms, spectrum, frequency response, … Sandryhaila & Moura, “DSP on graphs,” IEEE Tr-SP, 2013

  7. Carnegie Mellon Data Science: Graph Supports Associations  Graphical models: Markov random fields  Machine Learning approaches (Jordan, Willsky, …)  Data transforms:  Diffusion wavelets (Coifman, 2004), Regression analysis, wavelets on irregular sensor network (Baraniuk, 2005), filterbanks (Vandergheynst, 2011), separable wavelet filterbanks (Ortega, 2012)  Graph Laplacian (assumed undirected, non-negative weights)(Vandergheynst, Barbarossa, Ortega …)  Algebraic Signal Processing (ASP):  Pueschel and Moura (SIAM 2003, T-SP 2008, May, August)

  8. Carnegie Mellon Data: The Old and the New  Time series  Company data

  9. Carnegie Mellon DSP: Time Signals  Time signal:

  10. Carnegie Mellon DSP: Time Signals – Shift  Time signal:  Shift operator: Z -1  Shift matrix:

  11. Carnegie Mellon DSP: Time Signals – Graph 0 1 2 ••• n -1  Shift matrix: 0 1 2 • • • n-1  Graph:

  12. Carnegie Mellon Graph signal, shift, filters, DSP G : Graph Signals filtering/ convolution, impulse response, A-, Fourier-transforms, spectrum, frequency response  Time signals:  Cosine signal , k=0, …, 5  Average temperature in US cities  Website topics in hyperkinked blogs  Average # tweets

  13. Carnegie Mellon Graph signal, shift, filters, DSP G : Graph Shift filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  Shift: Adjacency matrix  Graph shift: local operation, replace signal value at a node by weighted linear combination of values at neighbors:  1 st order interpolation, weighted averaging, regression on graphs

  14. Carnegie Mellon DSP G : Graph Filtering Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  Graph signal:  Graph Filtering:

  15. Carnegie Mellon DSP G : Graph Filtering Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  Graph signal:  Graph Filtering:  Shift Invariance: H and A commute H and A have same eigenvectors Graph filter polynomial in shift A

  16. Carnegie Mellon DSP G : Fourier Transform Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  Discrete Fourier Transform (DFT):  Fourier Tr.:

  17. Carnegie Mellon DSP G : Fourier Transform Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  Discrete Fourier Transform (DFT):  Fourier Tr.:

  18. Carnegie Mellon DSP G : Graph Shift & FT Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  Diagonalization of the shift:  DFT and the shift:

  19. Carnegie Mellon Graph signal, shift, filters, DSP G : Graph Fourier Tr. filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  For simplicity: A is diagonalizable  Graph Fourier Tr.:  Inverse Graph Fourier Tr.:

  20. Carnegie Mellon Frequency  Time frequencies:

  21. Carnegie Mellon DSP G : Frequency Response Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  Graph filter  Graph filter frequency response

  22. Carnegie Mellon DSP G : Convolution Thm Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response  Filtering/ convolution theorem

  23. Carnegie Mellon DSP G : Graph Low and High Pass Signals  Ordering frequencies: Total variation  For s a graph frequency component

  24. Carnegie Mellon DSP G : Graph Low and High Pass Signals

  25. Carnegie Mellon DSP G : Graph Low and High Pass Signals

  26. Carnegie Mellon DSP G : Political Blogs  Data: 1224 conservative & liberal political blogs  Graph: 1224 nodes (blogs), edges (hyperlinks) Adamic, Glance  Graph signal: blue 1, red - 1 Sandryhaila, Moura, DSP on Graphs, IEEE Tr-SP, 2013

  27. Carnegie Mellon DSP G : Blogosphere Low- vs High Pass  1224 political blogs, hyperlinked: conservative & liberal  Adjacency matrix given by hyperlinks A  Graph signal: Adamic, Glance  F ourier transform:  Frequency representation:

  28. Carnegie Mellon DSP G : Blogosphere Low- vs High Pass  1224 political blogs, hyperlinked: conservative & liberal  Adjacency matrix given by hyperlinks A  Graph signal: Adamic, Glance  Frequency representation:

  29. Carnegie Mellon DSP G : Classification–Political Blogsphere  1224 political blogs, hyperlinked: conservative & liberal  Adjacency matrix given by hyperlinks A Adamic, Glance  Semisupervised Classifier (filter+threshold):  Filter design:

  30. Carnegie Mellon DSP G : Classification–Political Blogsphere 1 21 2  Political blogs: 3 23 22 4 5 24 6 27 7 25 26 28 10 29 11 30 9 12 13 32 31 17 15 14 18 33 35 36 16 34 19 39 38 37 20 40 Most connected Adamic, Glance Random  Classifier P=10 : AGF: Globalsip, 2013: Chen. Sandryhaila, Moura, Kovacevic DF: Diffusion Functions, 2008, J. Mach Learn. Res., Szlam, Coiffman, Maggioni

  31. Carnegie Mellon DSP G : Service Provider–Predict Customer Behavior  3.7 Million customers: 3.6 M non-churners, 100K churners  Adjacency matrix:  10 months log: Learn from few churners in month A who will churn month A+1

  32. Carnegie Mellon DSP G : Service Provider–Predict Customer Behavior  Classifier Deri & Moura, ICASSP, May 2014

  33. Carnegie Mellon DSP G : Classification  Classification by regularization  News articles dataset: Belkin, Matveeva, Nyogi, 2004 Graph: 18,000 news articles, 20 topics, graph: randomly select 500 from each class, each article a vector of 6000 most common keywords, use cosine distance between keywords:  Graph:  Regularization: Globalsip, 2013: Sandryhaila, Moura

  34. Carnegie Mellon DSP G : News Article Dataset–Classification  News articles dataset: Belkin, Matveeva, Nyogi, 2004 Graph: 18,000 news articles, 20 topics, graph: randomly select 500 from each class, each article a vector of 6000 most common keywords, use cosine distance between keywords: Globalsip, 2013: Sandryhaila, Moura

  35. Carnegie Mellon DSP G : NIST Digits Database Classification  NIST database: 70 0 0 0 grayscale im ages of handwritten digits from 0 to 9. Random ly select 30 0 0 im ages of each digit, construct testing dataset of 30 0 0 0 im ages. The representation graph for this dataset is constructed by viewing each im age as a point in a 28 2 =78 4 dim ensional vector space, com puting Euclidean distances between all im ages, and connecting each im age with six nearest neighbors.

  36. Carnegie Mellon Big Data  Product graphs:  Kronecker graphs:  Cartesian products:  Strong product:

  37. Carnegie Mellon Big Data   

  38. Carnegie Mellon Filtering  Filtering: Cartesian graph  Filtering: Kronecker graphs  Filtering: Strong graphs

  39. Carnegie Mellon Big Data  Parallel constructs:  Vectorizable constructs:

  40. Carnegie Mellon Big Data: Fourier Transform  Cartesian graphs:  Kronecker graphs and Strong graphs:

Recommend


More recommend