Carnegie Mellon Hidden Relations in Data – A Signal Processing on Graphs Approach José M. F. Moura Phillip L. and Marsha Dowd University Professor moura@cmu.edu, www.ece.cmu.edu/~moura Acknowledgements: AFOSR grant FA95501210087 Work with: Kar, Sandryhaila, Deri, Mei
Carnegie Mellon Data Data Data Data Data Data Data Data Data Data Data Data Big Data: Variety, Volume, Velocity, Veracity, Variability, Value, Visualization Unstructured, Distributed, ••• By 2020, all digital data created, replicated, consumed, in a year (IDC, Dec 2012): 44 ZB ≈ 170 M US LoC 44,000 EB 44,000,000 PB • 44,000,000,000 TB • 44,000,000,000,000 GB
Carnegie Mellon Data: Traditional Signals Time signals Speech Radar Signal Time series Images, video Forbes, 03/05/ 2013 KU Band SAR Image Mendrelic: Time Lapse Nice Sandia Nat Lab http://vimeo.com/18554749
Carnegie Mellon Data: Variety (Social, Web, Companies, …) Social networks Wireless Service Providers Web: hyperlinked blogs Linkedin Contacts Adamic, Glance Sensor Networks Internet Friendship Networks http://vidi.cs.ucdavis.edu/projects/ AggressionNetworks/
Carnegie Mellon Data: The Old and the New Time series Company data
Carnegie Mellon Analytics of Data Science: DSP on Graphs (Linear) DSP for social, biological and physical graph data : Graph signal model, filters, filtering and convolution, impulse response, z - and Fourier transforms, spectrum, frequency response, … Sandryhaila & Moura, “DSP on graphs,” IEEE Tr-SP, 2013
Carnegie Mellon Data Science: Graph Supports Associations Graphical models: Markov random fields Machine Learning approaches (Jordan, Willsky, …) Data transforms: Diffusion wavelets (Coifman, 2004), Regression analysis, wavelets on irregular sensor network (Baraniuk, 2005), filterbanks (Vandergheynst, 2011), separable wavelet filterbanks (Ortega, 2012) Graph Laplacian (assumed undirected, non-negative weights)(Vandergheynst, Barbarossa, Ortega …) Algebraic Signal Processing (ASP): Pueschel and Moura (SIAM 2003, T-SP 2008, May, August)
Carnegie Mellon Data: The Old and the New Time series Company data
Carnegie Mellon DSP: Time Signals Time signal:
Carnegie Mellon DSP: Time Signals – Shift Time signal: Shift operator: Z -1 Shift matrix:
Carnegie Mellon DSP: Time Signals – Graph 0 1 2 ••• n -1 Shift matrix: 0 1 2 • • • n-1 Graph:
Carnegie Mellon Graph signal, shift, filters, DSP G : Graph Signals filtering/ convolution, impulse response, A-, Fourier-transforms, spectrum, frequency response Time signals: Cosine signal , k=0, …, 5 Average temperature in US cities Website topics in hyperkinked blogs Average # tweets
Carnegie Mellon Graph signal, shift, filters, DSP G : Graph Shift filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response Shift: Adjacency matrix Graph shift: local operation, replace signal value at a node by weighted linear combination of values at neighbors: 1 st order interpolation, weighted averaging, regression on graphs
Carnegie Mellon DSP G : Graph Filtering Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response Graph signal: Graph Filtering:
Carnegie Mellon DSP G : Graph Filtering Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response Graph signal: Graph Filtering: Shift Invariance: H and A commute H and A have same eigenvectors Graph filter polynomial in shift A
Carnegie Mellon DSP G : Fourier Transform Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response Discrete Fourier Transform (DFT): Fourier Tr.:
Carnegie Mellon DSP G : Fourier Transform Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response Discrete Fourier Transform (DFT): Fourier Tr.:
Carnegie Mellon DSP G : Graph Shift & FT Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response Diagonalization of the shift: DFT and the shift:
Carnegie Mellon Graph signal, shift, filters, DSP G : Graph Fourier Tr. filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response For simplicity: A is diagonalizable Graph Fourier Tr.: Inverse Graph Fourier Tr.:
Carnegie Mellon Frequency Time frequencies:
Carnegie Mellon DSP G : Frequency Response Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response Graph filter Graph filter frequency response
Carnegie Mellon DSP G : Convolution Thm Graph signal, shift, filters, filtering/ convolution, impulse response, A- , Fourier-transforms, spectrum, frequency response Filtering/ convolution theorem
Carnegie Mellon DSP G : Graph Low and High Pass Signals Ordering frequencies: Total variation For s a graph frequency component
Carnegie Mellon DSP G : Graph Low and High Pass Signals
Carnegie Mellon DSP G : Graph Low and High Pass Signals
Carnegie Mellon DSP G : Political Blogs Data: 1224 conservative & liberal political blogs Graph: 1224 nodes (blogs), edges (hyperlinks) Adamic, Glance Graph signal: blue 1, red - 1 Sandryhaila, Moura, DSP on Graphs, IEEE Tr-SP, 2013
Carnegie Mellon DSP G : Blogosphere Low- vs High Pass 1224 political blogs, hyperlinked: conservative & liberal Adjacency matrix given by hyperlinks A Graph signal: Adamic, Glance F ourier transform: Frequency representation:
Carnegie Mellon DSP G : Blogosphere Low- vs High Pass 1224 political blogs, hyperlinked: conservative & liberal Adjacency matrix given by hyperlinks A Graph signal: Adamic, Glance Frequency representation:
Carnegie Mellon DSP G : Classification–Political Blogsphere 1224 political blogs, hyperlinked: conservative & liberal Adjacency matrix given by hyperlinks A Adamic, Glance Semisupervised Classifier (filter+threshold): Filter design:
Carnegie Mellon DSP G : Classification–Political Blogsphere 1 21 2 Political blogs: 3 23 22 4 5 24 6 27 7 25 26 28 10 29 11 30 9 12 13 32 31 17 15 14 18 33 35 36 16 34 19 39 38 37 20 40 Most connected Adamic, Glance Random Classifier P=10 : AGF: Globalsip, 2013: Chen. Sandryhaila, Moura, Kovacevic DF: Diffusion Functions, 2008, J. Mach Learn. Res., Szlam, Coiffman, Maggioni
Carnegie Mellon DSP G : Service Provider–Predict Customer Behavior 3.7 Million customers: 3.6 M non-churners, 100K churners Adjacency matrix: 10 months log: Learn from few churners in month A who will churn month A+1
Carnegie Mellon DSP G : Service Provider–Predict Customer Behavior Classifier Deri & Moura, ICASSP, May 2014
Carnegie Mellon DSP G : Classification Classification by regularization News articles dataset: Belkin, Matveeva, Nyogi, 2004 Graph: 18,000 news articles, 20 topics, graph: randomly select 500 from each class, each article a vector of 6000 most common keywords, use cosine distance between keywords: Graph: Regularization: Globalsip, 2013: Sandryhaila, Moura
Carnegie Mellon DSP G : News Article Dataset–Classification News articles dataset: Belkin, Matveeva, Nyogi, 2004 Graph: 18,000 news articles, 20 topics, graph: randomly select 500 from each class, each article a vector of 6000 most common keywords, use cosine distance between keywords: Globalsip, 2013: Sandryhaila, Moura
Carnegie Mellon DSP G : NIST Digits Database Classification NIST database: 70 0 0 0 grayscale im ages of handwritten digits from 0 to 9. Random ly select 30 0 0 im ages of each digit, construct testing dataset of 30 0 0 0 im ages. The representation graph for this dataset is constructed by viewing each im age as a point in a 28 2 =78 4 dim ensional vector space, com puting Euclidean distances between all im ages, and connecting each im age with six nearest neighbors.
Carnegie Mellon Big Data Product graphs: Kronecker graphs: Cartesian products: Strong product:
Carnegie Mellon Big Data
Carnegie Mellon Filtering Filtering: Cartesian graph Filtering: Kronecker graphs Filtering: Strong graphs
Carnegie Mellon Big Data Parallel constructs: Vectorizable constructs:
Carnegie Mellon Big Data: Fourier Transform Cartesian graphs: Kronecker graphs and Strong graphs:
Recommend
More recommend