Center for Science of Information Center for Science of Information Bryn Mawr Structural Information: Howard MIT Progress Report Princeton Purdue Wojciech Szpankowski Stanford Purdue University Texas A&M UC Berkeley (jointly with A. Grama, A. Magner, and J. Sreedharan) UC San Diego UIUC University of Hawaii 1 National Science Foundation/Science & Technology Centers Program Science & Technology Centers Program
Center for Science of Information Outline 1. Science of Information 2. TIMES: Temporal Information Maximally Extracted from Structure 3. Structural Compression 4. TIMES: Recovering Partial Order 5. Experimental Results - Synthetic Network - Real network - Functional Brain Network Science & Technology Centers Program 2
Center for Science of Information What is Science of Information? ▪ Claude Shannon laid the foundation of information theory, demonstrating that problems of data transmission and compression (i.e., reliably reproducing data ) can be precisely modeled formulated, and analyzed. ▪ SCIENCE OF INFORMATION builds on Shannon’s principles to address key challenges in understanding information that nowadays is not only communicated but also acquired, curated, organized, aggregated, managed, processed, suitably abstracted and represented, analyzed, inferred, valued, secured, and used in various scientific, engineering, and socio-economic processes. CSoI MISSION: Advance science and technology through a new quantitative understanding of the representation, communication and processing of information in biological, physical, social and engineering systems. Science & Technology Centers Program 3
Center for Science of Information Center’s Goals ▪ Extend Information Theory to meet new challenges in biology, economics, data & social sciences, and physical distributed systems. ▪ Understand new aspects of information (embedded) in structure, time, space, semantics, dynamic information, limited resources, complexity, representation invariant information, and cooperation & dependency. Science & Technology Centers Program 4
Center for Science of Information Outline 1. Science of Information 2. TIMES: Temporal Information Maximally Extracted from Structure 3. Structural Compression 4. TIMES: Recovering Partial Order 5. Experimental Results - Synthetic Network - Real network - Functional Brain Network Science & Technology Centers Program 5
Center for Science of Information Motivation: Infection Spread Infection network Nodes as patients, edges formed among friends and family. Only structure info available. Patients admitted not at the order they got infected. Science & Technology Centers Program 6
Center for Science of Information Further Motivation ▪ Financial transaction networks: flow of capital ▪ Spread of infectious diseases: origin and initial carriers ▪ Social networks: spread of information ▪ Network of biochemical reactions: (protein-protein interaction network) Study of the phylogenetic tree Cancer proteins tend to be ancient proteins [Srivastava et al., Nature 2010] Ebola spread network [Saey, ScienceNews Dec 2015] Science & Technology Centers Program 7
Center for Science of Information Formulation 3 1 10 2 8 1 4 3 9 5 6 9 5 6 4 7 11 8 7 11 2 12 10 12 Science & Technology Centers Program 8
Center for Science of Information Minimax Risk A node age recovery problem is a tuple Distortion measure between permutations Random graph model Random adversary function Node age estimator is a function Minimax risk for a random graph model and a distortion measure Distortion measure Relabeled graph Adversary distribution Random graph model Estimator Science & Technology Centers Program 9
Center for Science of Information Structural Quantities on Graphs Sets of permutations associated with a random graph model Set of feasible permutations Set of admissible graphs 4 2 3 (unlabeled structures) 3 3 3 2 2 1 3 1 3 2 4 Science & Technology Centers Program 10
Center for Science of Information Lower Bounds on Minimax Risk Exact recovery Approximate recovery (Kendall Tau distance) Theorem: Set of admissible graphs Set of feasible permutations Science & Technology Centers Program 11
Center for Science of Information Erd ő s–Rényi & Preferential Attachment Models Preferential Attachment model Science & Technology Centers Program 12
Center for Science of Information Bad News! Inapproximability result for Erd ő s–Rényi and preferential attachment graphs Adversary Estimator Science & Technology Centers Program 13
Center for Science of Information Further Bad News!! ML estimation is not a good approach The maximum likelihood estimation solution set satisfies with high probability. Science & Technology Centers Program 14
Center for Science of Information Outline 1. Science of Information 2. TIMES: Temporal Information Maximally Extracted from Structure 3. Structural Compression 4. TIMES: Recovering Partial Order 5. Experimental Results - Synthetic Network - Real network - Functional Brain Network Science & Technology Centers Program 15
Center for Science of Information Compression of Graphs & Structures Theorem (Structural entropy for a broad class of graph models) For a broad class of random graph models, From and Science & Technology Centers Program 16
Center for Science of Information Asymmetry of Preferential attachment case 17 Science & Technology Centers Program
Center for Science of Information Structural Entropy for PAG Theorem: Entropy of preferential attachment graphs Theorem: Structural entropy of preferential attachment graphs Science & Technology Centers Program 18
Center for Science of Information Results of node age recover problem so far are pessimistic Can we do better ? Science & Technology Centers Program 19
Center for Science of Information Outline 1. Science of Information 2. TIMES: Temporal Information Maximally Extracted from Structure 3. Structural Compression 4. TIMES: Recovering Partial Order 5. Experimental Results - Synthetic Network - Real network - Functional Brain Network Science & Technology Centers Program 20
Center for Science of Information Partial Orders and Binning Look for partial orders instead of total orders 1 2 4 3 9 5 6 8 11 7 12 10 Bin 2 Bin 1 Bin 4 Bin 3 Bin 5 10 11 1 2 8 9 3 5 4 7 12 6 Science & Technology Centers Program 21
Center for Science of Information Precision and Recall Recall: How much we are able to recover? # of correct pairs Precision: How good are the guessed pairs? # of pairs ordered by bins (excluding those inside bins) Density: Science & Technology Centers Program 22
Center for Science of Information Constrained Optimization Problem Different approach: phrase as an integer program. Precision max Set of partial orders subject to Science & Technology Centers Program 23
Center for Science of Information Approximating via Peeling algorithm 1 6 12 11 10 2 9 8 7 4 3 5 9 5 6 3 4 8 11 7 2 1 12 10 Bin 2 Bin 1 Bin 3 Bin 4 Bin 5 10 11 1 2 8 9 3 5 4 12 7 6 Science & Technology Centers Program 24
Center for Science of Information Outline 1. Science of Information 2. TIMES: Temporal Information Maximally Extracted from Structure 3. Structural Compression 4. TIMES: Recovering Partial Order 5. Experimental Results - Synthetic Network - Real network - Functional Brain Network Science & Technology Centers Program 25
Center for Science of Information Numerical Results LP relaxation gives an upper bound. Science & Technology Centers Program 26
Center for Science of Information Theoretical Results Perfect pair Theorem: Typical number of perfect pairs Science & Technology Centers Program 27
Center for Science of Information Theoretical Results Conjecture: Number of descendants of any given vertex Science & Technology Centers Program 28
Center for Science of Information Experiments: Synthetic Graphs : Ranking with bins given by Peeling algorithm How robust is the algorithm? Uniform Attachment model Science & Technology Centers Program
Center for Science of Information Experiments: Real-World Networks : Perfect pairs only (nodes with a directed path between them) Science & Technology Centers Program
Center for Science of Information Experiments: Brain Networks Find age orderings of regions of two different species, based on fMRI images of a same activity. Conjecture: There exists high correlation between these orderings of species evolved from the same genetic parent. Network formation The network has 46 nodes, each of which represents a region in the brain ● An initial network is formed from fMRI images of a human brain in resting state ● Each node here is a voxel and there are 243,648 voxels. ● Each voxel has a time series of data for ~ 350s. ● Pearson correlation coefficient is computed between time series data of each pair of ● voxels. If the correlation > 0.8 we form an edge between the voxels. ● In order to form a network of regions, we make logical OR of the rows and columns in ● the adjacency matrix of voxel network corresponding to each region. Science & Technology Centers Program
Recommend
More recommend