prediction for processes on network graphs
play

Prediction for Processes on Network Graphs Gonzalo Mateos Dept. of - PowerPoint PPT Presentation

Prediction for Processes on Network Graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ April 18, 2019 Network Science Analytics


  1. Prediction for Processes on Network Graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ April 18, 2019 Network Science Analytics Prediction for Processes on Network Graphs 1

  2. Nearest neighbors Nearest-neighbor prediction Markov random fields Kernel regression on graphs Case study: Predicting protein function Network Science Analytics Prediction for Processes on Network Graphs 2

  3. Processes on network graphs ◮ Motivation: study complex systems of elements and their interactions ◮ So far studied network graphs as representations of these systems ◮ Often some quantity associated with each of the elements is of interest ◮ Quantities may be influenced by the interactions among elements 1) Behaviors and beliefs influenced by social interactions 2) Functional roles of proteins influenced by their sequence similarity 3) Spread of epidemics influenced by proximity of individuals ◮ Can think of these quantities as random processes defined on graphs ◮ Static { X i } i ∈ V and dynamic processes { X i ( t ) } i ∈ V for t ∈ N or R + Network Science Analytics Prediction for Processes on Network Graphs 3

  4. Nearest-neighbor prediction ◮ Consider prediction of a static process X := { X i } i ∈ V on a graph ◮ Process may be truly static, or a snapshot of a dynamic process Static network process prediction Predict X i , given observations of the adjacency matrix Y = y and of all attributes X ( − i ) = x ( − i ) but X i . ◮ Idea: exploit the network graph structure in y for prediction ◮ For binary X i ∈ { 0 , 1 } , say, simple nearest-neighbor method predicts �� j ∈N i x j � ˆ X i = I > τ |N i | ⇒ Average of the observed process in the neighborhood of i ⇒ Called ‘guilt-by-association’ or graph-smoothing method Network Science Analytics Prediction for Processes on Network Graphs 4

  5. Example: predicting law practice ◮ Network G obs of working relationships among lawyers [Lazega’01] ◮ Nodes are N v = 36 partners, edges indicate partners worked together 13 33 5 8 36 6 31 30 10 24 32 18 23 20 15 28 4 22 35 3 34 26 14 19 25 12 16 17 9 7 29 2 27 21 11 1 ◮ Data includes various node-level attributes { X i } i ∈ V including ⇒ Type of practice, i.e., litigation (red) and corporate (cyan) ◮ Suspect lawyers collaborate more with peers in same legal practice ⇒ Knowledge of collaboration useful in predicting type of practice Network Science Analytics Prediction for Processes on Network Graphs 5

  6. Example: predicting law practice (cont.) ◮ Q: In predicting practice X i , how useful is the value of one neighbor? ⇒ Breakdown of 115 edges based on practice of incident lawyers Litigation Corporate Litigation 29 43 Corporate 43 43 ◮ Looking at the rows in this table ◮ Litigation lawyers collaborators are 40% litigation, 60% corporate ◮ Collaborations of corporate lawyers are evenly split ⇒ Suggests using a single neighbor has little predictive power ◮ But 60% (29+43=72) of edges join lawyers with common practice ⇒ Suggests on aggregate knowledge of collaboration informative Network Science Analytics Prediction for Processes on Network Graphs 6

  7. Example: predicting law practice (cont.) ◮ Incorporate information of all collaborators as in nearest-neighbors ◮ Let X i = 0 if lawyer i practices litigation, and X i = 1 for corporate 4 Frequency Frequency 4 3 2 2 1 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Fraction of Corporate Neighbors, Among Litigation Fraction of Corporate Neighbors, Among Corporate ◮ Nearest-neighbor prediction rule �� j ∈N i x j � ˆ X i = I > 0 . 5 |N i | ⇒ Infers correctly 13 of the 16 corporate lawyers (i.e., 81%) ⇒ Infers correctly 16 of the 18 litigation lawyers (i.e., 89%) ⇒ Overall error rate is just under 15% Network Science Analytics Prediction for Processes on Network Graphs 7

  8. Modeling static network processes ◮ Nearest-neighbor methods may seem rather informal and simple ⇒ But competitive with more formal, model-based approaches ◮ Still, model-based methods have certain potential advantages: a) Probabilistically rigorous predictive statements; b) Formal inference for model parameters; and c) Natural mechanisms for handling missing data ◮ Model the process X := { X i } i ∈ V given an observed graph Y = y ⇒ Markov random field (MRF) models ⇒ Kernel-regression models using graph kernels Network Science Analytics Prediction for Processes on Network Graphs 8

  9. Markov random fields Nearest-neighbor prediction Markov random fields Kernel regression on graphs Case study: Predicting protein function Network Science Analytics Prediction for Processes on Network Graphs 9

  10. Markov random field models ◮ Consider a graph G ( V , E ) with given adjacency matrix A ⇒ Collection of discrete RVs X = [ X 1 , . . . , X N v ] ⊤ defined on V ◮ Def: process X is a Markov random field (MRF) on G if � � X ( − i ) = x ( − i ) � � X N i = x N i � � � � P X i = x i = P X i = x i , i ∈ V ◮ X i conditionally independent of other X k , given neighbors values ◮ ‘Spatial’ Markov property, generalizing Markov chains in time ◮ G defines neighborhoods N i , hence dependencies ◮ Roots in statistical mechanics, Ising model of ferromagnetism [Ising ’25] ⇒ MRFs used extensively in spatial statistics and image analysis ◮ Definition requires a technical condition P ( X = x ) > 0, for all x Network Science Analytics Prediction for Processes on Network Graphs 10

  11. MRFs and Gibbs random fields ◮ MRFs equivalent to Gibbs random fields X , having joint distribution � 1 � exp { U ( x ) } P ( X = x ) = κ ⇒ Energy function U ( · ), partition function κ = � x exp { U ( x ) } ⇒ Equivalence follows from the Hammersley-Clifford theorem ◮ Energy function decomposable over the maximal cliques in G � U ( x ) = U c ( x ) c ∈C ⇒ Defined clique potentials U c ( · ), set C of maximal cliques in G � X ( − i ) � ◮ Can show P � � X i depends only on cliques involving vertex i Network Science Analytics Prediction for Processes on Network Graphs 11

  12. Example: auto-logistic MRFs ◮ May specify MRFs through choice of clique potentials U c ( · ) ◮ Ex: Class of auto models are defined through the constraints: (i) Only cliques c ∈ C of size one and two have U c � = 0 � X N i � � � (ii) Probabilities P X i have an exponential family form ◮ For binary RVs X i ∈ { 0 , 1 } , the energy function takes the form � � U ( x ) = α i x i + β ij x i x j i ∈ V ( i , j ) ∈ E ◮ Resulting MRF is known as auto-logistic model, because exp { α i + � j ∈N i β ij x j } � X N i = x N i � � � P X i = 1 = 1 + exp { α i + � j ∈N i β ij x j } ⇒ Logistic regression of x i on its neighboring x j ’s ⇒ Ising model a special case, when G is a regular lattice Network Science Analytics Prediction for Processes on Network Graphs 12

  13. Homogeneity assumptions ◮ Typical to assume that parameters α i and β ij are homogeneous ◮ Ex: Specifying α i = α and β ij = β yields conditional log-odds � � � X N i = x N i � � � P X i = 1 � log = α + β x j � X N i = x N i � � � P X i = 0 j ∈N i ⇒ Linear in the number of neighbors j of i with X j = 1 ◮ Ex: Specifying α i = α + |N i | β 2 and β ij = β 1 − β 2 yields � � � X N i = x N i � � � P X i = 1 � � log = α + β 1 x j + β 2 (1 − x j ) � � X N i = x N i � � P X i = 0 j ∈N i j ∈N i ⇒ Linear also in the number of neighbors j of i with X j = 0 Network Science Analytics Prediction for Processes on Network Graphs 13

  14. MRFs for continuous random variables ◮ MRFs with continuous RVs: replace PMFs/sums with pdfs/integrals ⇒ Gaussian distribution common for analytical tractability � X N i = x N i , with ◮ Ex: auto-Gaussian model specifies Gaussian X i � � X N i = x N i � � � � = α i + β ij ( x j − α j ) E X i j ∈N i � X N i = x N i = σ 2 � � � var X i ⇒ Values X i modeled as weighted combinations of i ’s neighbors ◮ Let µ = [ α 1 , . . . , α N v ] ⊤ and Σ = σ 2 ( I − B ) − 1 , where B = [ β ij ] ⇒ Under β ii = 0 and β ij = β ji → X ∼ N ( µ , Σ ) ◮ Homogeneity assumptions can be imposed, simplifying expressions ⇒ Further set α i = α and β ij = β → X ∼ N ( α 1 , σ 2 ( I − β A ) − 1 ) Network Science Analytics Prediction for Processes on Network Graphs 14

  15. Inference and prediction for MRFs ◮ In studying process X = { X i } i ∈ V of interest to predict some or all of X ◮ MRF models we have seen for this purpose are of the form � 1 � exp { U ( x ; θ ) } P θ ( X = x ) = κ ( θ ) ⇒ Parameter θ low-dimensional, e.g., θ = [ α, β ] in auto-models ◮ Predictions can be generated based on the distribution P θ ( · ) ⇒ Knowledge of θ is necessary, and typically θ is unknown ◮ Unlike nearest-neighbors prediction, MRFs requires inference of θ first Network Science Analytics Prediction for Processes on Network Graphs 15

Recommend


More recommend