Learning a Distance Metric for Structured Network Prediction Stuart Andrews and Tony Jebara Columbia University Learning to Compare Examples Workshop, December 8, 2006
Outline • Introduction • Context, motivation & problem definition • Contributions • Structured network characterization • Network prediction model • Distance-based score function • Maximum-margin learning • Experiments • 1-Matchings on toy data • Equivalence networks on face images • Preliminary results on social networks • Future & related work, summary and conclusions Learning to Compare Examples Workshop, December 8, 2006
Context • Pattern classification • Inputs & outputs • Independent and identically distributed • Pattern classification for structured objects • Sets of inputs & outputs • Model dependencies amongst output variables • Parameterize model using a Mahalanobis distance metric Learning to Compare Examples Workshop, December 8, 2006
Motivation for structured network prediction • Man made and natural formed networks exhibit a high degree of structural regularity Learning to Compare Examples Workshop, December 8, 2006
Motivation • Scale free networks Protein-interaction network, Jeffrey Heer, Berkeley Barabási & Oltvai, Nature Genetics, 2004 Learning to Compare Examples Workshop, December 8, 2006
Motivation • Equivalence networks Equivalence network on Olivetti face images - union of vertex-disjoint complete subgraphs Learning to Compare Examples Workshop, December 8, 2006
Structured network prediction • Given 1 2 • { x 1 , . . . , x n } n entities with attributes x k ∈ R d 5 3 • And a structural prior on networks 4 • Output • Network of similar entities with desired structure 1 2 y = ( y j,k ) y j,k ∈ { 0 , 1 } 5 3 1 1 1 1 4 1 1 1 1 1 1 Learning to Compare Examples Workshop, December 8, 2006
Applications • • Tasks • Initializing • Augmenting • Filtering of networks • Domains • E-commerce • Social network analysis • Network biology Learning to Compare Examples Workshop, December 8, 2006
Challenges for SNP • How can we take structural prior into account? • Complex dependencies amongst atomic edge predictions • What similarity should we use? • Avoid engineering similarity metric for each domain Learning to Compare Examples Workshop, December 8, 2006
Structural network priors - 1 • Degree of a node δ ( v ) v δ ( v ) = 5 • Number of incident edges • • • Degree distribution • Probability of node having degree k, for all k Learning to Compare Examples Workshop, December 8, 2006
Degree distributions &'()''*&+,-)+./-+01 %&'(&&)%*+,(*-.,*/0 ! ! !" “rich get richer” 4000 ! ! ! # !" !" nodes 20(*4536 11/')3425 protein interaction ! % !" ! # !" network 4233 nodes ! $ !" ! $ !" # " ! # !" !" !" !" 20(*3 1/')2 degree distribution %&'(&&)%*+,(*-.,*/0 0.35 ! ! social network 6848 !" nodes 0.3 0.25 11/')3425 ! # !" p(k) 0.2 equivalence network 0.15 300 nodes ! $ !" 0.1 0.05 " ! # 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 !" !" !" k 1/')2 Learning to Compare Examples Workshop, December 8, 2006
Structural network priors - 2 • Combinatorial families • Chains • Trees & forests • Cycles • Unions of disjoint complete subgraphs • Generalized matchings Learning to Compare Examples Workshop, December 8, 2006
⇔ p(k) B-matchings • A b-matching has for (almost) all δ ( v ) = b v k b 1 2 1 2 1 2 1 2 5 3 5 3 5 3 5 3 4 4 4 4 1-matching 2-matching 3-matching 4-matching y j,k ∈ { 0 , 1 } � � y j,k = b ∀ j y j,k = b ∀ k y ∈ B j k • We consider B-matching networks because they B are flexible and efficient Learning to Compare Examples Workshop, December 8, 2006
Predictive Model • Maximum weight b-matching as predictive model 1. Receive nodes and attributes s = ( s j,k ) s j,k ∈ R 2. Compute edge weights 3. Select a b-matching with maximal weight � y ∈ B y T s y j,k s j,k = max max y ∈ B j,k • B-matchings requires time O ( n 3 ) Learning to Compare Examples Workshop, December 8, 2006
Structured network prediction • The question that remains is how do we compute the weights? Learning to Compare Examples Workshop, December 8, 2006
Learning the weights • • Weights are parameterized by a Mahalanobis distance metric • s j,k = ( x j − x k ) T Q ( x j − x k ) Q � 0 • • In other words, we want to find the best linear transformation (rotation & scaling) to facilitate b-matching Learning to Compare Examples Workshop, December 8, 2006
Learning the weights • • We propose to learn the weights from one or more partially observed networks • We observe the attributes of all nodes • But only a subset of the edges • • Transductive approach • Learn weights to “fit” training edges train edges • While structured network prediction is performed over training and test edges test edges Learning to Compare Examples Workshop, December 8, 2006
Example • Given the following nodes & edges Learning to Compare Examples Workshop, December 8, 2006
Example • 1-matching Q = Learning to Compare Examples Workshop, December 8, 2006
Example • 1-matching Learning to Compare Examples Workshop, December 8, 2006
Example • 1-matching Learning to Compare Examples Workshop, December 8, 2006
Maximum-margin Taskar et al. 2005 • We use the dual-extragradient algorithm to learn Q • Define the margin to be the minimum gap between the predictive y ∈ B values of the true structure and each possible alternative y 1 , y 2 , . . . ∈ B structure [ ] d 1 , 1 d 1 , 2 R d 1 , 2 s Q ( x ) T y s Q ( x ) = vec s Q ( x ) T y 1 s Q ( x ) T y 2 s Q ( x ) T y 3 Learning to Compare Examples Workshop, December 8, 2006
Maximum-margin Taskar et al. 2005 • We use the dual-extragradient algorithm to learn Q • Define the margin to be the minimum gap between the predictive y ∈ B values of the true structure and each possible alternative y 1 , y 2 , . . . ∈ B structure [ ] d 1 , 1 d 1 , 2 R d 1 , 2 s Q ( x ) T y s Q ( x ) = vec s Q ( x ) T ( y − y 1 ) ≥ 1 s Q ( x ) T y 1 s Q ( x ) T y 2 s Q ( x ) T y 3 Learning to Compare Examples Workshop, December 8, 2006
Maximum-margin Taskar et al. 2005 • We use the dual-extragradient algorithm to learn Q • Define the margin to be the minimum gap between the predictive y ∈ B values of the true structure and each possible alternative y 1 , y 2 , . . . ∈ B structure [ ] d 1 , 1 d 1 , 2 R d 1 , 2 s Q ( x ) T y s Q ( x ) = vec s Q ( x ) T ( y − y 1 ) ≥ 1 s Q ( x ) T y 1 s Q ( x ) T ( y − y 2 ) ≥ 1 s Q ( x ) T y 2 s Q ( x ) T y 3 Learning to Compare Examples Workshop, December 8, 2006
Maximum-margin • You can think of the dual extragradient algorithm as successively minimizing the violation of the gap constraints • Each iteration focusses on “worst offending network” s Q ( x ) T ˜ y bad = argmin 1. y ˜ y ∈ B Learning to Compare Examples Workshop, December 8, 2006
Maximum-margin • You can think of the dual extragradient algorithm as successively minimizing the violation of the gap constraints • Each iteration focusses on “worst offending network” s Q ( x ) T ˜ y bad = argmin 1. y ˜ y ∈ B Q = Q − ǫ∂ gap ( y , y bad ) 2. ∂ Q Learning to Compare Examples Workshop, December 8, 2006
Maximum-margin • You can think of the dual extragradient algorithm as successively minimizing the violation of the gap constraints • Each iteration focusses on “worst offending network” s Q ( x ) T ˜ y bad = argmin 1. y ˜ y ∈ B d j,k = ( x j − x k ) T Q ( x j − x k ) Q = Q − ǫ∂ gap ( y , y bad ) 2. = � Q, ( x j − x k )( x j − x k ) T � ∂ Q linear in Q � � ( x j − x k )( x j − x k ) T − ( x j − x k )( x j − x k ) T jk ∈ FP jk ∈ FN Learning to Compare Examples Workshop, December 8, 2006
Maximum-margin • You can think of the dual extragradient algorithm as successively minimizing the violation of the gap constraints • Each iteration focusses on “worst offending network” s Q ( x ) T ˜ y bad = argmin 1. y ˜ y ∈ B d j,k = ( x j − x k ) T Q ( x j − x k ) Q = Q − ǫ∂ gap ( y , y bad ) 2. = � Q, ( x j − x k )( x j − x k ) T � ∂ Q Caveat: this is not the whole story! Thanks to Simon Lacoste-Julien for help debugging � � ( x j − x k )( x j − x k ) T − ( x j − x k )( x j − x k ) T jk ∈ FP jk ∈ FN Learning to Compare Examples Workshop, December 8, 2006
Experiments • How does it work in practice? Learning to Compare Examples Workshop, December 8, 2006
Recommend
More recommend