Cross-Graph Learning of Multi-Relational Associations Hanxiao Liu, Yiming Yang Carnegie Mellon University { hanxiaol, yiming } @cs.cmu.edu June 22, 2016 1 / 24
Outline Task Description New Contributions Framework Scalable Inference Empirical Evaluation Summary 2 / 24
Task Description Goal : Predict associations among heterogeneous graphs. Citation Structure Similarity Sequence Similarity Coauthorship Shared Foci Paper Publish Write Interact Author Venue Compound Protein Attend (a) Drug-Target Interaction (b) Citation Network “John publish a reinforcement learning paper at ICML.” (John,RL Paper,ICML) 3 / 24
Outline Task Description New Contributions Framework Scalable Inference Empirical Evaluation Summary 4 / 24
New Contributions ◮ A unified framework to integrating heterogeneous information in multiple graphs. ◮ Transductive learning to leverage both labeled data (sparse) and unlabeled data (massive). ◮ A convex approximation for the scalable inference over the combinatorial number of possible tuples. 5 / 24
Outline Task Description New Contributions Framework Scalable Inference Empirical Evaluation Summary 6 / 24
Framework Notation ◮ G (1) , G (2) , . . . , G ( J ) are individual graphs; ◮ n j is the #nodes in G ( j ) ; ◮ ( i 1 , i 2 , . . . , i J ) is a tuple (multi-relation); ◮ f i 1 ,i 2 ,...,i J is the predicted score for the tuple; ◮ f is a tensor in R n 1 × n 2 ×···× n J . 7 / 24
Framework Product Graph ( P ) induced from G (1) , . . . , G ( J ) . � � P = , , � �� � � �� � � �� � G (1) G (2) G (3) Tensor product: P ( G (1) , G (2) , G (3) ) = G (1) ⊗ G (2) ⊗ G (3) 8 / 24
Framework Product Graph ( P ) induced from G (1) , . . . , G ( J ) . � � P = , , � �� � � �� � � �� � G (1) G (2) G (3) Tensor product: P ( G (1) , G (2) , G (3) ) = G (1) ⊗ G (2) ⊗ G (3) 8 / 24
Framework Why product graph? ◮ Mapping heterogeneous graphs onto a unified graph for label propagation (transductive learning). 9 / 24
Framework Assuming vec ( f ) ∼ N (0 , P ) (1) which implies: − log p ( f | P ) ∝ vec ( f ) ⊤ P − 1 vec ( f ) := � f � 2 (2) P Optimization problem ℓ O ( f ) + γ 2 � f � 2 min (3) P f 10 / 24
Framework Assuming vec ( f ) ∼ N (0 , P ) (1) which implies: − log p ( f | P ) ∝ vec ( f ) ⊤ P − 1 vec ( f ) := � f � 2 (2) P Optimization problem ℓ O ( f ) + γ 2 � f � 2 min (3) P f 10 / 24
Framework Assuming vec ( f ) ∼ N (0 , P ) (1) which implies: − log p ( f | P ) ∝ vec ( f ) ⊤ P − 1 vec ( f ) := � f � 2 (2) P Optimization problem ℓ O ( f ) + γ 2 � f � 2 min (3) P f 10 / 24
Framework For computational tractability, we focus on the spectral graph product family of P . Spectral Graph Product (SGP) � G (1) , . . . , G ( J ) � The eigensystem of P κ is parametrized by the eigensystems of individual graphs, i.e., � � � � � κ λ i 1 , . . . , λ i J , v i j (4) i 1 ,...,i J j λ i j / v i j is the i j -th eigenvalue/eigenvector of the j -th graph. 11 / 24
Framework Nice properties of SGP: Subsuming basic operations κ ( x, y ) = x × y = ⇒ P κ ( G, H ) = G ⊗ H Tensor (5) κ ( x, y ) = x + y = ⇒ P κ ( G, H ) = G ⊕ H Cartesian (6) Supporting graph diffusions σ Heat ( P κ ) = I + P κ + 1 2 P 2 κ + · · · = P e κ (7) σ von − Neumann ( P κ ) = I + P κ + P 2 κ + · · · = P (8) 1 1 − κ Order-insensitive: If κ is commutative, then SGP is commutative (up to graph isomorphism). 12 / 24
Framework Nice properties of SGP: Subsuming basic operations κ ( x, y ) = x × y = ⇒ P κ ( G, H ) = G ⊗ H Tensor (5) κ ( x, y ) = x + y = ⇒ P κ ( G, H ) = G ⊕ H Cartesian (6) Supporting graph diffusions σ Heat ( P κ ) = I + P κ + 1 2 P 2 κ + · · · = P e κ (7) σ von − Neumann ( P κ ) = I + P κ + P 2 κ + · · · = P (8) 1 1 − κ Order-insensitive: If κ is commutative, then SGP is commutative (up to graph isomorphism). 12 / 24
Framework Nice properties of SGP: Subsuming basic operations κ ( x, y ) = x × y = ⇒ P κ ( G, H ) = G ⊗ H Tensor (5) κ ( x, y ) = x + y = ⇒ P κ ( G, H ) = G ⊕ H Cartesian (6) Supporting graph diffusions σ Heat ( P κ ) = I + P κ + 1 2 P 2 κ + · · · = P e κ (7) σ von − Neumann ( P κ ) = I + P κ + P 2 κ + · · · = P (8) 1 1 − κ Order-insensitive: If κ is commutative, then SGP is commutative (up to graph isomorphism). 12 / 24
Outline Task Description New Contributions Framework Scalable Inference Empirical Evaluation Summary 13 / 24
Scalable Inference For general GP, the semi-norm is computed as � f � 2 P = vec ( f ) ⊤ P − 1 vec ( f ) (9) For SGP, P κ no longer has to be explicitly computed. � � 2 n 1 ,n 2 ,...,n J f v i 1 , . . . , v i J � � f � 2 P κ = (10) � � κ λ i 1 , . . . , λ i J i 1 ,i 2 ,...,i J ◮ f ( v i 1 , v i 2 , . . . , v i J ) = f × 1 v i 1 × 2 v i 2 · · · × J v i J ◮ However, even evaluating (10) is expensive. 14 / 24
Scalable Inference For general GP, the semi-norm is computed as � f � 2 P = vec ( f ) ⊤ P − 1 vec ( f ) (9) For SGP, P κ no longer has to be explicitly computed. � � 2 n 1 ,n 2 ,...,n J f v i 1 , . . . , v i J � � f � 2 P κ = (10) � � κ λ i 1 , . . . , λ i J i 1 ,i 2 ,...,i J ◮ f ( v i 1 , v i 2 , . . . , v i J ) = f × 1 v i 1 × 2 v i 2 · · · × J v i J ◮ However, even evaluating (10) is expensive. 14 / 24
Scalable Inference For general GP, the semi-norm is computed as � f � 2 P = vec ( f ) ⊤ P − 1 vec ( f ) (9) For SGP, P κ no longer has to be explicitly computed. � � 2 n 1 ,n 2 ,...,n J f v i 1 , . . . , v i J � � f � 2 P κ = (10) � � κ λ i 1 , . . . , λ i J i 1 ,i 2 ,...,i J ◮ f ( v i 1 , v i 2 , . . . , v i J ) = f × 1 v i 1 × 2 v i 2 · · · × J v i J ◮ However, even evaluating (10) is expensive. 14 / 24
Scalable Inference Using low-rank SGP ◮ f lies in the linear span of the eigenvectors of P . ◮ Eigenvectors of high volatility can be pruned away. 15 / 24
Scalable Inference Using low-rank SGP ◮ f lies in the linear span of the eigenvectors of P . ◮ Eigenvectors of high volatility can be pruned away. Figure : Eigenvectors of G (blue), H (red) and P ( G, H ). 15 / 24
Scalable Inference Restrict f in the linear span of “smooth” bases of P . d 1 ,d 2 , ··· ,d J � � f ( α ) = α i 1 ,i 2 , ··· ,i J v i j (11) i 1 ,i 2 , ··· ,i J =1 j where the core tensor α ∈ R d 1 × d 2 ×···× d J , d j ≪ n j . The semi-norm becomes d 1 ,d 2 , ··· ,d J α 2 � � f ( α ) � 2 i 1 ,i 2 , ··· ,i J P κ = (12) � � κ λ i 1 , λ i 2 , . . . , λ i J i 1 ,i 2 ,...,i J =1 We then optimize w.r.t. α instead of f . Parameter size: � j n j → � j d j . 16 / 24
Scalable Inference Restrict f in the linear span of “smooth” bases of P . d 1 ,d 2 , ··· ,d J � � f ( α ) = α i 1 ,i 2 , ··· ,i J v i j (11) i 1 ,i 2 , ··· ,i J =1 j where the core tensor α ∈ R d 1 × d 2 ×···× d J , d j ≪ n j . The semi-norm becomes d 1 ,d 2 , ··· ,d J α 2 � � f ( α ) � 2 i 1 ,i 2 , ··· ,i J P κ = (12) � � κ λ i 1 , λ i 2 , . . . , λ i J i 1 ,i 2 ,...,i J =1 We then optimize w.r.t. α instead of f . Parameter size: � j n j → � j d j . 16 / 24
Scalable Inference Figure : Tucker Decomposition, where α is the core tensor. 17 / 24
Scalable Inference Revised optimization objective α ∈ R d 1 × d 2 ···× dJ ℓ O ( f ( α )) + γ 2 � f ( α ) � 2 min (13) P κ Ranking loss function � � 2 � f i 1 ...i J − f i ′ 1 ...i ′ ( i 1 , . . . , i J ) ∈ O + J J ) ∈ ¯ ( i ′ 1 , . . . , i ′ O ℓ O ( f ) = (14) |O × ¯ O| � ∂f i 1 ,...,i J � − ∂f i ′ ∇ α = ∂ℓ O 1 ,...,i ′ + γα ⊘ κ J (15) ∂f ∂α ∂α Tensor algebras are carried out on GPU. 18 / 24
Scalable Inference Revised optimization objective α ∈ R d 1 × d 2 ···× dJ ℓ O ( f ( α )) + γ 2 � f ( α ) � 2 min (13) P κ Ranking loss function � � 2 � f i 1 ...i J − f i ′ 1 ...i ′ ( i 1 , . . . , i J ) ∈ O + J J ) ∈ ¯ ( i ′ 1 , . . . , i ′ O ℓ O ( f ) = (14) |O × ¯ O| � ∂f i 1 ,...,i J � − ∂f i ′ ∇ α = ∂ℓ O 1 ,...,i ′ + γα ⊘ κ J (15) ∂f ∂α ∂α Tensor algebras are carried out on GPU. 18 / 24
Scalable Inference Revised optimization objective α ∈ R d 1 × d 2 ···× dJ ℓ O ( f ( α )) + γ 2 � f ( α ) � 2 min (13) P κ Ranking loss function � � 2 � f i 1 ...i J − f i ′ 1 ...i ′ ( i 1 , . . . , i J ) ∈ O + J J ) ∈ ¯ ( i ′ 1 , . . . , i ′ O ℓ O ( f ) = (14) |O × ¯ O| � ∂f i 1 ,...,i J � − ∂f i ′ ∇ α = ∂ℓ O 1 ,...,i ′ + γα ⊘ κ J (15) ∂f ∂α ∂α Tensor algebras are carried out on GPU. 18 / 24
Outline Task Description New Contributions Framework Scalable Inference Empirical Evaluation Summary 19 / 24
Empirical Evaluation Datasets Enzyme 445 compounds, 664 proteins. DBLP 34 K authors, 11 K papers, 22 venues. Representative Baselines TF/GRTF Tensor Factorization/Graph-Regularized TF NN One-class Nearest Neighbor RSVM Ranking SVMs LTKM Low-Rank Tensor Kernel Machines 20 / 24
Recommend
More recommend