Relational learning with many relations Guillaume Obozinski Laboratoire d’Informatique Gaspard Monge ´ Ecole des Ponts - ParisTech Joint work with Rodolphe Jenatton, Nicolas Le Roux and Antoine Bordes. Labex B´ ezout - Huawei Seminar - April 3rd, 2015 Relational learning with many relations 1/24
Modelling relations between pairs of entities Triplets: Term 1 - Relation - Term 2 Relational learning with many relations 2/24
Modelling relations between pairs of entities Triplets: Term 1 - Relation - Term 2 Single relation Collaborative filtering Link prediction Modeling of social networks Relational learning with many relations 2/24
Modelling relations between pairs of entities Triplets: Term 1 - Relation - Term 2 Single relation Collaborative filtering Link prediction Modeling of social networks Multiple relations Collective classification Modelling in relational knowledge databases Proteins-protein and protein-ligand interactions Natural language semantics (and semantic role labelling) Relational learning with many relations 2/24
Our motivation : Learning the semantic value of verbs Model triplets: Subject Verb Object S i R j O k Relational learning with many relations 3/24
Our motivation : Learning the semantic value of verbs Model triplets: Subject Verb Object S i R j O k View this as the relation: R j ( S i , O k ) = 1 Relational learning with many relations 3/24
Different kinds of relational learning Learn to predict relations from object attributes: Binary classification from pairs of feature vectors Relational learning with many relations 4/24
Different kinds of relational learning Learn to predict relations from object attributes: Binary classification from pairs of feature vectors Exploit logical properties of relations: transitivity, implication, mutual exclusion, etc Markov Logic Networks (Kok and Domingos, 2007) Relational learning with many relations 4/24
Different kinds of relational learning Learn to predict relations from object attributes: Binary classification from pairs of feature vectors Exploit logical properties of relations: transitivity, implication, mutual exclusion, etc Markov Logic Networks (Kok and Domingos, 2007) Predict relations from some observed relations Relational learning with many relations 4/24
Different kinds of relational learning Learn to predict relations from object attributes: Binary classification from pairs of feature vectors Exploit logical properties of relations: transitivity, implication, mutual exclusion, etc Markov Logic Networks (Kok and Domingos, 2007) Predict relations from some observed relations Idea: relations derive from unobserved latent attributes. Relational learning from intrinsic latent attributes Relational learning with many relations 4/24
Stochastic Block Model Wang and Wong (1987); Nowicki and Snijders (2001) C ′ C i k Z ik Relational learning with many relations 5/24
Stochastic Block Model Wang and Wong (1987); Nowicki and Snijders (2001) C ′ C i k Z ik � P ( Z ik = 1 | C i = c , C ′ k = c ′ ) P ( C i = c ) P ( C ′ k = c ′ ) P ( Z ik = 1) = c , c ′ Relational learning with many relations 5/24
Stochastic Block Model Wang and Wong (1987); Nowicki and Snijders (2001) C ′ C i k Z ik � P ( Z ik = 1 | C i = c , C ′ k = c ′ ) P ( C i = c ) P ( C ′ k = c ′ ) P ( Z ik = 1) = c , c ′ � R cc ′ S ci O c ′ k = ( s i ) ⊤ Ro k P ik = c , c ′ Relational learning with many relations 5/24
Stochastic Block Model Wang and Wong (1987); Nowicki and Snijders (2001) C ′ C i k Z ik � P ( Z ik = 1 | C i = c , C ′ k = c ′ ) P ( C i = c ) P ( C ′ k = c ′ ) P ( Z ik = 1) = c , c ′ � R cc ′ S ci O c ′ k = ( s i ) ⊤ Ro k P ik = c , c ′ P = S ⊤ R O Relational learning with many relations 5/24
A matrix factorization problem = S ⊤ P R O 0 ≤ R ik ≤ 1 o k ∈ △ , s i ∈ △ △ = { x ∈ R p with + | � x � 1 = 1 } Relational learning with many relations 6/24
Stochastic Block Model for several relation types C ′ C i k Z ( j ) ik Relational learning with many relations 7/24
Stochastic Block Model for several relation types C ′ C i k Z ( j ) ik P ( Z ( j ) � P ( Z ( j ) ik = 1 | C i = c , C ′ k = c ′ ) P ( C i = c ) P ( C ′ k = c ′ ) ik = 1) = c , c ′ Relational learning with many relations 7/24
Stochastic Block Model for several relation types C ′ C i k Z ( j ) ik P ( Z ( j ) � P ( Z ( j ) ik = 1 | C i = c , C ′ k = c ′ ) P ( C i = c ) P ( C ′ k = c ′ ) ik = 1) = c , c ′ P ( j ) � [ R j ] cc ′ S ci O c ′ k = ( s i ) ⊤ R j o k ik = c , c ′ Relational learning with many relations 7/24
Stochastic Block Model for several relation types C ′ C i k Z ( j ) ik P ( Z ( j ) � P ( Z ( j ) ik = 1 | C i = c , C ′ k = c ′ ) P ( C i = c ) P ( C ′ k = c ′ ) ik = 1) = c , c ′ P ( j ) � [ R j ] cc ′ S ci O c ′ k = ( s i ) ⊤ R j o k ik = c , c ′ P j = S ⊤ R j O . Relational learning with many relations 7/24
Collective matrix factorization = S ⊤ O P j R j 0 ≤ [ R j ] ik ≤ 1 o k ∈ △ , s i ∈ △ △ = { x ∈ R p with + | � x � 1 = 1 } Relational learning with many relations 8/24
Collective matrix factorization = S ⊤ O P j R j 0 ≤ [ R j ] ik ≤ 1 o k ∈ △ , s i ∈ △ △ = { x ∈ R p with + | � x � 1 = 1 } Corresponds to the approach used in RESCAL (Nickel et al., 2012) S = O , R j � Z j − P j � 2 min F Relational learning with many relations 8/24
A bilinear logistic model s i o k Z ijk = R j ( S i , O k ) Relational learning with many relations 9/24
A bilinear logistic model s i o k Z ijk = R j ( S i , O k ) � − 1 P ( R j ( S i , O k ) = 1) = P ( j ) 1 + exp − η ( j ) � ik = ik with an “energy” E ( s i , R j , o k ) = η ( j ) ik = � s i , R j o k � Relational learning with many relations 9/24
A bilinear logistic model s i o k Z ijk = R j ( S i , O k ) � − 1 P ( R j ( S i , O k ) = 1) = P ( j ) 1 + exp − η ( j ) � ik = ik with an “energy” E ( s i , R j , o k ) = η ( j ) ik = � s i , R j o k � So that with H ( j ) = ( η ( j ) ik ) 1 ≤ i , k ≤ n we have H ( j ) = S ⊤ R j O Relational learning with many relations 9/24
Dealing with the number of parameters? : related work Relational learning with many relations 10/24
Dealing with the number of parameters? : related work Clustering of Entities and Relations Miller et al. (2009); Zhu (2012) Bayesian Non-parametric clustering: Kemp et al. (2006); Sutskever et al. (2009) Clustering in the context of Markov Logic Network: Kok and Domingos (2007) Relational learning with many relations 10/24
Dealing with the number of parameters? : related work Clustering of Entities and Relations Miller et al. (2009); Zhu (2012) Bayesian Non-parametric clustering: Kemp et al. (2006); Sutskever et al. (2009) Clustering in the context of Markov Logic Network: Kok and Domingos (2007) Embeddings Collective Matrix Factorization by (Nickel et al., 2012) ( rescal ) Semantic Matching Energy ( sme ) model of Bordes et al. (2012): encodes relations as vectors for scalability. Relational learning with many relations 10/24
Dealing with the number of parameters? : related work Clustering of Entities and Relations Miller et al. (2009); Zhu (2012) Bayesian Non-parametric clustering: Kemp et al. (2006); Sutskever et al. (2009) Clustering in the context of Markov Logic Network: Kok and Domingos (2007) Embeddings Collective Matrix Factorization by (Nickel et al., 2012) ( rescal ) Semantic Matching Energy ( sme ) model of Bordes et al. (2012): encodes relations as vectors for scalability. Tensor factorization CANDECOMP/PARAFAC Tucker (1966); Harshman and Lundy (1994) Probabilistic formulation of Chu and Ghahramani (2009) Relational learning with many relations 10/24
Our solution: Latent relational factors Idea: Modelling the relations between the relations... Relational learning with many relations 11/24
Our solution: Latent relational factors Idea: Modelling the relations between the relations... d � Θ r = u r v ⊤ α j R j = r Θ r , with r r =1 for some sparse vector α j ∈ R d . Relational learning with many relations 11/24
Our solution: Latent relational factors Idea: Modelling the relations between the relations... d � Θ r = u r v ⊤ α j R j = r Θ r , with r r =1 for some sparse vector α j ∈ R d . Given n r number of relations p embedding dimension: R j ∈ R p × p d number of latent relational factors ¯ s ≤ λ d average number of non-zero α coefficients Relational learning with many relations 11/24
Our solution: Latent relational factors Idea: Modelling the relations between the relations... d � Θ r = u r v ⊤ α j R j = r Θ r , with r r =1 for some sparse vector α j ∈ R d . Given n r number of relations p embedding dimension: R j ∈ R p × p d number of latent relational factors ¯ s ≤ λ d average number of non-zero α coefficients ⇒ we reduce the # of parameters from n r p 2 to 2 pd + ¯ sn r Relational learning with many relations 11/24
Algorithmic approach Large scale |P| = 10 6 Relational learning with many relations 12/24
Algorithmic approach Large scale |P| = 10 6 Stochastic projected block-coordinate gradient descent algorithm Relational learning with many relations 12/24
Recommend
More recommend