An Efficient reconciliation algorithm for social networks Silvio Lattanzi (Google Research NY) Joint work with: Nitish Korula (Google Research NY) ICERM Stochastic Graph Models
Outline Graph reconciliation Model and theoretical results. Experimental results From theory to practice. Open problems and future directions Stochastic Graph Models, ICERM
Graph reconciliation Stochastic Graph Models, ICERM
Real world motivations Stochastic Graph Models, ICERM
Real world motivations Intra-language network Stochastic Graph Models, ICERM
Real world motivations Intra-language network Inter-language network Stochastic Graph Models, ICERM
Real world motivations Can we use intra-language information to improve inter- language graph? Stochastic Graph Models, ICERM
Real world motivations Can we use intra-language information to improve inter- language graph? Stochastic Graph Models, ICERM
Real world motivations Can we use intra-language information to improve inter- language graph? ? Stochastic Graph Models, ICERM
Real world motivations Stochastic Graph Models, ICERM
Real world motivations Stochastic Graph Models, ICERM
Real world motivations Stochastic Graph Models, ICERM
Real world motivations Stochastic Graph Models, ICERM
Graph reconciliation problem Given two networks, identify as many users as possible across them. Applications: social networks ontology reconciliation Stochastic Graph Models, ICERM
Previous work Problem of reconciliation introduced by Novak et al. Stochastic Graph Models, ICERM
Previous work Problem of reconciliation introduced by Novak et al. Two main approaches: - ML on user profile features (name, location, image) Stochastic Graph Models, ICERM
Previous work Problem of reconciliation introduced by Novak et al. Two main approaches: - ML on user profile features (name, location, image) - ML on neighborhood topology Stochastic Graph Models, ICERM
Previous work Problem of reconciliation introduced by Novak et al. Two main approaches: - ML on user profile features (name, location, image) - ML on neighborhood topology Limitations: Stochastic Graph Models, ICERM
Previous work Very rich literature in de-anonymization Two relevant works: - Backstrom et al. propose an active and passive attack Stochastic Graph Models, ICERM
Previous work Very rich literature in de-anonymization Two relevant works: - Backstrom et al. propose an active and passive attack Stochastic Graph Models, ICERM
Previous work Very rich literature in de-anonymization Two relevant works: - Backstrom et al. propose an active and passive attack Stochastic Graph Models, ICERM
Previous work Very rich literature in de-anonymization Two relevant works: - Backstrom et al. propose an active and passive attack Stochastic Graph Models, ICERM
Previous work Very rich literature in de-anonymization Two relevant works: - Backstrom et al. propose an active and passive attack Stochastic Graph Models, ICERM
Previous work Very rich literature in de-anonymization Two relevant works: - Backstrom et al. propose an active and passive attack Stochastic Graph Models, ICERM
Previous work Very rich literature in de-anonymization Two relevant works: - Backstrom et al. propose an active and passive attack - Narayanan and Shmatikov successful de-anonymization attack Stochastic Graph Models, ICERM
Narayanan and Shmatikov experiment Ground truth 24000 matching across the two social networks Stochastic Graph Models, ICERM
Narayanan and Shmatikov experiment Ground truth 24000 matching across the two social networks 80 me-links Stochastic Graph Models, ICERM
Narayanan and Shmatikov experiment Ground truth 24000 matching across the two social networks 80 me-links They could re-identify 30.8% of the mappings. Stochastic Graph Models, ICERM
Narayanan and Shmatikov experiment Algorithm: Stochastic Graph Models, ICERM
Narayanan and Shmatikov experiment Algorithm: ? Stochastic Graph Models, ICERM
Narayanan and Shmatikov experiment Algorithm: 2 Stochastic Graph Models, ICERM
Narayanan and Shmatikov experiment Algorithm: 0 1 0 2 Stochastic Graph Models, ICERM
Narayanan and Shmatikov experiment Algorithm: 0 1 0 2 Stochastic Graph Models, ICERM
Narayanan and Shmatikov experiment Algorithm: Stochastic Graph Models, ICERM
Narayanan and Shmatikov experiment Algorithm: Why? Is it necessary to have high degree me-links? Stochastic Graph Models, ICERM
Abstraction Input: two graphs and a set of trusted matching We want to maximize the number of final matches. Stochastic Graph Models, ICERM
Is the problem tractable? Problem is similar to graph isomorphism Stochastic Graph Models, ICERM
Is the problem tractable? Problem is similar to graph isomorphism Problem seems even harder because we want to detect similar structure Stochastic Graph Models, ICERM
Is the problem tractable? Problem is similar to graph isomorphism Problem seems even harder because we want to detect similar structure Stochastic Graph Models, ICERM
Abstraction Formalization of the problem: Underlying social network Stochastic Graph Models, ICERM
Abstraction Formalization of the problem: Underlying social network Delete the edges independently p 1 p 2 Stochastic Graph Models, ICERM
Abstraction Formalization of the problem: Underlying social network Delete the edges independently p 1 p 2 Initial matchings Stochastic Graph Models, ICERM
Questions Having a constant fraction of me-links, can we reconcile the entire network? If we have k me-links which fraction of networks can we reconcile? Stochastic Graph Models, ICERM
Underlying social network Without additional assumption on the underling network problem seems still very hard Stochastic Graph Models, ICERM
Underlying social network Without additional assumption on the underling network problem seems still very hard We study two different models for social networks: - G(n,p) - Preferential attachment Stochastic Graph Models, ICERM
Our algorithm Algorithm: Narayanan Shmatikov + degree bucketing + acceptance threshold Stochastic Graph Models, ICERM
G(n,p) Does the technique works if the underlying graph is random? p 1 p p 2 Stochastic Graph Models, ICERM
G(n,p) Does the technique works if the underlying graph is random? p 1 p p 2 E [ N G 1 ( ∗ ) ∩ N G 2 ( ∗ )] = ( n − 1) pp 1 p 2 E [ N G 1 ( ∗ ) ∩ N G 2 ( ∗ )] = ( n − 2) p 2 p 1 p 2 Stochastic Graph Models, ICERM
Concentration c log n ≤ p ≤ 1 We assume 6 , l, p 1 , p 2 ∈ O (1) n Two cases: - , Chernoff bound is enough npp 1 p 2 l ≥ 24 log n - , we never make error npp 1 p 2 l ≤ 24 log n x = ( n − 2) p 2 p 1 p 2 " n # ✓ n ◆ = (1 − x ) n + nx (1 − x ) n − 1 + x 2 (1 − x ) n − 2 = 1 − n 3 x 3 − o ( n 3 x 3 ) X P = B i ≤ 2 2 i =1 Stochastic Graph Models, ICERM
More realistic model Preferential attachment: - is a single node with G m m 1 self-loops - adding a node to and G m G m m n − 1 n edges with probability proportional to the current degrees Stochastic Graph Models, ICERM
Preferential attachment A bit harder - Several nodes of constant degree, we need to have a cascade - Objective is reconcile a constant fraction of the network Stochastic Graph Models, ICERM
Sketch of the proof For high degree node we can use concentration results. Stochastic Graph Models, ICERM
Sketch of the proof For high degree node we can use concentration results. Different nodes of intermediate degree do not share many neighbors. Stochastic Graph Models, ICERM
Sketch of the proof For high degree node we can use concentration results. Different nodes of intermediate degree do not share many neighbors. High degree nodes help to detect intermediate degree nodes that in turn help to detect small degree nodes. Stochastic Graph Models, ICERM
PA structural lemmas High degree nodes are early birds. o (log 2 n ) Nodes inserted after time , for constant , have degree in φ n φ Stochastic Graph Models, ICERM
PA structural lemmas High degree nodes are early birds. o (log 2 n ) Nodes inserted after time , for constant , have degree in φ n φ The rich get richer. log 2 n For nodes of degree greater than a constant fraction of their neighbors has been inserted after time , for constant ✏ ✏ n Stochastic Graph Models, ICERM
PA structural lemmas High degree nodes are early birds. o (log 2 n ) Nodes inserted after time , for constant , have degree in φ n φ The rich get richer. log 2 n For nodes of degree greater than a constant fraction of their neighbors has been inserted after time , for constant ✏ ✏ n First-mover advantage. All nodes inserted before time , have degree at least n 0 . 3 log 3 n Stochastic Graph Models, ICERM
High degree nodes are early birds G m G m 1 n Stochastic Graph Models, ICERM
Recommend
More recommend