Classifying Online Social Network Users Through the Social Graph - PowerPoint PPT Presentation

Classifying Online Social Network Users Through the Social Graph Cristina P´ erez Sol` a and Jordi Herrera Joancomart´ ı Departament d’Enginyeria de la Informaci´ o i les Comunicacions Universitat Aut` onoma de Barcelona October 25th, 2012

Introduction Classifier proposal The experiments Conclusions and further work Introduction 1 Classifier proposal 2 The experiments 3 Conclusions and further work 4 2 / 23

Introduction Classifier proposal The experiments Conclusions and further work About the title Classifying... Definition Classification is the problem of identifying to which of a set of categories a new observation belongs. The decision is made on the basis of a training set of data containing observations whose category membership is already known. 3 / 23

Introduction Classifier proposal The experiments Conclusions and further work About the title ... Online Social Network Users... 4 / 23

Introduction Classifier proposal The experiments Conclusions and further work About the title ...Through the Social Graph Definition A social graph is a graph where nodes represent users in a social network and edges represent relationships between these users. 5 / 23

Introduction Classifier proposal The experiments Conclusions and further work What do we want to do? Goals Design a user (node) classifier that uses the graph structure alone (no semantic information is needed). Apply the previously designed classifier to label OSN users. Demonstrate that OSN user classification is possible with naively anonymized graphs. 6 / 23

Introduction Classifier proposal The experiments Conclusions and further work Why is it interesting? Motivation User classification as a privacy attack User classification allows an attacker to infer (private) attributes from the user. Attributes may be sensitive by themselves. Attribute disclosure may have undesirable consecuences for the user. In any case, the user is not able to control the disclosure of the information about himself anymore... 7 / 23

Introduction Classifier proposal The experiments Conclusions and further work Introduction 1 Classifier proposal 2 Architecture overview Classifier modules Specific design details The experiments 3 Conclusions and further work 4 8 / 23

Introduction Classifier proposal The experiments Conclusions and further work Architecture overview Classifier Architecture The proposed classifier is implemented with a 5 module architecture, which includes two different classifiers: an initial classifier and a relational classifier. Class labels Data Initial Neighborhood Data Relational New class preprocessing classifier analysis preprocessing classifier labels Clus. coeff. & degrees 9 / 23

Introduction Classifier proposal The experiments Conclusions and further work Classifier modules Initial classifier The initial classifier analyzes the graph structure and maps each node to a 2-dimensional sample: degree & clustering coefficient. The output is an initial assignation of nodes to categories. 10 / 23

Introduction Classifier proposal The experiments Conclusions and further work Classifier modules Neighborhood analysis The neighborhood analysis module reports to which kind of nodes is every node connected, using the labels assigned by the initial classifier. 11 / 23

Introduction Classifier proposal The experiments Conclusions and further work Classifier modules Relational classifier The relational classifier maps users to n -dimensional samples, using both degree & clustering coefficient and the neighborhood information to classify users. The output is a new assignation of nodes to categories, which can differ from the initial classification. 12 / 23

Introduction Classifier proposal The experiments Conclusions and further work Specific design details Some details about the classifier The graph is directed, so we distinguish between indegree and outdegree (instead of having just degree). This distinction increases by 2 the number of dimensions in the neighborhood analysis. We can have as many categories as we want: we just have to add more dimensions! Classifiers are instantiated with Support Vector Machines with soft margins. The relational classifier is applied iteratively. 13 / 23

Introduction Classifier proposal The experiments Conclusions and further work Introduction 1 Classifier proposal 2 The experiments 3 Experiment design Experiment results Conclusions and further work 4 14 / 23

Introduction Classifier proposal The experiments Conclusions and further work Experiment design The main goal Research question Is an attacker able to recover attributes from OSN users knowing just the social graph structure and the attributes of a small subset of the nodes in the graph? We are facing a within network classification problem, where nodes for which the labels are unknown are linked to nodes for which the label is known. 15 / 23

Introduction Classifier proposal The experiments Conclusions and further work Experiment design Data used in the experiments We collected data from 936.423 Twitter users, which were all the neighbors of a subset of 300 nodes. We constructed two disjoint graphs G 1 = ( V 1 , E 1 ) and G 2 = ( V 2 , E 2 ) with users and their relationships. We labeled the nodes of the graphs to obtain the ground of truth: Binary classification: individual or company. Multiclass classification: normal user, blogger, celebrity, media and organization. 16 / 23

Introduction Classifier proposal The experiments Conclusions and further work Experiment design An experiment Each of the experiments consisted on: Randomly selecting a subset of nodes ( V train ) to be used as training samples: 65%, 50%, 35% and 20% of nodes. Training the classifiers with those samples. Classifying the rest of the nodes ( V test = V � V train ). Evaluating the overall performance using the ground of truth. We performed 100 experiments for each of the training set sizes and for both classification problems. 17 / 23

Introduction Classifier proposal The experiments Conclusions and further work Experiment results Binary Classification Results Correct rates 0.75 0.7 Correct rate 0.65 D1−65% train D1−50% train 0.6 D1−35% train D1−20% train D2−65% train 0.55 D2−50% train D2−35% train D2−20% train 0.5 0 1 2 3 4 5 6 7 8 9 10 Iteration 18 / 23

Introduction Classifier proposal The experiments Conclusions and further work Experiment results Multiclass Classification Results Correct rates 0.6 0.55 Correct rate 0.5 0.45 Cat a − 65% train 0.4 Cat a − 50% train Cat a − 35% train 0.35 Cat a − 20% train 0.3 0 1 2 3 4 5 6 7 8 9 10 Iteration 19 / 23

Introduction Classifier proposal The experiments Conclusions and further work Introduction 1 Classifier proposal 2 The experiments 3 Conclusions and further work 4 20 / 23

Introduction Classifier proposal The experiments Conclusions and further work Conclusions Conclusions Information found in the social graph is enough to perform classification. It is possible to classify OSN users using a naively anonymized copy of a social graph. Naive anonymization does not protect OSN users from attribute disclosure. Success rate varies depening on the training set sizes. 21 / 23

Introduction Classifier proposal The experiments Conclusions and further work Further work Further work Integrate both structural and semantic information to improve classification. Study the impact of different graph anonymization techniques (other than the naive anonymization) on the classification. Analyze the performance of other classification techniques for relational data. 22 / 23

Classifying Online Social Network Users Through the Social Graph Cristina P´ erez Sol` a and Jordi Herrera Joancomart´ ı Departament d’Enginyeria de la Informaci´ o i les Comunicacions Universitat Aut` onoma de Barcelona October 25th, 2012

Linear SVM 24 / 23

Non linear SVM 25 / 23

Classifying Online Social Network Users Through the Social Graph - PowerPoint PPT Presentation

Classifying Online Social Network Users Through the Social Graph Cristina P erez Sol` a and Jordi Herrera Joancomart Departament dEnginyeria de la Informaci o i les Comunicacions Universitat Aut` onoma de Barcelona October

V0D 2016 Classifying Studies V0D V0D 2016 Classifying Studies 1 2016 Classifying Studies

Classifying Homogeneous Structures Cherlin Introduction The finite case Gregory Cherlin

Fermilab Users Meeting Fermilab Users Meeting Fermilab Users Meeting Fermilab Users

European Social Network Social services in Europe Christian Fillet Chair, European Social

What is in it for them? Insights into social media users Insights into social media users

Classifying Unification Problems in Preliminaries Algebraic Unifiers Distributive Lattices and

In 2018, it is estimated that there will be around 2.55 billion social network users around

analysis of a real online social network using semantic web frameworks Guillaume Erto,

Classifying Internet One-way Traffic Eduard Glatz, Xenofontas Dimitropoulos ETH Zurich May 15,

Language and Computers Unsupervised Learning Features & Classifying Documents Evidence

Classifying Strictly Weakly Integral Modular Categories of Dimension 16p Elena Amparo College of

Cubical sets as a classifying topos Bas Spitters Carnegie Mellon University, Pittsburgh Aarhus

The (big) infinitesimal topos as a classifying topos Matthias Hutzler Universit at Augsburg

Classifying Laser Range Data Images Supervisor: Elin Anna Topp Fredrik Paulsson Shan

On the classifying space of an Artin monoid Giovanni Paolini Scuola Normale Superiore, Pisa

Classifying local four gluon S-matrices Subham Dutta Chowdhury November 20, 2020 YITP Strings

Invariant Theory of Artin-Schelter Regular Algebras: The Shephard-Todd-Chevalley Theorem Ellen

Modelling Requirements for Content Recommendation Systems Sarah Bouraga 1 , 2 Ivan Jureta 1 , 2 ,

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location:

University Credits 4.0 Project Group Proposal 2020/2021 whoami Interim Professor for IT security

2020 LFN Event Strategy: Technical Events Support and execute 4 LFN technical events 2,

. . - ~ ~ ' ~" ~ s ~.-.v i l .J G ~ l e L~ g o G ~ ~ ~ ~ .~ TOP SECRET // Sl!

Davis Social Links Leveraging Social Informatics for Cyber Security: Architecture,

srs trs

Sambuz

Useful Links

Newsletter

Mail Us

Classifying Online Social Network Users Through the Social Graph - PowerPoint PPT Presentation

Classifying Online Social Network Users Through the Social Graph Cristina P erez Sol` a and Jordi Herrera Joancomart Departament dEnginyeria de la Informaci o i les Comunicacions Universitat Aut` onoma de Barcelona October

V0D 2016 Classifying Studies V0D V0D 2016 Classifying Studies 1 2016 Classifying Studies

Classifying Homogeneous Structures Cherlin Introduction The finite case Gregory Cherlin

Fermilab Users Meeting Fermilab Users Meeting Fermilab Users Meeting Fermilab Users

European Social Network Social services in Europe Christian Fillet Chair, European Social

What is in it for them? Insights into social media users Insights into social media users

Classifying Unification Problems in Preliminaries Algebraic Unifiers Distributive Lattices and

In 2018, it is estimated that there will be around 2.55 billion social network users around

analysis of a real online social network using semantic web frameworks Guillaume Erto,

Classifying Internet One-way Traffic Eduard Glatz, Xenofontas Dimitropoulos ETH Zurich May 15,

Language and Computers Unsupervised Learning Features &amp; Classifying Documents Evidence

Classifying Strictly Weakly Integral Modular Categories of Dimension 16p Elena Amparo College of

Cubical sets as a classifying topos Bas Spitters Carnegie Mellon University, Pittsburgh Aarhus

The (big) infinitesimal topos as a classifying topos Matthias Hutzler Universit at Augsburg

Classifying Laser Range Data Images Supervisor: Elin Anna Topp Fredrik Paulsson Shan

On the classifying space of an Artin monoid Giovanni Paolini Scuola Normale Superiore, Pisa

Classifying local four gluon S-matrices Subham Dutta Chowdhury November 20, 2020 YITP Strings

Invariant Theory of Artin-Schelter Regular Algebras: The Shephard-Todd-Chevalley Theorem Ellen

Modelling Requirements for Content Recommendation Systems Sarah Bouraga 1 , 2 Ivan Jureta 1 , 2 ,

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location:

University Credits 4.0 Project Group Proposal 2020/2021 whoami Interim Professor for IT security

2020 LFN Event Strategy: Technical Events Support and execute 4 LFN technical events 2,

. . - ~ ~ ' ~&quot; ~ s ~.-.v i l .J G ~ l e L~ g o G ~ ~ ~ ~ .~ TOP SECRET // Sl!

Davis Social Links Leveraging Social Informatics for Cyber Security: Architecture,

srs trs

Sambuz

Useful Links

Newsletter

Mail Us

Language and Computers Unsupervised Learning Features & Classifying Documents Evidence

. . - ~ ~ ' ~" ~ s ~.-.v i l .J G ~ l e L~ g o G ~ ~ ~ ~ .~ TOP SECRET // Sl!