R-SUSCEPTIBILITY An IR-Centric Approach to Assessing Privacy Risks for Users in Online Communities Joanna Asia Biega Krishna P . Gummadi, Ida Mele, Dragan Milchevski, Christos Tryfonopoulos, Gerhard Weikum SIGIR 2016
APPROACHING PRIVACY Data publishing Online communities gender age disease user1 male 37 cancer heart user2 male 37 female disease 20-30 user3 female 42 cancer Prevent deanonymization, Account linking Prevent attribute disclosure Attribute inference 2
APPROACHING PRIVACY Data publishing Online communities gender age disease user1 male 37 cancer heart user2 male 37 female disease 20-30 user3 female 42 cancer Prevent deanonymization, Account linking Prevent attribute disclosure Attribute inference (not in this work) 3
PRIVACY IN ONLINE COMMUNITIES WITH TEXTUAL DATA Build reputation 13.07.2011, user1: Studies show alarming depression rates among teenagers. Get information Share information 17.05.2011, user2: 13.07.2011, user3: Should I inform my potential employer during an On a cocktail of antidepressants and getting interview that I am 3 months pregnant? crazy hallucinations :o Not obvious how to apply noise Quantify, inform, and guide 4
IN THE (IR) WILD Search : 5
IN THE (IR) WILD Search : Great student party #sigir2016 Shouldn’t have drunk that much #wine. #drunk ;) 6
IN THE (IR) WILD Search : drunk wine party HR 7
IN THE (IR) WILD Search : drunk wine party HR user_1 user_2 Great student party #sigir2016. Shouldn’t have drunk that much #wine. #drunk ;) user_3 user_4 8
IN THE (IR) WILD - MORE EXAMPLES Search : - drunk wine wasted party Remote HR - bungee jump adrenaline search - depressed anxiety antidepressant U_1 U_2 Local crawl … U_k 9
IN THE WILD 10
PRIVACY RISKS VIA EXPOSURE IN A COMMUNITY Criterion : <topic> U_1 U_2 … U_k 11
R-SUSCEPTIBILITY Criterion : <topic> Rank-Susceptibility 1. U_1 2. U_2 … … k. U_k 12
R-SUSCEPTIBILITY: FRAMEWORK FOR TEXTUAL DATA drug addiction: financial debts depression: (1) Topics: drug, addiction, debt, loan, depression, suicide, addict, cocaine, … pay, student, … depressed, suffer, … high high medium (2) Sensitivity: U_1 U_7 U_78 R-Susceptibility (3) Risk Scores U_2 U_13 U_1 … … … U_k U_14 U_k 13
OVERVIEW ➤ R-Susceptibility framework ➤ Topics ➤ Topic sensitivity ➤ Risk measures ➤ Baselines ➤ Topic-model-based ➤ Experiments ➤ Summary 14
OVERVIEW ➤ R-Susceptibility framework ➤ Topics ➤ Topic sensitivity ➤ Risk measures ➤ Baselines ➤ Topic-model-based ➤ Experiments ➤ Summary 15
R-SUSCEPTIBILITY: TOPICS drug addiction: financial debts depression: Topics: drug, addiction, debt, loan, depression, suicide, addict, cocaine, … pay, student, … depressed, suffer, … LDA Quora: NYT: 500 topics 500 topics 600k posts 700k articles 16
OVERVIEW ➤ R-Susceptibility framework ➤ Topics ➤ Topic sensitivity ➤ Risk measures ➤ Baselines ➤ Topic-model-based ➤ Experiments ➤ Summary 17
R-SUSCEPTIBILITY: TOPIC SENSITIVITY drug addiction: financial debts depression: Topics: drug, addiction, debt, loan, depression, suicide, addict, cocaine, … pay, student, … depressed, suffer, … ? Sensitivity: high high medium 18
IDENTIFYING SENSITIVE TOPICS drug addiction: financial debts depression: Topics: drug, addiction, debt, loan, depression, suicide, addict, cocaine, … pay, student, … depressed, suffer, … If a user’s post in an online community contained these words, would you consider it privacy sensitive? Sensitivity: 19
IDENTIFYING SENSITIVE TOPICS drug addiction: financial debts depression: Topics: drug, addiction, debt, loan, depression, suicide, addict, cocaine, … pay, student, … depressed, suffer, … yes yes no yes no no (2 topic models * 500 topics * 7 judgements per topic) yes yes no Sensitivity: # yes / # 20
OVERVIEW ➤ R-Susceptibility framework ➤ Topics ➤ Topic sensitivity ➤ Risk measures ➤ Baselines ➤ Topic-model-based ➤ Experiments ➤ Summary 21
ENTROPY BASELINE X1 = X2 = X3 = X4 = salient attributes for topic X depression anxiety psychiatrist paxil P(0), P(1) (community without user U) U U ∗ (community with user U) average KL-divergence of salient words distributions risk ( U 0 , X ) = 1 P U ∗ [ x i = v ] log ( P U [ x i = v ] X X P U ∗ [ x i = v ]) j i v = { 0 , 1 } over attributes over values 22
DIFF-PRIV BASELINE X1 = X2 = X3 = X4 = salient attributes for topic X depression anxiety psychiatrist paxil P(0), P(1) (community without user U) U U ∗ (community with user U) Inspired by the differential privacy principle and P U [ x i ] ≤ 2 � P U ∗ [ x i ] P U ∗ [ x i ] ≤ 2 � P U [ x i ] ✓ P U [ x i ] ✓ ✓ ◆ ✓ P U ∗ [ x i ] ◆◆◆ risk ( U 0 , X ) = max max log , log P U ∗ [ x i ] P U [ x i ] x i probability increases or decreases over attributes 23
OVERVIEW ➤ R-Susceptibility framework ➤ Topics ➤ Topic sensitivity ➤ Risk measures ➤ Baselines ➤ Topic-model-based ➤ Strength of interest } Which aspects matter ➤ Breadth of interest when it comes to human risk perception? ➤ Temporal variation of interest ➤ Experiments ➤ Summary 24
TOPIC-MODEL RISK SCORE: BUILDING BLOCKS antidepressant depression psychiatrist oscar celebrity R-Susceptibility topic model 25
TOPIC-MODEL RISK SCORE: BUILDING BLOCKS Quantifying user interest in a topic antidepressant depression n o i s s psychiatrist e r p e D = r oscar e X s U = celebrity X _ U R-Susceptibility topic model Details in the paper 26
TOPIC-MODEL RISK SCORE: STRENGTH OF INTEREST 24.10.2012 misbehaving dog 24.10.2012 anxiety 24.10.2012 dog trainers 24.10.2012 feeling lonely 27.10.2012 dentists LA 27.10.2012 psychiatrist nyc 29.10.2012 knitting tutorial 29.10.2012 central park events 03.12.2012 christmas tree shop LA 03.12.2012 antidepressants 10.12.2012 christmas recipes 10.12.2012 xanax side effects 27
TOPIC-MODEL RISK SCORE: STRENGTH OF INTEREST 24.10.2012 misbehaving dog 24.10.2012 anxiety 24.10.2012 dog trainers 24.10.2012 feeling lonely 27.10.2012 dentists LA 27.10.2012 psychiatrist nyc 29.10.2012 knitting tutorial 29.10.2012 central park events 03.12.2012 christmas tree shop LA 03.12.2012 antidepressants 10.12.2012 christmas recipes 10.12.2012 xanax side effects 28
TOPIC-MODEL RISK SCORE: STRENGTH OF INTEREST Three dimensions of user interest Strength of interest g n i k n n o a i R t i s o p depression xanax psychiatrist risk ( U, X ) = cos ( ~ U, ~ X ) 29
TOPIC-MODEL RISK SCORE: BREADTH OF INTEREST 24.10.2012 anxiety 24.10.2012 anxiety 24.10.2012 feeling lonely 24.10.2012 clinical depression 03.11.2012 psychiatrist nyc 03.11.2012 anatomy course book 07.11.2012 central park events 07.11.2012 central park events 03.12.2012 antidepressants 03.12.2012 liver cancer stats 10.12.2012 xanax side effects 10.12.2012 anorexia nervosa 30
TOPIC-MODEL RISK SCORE: BREADTH OF INTEREST 24.10.2012 anxiety 24.10.2012 anxiety 24.10.2012 feeling lonely 24.10.2012 clinical depression 03.11.2012 psychiatrist nyc 03.11.2012 anatomy course book 07.11.2012 central park events 07.11.2012 central park events 03.12.2012 antidepressants 03.12.2012 liver cancer stats 10.12.2012 xanax side effects 10.12.2012 anorexia nervosa 31
TOPIC-MODEL RISK SCORE: BUILDING BLOCKS REVISITED Quantifying user interest in a topic D = Psychiatry antidepressant depression n o i s s psychiatrist e r p e D = r oscar e X s U = celebrity X _ U R-Susceptibility topic model Details in the paper 32
TOPIC-MODEL RISK SCORE: BREADTH OF INTEREST Three dimensions of user interest Strength of interest depression g n i k n research career n o a i R t i s psychiatry o p depression xanax psychiatrist Breadth of interest risk ( U, X, D ) = cos ( ~ U, ~ X ) − cos ( ~ U, ~ D − ~ X ) 33
TOPIC-MODEL RISK SCORE: TEMPORAL VARIATION OF INTEREST 24.10.2012 anxiety 24.10.2012 anxiety 24.10.2012 feeling lonely 24.10.2012 feeling lonely 03.11.2012 psychiatrist nyc 24.10.2012 psychiatrist nyc 07.11.2012 central park events 24.10.2012 central park events 03.12.2012 antidepresssants 24.10.2012 antidepresssants 10.12.2012 xanax side effects 24.10.2012 xanax side effects 34
TOPIC-MODEL RISK SCORE: TEMPORAL VARIATION OF INTEREST 24.10.2012 anxiety 24.10.2012 anxiety 24.10.2012 feeling lonely 24.10.2012 feeling lonely 03.11.2012 psychiatrist nyc 24.10.2012 psychiatrist nyc 07.11.2012 central park events 24.10.2012 central park events 03.12.2012 antidepresssants 24.10.2012 antidepresssants 10.12.2012 xanax side effects 24.10.2012 xanax side effects 35
TOPIC-MODEL RISK SCORE: TEMPORAL VARIATION OF INTEREST time U = { ( v 1 , t 1 ) , ..., ..., ..., ..., ..., ..., ( v k , t k ) } 36
TOPIC-MODEL RISK SCORE: TEMPORAL VARIATION OF INTEREST time U = { ( v 1 , t 1 ) , ..., ..., ..., ..., ..., ..., ( v k , t k ) }
Recommend
More recommend