Fairness Constraints for Graph Embeddings* William L. Hamilton Assistant Professor at McGill University and Mila Canada CIFAR Chair in AI Visiting Researcher at Facebook AI Research *Joint work with my PhD student Joey Bose, to appear in ICML 2019 (pdf) William L. Hamilton, McGill University and Mila 1
Graph embeddings William L. Hamilton, McGill University and Mila 2
Application: Node classification ? ? ? ? Machine Learning ? William L. Hamilton, McGill University and Mila 4
Application: Link prediction ? ? x ? Machine Learning William L. Hamilton, McGill University and Mila 5
Becoming ubiquitous in social applications Graph embedding techniques are a powerful approach for § social recommendations, bot detection, content screening, behavior prediction, geo-localization, E.g., Facebook, Huawei, Uber Eats, Pinterest, LinkedIn, WeChat § Classic collaborative filtering approaches can be re- § interpreted in a more general graph embedding framework. William L. Hamilton, McGill University and Mila 6
But what about fairness and privacy? Graph embeddings designed to capture ev erything that ever § might be useful for the objective. Even if we don’t provide the model information about § s (e.g., gender or age), the model wi se sensi sitive a attributes will use th this infor formati tion on. . Wha What if a us user do doesn’ n’t want nt thi his inf nformation n us used? d? § William L. Hamilton, McGill University and Mila 7
Fairness from a pragmatic perspective Strict privacy and discrimination concerns are one § motivation. But what if users just don’t want their recommendations do § depend on certain attributes? What if users want the system to “ignore” parts of their § demographics or past behavior? William L. Hamilton, McGill University and Mila 8
Fairness in graph embeddings Basic idea: How can we learn node embeddings that are Ba § invariant to particular sensitive attributes? Cha Challeng nges: § Graph data is not i.i.d. § There is not just one classification task that we are trying to § enforce fairness on. There are often many possible sensitive attributes. § William L. Hamilton, McGill University and Mila 9
Our work: Fairness in graph embeddings William L. Hamilton, McGill University and Mila 10
Preliminaries and set-up Learning an encoder function to map nodes to embeddings: § Using these embeddings to “score” the likelihood of a § relationship between nodes: William L. Hamilton, McGill University and Mila 11
Preliminaries and set-up Learning an encoder function to map nodes to embeddings: § Using these embeddings to “score” the likelihood of a § relationship between nodes: Score of a (possible) edge is a function of the two node embeddings and the relation type. William L. Hamilton, McGill University and Mila 12
Preliminaries and set-up Learning an encoder function to map nodes to embeddings: § Using these embeddings to “score” the likelihood of a § relationship between nodes: Goal: Train the embeddings (with a subset of the true edges) so that the score for all real edges is larger than all non-edges. William L. Hamilton, McGill University and Mila 13
Preliminaries and set-up Generic loss function: § X L edge ( s ( e ) , s ( e − 1 ) , ..., s ( e − m )) e ∈ E train Scores assigned to Task-specific loss random negative function Sum over (batch sample edges. of) training edges. Score assigned to positive/real edge. William L. Hamilton, McGill University and Mila 14
Preliminaries and set-up: Concrete examples Score functions: § Loss-functions: § William L. Hamilton, McGill University and Mila 15
Preliminaries and set-up: Concrete examples Score functions: § s ( e ) = s ( h z u , r, z v i ) = z > Dot-product: § u z v Loss-functions: § William L. Hamilton, McGill University and Mila 16
Preliminaries and set-up: Concrete examples Score functions: § s ( e ) = s ( h z u , r, z v i ) = z > Dot-product: § u z v TransE: § s ( e ) = s ( h z u , r, z v i ) = �k z u + r � z v k 2 2 Loss-functions: § William L. Hamilton, McGill University and Mila 17
Preliminaries and set-up: Concrete examples Score functions: § s ( e ) = s ( h z u , r, z v i ) = z > Dot-product: § u z v TransE: § s ( e ) = s ( h z u , r, z v i ) = �k z u + r � z v k 2 2 Loss-functions: § m Max-margin: X § L edge ( s ( e ) , s ( e − 1 ) , ..., s ( e − m )) = max(1 − s ( e ) + s ( e − i ) , 0) i =1 William L. Hamilton, McGill University and Mila 18
Preliminaries and set-up: Concrete examples Score functions: § s ( e ) = s ( h z u , r, z v i ) = z > Dot-product: § u z v TransE: § s ( e ) = s ( h z u , r, z v i ) = �k z u + r � z v k 2 2 Loss-functions: § m Max-margin: X § L edge ( s ( e ) , s ( e − 1 ) , ..., s ( e − m )) = max(1 − s ( e ) + s ( e − i ) , 0) i =1 m Cross-entropy: § X L edge ( s ( e ) , s ( e − 1 ) , ..., s ( e − m )) = − log( σ ( s ( e )) − log(1 − σ ( s ( e − i )) i =1 William L. Hamilton, McGill University and Mila 19
Formalizing fairness How do we ensure fairness in this context? § William L. Hamilton, McGill University and Mila 20
Formalizing fairness How do we ensure fairness in this context? § So Solution: re repre resentational invariance § Want embeddings to be independent from the attributes: § Which is equivalent to minimizing the mutual information to § between the embeddings and the attributes: William L. Hamilton, McGill University and Mila 21
Enforcing fairness through an adversary William L. Hamilton, McGill University and Mila 22
Enforcing fairness through an adversary Key ey co componen ent 1: Composi sitional al en enco coder er. § Given a set of attributes, it outputs “filtered” embeddings § that should be invariant to those attributes. Trainable filter function (neural Input: node ID and network) outputs embedding Sum over all set of sensitive that is invariant to attribute k . sensitive attributes attributes William L. Hamilton, McGill University and Mila 23
Enforcing fairness through an adversary Key y comp mponent 2: Ad Adve versarial discrimi minators § For each sensitive attribute, train an adversarial discriminator that § tries to predict that sensitive attribute from the filtered embeddings: Ou Outpu put: Likelihood that node u has that attribute value. Discriminator In Input: : Filtered for sensitive embeddding for node u attribute k. and attribute value. William L. Hamilton, McGill University and Mila 24
Enforcing fairness through an adversary Pu Putting it all together in an adversarial loss: § Original loss function for the edge prediction task Constant that determines the Likelihood of discriminator predicting the strength of the fairness sensitive attributes. constraints William L. Hamilton, McGill University and Mila 25
Enforcing fairness through an adversary Pu Putting i g it a all t toge ogether i in a an a adversarial l los oss: § During training the encoder tries to minimize this loss and the § adversarial discriminators are trained to maximize it. William L. Hamilton, McGill University and Mila 26
Enforcing fairness through an adversary William L. Hamilton, McGill University and Mila 28
Dataset 1: MovieLens-1M Classic recommender system benchmark. § Bipartite graph between users and movies. § ): Users and movies No Node des (~1 (~10,000): § Edges (~1,000,000): Rating a user gives a movie Ed § Se Sensitive ve attributes : § Gender § Age (binned to become a categorical attribute) § Occupation § William L. Hamilton, McGill University and Mila 30
Dataset 2: Reddit Derived from public Reddit comments. § Bipartite graph between users and communities. § ): Users and communities Node No des (~3 (~300,000): § Edges (~7,000,000): Whether a user commented on that Ed § community Se Sensitive ve attributes : Randomly select 50 communities to be § “sensitive” communities William L. Hamilton, McGill University and Mila 31
Dataset 3: Freebase 15k-237 Derived from classic knowledge base completion § benchmark. Knowledge graph between set of typed entities. § ): Users and communities No Node des (~1 (~15,000): § Edges (~150,000): 237 different relation types (e.g., Ed § married_to, born_in, capital_of, director_of) ve attributes : Randomly selected 3 entity type Se Sensitive § annotations (e.g., is_actor) to be “sensitive attributes” William L. Hamilton, McGill University and Mila 32
Experiments: Three questions 1. What is the cost of invariance? 2. What is the impact of compositionality? 3. Can we generalize to unseen combinations of attributes? William L. Hamilton, McGill University and Mila 33
MovieLens: Fairness results How strongly can we enforce fairness? § Compare three approaches to enforcing fairness: § No adversary (i.e., just train on the recommendation task) § Independent adversarial model for each attribute § Full compositional model § William L. Hamilton, McGill University and Mila 34
Recommend
More recommend