WIKIGENDER: A MACHINE LEARNING MODEL TO DETECT GENDER BIAS IN WIKIPEDIA Natalie Bolón, Natàlia Gullón, Sofia Kypraiou, Irene Petlacalco
WIKIGENDER: METHODOLOGY ● Dataset Overviews from Wikipedia biographies: only 17% of them refer to women ● Encoding “Ada was an English mathematician and writer” (0, 0, 1, 0, 0, …, 0, 1) Noun Adjective word n ∉ overview Stop word word k ∈ overview ● Balancing dataset by occupation << for each occupation we use the same number of male and female entries ● Model Binary target variable 0 1 BALANCED PREDICTION & DATASET Train/Test partition FEATURE EXTRACTION Logistic Regression https://wiki-gender.github.io
WIKIGENDER: RESULTS Bias in Adjectives Bias in Nouns + Adjectives Accuracy : 54.6±0.001% Accuracy : 62.9±0.002% Top 5 most predictive Top 5 most predictive adjectives words women men women men person football beautiful offensive marriage musician profit certain model officer cross hard creative defensive dancer war romantic diplomatic midfielder footballer Positive and Negative Family Career strongly and weakly subjective subjective https://wiki-gender.github.io
Recommend
More recommend