classiefier using machine learning to paint a picture of
play

CLASSIEFIER: USING MACHINE LEARNING TO PAINT A PICTURE OF SOCIAL - PowerPoint PPT Presentation

Dr. Paola Oliva-Altamirano, Innovation Lab, Our Community, May 2019 CLASSIEFIER: USING MACHINE LEARNING TO PAINT A PICTURE OF SOCIAL TRENDS A foreigner Who am I? From Honduras to the US to Australia From Galaxies to Taxonomies Dr. Paola


  1. Dr. Paola Oliva-Altamirano, Innovation Lab, Our Community, May 2019 CLASSIEFIER: USING MACHINE LEARNING TO PAINT A PICTURE OF SOCIAL TRENDS

  2. A foreigner Who am I? From Honduras to the US to Australia From Galaxies to Taxonomies • Dr. Paola Oliva-Altamirano, Innovation Lab, Our Community, May 2019 Our Community - Innovation Lab 2

  3. Outline: Introducing Our community’s data initiatives • Background: CLASSIE a social dictionary • How did we scope CLASSIEfier ? • How did CLASSIEfier evolve as a project? • • Data science for social good concept Results and conclusions • Our Community - Innovation Lab 3

  4. Is a social enterprise and B Corp that provides advice, connections, training and easy-to-use tech tools for community-builders. Training and networking Grants database Donation Platform Software for grants applications Our Community - Innovation Lab 4

  5. Our Community - Innovation Lab 5

  6. From CLASSIE to CLASSIEfier

  7. Main objective – Classification of grants In 2016, OC introduced Australia lacked a CLASSIE unified taxonomy to CLASSIE opens the door The classification classify subjects, to standard system for Australian beneficiaries and classification social sector initiatives organization types and entities Our Community - Innovation Lab 7

  8. • Subjects CLASSIE Populations • A social sector dictionary • Organisation type Where is the money going? and How is the Australian social sector working? Our Community - Innovation Lab 8

  9. Hierarchical Classification – e.g. Subjects Level 1 Sport and Social Sciences recreation 17 categories Interdisciplinary Community Level 2 Anthropology Sport studies recreation 132 categories Biological Archeology Camps Ethnic studies Asian studies Parks Outdoor sport Paralympics Level 3 anthropology 492 categories Indigenous Mountain and Hiking and Level 4 studies rock climbing walking 243 categories Our Community - Innovation Lab 9

  10. Now we have the dictionary – How do we apply it? How do we ensure that users are • Questions choosing the correct category ? How do we classify historical data ? • 800,000 grant applications since 2010 Our Community - Innovation Lab 10

  11. CLASSIEfier is a tool that will automatically classify grants Our Community - Innovation Lab 11

  12. How did we scope CLASSIEfier?

  13. Source: “One model to rule them all” by Christoph Molnar

  14. CLASSIEfier – Two different models To give automatic suggestions to grant applicants 1. To classify historical data 2. Seems like you are applying for: q Sports and recreation q Art and culture q Community and development Our Community - Innovation Lab 14

  15. CLASSIEfier: How does it work? 15 Our Community - Innovation Lab

  16. How did CLASSIEfier evolve?

  17. CLASSIEfier – The Algorithm What do we have? 800,000 4,000 grant applications grant applications labeled by users since CLASSIE went live How do we generate more labels? At least 2000 applications per category Our Community - Innovation Lab 17

  18. CLASSIEfier – The Algorithm Keyword matching = the process of searching for ‘Literal’ First phase: matches (e.g. “hospital”) in a given piece of text (e.g. a grant description) to identify groups or subjects (e.g. health sector). a simple keyword matching to Example: extract more labels This project will raise awareness and empower deaf deaf people by providing key mental health information in their primary language (Australian Sign Language Sign Language ). People with hearing impediment People with hearing impediment . Stages: For example: • Identify keywords for CLASSIE “orphans” is a confusing category. • Extract applications that exhibit a strong match “wildlife welfare” is a straight forward • Score the classification done by Users category We found that: • Keyword matching accuracy differs from one category to another. • On average is around 80% Our Community - Innovation Lab 18

  19. CLASSIEfier – The Algorithm Training dataset: Second phase: 128,000 Training the Machine Learning model grant applications Classified by keyword matching DIFFICULTY #1: Multilabel DIFFICULTY #2: Hierarchy DIFFICULTY #3: Number of labels per category Our Community - Innovation Lab 19

  20. Example: A grant application that is aimed at helping teenagers teenagers with autism autism . Multilabels and Hierarchy Beneficiaries: • “Children and youth” at level 1 • “Adolescents” at level 2 And also, • “People with disabilities” at level 1 • “People with intellectual disabilities” at level 2 Our Community - Innovation Lab 20

  21. DIFFICULTY #3: Number of labels per category Categories such as Confucius, North American people , Nomadic • people among others will have less than 100 grant applications. 20X less Than the 2000 minimum required Niche classification or “ black holes ” Our Community - Innovation Lab 21

  22. How do we solve it? – Separate training Reads the application Classification Level 1 – Machine learning Information and Sports and recreation communications Classification Level 2: Classification Level 2: We have enough we do not have enough labels we use another labels we use ML model keyword matching Classification Level 3: Classification Level 3: Keyword matching Keyword matching Our Community - Innovation Lab 22

  23. CLASSIEfier – The Algorithm Third phase: Model interpretation: scoring and checking for biases Stages: Choose the best model – k- nearest neighbours (k-nn) • Choose the best parameters • Choose the best scoring • Our Community - Innovation Lab 23

  24. Scoring Recall: !" !"#$% &'(&)*+&,' ,- .&//&'0 1*213/ Precision: !" !"#$" &'(&)*+&,' ,- 2*( 453(&)+&,'/ Our Community - Innovation Lab 24

  25. Scoring Based on the fact that each application has several categories Recall: How many categories got picked per application 0 None 1 <45% 2 >45% 3 Perfect match Precision: How many categories are wrong per application 0 All 1 >55% 2 <55% 3 None – Perfect match CLASSIEfier ~4-5 0 6 Useless Model Perfect Model!! Our Community - Innovation Lab 25

  26. Misclassifications and black holes will cause to underfund minorities that are already overlooked Our Community - Innovation Lab 26

  27. “ The best minds of my generation are thinking about how to make people click ads ,” he says. “That sucks.” -- Jeff Hammerbacher (Cloudera and Facebook data leader) The Data Science for Social Good Movement

  28. Algorithmic bias • This will happen if you feed in the algorithm with data that is already biased or with insufficient data - The algorithm will predict biased classifications. Algorithms are mirrors • Sport people Our Community - Innovation Lab 28

  29. Know your Model! xkdc.com/1838/ Our Community - Innovation Lab 29

  30. SHAP (SHapley Additive exPlanations) AI Fairness 360 WEAT tests proposed in Caliskan et al. 2017 Our Community - Innovation Lab 30

  31. Document everything! – this is how we tackle biases Choose transparency Our Community - Innovation Lab 31

  32. Results and conclusions Church Religion Model = Religion Christian Reality – A fete in a Catholic school It is not feasible to classify human natural languages with 100% accuracy Our Community - Innovation Lab 32

  33. Results and conclusions Out 200 applications classified by Users we found that: Church Religion Christian 63% 18% 19% right wrong Half right CLASSIEfier works similar to humans , not better not worse. ~ 70-80% accuracy • Our Community - Innovation Lab 33

  34. Results and conclusions Church Religion Christian Approved Declined Grant applications Grant applications 85% accuracy 75% accuracy The model is also discriminating between good and bad applications • Our Community - Innovation Lab 34

  35. Results and conclusions Church Seems like you are applying Religion for: Christian q Sports and recreation q Art and culture q Community and development CLASSIEfier is now feeding back into CLASSIE Our Community - Innovation Lab 35

  36. CLASSIEfier – More than just an algorithm Writing and testing the Production – back and front Data preprocessing algorithm end product Maintenance Our Community - Innovation Lab 36

  37. Linkedin: paola-oliva-altamirano Email: paolao@ourcommunity.com.au Innovation lab: https://www.ourcommunity.com.au/innovationlab DO YOU WANT TO LEARN MORE?

Recommend


More recommend