CLASSIEFIER: USING MACHINE LEARNING TO PAINT A PICTURE OF SOCIAL - PowerPoint PPT Presentation

Dr. Paola Oliva-Altamirano, Innovation Lab, Our Community, May 2019 CLASSIEFIER: USING MACHINE LEARNING TO PAINT A PICTURE OF SOCIAL TRENDS

A foreigner Who am I? From Honduras to the US to Australia From Galaxies to Taxonomies • Dr. Paola Oliva-Altamirano, Innovation Lab, Our Community, May 2019 Our Community - Innovation Lab 2

Outline: Introducing Our community’s data initiatives • Background: CLASSIE a social dictionary • How did we scope CLASSIEfier ? • How did CLASSIEfier evolve as a project? • • Data science for social good concept Results and conclusions • Our Community - Innovation Lab 3

Is a social enterprise and B Corp that provides advice, connections, training and easy-to-use tech tools for community-builders. Training and networking Grants database Donation Platform Software for grants applications Our Community - Innovation Lab 4

Our Community - Innovation Lab 5

From CLASSIE to CLASSIEfier

Main objective – Classification of grants In 2016, OC introduced Australia lacked a CLASSIE unified taxonomy to CLASSIE opens the door The classification classify subjects, to standard system for Australian beneficiaries and classification social sector initiatives organization types and entities Our Community - Innovation Lab 7

• Subjects CLASSIE Populations • A social sector dictionary • Organisation type Where is the money going? and How is the Australian social sector working? Our Community - Innovation Lab 8

Hierarchical Classification – e.g. Subjects Level 1 Sport and Social Sciences recreation 17 categories Interdisciplinary Community Level 2 Anthropology Sport studies recreation 132 categories Biological Archeology Camps Ethnic studies Asian studies Parks Outdoor sport Paralympics Level 3 anthropology 492 categories Indigenous Mountain and Hiking and Level 4 studies rock climbing walking 243 categories Our Community - Innovation Lab 9

Now we have the dictionary – How do we apply it? How do we ensure that users are • Questions choosing the correct category ? How do we classify historical data ? • 800,000 grant applications since 2010 Our Community - Innovation Lab 10

CLASSIEfier is a tool that will automatically classify grants Our Community - Innovation Lab 11

How did we scope CLASSIEfier?

Source: “One model to rule them all” by Christoph Molnar

CLASSIEfier – Two different models To give automatic suggestions to grant applicants 1. To classify historical data 2. Seems like you are applying for: q Sports and recreation q Art and culture q Community and development Our Community - Innovation Lab 14

CLASSIEfier: How does it work? 15 Our Community - Innovation Lab

How did CLASSIEfier evolve?

CLASSIEfier – The Algorithm What do we have? 800,000 4,000 grant applications grant applications labeled by users since CLASSIE went live How do we generate more labels? At least 2000 applications per category Our Community - Innovation Lab 17

CLASSIEfier – The Algorithm Keyword matching = the process of searching for ‘Literal’ First phase: matches (e.g. “hospital”) in a given piece of text (e.g. a grant description) to identify groups or subjects (e.g. health sector). a simple keyword matching to Example: extract more labels This project will raise awareness and empower deaf deaf people by providing key mental health information in their primary language (Australian Sign Language Sign Language ). People with hearing impediment People with hearing impediment . Stages: For example: • Identify keywords for CLASSIE “orphans” is a confusing category. • Extract applications that exhibit a strong match “wildlife welfare” is a straight forward • Score the classification done by Users category We found that: • Keyword matching accuracy differs from one category to another. • On average is around 80% Our Community - Innovation Lab 18

CLASSIEfier – The Algorithm Training dataset: Second phase: 128,000 Training the Machine Learning model grant applications Classified by keyword matching DIFFICULTY #1: Multilabel DIFFICULTY #2: Hierarchy DIFFICULTY #3: Number of labels per category Our Community - Innovation Lab 19

Example: A grant application that is aimed at helping teenagers teenagers with autism autism . Multilabels and Hierarchy Beneficiaries: • “Children and youth” at level 1 • “Adolescents” at level 2 And also, • “People with disabilities” at level 1 • “People with intellectual disabilities” at level 2 Our Community - Innovation Lab 20

DIFFICULTY #3: Number of labels per category Categories such as Confucius, North American people , Nomadic • people among others will have less than 100 grant applications. 20X less Than the 2000 minimum required Niche classification or “ black holes ” Our Community - Innovation Lab 21

How do we solve it? – Separate training Reads the application Classification Level 1 – Machine learning Information and Sports and recreation communications Classification Level 2: Classification Level 2: We have enough we do not have enough labels we use another labels we use ML model keyword matching Classification Level 3: Classification Level 3: Keyword matching Keyword matching Our Community - Innovation Lab 22

CLASSIEfier – The Algorithm Third phase: Model interpretation: scoring and checking for biases Stages: Choose the best model – k- nearest neighbours (k-nn) • Choose the best parameters • Choose the best scoring • Our Community - Innovation Lab 23

Scoring Recall: !" !"#$% &'(&)*+&,' ,- .&//&'0 1*213/ Precision: !" !"#$" &'(&)*+&,' ,- 2*( 453(&)+&,'/ Our Community - Innovation Lab 24

Scoring Based on the fact that each application has several categories Recall: How many categories got picked per application 0 None 1 <45% 2 >45% 3 Perfect match Precision: How many categories are wrong per application 0 All 1 >55% 2 <55% 3 None – Perfect match CLASSIEfier ~4-5 0 6 Useless Model Perfect Model!! Our Community - Innovation Lab 25

Misclassifications and black holes will cause to underfund minorities that are already overlooked Our Community - Innovation Lab 26

“ The best minds of my generation are thinking about how to make people click ads ,” he says. “That sucks.” -- Jeff Hammerbacher (Cloudera and Facebook data leader) The Data Science for Social Good Movement

Algorithmic bias • This will happen if you feed in the algorithm with data that is already biased or with insufficient data - The algorithm will predict biased classifications. Algorithms are mirrors • Sport people Our Community - Innovation Lab 28

Know your Model! xkdc.com/1838/ Our Community - Innovation Lab 29

SHAP (SHapley Additive exPlanations) AI Fairness 360 WEAT tests proposed in Caliskan et al. 2017 Our Community - Innovation Lab 30

Document everything! – this is how we tackle biases Choose transparency Our Community - Innovation Lab 31

Results and conclusions Church Religion Model = Religion Christian Reality – A fete in a Catholic school It is not feasible to classify human natural languages with 100% accuracy Our Community - Innovation Lab 32

Results and conclusions Out 200 applications classified by Users we found that: Church Religion Christian 63% 18% 19% right wrong Half right CLASSIEfier works similar to humans , not better not worse. ~ 70-80% accuracy • Our Community - Innovation Lab 33

Results and conclusions Church Religion Christian Approved Declined Grant applications Grant applications 85% accuracy 75% accuracy The model is also discriminating between good and bad applications • Our Community - Innovation Lab 34

Results and conclusions Church Seems like you are applying Religion for: Christian q Sports and recreation q Art and culture q Community and development CLASSIEfier is now feeding back into CLASSIE Our Community - Innovation Lab 35

CLASSIEfier – More than just an algorithm Writing and testing the Production – back and front Data preprocessing algorithm end product Maintenance Our Community - Innovation Lab 36

Linkedin: paola-oliva-altamirano Email: paolao@ourcommunity.com.au Innovation lab: https://www.ourcommunity.com.au/innovationlab DO YOU WANT TO LEARN MORE?

CLASSIEFIER: USING MACHINE LEARNING TO PAINT A PICTURE OF SOCIAL - PowerPoint PPT Presentation

Dr. Paola Oliva-Altamirano, Innovation Lab, Our Community, May 2019 CLASSIEFIER: USING MACHINE LEARNING TO PAINT A PICTURE OF SOCIAL TRENDS A foreigner Who am I? From Honduras to the US to Australia From Galaxies to Taxonomies Dr. Paola

PAC Learning Matt Gormley Lecture 14 Oct. 17, 2018 1 ML Big Picture Learning Paradigms:

Linear Models Machine Learning 1 Checkpoint: The bigger picture Supervised learning:

Online Learning Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim Blum and

PAC Learning + Midterm Review Matt Gormley Lecture 15 March 7, 2018 1 ML Big Picture

Support Vector Machines Machine Learning 1 Big picture Linear models 2 Big picture Linear

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Big Picture Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University March 2

Peter Thorpe Peter Thorpe We would like you to be able to paint a space themed picture in the

An Exercise in An Exercise in Machine Learning Machine Learning

Computational Learning Theory: The Theory of Generalization Machine Learning 1 Slides based on

Machine Learning By Alex Scarlatos What is Machine Learning? Machine Learning is the process by

Machine Learning: Study of algorithms that improve their performance P at some task T

Traditional Machine Learning: Unsupervised Learning Juhan Nam Traditional Machine Learning

CS 335 Machine Learning What is Machine Learning? Dan Sheldon Spring 2019 What is Machine

Machine Learning Machine Learning: algorithms that use experience to improve their

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

1 Why Study Machine Learning? Why Study Machine Learning? Cognitive Science The Time is Ripe

MACHINE LEARNING, STATISTICAL LEARNING AND PARALLEL COMPUTING INTRODUCTION VS MACHINE LEARNING

Apache PredictionIO End-to-End Machine Learning Server with Apache Spark What is Machine

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

Deep Learning: Intro Juhan Nam Review of Traditional Machine Learning The traditional machine

Machine Learning for Auto Optimization What is Machine Learning? Definition: Machine

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

BBM406 Fundamentals of Machine Learning Lecture 2: Machine Learning by Examples, Nearest