Modeling Experts and Novices in Citizen Science Data Jun Yu, - PowerPoint PPT Presentation

Modeling Experts and Novices in Citizen Science Data Jun Yu, Weng-Keen Wong, Rebecca Hutchinson {yuju,wong,rah}@eecs.oregonstate.edu

Introduction Species Distribution Modeling important for: • Understanding species- habitat relationships • Conservation and reserve design • Predicting effects of Predicted distribution of tree swallows across climate / land use change North America (from D. Fink) Many research questions require data to be collected at broad spatial and temporal scales

Introduction Citizen science: scientific research in which volunteers from the community participate as field assistants [Cohn 2008] Pros: Cons • Inexpensive • Reliability of data • Can collect data over large spatial areas and long time periods

Introduction • One of the largest citizen science programs • Online checklist database developed by Cornell Lab of Ornithology and National Audubon Society • Birders submit checklists of birds observed (> 1.5 million checklists in Jan 2010)

Introduction Can we use eBird data for accurate SDM? • Main issue: birders have different levels of expertise Novice Expert • How reliable is the data? – Data reviewed through a verification process – But biases still exist

Methodology Labeled Training Set Birder ID: 42 Birder ID: 56 Expertise: Expert Expertise: Novice Train model Blue Heron X Blue Heron X Blue Heron X Blue Heron X House Finch √ House Finch √ Blue Heron X House Finch √ House Finch X Blue Heron X Purple Finch X Purple Finch X House Finch √ Blue Heron X Purple Finch X Purple Finch X House Finch √ Tree Sparrow √ Tree Sparrow √ Purple Finch X House Finch √ Tree Sparrow √ Tree Sparrow √ Purple Finch X . . . . . . Tree Sparrow √ Purple Finch X . . . . . . Tree Sparrow √ . . . Tree Sparrow √ . . . . . . Use model 32 experts (2532 checklists) 88 novices (2107 checklists)

Methodology Detection Environmental Occupancy Covariates Detection Covariates (Latent) o i d it X i Z i Y it W it t=1,…,T i i=1,…,N Start with Occupancy-Detection (OD) model [Mackenzie et al. 2006]

Methodology Assumptions on OD model • Site closure assumption: species occupancy status stays the same over the site visits • No false detections: can’t detect a bird if it doesn’t occupy the site

Methodology Expertise Expertise v j Covariates E j U j j=1,…,M o i d it , f it W it Z i Y it X i t=1,…,T i i=1,…,N Occupancy-Detection-Expertise (ODE) model

Methodology ODE model details • Allow for false detections. Results in four sets of parameters: – True detection and false detection parameters for experts – True detection and false detection parameters for novices • Introduces an identifiability problem – Add constraint during training • Train using Expectation-Maximization

Results 1. Want to predict occupancy (Z i ) but ground truth not available. Instead, predicting observation (Y it ) – eBird data from NY, breeding season (2006-2008) – Expertise nodes observed in training data, unobserved in test data – Evaluating spatial data is challenging: use checkerboarding – Compare with Logistic Regression and OD model

Results Average AUC on four hard ‐ to ‐ detect bird species Average AUC on four common bird species 0.80 0.80 AUC 0.70 0.70 AUC 0.60 0.60 0.50 0.50 White ‐ breasted Northern Great Blue Blue ‐ headed Northern Rough ‐ Blue Jay Brown Thrasher Wood Thrush Nuthatch Cardinal Heron Vireo winged Swallow 0.6726 0.6283 0.6831 0.6641 LR 0.6576 0.7976 0.6575 0.6579 LR OD 0.6881 0.6262 0.7073 0.6691 0.6920 0.8055 0.6609 0.6643 OD 0.7104 0.6600 0.7085 0.6959 ODE 0.6954 0.8325 0.6872 0.6903 ODE

Results 2. Predict Expertise (E j ) of birder given checklist history – Site occupancy (Z i ) is unobserved in both training and testing – Two-fold cross-validation on birders – Repeat 20 times and report average AUC – Compare against Logistic Regression

Results Average AUC on four hard ‐ to ‐ detect bird species Average AUC on four common bird species 0.85 0.85 0.80 0.80 AUC 0.75 AUC 0.75 0.70 0.70 0.65 0.65 Blue ‐ headed Northern Rough ‐ White ‐ breasted Northern Great Blue Brown Thrasher Wood Thrush Blue Jay Vireo winged Swallow Nuthatch Cardinal Heron 0.7265 0.7249 0.7352 0.7472 LR 0.7523 0.7869 0.7792 0.7675 LR ODE 0.7417 0.7212 0.7442 0.7661 0.7761 0.7981 0.8052 0.7937 ODE

Results 3. Discovering differences between experts and novices Common birds Hard-to-detect birds

Future work • Discover sources of novice bias • Improve accuracy of species distribution models by adjusting for this novice bias • Incorporate tree-models in occupancy and detection components • Semi-supervised version of ODE model

Acknowledgements • Cornell Lab of Ornithology: – Marshall Iliff – Brian Sullivan – Chris Wood – Steve Kelling • This project supported by NSF grant CCF 0832804

Modeling Experts and Novices in Citizen Science Data Jun Yu, - PowerPoint PPT Presentation

Modeling Experts and Novices in Citizen Science Data Jun Yu, Weng-Keen Wong, Rebecca Hutchinson {yuju,wong,rah}@eecs.oregonstate.edu Introduction Species Distribution Modeling important for: Understanding species- habitat relationships

Welcome to this session on experts, novices, development, learning and memory. As novices gain

Citizen Science Certification San Juan Bay Estuary Program Citizen Science Citizen science

CITIZEN PARTICIPATION DISASTER WAIVER REQUIREMENTS 1 CITIZEN PARTICIPATION CDBG CITIZEN

CITIZEN PARTICIPATION DISASTER WAIVER REQUIREMENTS CITIZEN PARTICIPATION CDBG CITIZEN

JAVASCRIPT IS COMING TO EAT YOU Citizen Tim Electric Citizen | June 2019 JAVASCRIPT IS COMING

Public consultation EXPERTS WIPO ADR PRESENTATION AND CURRENT STATE OF THE EXPERTS WIPO ADR

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person

Ways to Make Citizen Science Projects more Collaborative, and Ultimately the Data more Reliable

Enhancing the Quality and Trust of Citizen Science Data Abdul Alabri eResearch Lab School of

THE CITIZEN PORTAL THE NEXT GENERATION OF UTAH.GOV PRESENTED BY UTAH INTERACTIVE AGENDA WHAT

Creating Data-driven Feedback for Novices in Goal-driven Programming Projects Thomas Price

NYU at Cold Start 2015: Experiments on KBC with NLP Novices Yifan He Ralph Grishman

how novices model business processes Jan Recker | Niz Safrudin | Michael Rosemann Business

Community Mapping Creating the Evidence Base Citizen Science & Participatory Mapping DR

How can the ALA help BIGnet? Citizen Science at work Piers Higgs Citizen Science Team Lead

Sharing Good Practice Taking Citizen Science Outdoors to support your teaching Thursday 16 th

Learning frameworks Self-supervised learning: (Auto)encoder networks Supervised learning Network

TIME TO GET HIP October 27, 2016 Obstetrics & Gynecology Update: What Does The

Disclosures United Therapeutics funding Not approved by the FDA for use in children

DVT Assessment Module Seamless referral and GP communication Improved clinical

Basic e-mail forensics John R. Levine & Neil Schwartzman Underground Economy#13 September

Logic and Knowledge Representation K n o w l e d g e r e p r e s e n t a t i

Statistical Classification with Fisher Zantedeschi Introduction Kernel Topic Models LDA PLSM

Scripting Apache OpenOffice Introductory Nutshell Programs (Writer, Calc, Impress) Rony G.

Sambuz

Useful Links

Newsletter

Mail Us

Modeling Experts and Novices in Citizen Science Data Jun Yu, - PowerPoint PPT Presentation

Modeling Experts and Novices in Citizen Science Data Jun Yu, Weng-Keen Wong, Rebecca Hutchinson {yuju,wong,rah}@eecs.oregonstate.edu Introduction Species Distribution Modeling important for: Understanding species- habitat relationships

Welcome to this session on experts, novices, development, learning and memory. As novices gain

Citizen Science Certification San Juan Bay Estuary Program Citizen Science Citizen science

CITIZEN PARTICIPATION DISASTER WAIVER REQUIREMENTS 1 CITIZEN PARTICIPATION CDBG CITIZEN

CITIZEN PARTICIPATION DISASTER WAIVER REQUIREMENTS CITIZEN PARTICIPATION CDBG CITIZEN

JAVASCRIPT IS COMING TO EAT YOU Citizen Tim Electric Citizen | June 2019 JAVASCRIPT IS COMING

Public consultation EXPERTS WIPO ADR PRESENTATION AND CURRENT STATE OF THE EXPERTS WIPO ADR

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person

Ways to Make Citizen Science Projects more Collaborative, and Ultimately the Data more Reliable

Enhancing the Quality and Trust of Citizen Science Data Abdul Alabri eResearch Lab School of

THE CITIZEN PORTAL THE NEXT GENERATION OF UTAH.GOV PRESENTED BY UTAH INTERACTIVE AGENDA WHAT

Creating Data-driven Feedback for Novices in Goal-driven Programming Projects Thomas Price

NYU at Cold Start 2015: Experiments on KBC with NLP Novices Yifan He Ralph Grishman

how novices model business processes Jan Recker | Niz Safrudin | Michael Rosemann Business

Community Mapping Creating the Evidence Base Citizen Science &amp; Participatory Mapping DR

How can the ALA help BIGnet? Citizen Science at work Piers Higgs Citizen Science Team Lead

Sharing Good Practice Taking Citizen Science Outdoors to support your teaching Thursday 16 th

Learning frameworks Self-supervised learning: (Auto)encoder networks Supervised learning Network

TIME TO GET HIP October 27, 2016 Obstetrics &amp; Gynecology Update: What Does The

Disclosures United Therapeutics funding Not approved by the FDA for use in children

DVT Assessment Module Seamless referral and GP communication Improved clinical

Basic e-mail forensics John R. Levine &amp; Neil Schwartzman Underground Economy#13 September

Logic and Knowledge Representation K n o w l e d g e r e p r e s e n t a t i

Statistical Classification with Fisher Zantedeschi Introduction Kernel Topic Models LDA PLSM

Scripting Apache OpenOffice Introductory Nutshell Programs (Writer, Calc, Impress) Rony G.

Sambuz

Useful Links

Newsletter

Mail Us

Community Mapping Creating the Evidence Base Citizen Science & Participatory Mapping DR

TIME TO GET HIP October 27, 2016 Obstetrics & Gynecology Update: What Does The

Basic e-mail forensics John R. Levine & Neil Schwartzman Underground Economy#13 September