What can be done to remove biases in volunteer-gathered biological records? Nick Isaac Arco van Strien, Tom August, Gary Powney & David Roy
Talk Outline • The Problem 120 • Solutions? 100 • Testing the Solutions Number of sites 80 60 • The Way Ahead 40 • Tools 20 • Applications 0 1970 1980 1990 2000 2010 Year
Problem: ad hoc recording is biased
Problem: ad hoc recording is biased • in time 1000000 Butterflies Number of records 100000 Bryophyte • in space Orthoptera 10000 Myriapod 1000 Isopods • detectability 100 Coleoptera Moths 10 Bees • effort per visit 1 Wasps 1970 1980 1990 2000 2010 Number of Species Effort
Most lists are incomplete For most groups, ~50% of ‘lists’ are single species For many groups, the prevalence of short lists varies systematically over time 100% 90% 80% 70% Proportion of all visits 60% > 3 species 50% 3 species 40% 2 species Single species 30% 20% 10% 0%
Solutions? • Aggregation • Data Selection methods • Correction for sampling effort • Modelling the data collection process
Aggregation into Atlas periods
Selection methods • Remove the bias, leave the signal • The ‘well - sampled set’ • Threshold number of species • Threshold number of years • Untested assumption • Loss of power? Well-sampled sites for Dragonflies
Correction for sampling effort • by time period • Telfer’s Change index • per year • Ball’s ‘Reporting Rate’ method • per visit • Szabo’s ‘List Length’ method • in space (per grid cell or neighbourhood) • Hill’s ‘ Frescalo ’ method
Correction: Hill’s Frescalo method Frescalo estimates the recording intensity of each grid cell Red = under-recorded White = well-recorded Hill, MO (2011). Local frequency as a key to interpreting species occurrence data when recording effort is not known. Methods in Ecology and Evolution, 3(1), 195 – 205.
Hill’s Frescalo method Frescalo estimates which species ‘should’ be in each grid cell if well-sampled Trends can be modelled as changes in ‘relative recording rate’
Occupancy: modelling data collection Occupancy (unobserved) Extant Extinct Data generation process Observations Year 1 Year 2 Year 3 Year 4 Year 5 Separation of “state” and “data generation” processes into separate submodels permits (annual) estimation of occupancy and detection
Testing the solutions by simulation • Generate records resembling NBN-type datasets 1000 sites, 25 species, 10 years • • Realistic scenarios of recorder behaviour Parameterized from UK and Dutch datasets • • Formally compare methods for estimating trends Type I error rate when no trend exists • Power to detect genuine trend •
Simulation results: Type I error rates Isaac et al (in review) Methods in Ecology & Evolution
Simulation results • Simple ‘correction’ models fail easily • Frescalo performs well but subjective to apply • Selection methods are robust but less powerful • Occupancy most promising overall • Least often wrong • Most powerful overall • … but a problem with spatially -biased sampling Isaac et al (in review) Methods in Ecology & Evolution
The Way Ahead • Occupancy + site-selection criterion? • P detect ≈ List Length, Julian Date, Previously Recorded, …….} • If we knew more about the bias, we could model it • A little bit of meta data would go a long way • Visit-based records are crucial
Tools -An easy way to record -Great potential for harvesting meta-data
https://github.com/BiologicalRecordsCentre http://bit.ly/18wTrrK
Applications • Identifying drivers of change in native ladybirds • Overview of trends in UK biodiversity • Developing a biodiversity indicator
Identifying drivers of change Declines in native ladybirds are attributable to the arrival of the invasive Harlequin ladybird davidkennardphotography.com Similar patterns across 8 native species in both GB & Belgium Mike Majerus Roy et al (2012) Diversity & Distributions, 18: 717 – 725
Trends in British Biodiversity 1990-2000 • Good news: Median change +2.4% • Bad news: >1000 species would qualify as VU or worse
The Priority Species Indicator 120 United Kingdom 100 90 100 Source: Biodiversity in Your 80 Pocket 2013 95% Confidence interval max Percentage of species 70 Index (1970 = 100) 80 60 50 60 40 40 30 95% Confidence interval min 20 20 10 0 Long term Short term 0 1970 1975 1980 1985 1990 1995 2000 2005 2010 Decline Increase 120 United Kingdom 100 90 100 80 95% Confidence interval max Percentage of species 70 Index (1970 = 100) 80 60 50 60 40 40 30 95% Confidence interval min 20 20 10 0 0 Long term 1970 1975 1980 1985 1990 1995 2000 2005 2010 Decline Increase
Conclusions We shouldn’t remove the bias but model it Occupancy models are especially promising A little bit of meta-data would go a long way = a vast untapped resource
Simulated patterns of recording 1. Even recording: random sampling 2. Doubling intensity: number of visits doubles 3. Doubling with biased sampling wrt focal sites 4. Incomplete recording (growth in short lists) 5. Detection increasing: focal species becomes more detectable 6. Non-focal declines
The ‘well - sampled sites’ model Assumptions/Caveats: • Groups are recorded collectively, as an assemblage • Effort per visit has not changed over time • Detectability per visit is constant over time • Well-sampled sites are representative Robust to: • Changes in effort over time • Change in spatial pattern of recording • Changes in community composition • Temporally & spatially precise • Can easily add covariates Well-sampled sites for Dragonflies
Recommend
More recommend