Biological monitoring programmes are increasingly relying upon large volumes of citizen-science data to improve the scope and spatial coverage of information, challenging the scientific community to develop design and model-based approaches to improve inference.
Recent statistical models in ecology have been developed to accommodate false-negative errors, although current work points to false-positive errors as equally important sources of bias. This is of particular concern for the success of any monitoring programme given that rates as small as 3% could lead to the overestimation of the occurrence of rare events by as much as 50%, and even small false-positive rates can severely bias estimates of occurrence dynamics.
We present an integrated, computationally efficient Bayesian hierarchical model to correct for false-positive and false-negative errors in detection/non-detection data. Our model combines independent, auxiliary data sources with field observations to improve the estimation of false-positive rates, when a subset of field observations cannot be validated a posteriori or assumed as perfect. We evaluated the performance of the model across a range of occurrence rates, false-positive and false-negative errors, and quantity of auxiliary data.
The model performed well under all simulated scenarios, and we were able to identify critical auxiliary data characteristics which resulted in improved inference. We applied our false-positive model to a large-scale, citizen-science monitoring programme for anurans in the north-eastern United States, using auxiliary data from an experiment designed to estimate false-positive error rates. Not correcting for false-positive rates resulted in biased estimates of occupancy in 4 of the 10 anuran species we analysed, leading to an overestimation of the average number of occupied survey routes by as much as 70%.
The framework we present for data collection and analysis is able to efficiently provide reliable inference for occurrence patterns using data from a citizen-science monitoring programme. However, our approach is applicable to data generated by any type of research and monitoring programme, independent of skill level or scale, when effort is placed on obtaining auxiliary information on false-positive rates.
Additional publication details
Uncertainty in biological monitoring: a framework for data collection and analysis to account for multiple sources of sampling bias