Integrating multiple data sources in species distribution modeling: A framework for data fusion

Krishna Pacifici; Brian J. Reich; David A.W. Miller; Beth Gardner; Glenn E. Stauffer; Susheela Singh; Alexa McKerrow; Jaime A. Collazo

doi:10.1002/ecy.1710

Integrating multiple data sources in species distribution modeling: A framework for data fusion

Ecology

By: Krishna Pacifici, Brian J. Reich, David A.W. Miller, Beth Gardner, Glenn E. Stauffer, Susheela Singh, Alexa McKerrow, and Jaime A. Collazo

https://doi.org/10.1002/ecy.1710

Links

More information: Publisher Index Page (via DOI)
Open Access Version: Publisher Index Page
Download citation as: RIS | Dublin Core

Abstract

The last decade has seen a dramatic increase in the use of species distribution models (SDMs) to characterize patterns of species’ occurrence and abundance. Efforts to parameterize SDMs often create a tension between the quality and quantity of data available to fit models. Estimation methods that integrate both standardized and non-standardized data types offer a potential solution to the tradeoff between data quality and quantity. Recently several authors have developed approaches for jointly modeling two sources of data (one of high quality and one of lesser quality). We extend their work by allowing for explicit spatial autocorrelation in occurrence and detection error using a Multivariate Conditional Autoregressive (MVCAR) model and develop three models that share information in a less direct manner resulting in more robust performance when the auxiliary data is of lesser quality. We describe these three new approaches (“Shared,” “Correlation,” “Covariates”) for combining data sources and show their use in a case study of the Brown-headed Nuthatch in the Southeastern U.S. and through simulations. All three of the approaches which used the second data source improved out-of-sample predictions relative to a single data source (“Single”). When information in the second data source is of high quality, the Shared model performs the best, but the Correlation and Covariates model also perform well. When the information quality in the second data source is of lesser quality, the Correlation and Covariates model performed better suggesting they are robust alternatives when little is known about auxiliary data collected opportunistically or through citizen scientists. Methods that allow for both data types to be used will maximize the useful information available for estimating species distributions.

Additional publication details
Publication type	Article
Publication Subtype	Journal Article
Title	Integrating multiple data sources in species distribution modeling: A framework for data fusion
Series title	Ecology
DOI	10.1002/ecy.1710
Volume	98
Issue	3
Year Published	2017
Language	English
Publisher	Wiley
Contributing office(s)	Coop Res Unit Atlanta, Core Science Analytics and Synthesis, Core Science Analytics, Synthesis, and Libraries, GAP Analysis Project
Description	11 p.
First page	840
Last page	850
Google Analytic Metrics	Metrics page