Small values in big data: The continuing need for appropriate metadata

Craig A. Stow; Katherine E. Webster; Tyler Wagner; Noah R. Lottig; Patricia A. Soranno; YoonKyung Cha

doi:10.1016/j.ecoinf.2018.03.002

Small values in big data: The continuing need for appropriate metadata

Ecological Informatics

By: Craig A. Stow, Katherine E. Webster, Tyler Wagner, Noah R. Lottig, Patricia A. Soranno, and YoonKyung Cha

https://doi.org/10.1016/j.ecoinf.2018.03.002

Links

More information: Publisher Index Page (via DOI)
Open Access Version: Publisher Index Page
Download citation as: RIS | Dublin Core

Abstract

Compiling data from disparate sources to address pressing ecological issues is increasingly common. Many ecological datasets contain left-censored data – observations below an analytical detection limit. Studies from single and typically small datasets show that common approaches for handling censored data — e.g., deletion or substituting fixed values — result in systematic biases. However, no studies have explored the degree to which the documentation and presence of censored data influence outcomes from large, multi-sourced datasets. We describe left-censored data in a lake water quality database assembled from 74 sources and illustrate the challenges of dealing with small values in big data, including detection limits that are absent, range widely, and show trends over time. We show that substitutions of censored data can also bias analyses using ‘big data’ datasets, that censored data can be effectively handled with modern quantitative approaches, but that such approaches rely on accurate metadata that describe treatment of censored data from each source.

Additional publication details
Publication type	Article
Publication Subtype	Journal Article
Title	Small values in big data: The continuing need for appropriate metadata
Series title	Ecological Informatics
DOI	10.1016/j.ecoinf.2018.03.002
Volume	45
Year Published	2018
Language	English
Publisher	Elsevier
Contributing office(s)	Coop Res Unit Leetown
Description	5 p.
First page	26
Last page	30
Google Analytic Metrics	Metrics page