The selection of a distributional assumption suitable for modelling macroinvertebrate density data is typically challenging. Macroinvertebrate data often exhibit substantially larger variances than expected under a standard count assumption, that of the Poisson distribution. Such overdispersion may derive from multiple sources, including heterogeneity of habitat (historically and spatially), differing life histories for organisms collected within a single collection in space and time, and autocorrelation. Taken to extreme, heterogeneity of habitat may be argued to explain the frequent large proportions of zero observations in macroinvertebrate data. Sampling locations may consist of habitats defined qualitatively as either suitable or unsuitable. The former category may yield random or stochastic zeroes and the latter structural zeroes. Heterogeneity among counts may be accommodated by treating the count mean itself as a random variable, while extra zeroes may be accommodated using zero-modified count assumptions, including zero-inflated and two-stage (or hurdle) approaches. These and linear assumptions (following log- and square root-transformations) were evaluated using 9 years of mayfly density data from a 52 km, ninth-order reach of the Upper Mississippi River (n = 959). The data exhibited substantial overdispersion relative to that expected under a Poisson assumption (i.e. variance:mean ratio = 23 ??? 1), and 43% of the sampling locations yielded zero mayflies. Based on the Akaike Information Criterion (AIC), count models were improved most by treating the count mean as a random variable (via a Poisson-gamma distributional assumption) and secondarily by zero modification (i.e. improvements in AIC values = 9184 units and 47-48 units, respectively). Zeroes were underestimated by the Poisson, log-transform and square root-transform models, slightly by the standard negative binomial model but not by the zero-modified models (61%, 24%, 32%, 7%, and 0%, respectively). However, the zero-modified Poisson models underestimated small counts (1 ??? y ??? 4) and overestimated intermediate counts (7 ??? y ??? 23). Counts greater than zero were estimated well by zero-modified negative binomial models, while counts greater than one were also estimated well by the standard negative binomial model. Based on AIC and percent zero estimation criteria, the two-stage and zero-inflated models performed similarly. The above inferences were largely confirmed when the models were used to predict values from a separate, evaluation data set (n = 110). An exception was that, using the evaluation data set, the standard negative binomial model appeared superior to its zero-modified counterparts using the AIC (but not percent zero criteria). This and other evidence suggest that a negative binomial distributional assumption should be routinely considered when modelling benthic macroinvertebrate data from low flow environments. Whether negative binomial models should themselves be routinely examined for extra zeroes requires, from a statistical perspective, more investigation. However, this question may best be answered by ecological arguments that may be specific to the sampled species and locations. ?? 2004 Elsevier B.V. All rights reserved.
Additional publication details
Selecting a distributional assumption for modelling relative densities of benthic macroinvertebrates