Normality of raw data in general linear models:  The most widespread myth in statistics

Marc Kery; Jeff S. Hatfield

doi:10.1890/0012-9623(2003)84[92:NORDIG]2.0.CO;2

Normality of raw data in general linear models: The most widespread myth in statistics

Bulletin of the Ecological Society of America

By: Marc Kery and Jeff S. Hatfield

https://doi.org/10.1890/0012-9623(2003)84[92:NORDIG]2.0.CO;2

Links

More information: Publisher Index Page (via DOI)
Open Access Version: Publisher Index Page
Download citation as: RIS | Dublin Core

Abstract

In years of statistical consulting for ecologists and wildlife biologists, by far the most common misconception we have come across has been the one about normality in general linear models. These comprise a very large part of the statistical models used in ecology and include t tests, simple and multiple linear regression, polynomial regression, and analysis of variance (ANOVA) and covariance (ANCOVA). There is a widely held belief that the normality assumption pertains to the raw data rather than to the model residuals. We suspect that this error may also occur in countless published studies, whenever the normality assumption is tested prior to analysis. This may lead to the use of nonparametric alternatives (if there are any), when parametric tests would indeed be appropriate, or to use of transformations of raw data, which may introduce hidden assumptions such as multiplicative effects on the natural scale in the case of log-transformed data. Our aim here is to dispel this myth. We very briefly describe relevant theory for two cases of general linear models to show that the residuals need to be normally distributed if tests requiring normality are to be used, such as t and F tests. We then give two examples demonstrating that the distribution of the response variable may be nonnormal, and yet the residuals are well behaved. We do not go into the issue of how to test normality; instead we display the distributions of response variables and residuals graphically.

Additional publication details
Publication type	Article
Publication Subtype	Journal Article
Title	Normality of raw data in general linear models: The most widespread myth in statistics
Series title	Bulletin of the Ecological Society of America
DOI	10.1890/0012-9623(2003)84[92:NORDIG]2.0.CO;2
Volume	84
Issue	2
Year Published	2003
Language	English
Contributing office(s)	Patuxent Wildlife Research Center
Description	3 p.
First page	92
Last page	94
Google Analytic Metrics	Metrics page