Statistical characterization of a large geochemical database and effect of sample size

C. Zhang; F. T. Manheim; J. Hinde; J. N. Grossman

doi:10.1016/j.apgeochem.2005.06.006

Statistical characterization of a large geochemical database and effect of sample size

Applied Geochemistry

By: C. Zhang, F. T. Manheim, J. Hinde, and J. N. Grossman

https://doi.org/10.1016/j.apgeochem.2005.06.006

Links

More information: Publisher Index Page (via DOI)
Download citation as: RIS | Dublin Core

Abstract

The authors investigated statistical distributions for concentrations of chemical elements from the National Geochemical Survey (NGS) database of the U.S. Geological Survey. At the time of this study, the NGS data set encompasses 48,544 stream sediment and soil samples from the conterminous United States analyzed by ICP-AES following a 4-acid near-total digestion. This report includes 27 elements: Al, Ca, Fe, K, Mg, Na, P, Ti, Ba, Ce, Co, Cr, Cu, Ga, La, Li, Mn, Nb, Nd, Ni, Pb, Sc, Sr, Th, V, Y and Zn. The goal and challenge for the statistical overview was to delineate chemical distributions in a complex, heterogeneous data set spanning a large geographic range (the conterminous United States), and many different geological provinces and rock types. After declustering to create a uniform spatial sample distribution with 16,511 samples, histograms and quantile-quantile (Q-Q) plots were employed to delineate subpopulations that have coherent chemical and mineral affinities. Probability groupings are discerned by changes in slope (kinks) on the plots. Major rock-forming elements, e.g., Al, Ca, K and Na, tend to display linear segments on normal Q-Q plots. These segments can commonly be linked to petrologic or mineralogical associations. For example, linear segments on K and Na plots reflect dilution of clay minerals by quartz sand (low in K and Na). Minor and trace element relationships are best displayed on lognormal Q-Q plots. These sensitively reflect discrete relationships in subpopulations within the wide range of the data. For example, small but distinctly log-linear subpopulations for Pb, Cu, Zn and Ag are interpreted to represent ore-grade enrichment of naturally occurring minerals such as sulfides. None of the 27 chemical elements could pass the test for either normal or lognormal distribution on the declustered data set. Part of the reasons relate to the presence of mixtures of subpopulations and outliers. Random samples of the data set with successively smaller numbers of data points showed that few elements passed standard statistical tests for normality or log-normality until sample size decreased to a few hundred data points. Large sample size enhances the power of statistical tests, and leads to rejection of most statistical hypotheses for real data sets. For large sample sizes (e.g., n > 1000), graphical methods such as histogram, stem-and-leaf, and probability plots are recommended for rough judgement of probability distribution if needed. ?? 2005 Elsevier Ltd. All rights reserved.

Additional publication details
Publication type	Article
Publication Subtype	Journal Article
Title	Statistical characterization of a large geochemical database and effect of sample size
Series title	Applied Geochemistry
DOI	10.1016/j.apgeochem.2005.06.006
Volume	20
Issue	10
Year Published	2005
Language	English
Larger Work Type	Article
Larger Work Subtype	Journal Article
Larger Work Title	Applied Geochemistry
First page	1857
Last page	1874
Google Analytic Metrics	Metrics page