Spatial data are commonly minimal and may have been collected in the process of confirming the profitability of a mining venture or investigating a contaminated site. In such situations, it is common to have measurements preferentially taken in the most critical areas (sweet spots, allegedly contaminated areas), thus conditionally biasing the sample. While preferential sampling makes good practical sense, its direct use leads to distorted sample moments and percentiles. Spatial clusters are a problem that has been identified in the past and solved with approaches ranging from ad hoc solutions to highly elaborate mathematical formulations, covering mostly the effect of clustering on the cumulative frequency distribution. The method proposed here is a form of resample, free of special assumptions, does not use weights to ponder the measurements, does not find solutions by successive approximation and provides variability in the results. The new method is illustrated with a synthetic dataset with an exponential semivariogram and purposely generated to follow a lognormal distribution. The lognormal distribution is both difficult to work with and typical of many attributes of practical interest. Testing of the new solution shows that sample subsets derived from resampled datasets can closely approximate the true probability distribution and the semivariogram, clearly outperforming the original preferentially sampled data.
Additional publication details
|Publication Subtype||Journal Article|
|Title||Resampling of spatially correlated data with preferential sampling for the estimation of frequency distributions and semivariograms|
|Series title||Stochastic Environmental Research and Risk Assessment|
|Contributing office(s)||Eastern Energy Resources Science Center|