The use of process models to inform and improve statistical models of nitrate occurrence, Great Miami River Basin, southwestern Ohio

Scientific Investigations Report 2012-5001

National Water-Quality Assessment Program



Statistical models of nitrate occurrence in the glacial aquifer system of the northern United States, developed by the U.S. Geological Survey, use observed relations between nitrate concentrations and sets of explanatory variables—representing well-construction, environmental, and source characteristics— to predict the probability that nitrate, as nitrogen, will exceed a threshold concentration. However, the models do not explicitly account for the processes that control the transport of nitrogen from surface sources to a pumped well and use area-weighted mean spatial variables computed from within a circular buffer around the well as a simplified source-area conceptualization. The use of models that explicitly represent physical-transport processes can inform and, potentially, improve these statistical models. Specifically, groundwater-flow models simulate advective transport—predominant in many surficial aquifers— and can contribute to the refinement of the statistical models by (1) providing for improved, physically based representations of a source area to a well, and (2) allowing for more detailed estimates of environmental variables.

A source area to a well, known as a contributing recharge area, represents the area at the water table that contributes recharge to a pumped well; a well pumped at a volumetric rate equal to the amount of recharge through a circular buffer will result in a contributing recharge area that is the same size as the buffer but has a shape that is a function of the hydrologic setting. These volume-equivalent contributing recharge areas will approximate circular buffers in areas of relatively flat hydraulic gradients, such as near groundwater divides, but in areas with steep hydraulic gradients will be elongated in the upgradient direction and agree less with the corresponding circular buffers.

The degree to which process-model-estimated contributing recharge areas, which simulate advective transport and therefore account for local hydrologic settings, would inform and improve the development of statistical models can be implicitly estimated by evaluating the differences between explanatory variables estimated from the contributing recharge areas and the circular buffers used to develop existing statistical models. The larger the difference in estimated variables, the more likely that statistical models would be changed, and presumably improved, if explanatory variables estimated from contributing recharge areas were used in model development. Comparing model predictions from the two sets of estimated variables would further quantify—albeit implicitly—how an improved, physically based estimate of explanatory variables would be reflected in model predictions. Differences between the two sets of estimated explanatory variables and resultant model predictions vary spatially; greater differences are associated with areas of steep hydraulic gradients. A direct comparison, however, would require the development of a separate set of statistical models using explanatory variables from contributing recharge areas.

Area-weighted means of three environmental variables—silt content, alfisol content, and depth to water from the U.S. Department of Agriculture State Soil Geographic (STATSGO) data—and one nitrogen-source variable (fertilizer-application rate from county data mapped to Enhanced National Land Cover Data 1992 (NLCDe 92) agricultural land use) can vary substantially between circular buffers and volume-equivalent contributing recharge areas and among contributing recharge areas for different sets of well variables. The differences in estimated explanatory variables are a function of the same factors affecting the contributing recharge areas as well as the spatial resolution and local distribution of the underlying spatial data. As a result, differences in estimated variables between circular buffers and contributing recharge areas are complex and site specific as evidenced by differences in estimated variables for circular buffers and contributing recharge areas of existing public-supply and network wells in the Great Miami River Basin. Large differences in areaweighted mean environmental variables are observed at the basin scale, determined by using the network of uniformly spaced hypothetical wells; the differences have a spatial pattern that generally is similar to spatial patterns in the underlying STATSGO data. Generally, the largest differences were observed for area-weighted nitrogen-application rate from county and national land-use data; the basin-scale differences ranged from -1,600 (indicating a larger value from within the volume-equivalent contributing recharge area) to 1,900 kilograms per year (kg/yr); the range in the underlying spatial data was from 0 to 2,200 kg/yr. Silt content, alfisol content, and nitrogen-application rate are defined by the underlying spatial data and are external to the groundwater system; however, depth to water is an environmental variable that can be estimated in more detail and, presumably, in a more physically based manner using a groundwater-flow model than using the spatial data. Model-calculated depths to water within circular buffers in the Great Miami River Basin differed substantially from values derived from the spatial data and had a much larger range.

Differences in estimates of area-weighted spatial variables result in corresponding differences in predictions of nitrate occurrence in the aquifer. In addition to the factors affecting contributing recharge areas and estimated explanatory variables, differences in predictions also are a function of the specific set of explanatory variables used and the fitted slope coefficients in a given model. For models that predicted the probability of exceeding 1 and 4 milligrams per liter as nitrogen (mg/L as N), predicted probabilities using variables estimated from circular buffers and contributing recharge areas generally were correlated but differed significantly at the local and basin scale. The scale and distribution of prediction differences can be explained by the underlying differences in the estimated variables and the relative weight of the variables in the statistical models. Differences in predictions of exceeding 1 mg/L as N, which only includes environmental variables, generally correlated with the underlying differences in STATSGO data, whereas differences in exceeding 4 mg/L as N were more spatially extensive because that model included environmental and nitrogen-source variables. Using depths to water from within circular buffers derived from the spatial data and depths to water within the circular buffers calculated from the groundwater-flow model, restricted to the same range, resulted in large differences in predicted probabilities. The differences in estimated explanatory variables between contributing recharge areas and circular buffers indicate incorporation of physically based contributing recharge area likely would result in a different set of explanatory variables and an improved set of statistical models.

The use of a groundwater-flow model to improve representations of source areas or to provide more-detailed estimates of specific explanatory variables includes a number of limitations and technical considerations. An assumption in these analyses is that (1) there is a state of mass balance between recharge and pumping, and (2) transport to a pumped well is under a steady state flow field. Comparison of volumeequivalent contributing recharge areas under steady-state and transient transport conditions at a location in the southeastern part of the basin shows the steady-state contributing recharge area is a reasonable approximation of the transient contributing recharge area after between 10 and 20 years of pumping. The first assumption is a more important consideration for this analysis. A gradient effect refers to a condition where simulated pumping from a well is less than recharge through the corresponding contributing recharge area. This generally takes place in areas with steep hydraulic gradients, such as near discharge locations, and can be mitigated using a finer model discretization. A boundary effect refers to a condition where recharge through the contributing recharge area is less than pumping. This indicates other sources of water to the simulated well and could reflect a real hydrologic process. In the Great Miami River Basin, large gradient and boundary effects—defined as the balance between pumping and recharge being less than half—occurred in 5 and 14 percent of the basin, respectively. The agreement between circular buffers and volume-equivalent contributing recharge areas, differences in estimated variables, and the effect on statisticalmodel predictions between the population of wells with a balance between pumping and recharge within 10 percent and the population of all wells were similar. This indicated process-model limitations did not affect the overall findings in the Great Miami River Basin; however, this would be model specific, and prudent use of a process model needs to entail a limitations analysis and, if necessary, alterations to the model.

Geospatial Extents

Additional Publication Details

Publication type:
Publication Subtype:
USGS Numbered Series
The use of process models to inform and improve statistical models of nitrate occurrence, Great Miami River Basin, southwestern Ohio
Series title:
Scientific Investigations Report
Series number:
Year Published:
U.S. Geological Survey
Publisher location:
Reston, VA
Contributing office(s):
Massachusetts-Rhode Island Water Science Center
x, 75 p.
Number of Pages:
United States
Other Geospatial:
Great Miami River Basin
Online Only (Y/N):
Additional Online Files(Y/N):