Management of water quality in streams of the United States is becoming increasingly complex as regulators seek to control aquatic pollution and ecological problems through Total Maximum Daily Load programs that target reductions in the concentrations of certain constituents. Sediment, nutrients, and bacteria, for example, are constituents that regulators target for reduction nationally and in the Tualatin River basin, Oregon. These constituents require laboratory analysis of discrete samples for definitive determinations of concentrations in streams. Recent technological advances in the nearly continuous, in situ monitoring of related water-quality parameters has fostered the use of these parameters as surrogates for the labor intensive, laboratory-analyzed constituents. Although these correlative techniques have been successful in large rivers, it was unclear whether they could be applied successfully in tributaries of the Tualatin River, primarily because these streams tend to be small, have rapid hydrologic response to rainfall and high streamflow variability, and may contain unique sources of sediment, nutrients, and bacteria.
This report evaluates the feasibility of developing correlative regression models for predicting dependent variables (concentrations of total suspended solids, total phosphorus, and Escherichia coli bacteria) in two Tualatin River basin streams: one draining highly urbanized land (Fanno Creek near Durham, Oregon) and one draining rural agricultural land (Dairy Creek at Highway 8 near Hillsboro, Oregon), during 2002-04. An important difference between these two streams is their response to storm runoff; Fanno Creek has a relatively rapid response due to extensive upstream impervious areas and Dairy Creek has a relatively slow response because of the large amount of undeveloped upstream land. Four other stream sites also were evaluated, but in less detail. Potential explanatory variables included continuously monitored streamflow (discharge), stream stage, specific conductance, turbidity, and time (to account for seasonal processes). Preliminary multiple-regression models were identified using stepwise regression and Mallow's Cp, which maximizes regression correlation coefficients and accounts for the loss of additional degrees of freedom when extra explanatory variables are used. Several data scenarios were created and evaluated for each site to assess the representativeness of existing monitoring data and autosampler-derived data, and to assess the utility of the available data to develop robust predictive models. The goodness-of-fit of candidate predictive models was assessed with diagnostic statistics from validation exercises that compared predictions against a subset of the available data.
The regression modeling met with mixed success. Functional model forms that have a high likelihood of success were identified for most (but not all) dependent variables at each site, but there were limitations in the available datasets, notably the lack of samples from high-flows. These limitations increase the uncertainty in the predictions of the models and suggest that the models are not yet ready for use in assessing these streams, particularly under high-flow conditions, without additional data collection and recalibration of model coefficients. Nonetheless, the results reveal opportunities to use existing resources more efficiently. Baseline conditions are well represented in the available data, and, for the most part, the models reproduced these conditions well. Future sampling might therefore focus on high flow conditions, without much loss of ability to characterize the baseline. Seasonal cycles, as represented by trigonometric functions of time, were not significant in the evaluated models, perhaps because the baseline conditions are well characterized in the datasets or because the other explanatory variables indirectly incorporate seasonal aspects. Multicollinearity among independent variabl