PeakFlow and LowFlow Magnitude Estimates at Defined Frequencies and Durations for Nontidal Streams in Delaware
Links
 Document: Report (8.12 MB pdf) , HTML , XML
 Data Releases:
 USGS data release  Magnitude and frequency of peak flows and low flows on nontidal streams in Delaware—Peak and low flow estimates and basin characteristics
 USGS data release  PeakFQ inputs and selected outputs for selected gages in or near Delaware
 USGS data release  Basin characteristics rasters for Delaware StreamStats 2020
 USGS data release  Fundamental dataset rasters for Delaware StreamStats 2020
 Download citation as: RIS  Dublin Core
Abstract
Reliable estimates of the magnitude of peak flows in streams are required for the economical and safe design of transportation and water conveyance structures. In addition, reliable estimates of the magnitude of low flows at defined frequencies and durations are needed for meeting regulatory requirements, quantifying base flows in streams and rivers, and evaluating time of travel and dilution of toxic spills. This report, in cooperation with the Delaware Department of Transportation and the Delaware Geological Survey, presents methods for estimating the magnitude of peak flows and low flows at defined frequencies and durations on nontidal streams in Delaware, at locations both monitored by streamflowgage sites and ungaged. Methods are presented for estimating (1) the magnitude of peak flows for return periods ranging from 2 to 500 years (50percent to 0.2percent annualexceedance probability), and (2) the magnitude of low flows as applied to 7, 14, and 30consecutive day lowflow periods with recurrence intervals of 2, 10, and 20 years (50, 10, and 5percent annual nonexceedance probabilities). These methods are applicable to watersheds that exhibit a full range of development conditions in Delaware. The report also describes StreamStats, a web application that allows users to easily obtain peakflow and lowflow magnitude estimates for userselected locations in Delaware.
Peakflow and lowflow magnitude estimates for ungaged sites are obtained using statistical regression analysis through a process known as regionalization, where information from a group of streamflowgage sites within a region forms the basis for estimates for ungaged sites within the same region. Ninetyfour streamflowgage sites in and near Delaware with at least 10 years of nonregulated annual peakflow data were used for the peakflow regression analysis, a subset of the 121 sites for which peakflow estimates were computed. These sites included both continuousrecord streamflowgage sites as well as partial record sites. Fortyfive streamflowgage sites with at least 10 years of nonregulated lowflow data available were used for the lowflow regression analyses, a subset of the 68 sites for which lowflow estimates were computed. Estimates for gaged sites are obtained by combining (1) the station peakflow statistics (mean, standard deviation, and skew) and peakflow estimates using the recent Bulletin 17C guidelines that incorporate the Expected Moments Algorithm with (2) regional estimates of peakflow magnitude derived from regional regression equations and regional skew derived from sites with records greater than or equal to 35 years. Example peakflow estimate calculations using the methods presented in the report are given for (1) ungaged sites, (2) gaged sites, (3) sites upstream or downstream from a gaged location, and (4) sites between gaged locations. Estimates for lowflow gaged sites are obtained by combining (1) the station lowflow statistics (mean, standard deviation, and skew) and lowflow estimates with (2) regional estimates of lowflow magnitude derived from regional regression equations. Example lowflow estimate calculations using the methods presented in the report are given for (1) ungaged sites, (2) gaged sites, (3) sites upstream or downstream from a gaged location, and (4) sites between gaged locations. A total of 54 sites in the Coastal Plain region were used to develop peakflow regressions for the region and 40 sites were used for the Piedmont region. Similarly, 24 sites were used for lowflow regression equation development in the Coastal Plain, with 21 in the Piedmont. Peak and lowflow site inclusion in the Coastal Plain tended to be more restricted with tidal influence and ranges of basin characteristics, including drainage area, limiting regression equation development and application.
Regional regression equations for peak flows and low flows, as applicable to ungaged sites in the Piedmont and Coastal Plain Physiographic Provinces in Delaware, are presented. Peakflow regression equations used variables that quantified drainage area, basin slope, percent area with welldrained soils, percent area with poorly drained soils, impervious area, and percent area of surface water storage in estimating peakflow estimates, whereas lowflow regression equations used only drainage area and percent poorly drained soils in the estimation of low flows. Average standard errors for peakflow regressions tended to be lower than those for low flow regressions, with lower errors in the Piedmont region for both peak and lowflow regressions. For peakflow estimates, a sensitivity analysis of Piedmont regression equation estimates to changes in impervious area is also presented.
Additional topics associated with the analyses performed during the study are discussed, including (1) the availability and description of 32 basin and climatic characteristics considered during the development of the regional regression equations; (2) the treatment of increasing trends in the annual peakflow series identified at 18 gaged sites and inclusion in or exclusion from the regional analysis; (3) regional skew analysis and determination of regression regions; (4) sample adjustments and removal of sites owing to regulation and redundancy; and (5) a brief comparison of peak and lowflow estimates at gages used in previous studies.
Introduction
Reliable estimates of the magnitude of peak flows in streams at defined frequencies are required for the economical design of transportation and waterconveyance structures such as roads, bridges, culverts, storm sewers, dams, and levees (Ries and Dillow, 2006). These estimates are also needed for the effective planning and management of land use and water resources to protect lives and property in floodprone areas and to determine floodinsurance rates (Ries and Dillow, 2006).
In addition, reliable estimates of the magnitude of low flows are needed by engineers, scientists, natural resource managers, and many others for (1) establishment of regulatory minimum flowby requirements for streams and rivers, (2) quantifying base flow in streams and rivers, (3) wastewater discharge permitting, (4) watersupply planning and management, (5) protection of stream biota, and (6) evaluation of time of travel and dilution of toxic spills (Carpenter and Hayes, 1996; Ries, 2006; Doheny and Banks, 2010).
Estimates of peakflow magnitude at defined frequencies and lowflow magnitude at defined frequencies and durations are needed, both at locations monitored by streamflowgages and at ungaged sites where no streamflow information is available for determining estimates (Ries and Dillow, 2006). Estimates for ungaged sites are usually achieved through a process known as regionalization, where peakflow and lowflow information for groups of streamflowgage sites within a region are used to calculate estimates for ungaged sites (Ries and Dillow, 2006).
Because peakflow and lowflow statistics that are computed at individual streamflowgage sites can represent different record lengths and different periods of time, they should be considered estimates when used for predicting longterm and future extreme peakflow or lowflow conditions (Ries, 2006; Doheny and Banks, 2010). Statistics also change over time as more streamflow data are recorded at individual streamflowgage sites and with the addition of extreme hydrologic conditions (Ries, 2006; Doheny and Banks, 2010). As a result, both peakflow and lowflow statistics at individual streamflowgage sites should be updated periodically to reflect the longer periods of data collection, and for use in the regionalization process to produce updated estimates at ungaged sites (Ries, 2006; Doheny and Banks, 2010).
Purpose and Scope
The purpose of this report is to present methods for estimating the magnitude of peak flows and low flows at defined frequencies and durations on nontidal streams in Delaware. This report (1) describes methods used to estimate the magnitudes of both peak flows and low flows at sites monitored by streamflowgage sites; (2) presents estimates of peakflow magnitude at the 50percent (2 year [yr]), 20percent (5 yr), 10percent (10 yr), 4percent (25 yr), 2percent (50 yr), 1percent (100 yr), 0.5percent (200 yr), and 0.2percent (500 yr) annualexceedance probability (AEP) for 94 streamflowgage sites in and near Delaware; (3) presents estimates of lowflow magnitudes at selected frequencies (7Q2, 7Q10, 7Q20, 14Q2, 14Q10, 14Q20, 30Q2, 30Q10, and 30Q20) describing 7, 14, and 30consecutiveday lowflow periods with annual nonrecurrence intervals of 2, 10, and 20 years (annual nonexceedance probabilities of 50 percent, 10 percent, and 5 percent, respectively) for 45 streamflowgage sites in and near Delaware; (4) describes methods used to develop regression equations to estimate the magnitude of peak flows and low flows for defined frequencies and durations at ungaged sites in Delaware; (5) describes the accuracy and limitations of the peakflow and lowflow equations; (6) presents example applications of the peakflow and lowflow methods; and (7) describes the U.S. Geological Survey (USGS) StreamStats web application from which basin characteristics, streamflow statistics, and estimates can be obtained.
Previous Investigations
Methods for determining peakflow magnitude estimates for nontidal streams in Delaware were published in previous reports by Tice (1968), Cushing and others (1973), Simmons and Carpenter (1978), Dillow (1996), and most recently by Ries and Dillow (2006). The regionalization methods described in the earlier reports relied on fewer monitored sites and shorter periods of recorded data than those used in Ries and Dillow (2006). Since Ries and Dillow (2006) was published, an additional 13 water years of streamflow data and computationally improved regionalization techniques have become available.
Methods for determining lowflow statistics for nontidal streams in Delaware were published previously in Carpenter and Hayes (1996). Regionalization methods used in that report relied on data through March 1987 (the end of the 1986 climatic year). An additional 31 years of streamflow data and computationally improved regionalization techniques for lowflow statistics have become available since the analyses described by Carpenter and Hayes (1996).
Doheny and Banks (2010) also presented selected lowflow statistics for 114 continuousrecord streamflowgage sites in Maryland based on available data through the 2009 climatic year. Lowflow statistics at several streamflowgage sites from the Coastal Plain area of Maryland and the Piedmont area of northeastern Maryland included in that investigation were updated using an additional 8 years of continuous streamflow data and used for regionalization in this investigation.
Description of Study Area
The study area, composed of the State of Delaware, is in the MidAtlantic region of the United States. Delaware lies between lat 38°27' and 39°51' N. and long 75°04' and 75°48' W. and is bordered on the north by Pennsylvania, on the west and south by Maryland, on the east by Delaware Bay, and the Atlantic Ocean in the southeast section of the State (fig. 1). The State of New Jersey is on the eastern shore of the Delaware River and Delaware Bay. Delaware has a land area of 1,954 square miles (mi^{2}) and a 2020 population of 986,809, an increase of 1.04 percent from 2019 (Macrotrends, 2021).
The climate in the study area is generally temperate (Ries and Dillow, 2006). The mean annual temperature across Delaware ranges from about 54 degrees Fahrenheit (° F) in northern New Castle County to 58.1° F along the Atlantic coast of southern Delaware. Mean annual precipitation is about 45 inches statewide (Delaware State Climatologist, 2021). The precipitation is distributed fairly evenly throughout the year, though with greater precipitation surplus (precipitation minus potential evapotranspiration) during the winter and spring months. Annual peak flows in the State are triggered by a mix of frontal storms with rain and melting snow in the spring, thunderstorms in the summer, and tropical storms and hurricanes in the summer and fall (Ries and Dillow, 2006). Annual lowflow periods are the result of extended periods with minimal rainfall that occur periodically, generally between July and October.
The study area is in two major physiographic provinces, the Coastal Plain and the Piedmont (fig. 1; Fenneman, 1938). The Fall Line, which crosses the northeast corner of Delaware about 5 miles (mi) south of the northwest corner of the State, forms the divide between the two provinces. The Piedmont Physiographic Province, northwest of the Fall Line, consists of gently rolling landscape with maximum elevations generally less than 400 feet (ft) above sea level. Delaware streams in this province have steep gradients and drain to either the Delaware River or to Delaware Bay (Dillow, 1996; Ries and Dillow, 2006). The Coastal Plain Physiographic Province, southeast of the Fall Line, consists of an area of low relief adjacent to the Chesapeake Bay and Delaware Bay, with elevations ranging from sea level to less than 100 ft. Streams in the Coastal Plain are often affected by tides for substantial distances above their mouths. The Fall Line is named as such because numerous waterfalls occur where rivers and streams drop in elevation from the Piedmont onto the Coastal Plain (Ries and Dillow, 2006).
Table 1.
Summary of streamflowgage sites in and near Delaware used in peakflow regional regression equations.Methods for Estimating the Magnitude of Peak Flows at Defined Frequencies
This report describes methods for estimating the magnitude of peak flows associated with defined frequencies for sites monitored by streamflowgage sites and for ungaged sites in and around Delaware. The process followed for determining peakflow magnitude estimates for ungaged sites in a given region usually includes (1) selecting a group of streamflowgage sites in and around the region with at least 10 years of annual peakflow data and streamflow conditions that generally represent the whole area; (2) computing initial peakflow magnitude estimates and atsite skew values as determined by use of “Guidelines for Determining Flood Flow Frequency—Bulletin 17C” (England and others, 2019); (3) computing physical and climatic characteristics (hereafter referred to as “basin characteristics”) that have a conceptual relation to the generation of peak flows for the drainage basins associated with the monitored sites; (4) analyzing the initial atsite skew coefficients to compute regional skew values; (5) recomputing the peakflow magnitude estimates for the sites monitored by streamflowgages by weighting the atsite skew coefficients and the new regional skew values that correspond to their respective record lengths; (6) analyzing relations between peakflow magnitude estimates and basin characteristics to determine homogeneity throughout the region or whether to divide the region into subregions; (7) using regression analysis to develop equations to estimate peakflow magnitudes at ungaged sites in the region or subregions; and (8) assessing and describing the accuracy associated with estimation of peakflow magnitudes for ungaged sites (Ries and Dillow, 2006).
Streamflowgage sites within Delaware and in adjacent states having basin centroids within 25 mi of the Delaware border were investigated for possible use in the regional analysis. Gaged sites in this region were not used in the analysis if less than 10 years of annual peakflow data were available, or if peak flows at the stations were substantially affected by dam regulation or storage of storm flows by reservoirs. Use of these criteria resulted in the initial selection of 121 gaged sites for use in the regional analysis (Hammond, 2021a) (fig. 2, table 1). The number of gaged sites that describe basins with an appreciable amount of urban development within the region was insufficient to develop separate regression equations based solely on urban drainage basins; however, the Delaware Department of Transportation is interested in understanding how development can affect peakflow magnitude estimates, so gaged sites that monitor basins with appreciable development were not excluded from the analysis.
Determination of Basin Characteristics
Basin characteristics that were initially considered for use in the regression analyses included data variables related to drainage area, basin slope, impervious area, soil types, basin storage, developed land, mean basin elevation, basin relief, forest cover, meanannual and maximum precipitation totals, hydraulic conductivity, development density, outletpoint elevation, flowpath lengths, housing units and housing density, and basin population. A list of specific basin characteristics considered for use in both the peakflow and lowflow regression analyses is shown in table 2.
Table 2.
Basin characteristics considered for use in the peakflow and lowflow regression analyses.Name  Units  Description 

DRNAREA  Square miles  Drainage area 
BSLDEM3M  Percent  Mean basin slope 
IMPERV  Percent  Percent impervious surface area from 2016 National Land Cover Database (Homer and others, 2012) 
SOILA  Percent  Percent soil type A, well drained (SSURGO) 
SOILD  Percent  Percent soil type D, poorly drained (SSURGO) 
STORNHD  Percent  Percent area of surface water storage from National Hydrography Dataset 
LC16STOR  Percent  Percent area of surface water storage from 2016 National Land Cover Database (Class 10, 90, 95) (Homer and others, 2012) 
LC16DEV  Percent  Percent of developed land use categories from 2016 National Land Cover Database (Class 21–24) (Homer and others, 2012) 
ELEV  Feet  Mean basin elevation 
RELIEF  Feet  Maximum minus minimum basin elevation 
FOREST  Percent  Percent of forested land use from 2016 National Land Cover Database (Class 41–43) (Homer and others, 2012) 
PRECIP  Inches  Mean annual precipitation 
I24HXY  Inches  24hour, 2, 5, 10, 25, 50, 100, 200, 500, 1,000year maximum precipitation where X is the recurrence interval in years (recurrence intervals correspond to the 50, 20, 10, 4, 2, 1, 0.5, 0.2, and 0.1percent chance annualexceedance probabilities) 
SSURGOKSAT  Inches per hour  Mean surface layer vertical hydraulic conductivity (SSURGO) 
DINTNLCD16A  NA  Development intensity (categories 21, 22, 23, 24; weighted 0.1, 0.25, 0.65, 0.9, respectively) from 2016 National Land Cover Database (Homer and others, 2012) 
OUTLETELEV  Feet  Outlet point elevation 
LFPLENGTH  Miles  Longest flowpath length 
BSHAPELFP  Unitless  Longest flowpath length squared divided by drainage area 
LFPSLPFM  Feet per mile  Longest flowpath slope 
LFPSLP1085FM  Feet per mile  Longest flowpath slope from the U.S. Geological Survey 10–85 slope method 
HOUSINGTOT2010  Number of housing units  Total housing units from 2010 census block groups (U.S. Census Bureau, 2010) 
HOUSINGDEN2010  Housing units per acre  Housing density from 2010 census block groups (U.S. Census Bureau, 2010) 
POPTOTAL10  Population  Total population in basin from 2010 census block groups (U.S. Census Bureau, 2010) 
POPDENS10  Population per acre  Basin population density from 2010 census block groups (U.S. Census Bureau, 2010) 
Flow Statistics for StreamflowGage Sites
There were 121 streamflowgage sites identified either within the State of Delaware or within a 25mi buffer of the State of Delaware border and located within Maryland, New Jersey, or Pennsylvania, each of which is associated with 10 or more years of annual peakflow data. Of these sites, 94 were subsequently used for the development of peakflow regression equations following analyses of trend, redundancy, and regulation as described in the sections below. Following Wagner and others (2016), record lengths from individual gaged sites were not combined following the analysis of contributingarea redundancy and flow regulation. Record combination would only have been possible for two sets of gage sites, and instead of introducing potential error, the longer record was chosen for each set of redundant gages (01467087 and 01467089, 01476480 and 01476500) providing a dataset composed solely of individual streamflowgagesite data records.
For each of the 121 streamflowgage sites, peakflow magnitudes at the 50, 20, 10, 4, 2, 1, 0.5, and 0.2percent AEPs (2year to 500year recurrence intervals) were determined using the PeakFQ program version 7.3 (Veilleux and others, 2014) and guidance from Bulletin 17C Guidelines for determining floodflow frequency (England and others, 2019). Bulletin 17C is implemented in PeakFQ version 7.3 using the Expected Moments Algorithm (EMA) (Cohn and others, 1997) and a generalized version of the GrubbsBeck test for low outliers (Cohn and others, 2013). Flow statistics are provided for sites included in regression analyses as well as those not suitable for inclusion in regression equations owing to regulation, redundancy, or the presence of trend as detailed below. Atsite, regressionbased, and weighted flow statistics are provided for sites included in regression equation development. Atsite, regressionbased, and weighted peakflow estimates along with basin characteristics can be found in Hammond (2021a).
Procedures for Implementing the Bulletin 17C Guidelines
Procedures for implementing the Bulletin 17C guidelines include (1) EMA analysis for fitting the logPearson Type III (LP3) distribution, incorporating historical information where applicable; (2) the use of weighted skew coefficients (based on weighting atsite skew coefficients, fig. 3, with regional skew coefficients from the section “Regional Skew Analysis and Determination of Streamflow Analysis Regions” below); and (3) the use of the Multiple GrubbsBeck Test (MGBT) for identifying potentially influential low flows (PILFs) (England and others, 2019; Sando and McCarthy, 2018).
The standard procedures for incorporating historical information with the EMA methodology involve defining and applying the bestavailable flow intervals and perception thresholds for peakflow magnitude analyses that include recorded streamflow data, historical peak flows, and ungaged historical periods. The EMA method improves upon the standard LP3 method in Bulletin 17B by integrating censored and interval peakflow data into the analysis (Cohn and others, 1997). There are two types of censored peakflow data in this report: (1) annual peak flows at creststage gages for which the flow is less than a minimum recordable value, and (2) historical annual peak flows that have not exceeded a recorded historical peak (Eash and others, 2013). In EMA, interval discharges were used to characterize peak flows known to be greater or less than a specific value, peak flows that could only be reliably estimated within a certain range, or to characterize missing data in periods of recorded data. The MGBT is used by EMA to identify not just low outliers but also other PILFs (Cohn and others, 2013). Although low outliers are typically one or two homogeneous values in a dataset that do not conform to the trend of the other observations, PILFs have a magnitude that is much smaller than the flood quantile of interest, occur below a statistically significant break in the flood magnitudefrequency plot, and have excessive influence on the estimated magnitude of large floods at defined frequencies. An example implementation of Bulletin 17C methods is shown in figure 4 for USGS gage 01480000 (Red Clay Creek at Wooddale, Delaware).
For crest stage gage locations, which differ from continuous record sites in that measurements are only available for the maximum stage of an individual event, it was also necessary to determine the minimum recordable flow as part of implementing perception thresholds. The minimum recordable flow threshold primarily affects treatment of peak flows near the extreme lower tail of the frequency distribution. Generally, the handling of peak flows near the extreme lower tail of the frequency distribution has limited effect on magnitudefrequency analyses, with very low peak flows either censored by PILF thresholds or exerting small influence in determining the distributional parameters of the LP3 distribution.
For some streamflowgage sites, the peakflow records are not well represented by the standard procedures and require informed user adjustments. For this study, adjustments were made to account for atypical lowertail peakflow data at seven sites (01412500, 01467086, 01467317, 01467351, 01473470, 01475019, and 01488500) by implementing the Single GrubbsBeck Test instead of the recommended MGBT, resulting in an improved EMA fit of the LP3 distribution. The PeakFQ models for all 121 sites used in this study are provided in Hammond (2021b).
Analysis of Trends in Annual PeakFlow Time Series
PeakFQ implements the nonparametric MannKendall test for the identification of monotonic trends in annual peakflow data. The results from this test show that peakflow data from 18 of the original 121 sites had significant trends, where significance is defined by pvalues less than 0.05. Similar to Wagner and others (2016), if sites displayed a significant trend, the MannKendall trend test was implemented on variations of the original annual peakflow series by removing 5 percent of the annual peakflow data from the beginning or end of the dataset, or a combination of the two. This practice assumes that if a significant trend is not identified using 90 or 95 percent of the data, then the trend was likely not present and not impactful on the peakflow magnitude analysis (Eash and others, 2013). If a trend remained after removing data as described, the site was not included in the development of regional regression equations. Nine of the 18 sites were kept for further analysis, although some not included were possibly eliminated owing to redundancy or regulation as described in the sections below. The nine sites that were not used for regression equation development owing to trends were 01467317, 01475000, 01477800, 01484100, 01484270, 01484500, 01491000, 01492500, and 01493000.
Regional Skew Analysis and Determination of Regression Regions
To assess whether separate regions with different regional skew values were present within our study area, we implemented the nonparametric Wilcoxon rank sum test for significant differences between the Coastal Plain and Piedmont physiographic regions used in the prior peakflow frequency study (Ries and Dillow, 2006). For this analysis, we only considered sites with more than 35 years of peakflow record, following Bulletin 17B (Interagency Advisory Committee on Water Data, 1982) and Wagner and others (2016). The Wilcoxon ranksum test showed no significant difference between the longterm skew values in the Coastal Plain and Piedmont physiographic regions; thus, the average of all Coastal Plain and Piedmont atsite skews were used to calculate a regional skew value (0.264) applicable to the entire study region. This value was used along with the atsite skew coefficients to produce the weightedskew PeakFQ models. Though only one regional skew value was used across the study area, when regression equations were constructed, separate regression equations were built for the Coastal Plain and Piedmont physiographic regions owing to improved model performance for individual regions as compared to using one model for the entire study area. Sites were assigned to physiographic regions based on the region containing the majority of their contributing drainage area. For one site, we deviated from this rule. USGS 01479000 White Clay Creek near Newark, DE, has 55 percent of the contributing area in the Coastal Plain and 45 percent from the Piedmont. The 45 percent located in the Piedmont is in the headwater of the basin where much of the flood generating urban influence is likely located, and this was the justification for assigning it to the Piedmont region in the prior study (Ries and Dillow, 2006).
Sample Adjustments and Removal of Sites Owing to Redundancy
Streamflowgage sites with significant degrees of regulation were not included in the original dataset if there were regulation qualifiers applied to the annual peak flows in the USGS National Water Information System (NWIS) database. Adjustments to the dataset were also made based on any redundancy among sites that could influence the relations determined by regression analysis. The process used to justify removal of streamflowgage sites as a result of redundancy is discussed below.
Statistical redundancy occurs when data from one streamflowgage site can be partly or completely deduced from the data associated with another nearby gage on the same stream or river, or nearby in the same watershed. Inclusion of redundant data in a peakflow analysis can influence the resulting regressionderived relation. The peakflow dataset for this investigation was evaluated for redundancy in order to remove any potential influence or bias on the peakflow magnitude regression analyses. Sites were flagged for possible redundancy if the drainage area ratio between two sites was less than 5.0 and the standardized distance between the two sites was less than 0.5 (Wagner and others, 2016).
There were 25 potential instances of redundancy identified in the peakflow dataset, involving a total of 30 sites. Decisions regarding site redundancy for peak flows were made based primarily on record length and drainagearea size. Consideration was also given to the length of the data record, any breaks in the continuous record or annual peakflow record, and how well the peakflow record represents the current watershed conditions.
Of the 30 sites potentially affected by redundancy issues and questions, 18 were removed from the peakflow analysis. For this study, no site records for peak flow were combined following removal of regulated or trendaffected sites, as a minimum of 10 years of concurrent data were not available (following Bulletin 17C guidelines and Wagner and others [2016]). The final number of streamflowgage sites used for peak regression equation development in each state and region is provided in table 3, and a summary of the drainage areas for these streamflowgage sites is provided in table 4. The ranges of basin characteristics used to develop the equations for each subregion are provided in table 5.
Development of Regression Equations
In developing regional regression equations, basin characteristics for candidate regression models were evaluated based on multicollinearity with other basin characteristics and statistical significance using ordinary leastsquares regression. Subsequently, generalized leastsquares regression was used to develop the final regression models for each region.
Explanatory Variable Selection for PeakFlow Estimation
Each of the 32 basin characteristics in table 2 were considered for inclusion as explanatory variables in the peakflow magnitude estimating equations. Prior to identifying the strongest peakflow magnitude predictor variables, a multicollinearity matrix was calculated for all candidate variables and used to identify variables that were strongly correlated with each other. Multicollinearity was evaluated using the variance inflation factor (VIF), which provides an index of the magnitude of variance increase for an estimated regression coefficient owing to multicollinearity (Helsel and others, 2020). For this study, it was desirable to keep the VIF for all explanatory variables below 2.5 with statistical significance determined at the 95percent confidence level (p≤0.05). In each case of multicollinearity between two variables, the simpler variable with more direct causal connection to peakflow magnitude was retained for further consideration. Following the multicollinearity analysis, a subset of 20 basin characteristics remained for consideration using an iterative ordinary leastsquares (OLS) regression approach to determine the best set of basincharacteristic variables to include in the peakflow magnitude estimation equations.
Ordinary LeastSquares Regression for PeakFlow Estimation
OLS regression techniques were used to select basin characteristics to use as explanatory variables for each AEP and to assess whether developing separate peakflow estimation equations for study subregions based on physiographic provinces (as was done in prior reports) remained a justifiable approach. The mathematical objective of leastsquares regression is to minimize the sum of the squared residuals (the differences between the actual and predicted value of a response variable) for all observations included in the regression (Farmer and others, 2019). The technique provides a consistent, reproducible way to estimate the linear best fit for a given set of data. OLS is a specific form of leastsquares regression that is characterized by an underlying assumption of homoscedasticity, meaning that the uncertainty associated with each responsevariable observation in a dataset is approximately equal. As detailed by Southard and Veilleux (2014), other assumptions inherent in the use of OLS regression include (1) a linear relation between the independent explanatory variable (basin characteristic) and the dependent response variable (the Ppercent AEP), and (2) normality of residuals. All variables were transformed using base10 logarithm to meet the first assumption. Variables expressed in percent, such as landcover characteristics, were transformed by the addition of a value of 1 prior to log base10 transformation to avoid the potential for computing an undefined quantity (the logarithm of zero). The normality of residuals resulting from various regressionbased relations was assessed to assure that the final assumption was valid before considering inclusion of those relations in the results of this study.
Initial OLS regression equations were developed for the 1percent AEP using 20 explanatory variables for 94 streamflowgage sites in the study area. The Q1 percent, which is the 1percent AEP, was chosen for optimizing the selection of explanatory variables because it is the AEP most often used by water managers, engineers, and planners (Eash and others, 2013). An iterative OLS regression approach using the allReg function from the USGS smwrStats R package (R Core Team, 2020; Lorenz, 2020) was utilized to examine the best subsets of explanatory variables for each AEP following the methodology suggested by Helsel and others (2020). A maximum of five concurrent variables for each AEP were considered for use in peakflow estimation. Separate OLS models for the entire study area, as well as for each potential subregion (Coastal Plain, Piedmont), were developed using this method. Candidate regression models in each case were evaluated with respect to several metrics, including (1) maximizing the coefficient of determination (R^{2}) and adjusted coefficient of determination (R^{2} adj), (2) minimizing residual standard error, (3) minimizing Mallow’s Cp, and (4) minimizing the predicted residual sum of squares (PRESS).
Additionally, peakflow magnitude data for each streamflowgage site and each AEP were analyzed to determine whether they exerted undue leverage and influence during variable selection using the iterative OLS approach. Although data from several sites displayed either influence or leverage that exceeded recommended values, no sites were removed as the watershed properties and peakflow characteristics for the sites identified were not unrepresentative of those common in the study area, and no other disqualifying rationale was identified. Finally, sites representing basins that straddle the boundary between Piedmont and Coastal Plain physiographic regions were examined to determine if any displayed undue influence or leverage on the results of the selected OLS regressions. Only one site (01467087) exceeded the statistical threshold for undue influence (for the 0.02 AEP only), with no straddle sites exerting undue leverage for any of the AEP relations.
Generalized LeastSquares Regression for Peak Flow Estimation
Generalized leastsquares (GLS) multiplelinear regression was used to compute the final regression coefficients and measures of accuracy for the regression equations to estimate peak flows at defined frequencies. The GLS method weights data from streamflowgage sites in the regression analysis according to differences in streamflow reliability (record length), variability (record variance), and spatial cross correlations of concurrent streamflow among streamflowgage sites (Farmer and others, 2019). Compared to OLS regression, the GLS regression approach provides improved estimates of streamflow statistics and increases the predictive accuracy of the regression equations (Stedinger and Tasker, 1985). The weightedmultiplelinear regression (WREG) program (Eng and others, 2009) updated to run in R (Farmer, 2017) was used to perform the GLS regressions. A correlation smoothing function is used by WREG to compute a weighting matrix for the streamflowgage sites in each subregion. The smoothing function relates the correlation between annual peak discharges from each pair of streamflowgage sites to the geographic distance between the sites. Smoothing functions were developed iteratively and generated separately for the Coastal Plain and Piedmont subregions. The statistical parameters needed to perform predictioninterval computations for peakflow magnitude estimates can be found in table 6 of this report. The resulting GLSderived regression equations follow the general structure shown below:
whereQ_{p}
is the dependent variable, the Ppercent AEP, in cubic feet per second (ft^{3}/s);
A and B
are independent (explanatory) variables; and
a,b, and c
are regression coefficients.
If the dependent variable, Q_{p}, and the independent variables, A and B, are logarithmically transformed, the model takes the following form:
(2)
GLSderived regression equations for Coastal Plain and Piedmont subregions:Coastal Plain
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
Table 3.
Number of streamflowgage sites included in the peakflow regression analyses by subregion and state.Table 4.
Summary of drainage area, number of streamflowgage sites, and average years of record used in the peakflow regression analyses for Delaware.Table 5.
Ranges of basin characteristics used to develop the peakflow regression equations.Table 6.
Values needed to determine 90percent prediction intervals for the best peakflow regression equations by subregion in Delaware.Table 7.
Mean and median percent differences between peakflow magnitudes at defined frequencies computed from the annual peakflow data for streamflowgage sites included in both the current study and previous studies.Comparison of Results with Previous Study
Median percent differences in peakflow magnitudes computed from the annual peakflow data from streamflowgage sites included in this study show that flow estimates were generally within 10 percent of those from Ries and Dillow (2006), with slightly higher estimates for the sites and years included in this report (table 7). Sites in the Coastal Plain appeared to have higher median percent differences, particularly sites with additional recorded data since the prior study.
In Ries and Dillow (2006), variables to predict peakflow magnitudes for the Piedmont region included drainage area, the percent area forested, the percent area with well drained soils (soil A), the percent area with impervious cover, and the percent area of surface water storage. For the Coastal Plain, variables used included drainage area, mean basin slope, and percent area with well drained soils (soil A). Similarly, in this report, for the Piedmont drainage area, mean basin slope, percent area of surface water storage, and percent area with impervious cover were chosen, but percent area poorly drained soils (soil D) was selected instead of well drained soils (soil A) and percent area forested. Variables chosen for the Coastal Plain were identical to the prior report. The model errors for the equations generated in this report (table 8) were generally within 5 percent of those from the prior report, with slightly higher model error in the Piedmont subregion and slightly lower error for the Coastal Plain subregion.
Accuracy and Limitations of PeakFlow Regression Equations
The accuracy of a regression equation depends on the model error and the sampling error associated with the data used in its development. Model error measures the ability of a set of explanatory variables to estimate the values of peakflow characteristics calculated from the recorded streamflowgage data used to develop the equation (Ries and Dillow, 2006). Sampling error measures the ability of a finite number of sites with a finite number of recorded annual peak flows to describe the true peakflow characteristics of a site. Model error depends on the number and predictive power of the explanatory variables in a regression equation. Sampling error depends on the number of sites and the amount of peakflow data at each site used in the analysis. Sampling error decreases as the number of sites and amount of data increase (Ries and Dillow, 2006). By definition, approximately twothirds of the flow estimates derived using a given equation will contain errors within the standard error (estimate or prediction) associated with that equation.
Traditional measures of the accuracy of peakflow regression equations are the standard error of estimate and the standard error of prediction. The standard error of estimate is derived from the model error, and is a measure of how well the estimated peak discharges generated using a regression equation agree with the peakflow statistics generated from the atsite annual peakflow data used to create the equation (Ries and Dillow, 2006). The standard error of prediction is derived from the sum of the model error and sampling error and is a measure of how accurately the estimated peak discharges generated using an equation will be able to predict the true value of peak discharge at the selected AEP (Ries and Dillow, 2006). Average standard errors of estimate and prediction associated with equations 3 through 18 are shown in table 8.
For the Coastal Plain region, Thomas and SanchezClaros (2019) similarly developed regional regression equations for 50 to 0.2percent annualexceedance probabilities using Bulletin 17B guidance and data through 2017 (Interagency Advisory Committee on Water Data, 1982). The study area overlapped well with sites used in the present study, including only sites in the Coastal Plain. Their equations included drainage area, watershed slope, and percent area in soil class A, which were the same variables used by Coastal Plain equations in the present study, and had similar values of standard error as equations used in this report. For the Piedmont region, Thomas and Moglen (2016) similarly developed regional regression equations for 50 to 0.2percent annualexceedance probabilities using Bulletin 17B guidance and data through 2012 (Interagency Advisory Committee on Water Data, 1982). The study area only partly overlapped with the sites in the present study including sites in the Appalachian Plateau, Allegheny Ridge, Blue Ridge, and Great Valley regions in addition to the Piedmont. Drainage area, limestone as a percent of watershed area, impervious area, and forest as percent of watershed areas were included in the regional regression equations—and though the present report used drainage area, watershed slope, percent watershed in soil class D, impervious area and storage along the NHD network—standard error for both sets of regional regression equations were similar.
Table 8.
Average standard errors of estimate and prediction for selected peakflow regression equations by subregion in Delaware.Table 9.
Sensitivity analysis of Piedmont regression equation estimates to changes in impervious area. Table values are the median percent change in flow estimates with corresponding changes in impervious area. Impervious area increase columns—for example, 5 percent—report the percent change in flow estimate for each annualexceedance probability with a 5percent increase in the 2016 National Land Cover Database impervious percent for each watershed.A measure of uncertainty that can provide information about the range of error associated with an estimate is the prediction interval. The prediction interval states the minimum and maximum values within which there is a defined probability that the true value of an estimated variable exists. For instance, defining the 90percent prediction interval for the 1percentchance AEP at an ungaged site would allow the user to know the minimum and maximum flow magnitudes that have a 90percent chance of bounding the range in which the true flow value exists. A full explanation, including example computations, of the prediction interval can be found in Ries and Dillow (2006; p. 24–28).
The regression equations can be used to estimate peakflow magnitudes for ungaged sites with natural flow conditions in Delaware. The equations should not be applied to streams with substantial floodretention storage upstream from sites of interest (Ries and Dillow, 2006).
Additionally, in this report, static impervious area from the National Land Cover Database [NLCD] (2016) dataset was used to represent the role of urban effects in the Piedmont regression equations (Homer and others, 2012). Although the NLCD dataset has temporal coverage back to the early 2000s, impervious percent estimates at high spatial resolution are more uncertain before that time. In an effort to assess the sensitivity of peak flow estimates to changes in impervious area through time, we artificially increased the impervious percent of each watershed in each Piedmont regression equation to examine the changes in peakflow estimates. Doubling the impervious area of a watershed on average resulted in a 7–14 percent increase in the flow estimate depending on the AEP used (table 9). Future studies may utilize alternate datasets with proxy information on impervious area going back many decades (for example, Uhl and others, 2021) to provide an estimate of urban effects on peak flows through time.
The accuracy of the regression equations is known only within the range of basin characteristics that were used to develop the equations. The equations can be applied to ungaged sites with basin characteristics that are not within the range of applicability, but the accuracy of the resulting estimates will be unknown (Ries and Dillow, 2006).
Procedures for Estimating PeakFlow Magnitudes
At ungaged sites on ungaged streams with no current or historical recorded atsite data, the best estimates of peakflow characteristics can be obtained from regression equations 3 through 18 presented in this report. At gaged sites, and ungaged sites near gaged sites on the same stream, the best estimates of peakflow characteristics can be obtained by using the results of those equations in conjunction with recorded data from a streamflowgage site as described below.
The best magnitude estimates for peak flows are often obtained through a weighted combination of estimates produced from more than one method (Ries and Dillow, 2006). Tasker (1975) demonstrated that if two independent estimates of a streamflow statistic are available, a properly weighted average of the independent estimates will provide an estimate that is more accurate than either of the two independent estimates. Improved magnitude estimates for peak flows can be determined for Delaware streamflowgage sites by weighting estimates from the atsite annual peakflow data from each site with estimates obtained from the regression equations provided in this report. Improved estimates can be determined for ungaged sites in Delaware by weighting the estimates obtained from the regression equations with estimates determined based on the flow per unit area of an upstream or downstream streamflowgage site, when available (Ries and Dillow, 2006).
In this section, stepbystep examples are provided that demonstrate how to compute magnitude estimates for peak flows. The examples are based on the methods described previously in the section “Estimating PeakFlow Magnitudes for Ungaged Sites.” Basin characteristics that are needed for use in the estimation equations can be determined as described in table 2.
Estimation for a Gaged Site
The Interagency Advisory Committee on Water Data (1982) recommends that the best estimates of floodmagnitude statistics for a streamflowgaging site can be obtained by combining the estimates determined from LP3 analysis of the atsite annual peak data with estimates obtained for the site using regression equations. Weighted estimates were computed for sites used in regression analyses by considering atsite estimates and regression estimates using the methodology first developed and reported by Sando and McCarthy (2018). This method includes adjustment of confidence intervals to include EMA uncertainty (England and others, 2019).
Procedures for Weighting with Regional Regression Equations
The uncertainty of peakflow magnitude estimates can be reduced by combining the atsite magnitude estimates with other independent estimates, such as from regression equations, to obtain a weighted magnitude estimate at the streamflowgage site. As indicated in Bulletin 17C (England and others, 2019), the weightedmagnitude method assumes that the two magnitude estimates are independent and unbiased and the variances are reliable and consistent. The weightedmagnitude method, presented in appendix 9 of Bulletin 17C, uses the logtransformed magnitude estimates and variances from two separate estimates (s, atsite and r, regression) to compute a weightedmagnitude estimate (wtd) and confidence intervals using the following equations:
whereQ
is the magnitude estimate for estimation method s, r, or wtd, in cubic feet per second;
X
is the logtransformed magnitude estimate for estimation method s, r, or wtd;
V
is the variance for estimation method s, r, or wtd;
${U}_{wtd}$ and ${L}_{wtd}$
are the upper and lower logtransformed confidence limits for the twotailed 90percent confidence interval;
1.64
is the onetailed Student’s tvalue for the 95percent (upper) and 5percent (lower) confidence limits assuming infinite degrees of freedom; and
$C{I}_{U,wtd}$ and $C{I}_{L,wtd}$ are the upper and lower limits of the twotailed 90percent confidence interval for the weightedmagnitude estimate
in cubic feet per second.
The weightedmagnitude method using equations 19 through 27 calculates confidence intervals for the weighted estimate using the weighted variance; however, this method does not consider the confidence intervals for an atsite magnitude analysis computed using EMA. Thus, the USGS developed a method for weighting an atsite magnitude estimate with another independent estimate that preserves the characteristics of the confidence intervals computed using EMA (Sando and McCarthy, 2018). This method uses the effective variances of the upper and lower confidence intervals (V_{eff,U} and V_{eff,L}) from the atsite analysis to compute confidence intervals for the weighted estimates as shown in equations 28 through 35:
where$C{I}_{U,s}$ and $C{I}_{L,s}$
are the upper and lower limits of the twotailed 90percent confidence interval from the atsite magnitude analysis, in cubic feet per second;
${U}_{s}$ and ${L}_{s}$
are the upper and lower logtransformed confidence limits for the twotailed 90percent confidence interval from the atsite magnitude analysis;
${V}_{eff,\text{}U}$ and ${V}_{eff,\text{}L}$
are the computed effective variances for the upper and lower confidence limits;
${V}_{b}$
is the variance of the second method, such as regression equations; and
${V}_{wtd,U}$ and ${V}_{wtd,L}$
are the weighted variances of the upper and lower confidence limits, which are computed using the effective variances from the magnitude analysis.
Estimation for a Site Upstream or Downstream from a Gaged Site
Guimaraes and Bohman (1992) and Stamey and Hess (1993) presented the following method to improve floodmagnitude estimates for an ungaged site with a drainage area that is between 0.5 and 1.5 times the drainage area of a streamflowgaging site that is on the same stream. As in the previous section, the symbols used to explain this method in the source publications have been preserved in the discussion below, and the equivalent expressions used in this report are identified in the variable definitions that accompany each equation (Ries and Dillow, 2006).
To obtain a weighted peakflow estimate (Q_{T(U)w}) for recurrence interval T (where AEP is equivalent to 1/T) at the ungaged site, the weighted flow estimate for an upstream or downstream streamflowgaging site (Q_{T(G)w}) must first be determined from the “Peak flow estimates” table in Hammond (2021a). The weighted streamflowgaging site estimate is then used to obtain an estimate for the ungaged site that is based on the flow per unit area at the streamflowgaging site (Q_{T(U)g}) by use of the equation
whereA_{u}
is the drainage area (DRNAREA_{u}) for the ungaged site,
A_{g}
is the drainage area (DRNAREA_{g}) for the upstream or downstream streamflowgaging site
^{b}
is 0.68 in the Piedmont and 0.70 in the Coastal Plain, as determined by computing the mean of the drainagearea exponents for all AEPs from regressions of peakflow magnitudes against drainage area as the only explanatory variable, and where the equation constant is forced to be zero.
ΔA
is the absolute value of the difference between the drainage areas of the streamflowgaging site and the ungaged site, DRNAREA_{g}  DRNAREA_{u}, and
Q_{T(U)r}
is the peakflow estimate for AEP (Q50%, Q20%,…,Q0.2%) at the ungaged site derived from the applicable regional equation given above.
Use of the equations above gives full weight to the regression estimates when the drainage area for the ungaged site is less than 0.5 or greater than 1.5 times the drainage area for the streamflowgaging site and increasing weight to the streamflowgagingsitebased estimates as the drainage area ratio approaches 1. The weighting procedure should not be applied when the drainage area ratio for the ungaged site and streamflowgaging site is less than 0.5 or greater than 1.5. In theory, the standard errors for these estimates should be at least as small as those for the estimates derived from the regression equations alone (Ries and Dillow, 2006).
Estimation for a Site Between Gaged Sites
In the case where a peakflow magnitude estimate is needed for a site that is located between two gaged sites on a stream, the estimate may be obtained by use of the procedure presented earlier for calculating weighted estimates for a gaged site, with the following procedural alteration. For consistency, the symbology used below is the same as that used in the previous section, and the equivalent expressions used elsewhere in this report are identified in the variable definitions that accompany each equation.
Because the site is ungaged, a direct determination of the flow at the site for the selected AEP is not possible. An interpolated value can be obtained by use of the equation
A_{u}
is the drainage area (DRNAREA_{u}) for the ungaged site,
A_{gu}
is the drainage area (DRNAREA_{gu}) for the upstream gaged site,
A_{gd}
is the drainage area (DRNAREA_{gd}) for the downstream gaged site,
Q_{Tu}
is the discharge at the Tpercent AEP (Q50%, Q20%,…, Q0.2%) for the ungaged site,
Q_{Tgu}
is the discharge at the Tpercent AEP (Q50%, Q20%,…, Q0.2%) for the upstream gaged site, and
Q_{Tgd}
is the discharge at the Tpercent AEP (Q50%, Q20%,…, Q0.2%) for the downstream gaged site.
Estimation Examples
Ungaged Site on Ungaged Stream
Problem: Estimate the 1percent AEP discharge for Horsepen Arm at Fox Hunters Road (lat 38.94339° N., long 75.64637° W.) in Kent County, Delaware. This location is in the Coastal Plain subregion.
DRNAREA = 5.21 mi^{2}, BSLDEM3M = 1.64 percent, SOILA = 0.43 percent
Calculate the 1percent AEP using equation 8:
Q_{1%} =(10^{2.616})(5.21^{0.693})(1.64^{0.483})((0.43+1)^{0.356})
Q_{1%} =1,450 ft^{3}/s
Gaged Site
Problem: Estimate the 1percent AEP discharge for USGS 01488500. This site is in the Coastal Plain subregion.

Obtain the 1percent AEP discharge and variance in cubic feet per second for USGS 01488500 for the atsite (s) and regression (r) methods in the “Peak flow estimates” table in Hammond (2021a):

Calculate X_{s} and X_{r} using equations 19 and 20:
X_{s} = log_{10}(4,945); X_{r} = log_{10}(4,097)
X_{s} = 3.694; X_{r} = 3.612

Calculate X_{wtd} using equation 21:
X_{wtd} = ((3.694*0.06384)+(3.612*0.0056))/(0.0056+0.06384)
X_{wtd} = 3.688
Q_{wtd} = 10^{3.688} = 4,871 ft^{3}/s
Site Between Gaged Sites
Problem: Estimate the 20percent AEP discharge for the ungaged location (lat 39.73897° N., 75.63439° W.) at the State Route 41 bridge near Elsmere, Delaware between USGS 01480000 and USGS 01480015. This site is in the Piedmont subregion.

Obtain the 20percent AEP weighted discharge estimate and drainage areas for USGS 01480000 and USGS 01480015 from the “Peak flow estimates” and “Peak basin stats” tables in Hammond (2021a). Also obtain the drainage area for the ungaged site using the StreamStats application:
A_{u} = 51.1 mi^{2}
A_{gu} = 47.4 mi^{2}
A_{gd} = 52.3 mi^{2}
Q_{Tgu} = 3,761 ft^{3}/s
Q_{Tgd} = 5,113 ft^{3}/s
${Q}_{Tu}=$ [(51.1–47.4)/(52.3–47.4)*((5,113/52.3)–(3,761/47.4))+(3,761/47.4)]*51.1
Q_{Tu} = 4,765 ft^{3}/s
Site Upstream from a Gaged Site
Problem: Estimate the 1percent AEP discharge for ungaged location (lat 39.69087° N., long 75.70685° W.) White Clay Creek at Red Mill Road, upstream of USGS 01479000. This location is in the Piedmont subregion.

Obtain the 1percent AEP weighted discharge estimate and drainage areas for USGS 01479000 from the “Peak flow estimates” and “Peak basin stats” tables in Hammond (2021a). Also obtain the drainage area and relevant basin characteristics for the 1percent AEP regression equation for the ungaged site using the StreamStats application:
A_{u} = 79.2 mi^{2}
A_{g} = 88.9 mi^{2}
b is 0.68 in the Piedmont, as determined by computing the mean of the drainagearea exponents for all AEPs.
Q_{T(G)w}_{=} 18,705 ft^{3}/s
SOILD = 14.31
IMPERV = 6.42
STORNHD = 0.52

Calculate ${Q}_{T\left(U\right)g}$ using equation 36:

Calculate ${Q}_{T\left(U\right)r}$ using equation 16:

Calculate ${Q}_{T\left(U\right)w}$using equation 37:
Methods for Estimating the Magnitude of Low Flows at Defined Frequencies and Durations
This report also presents methods for estimating the magnitude of low flows at defined frequencies and durations both for streamflowgage sites and for ungaged sites in Delaware. The process followed for determining lowflow magnitude estimates for ungaged sites in a given region usually includes (1) selecting a group of streamflowgage sites in and around the region with at least 10 years of annual 7day lowflow data based on climatic years, and streamflow conditions that are generally representative of the area as a whole, (2) computing initial lowflow magnitude estimates for the streamflowgage sites using analytical techniques automated within the USGS SWToolbox program (Kiang and others, 2018); (3) computing basin characteristics that have a conceptual relation to the generation of low flows for the drainage basins associated with the gaged sites; (4) recomputing the lowflow magnitude estimates for the streamflowgage sites by weighting the atsite skew coefficients with the new regional skew values; (5) determining if relations between lowflow magnitude estimates and basin characteristics are homogeneous throughout the region or if the region should be divided into subregions; (6) using regression analysis to develop equations to estimate lowflow magnitudes at ungaged sites in the region or subegions; and (8) assessing and describing the accuracy associated with estimation of lowflow magnitudes for ungaged sites (Ries and Dillow, 2006; Carpenter and Hayes, 1996).
Table 10.
Summary of streamflowgage sites in and near Delaware used in lowflow regional regression equations.Determination of Streamflow Analysis Regions
Similar to the peakflow analysis, prior reports (Carpenter and Hayes, 1996; Doheny and Banks, 2010) demonstrated that separating lowflow sites into Coastal Plain and Piedmont physiographic regions resulted in improved regression equation performance. In this study, we implement the same separation and note improved regression performance when data are separated by physiographic region as described in the regression equation development section below.
Flow Statistics for StreamflowGage Sites
Sixtyeight streamflowgage sites were identified within the State of Delaware, or within a 25mi buffer of the Delaware border (located within Maryland, New Jersey, or Pennsylvania) that each had 10 or more complete climatic years (April 1 to March 30) of recorded daily flow data (fig. 5, table 10). Climatic years were used in line with common practice in prior lowflow studies (Carpenter and Hayes, 1996; Dudley and others, 2020) as this period better brackets the occurrence and duration of lowflow periods than the calendar year or water year. No partialrecord streamflowgage sites were used owing to the uncertainty associated with implementing correlations in flow between sites, and no sites containing minimum flows of zero were included in the analysis. The 7, 14, and 30consecutiveday lowflow discharges for annual nonexceedance probabilities (ANPs) of 50, 10, and 5 percent (corresponding to the 2, 10, and 20year recurrence intervals, respectively) were calculated using the USGS SW Toolbox (Kiang and others, 2018). Fortyfive of these streamflowgage sites were subsequently used to develop lowflow regression equations following analyses of redundancy and regulation as described in the sections below. Note that although several sites included in the analyses exhibit infrequent and intermittent periods of recorded stage data that are affected by tidal influences, the published daily discharge data associated with those sites (and used in the analyses) describe computed flows (and flow statistics) that are not affected by the tidally influenced stage data. Flow statistics are provided for sites included in the regression analyses as well as those not suitable for inclusion as a result of regulation or redundancy. Atsite, regressionbased and weighted flow statistics are provided for sites included in regression equation development, whereas only atsite statistics are provided for sites not included in regression equation development. Weighted flow statistics were calculated as in the section “Procedures for Weighting with Regional Regression Equations.” Atsite, regressionbased and weighted estimates of lowflow statistics, along with basin characteristics, can be found in “Low flow estimates” and “Low basin stats” in Hammond (2021a).
Sample Adjustments
Adjustments to the dataset were made based on analysis of the degree of watershed regulation or any redundancy among sites that could influence the relations determined from the regression analysis. The process used for removal of streamflowgage sites as a result of regulation and redundancy are discussed below.
Removal of StreamflowGage Sites Owing to Regulation
Sixtyeight streamflowgage sites were evaluated for possible regulation that could affect the analysis of lowflow characteristics and frequency. The data from 27 of these sites showed possible effects on flow that required further investigation and verification.
Basins were evaluated for flow regulation using the USGS GAGES II dataset (Falcone, 2017) and other datasets that represent flowaccumulated attributes. After assessment of basin characteristics including (1) drainage area; (2) stream density; (3) percent of watershed area covered by lakes, ponds, and reservoirs; (4) baseflow index; (5) power generation capacity; (6) basin storage; and (7) highdensity development, the GAGES II attributes for freshwater withdrawal and National Inventory of Dam storage accumulated to NHDPlus version 2.1 (Wieczorek and others, 2018) were chosen to broadly characterize basin regulation. Streamflowgage sites with more than 1,000 acrefeet of reservoir storage volume and more than 1,200 megaliters per square kilometer per year of freshwater withdrawal were identified and marked as locations with potentially significant regulation effects. Individual state records of permitted effluent were not used to screen sites in this study.
Published site data in the USGS NWIS database were also consulted. Consideration was given to effects from irrigation and diversions, streamflow augmentation, tidal influences, proximity of sites to impoundments or reservoirs, or if sites were located on lake outlets where streamflow is controlled by gates or valves. Reports from other lowflow investigations for this study area and adjacent areas were also consulted to aid in determining suitability of sites for the analysis (Carpenter and Hayes, 1996; Stuckey and Roland, 2011; Watson and McHugh, 2014). Twentyone of the 27 streamflowgage sites were eliminated from the lowflow analysis owing to various regulation effects, based on evaluation of the GAGES II variables and analysis of data from the sites.
Removal of StreamflowGage Sites Owing to Redundancy
Twenty possible instances of redundancy were identified in the recorded lowflow data from 19 sites. Of the 19 sites identified, 13 had already been eliminated from the lowflow analysis based on watershed regulation. Decisions regarding redundancy for the remaining six sites were made based on length of data record and drainagearea size, period of the data record, any breaks in the continuous record period, and how well the lowflow data record represented current watershed conditions at each site. Based on these considerations, two of the remaining six sites being evaluated were eliminated from the lowflow analysis owing to redundancy. The final number of streamflowgage sites used for low regression equation development in each state and region is provided in table 11, and a summary of the drainage areas for these streamflowgage sites is provided in table 12. The ranges of basin characteristics used to develop the equations for each subregion are provided in table 13.
Development of Regression Equations
Explanatory Variable Selection for LowFlow Estimation
Each of the 32 basin characteristics in table 2 were considered for inclusion as explanatory variables in the lowflow magnitude estimating equations. Prior to identifying the strongest lowflow magnitude predictor variables, a multicollinearity matrix was calculated for all candidate variables to identify strong correlations between variables. The process and parameters used in the investigation of multicollinearity between potential explanatory variables was described previously in the section “Explanatory Variable Selection for PeakFlow Estimation.” In each case of multicollinearity between two variables, the simpler variable with more direct causal connection to lowflow magnitude was retained for further consideration. Following the multicollinearity analysis, a subset of 20 basin characteristics remained for consideration using an iterative ordinary leastsquares regression approach to determine the best set of basincharacteristic variables to include in the lowflow magnitude estimation equations.
Ordinary LeastSquares Regression for LowFlow Estimation
OLS regression was used to select basin characteristics as explanatory variables for estimating low flows associated with each selected combination of duration and ANP, and to assess whether developing separate lowflow estimation equations in each study subregion based on physiographic provinces (as was done in prior reports) remains a justifiable approach. The mathematical basis of OLS, along with the accompanying assumptions and constraints, can be reviewed in the previous section titled “Ordinary LeastSquares Regression for PeakFlow Estimation.”
Initial OLS regression equations were developed for the 7day duration, 10percent ANP (7Q10) using 20 explanatory variables for 45 streamflowgage sites in the study area. The 7Q10 was selected for optimizing the selection of explanatory variables because it is the duration and ANP discharge combination most often used by water managers, engineers, and planners (Pryce, 2004). An iterative OLS regression approach using the allReg function from the USGS “smwrStats” R package (R Core Team, 2020; Lorenz, 2020) was utilized to examine the best subsets of explanatory variables for each durationANP combination following the methodology suggested by Helsel and others (2020). A maximum of five concurrent variables for each durationANP combination were considered for use in lowflow estimation. Separate OLS models for the entire study area, as well as for each potential subregion (Coastal Plain, Piedmont), were developed using this method. Candidate regression models in each case were evaluated with respect to several metrics, including (1) maximizing the coefficient of determination (R^{2}) and adjusted coefficient of determination (R^{2} adj), (2) minimizing residual standard error, (3) minimizing Mallow’s Cp, and (4) minimizing the predicted residual sum of squares (PRESS).
Additionally, lowflow magnitude data for each streamflowgage site and each combination of duration and ANP were analyzed to determine whether they exerted undue leverage and influence during variable selection using the iterative OLS approach. Although data from several sites displayed either influence or leverage that exceeded recommended values, no sites were removed, as the watershed properties and lowflow characteristics for the sites identified were not unrepresentative of those common in the study area, and no other disqualifying rationale was identified.
Weighted LeastSquares Regression for LowFlow Estimation
Weighted leastsquares (WLS) multiplelinear regression was used to compute the final regression coefficients and measures of accuracy for the lowflow regression equations. Unlike OLS, WLS can account for the heterogeneous uncertainties associated with the computed atsite lowflow statistics used to develop the estimation equations (Farmer and others, 2019). The method weights observations in proportion to the variance of the timeseries data used to compute atsite statistics or some surrogate of the variance such as the length of the atsite data record. Giving additional weight to data with less uncertainty produces more accurate estimation equations than would be the case using OLS regression. Although drainage area and percent of the watershed with soil D are chosen for both Piedmont and Coastal Plain equations, the variable exponents and equation coefficient differ, suggesting somewhat different relations between the variables and low flow in the two regions.
WLS regression equations for Coastal Plain and Piedmont subregions are as follows:Coastal Plain