The Regional Earthquake Likelihood Models (RELM) project aims to produce and evaluate alternate models of earthquake potential (probability per unit volume, magnitude, and time) for California. Based on differing assumptions, these models are produced to test the validity of their assumptions and to explore which models should be incorporated in seismic hazard and risk evaluation. Tests based on physical and geological criteria are useful but we focus on statistical methods using future earthquake catalog data only. We envision two evaluations: a test of consistency with observed data and a comparison of all pairs of models for relative consistency. Both tests are based on the likelihood method, and both are fully prospective (i.e., the models are not adjusted to fit the test data). To be tested, each model must assign a probability to any possible event within a specified region of space, time, and magnitude. For our tests the models must use a common format: earthquake rates in specified “bins” with location, magnitude, time, and focal mechanism limits.
Seismology cannot yet deterministically predict individual earthquakes; however, it should seek the best possible models for forecasting earthquake occurrence. This paper describes the statistical rules of an experiment to examine and test earthquake forecasts. The primary purposes of the tests described below are to evaluate physical models for earthquakes, assure that source models used in seismic hazard and risk studies are consistent with earthquake data, and provide quantitative measures by which models can be assigned weights in a consensus model or be judged as suitable for particular regions.
In this paper we develop a statistical method for testing earthquake likelihood models. A companion paper (Schorlemmer and Gerstenberger 2007, this issue) discusses the actual implementation of these tests in the framework of the RELM initiative.
Statistical testing of hypotheses is a common task and a wide range of possible testing procedures exist. Jolliffe and Stephenson (2003) present different forecast verifications from atmospheric science, among them likelihood testing of probability forecasts and testing the occurrence of binary events. Testing binary events requires that for each forecasted event, the spatial, temporal and magnitude limits be given. Although major earthquakes can be considered binary events, the models within the RELM project express their forecasts on a spatial grid and in 0.1 magnitude units; thus the results are a distribution of rates over space and magnitude. These forecasts can be tested with likelihood tests.
In general, likelihood tests assume a valid null hypothesis against which a given hypothesis is tested. The outcome is either a rejection of the null hypothesis in favor of the test hypothesis or a nonrejection, meaning the test hypothesis cannot outperform the null hypothesis at a given significance level. Within RELM, there is no accepted null hypothesis and thus the likelihood test needs to be expanded to allow comparable testing of equipollent hypotheses.
To test models against one another, we require that forecasts are expressed in a standard format: the average rate of earthquake occurrence within pre-specified limits of hypocentral latitude, longitude, depth, magnitude, time period, and focal mechanisms. Focal mechanisms should either be described as the inclination of P-axis, declination of P-axis, and inclination of the T-axis, or as strike, dip, and rake angles. Schorlemmer and Gerstenberger (2007, this issue) designed classes of these parameters such that similar models will be tested against each other. These classes make the forecasts comparable between models. Additionally, we are limited to testing only what is precisely defined and consistently reported in earthquake catalogs. Therefore it is currently not possible to test such information as fault rupture length or area, asperity location, etc. Also, to account for data quality issues, we allow for location and magnitude uncertainties as well as the probability that an event is dependent on another event.
As we mentioned above, only models with comparable forecasts can be tested against each other. Our current tests are designed to examine grid-based models. This requires that any fault-based model be adapted to a grid before testing is possible. While this is a limitation of the testing, it is an inherent difficulty in any such comparative testing. Please refer to appendix B for a statistical evaluation of the application of the Poisson hypothesis to fault-based models.
The testing suite we present consists of three different tests: L-Test, N-Test, and R-Test. These tests are defined similarily to Kagan and Jackson (1995). The first two tests examine the consistency of the hypotheses with the observations while the last test compares the spatial performances of the models.
|Publication Subtype||Journal Article|
|Title||Earthquake likelihood model testing|
|Series title||Seismological Research Letters|
|Publisher||Seismological Society of America|
|Google Analytic Metrics||Metrics page|