This paper reports the metrics-based results of the Dst index part of the 2008–2009 GEM Metrics Challenge. The 2008–2009 GEM Metrics Challenge asked modelers to submit results for four geomagnetic storm events and five different types of observations that can be modeled by statistical, climatological or physics-based models of the magnetosphere-ionosphere system. We present the results of 30 model settings that were run at the Community Coordinated Modeling Center and at the institutions of various modelers for these events. To measure the performance of each of the models against the observations, we use comparisons of 1 hour averaged model data with the Dst index issued by the World Data Center for Geomagnetism, Kyoto, Japan, and direct comparison of 1 minute model data with the 1 minute Dst index calculated by the United States Geological Survey. The latter index can be used to calculate spectral variability of model outputs in comparison to the index. We find that model rankings vary widely by skill score used. None of the models consistently perform best for all events. We find that empirical models perform well in general. Magnetohydrodynamics-based models of the global magnetosphere with inner magnetosphere physics (ring current model) included and stand-alone ring current models with properly defined boundary conditions perform well and are able to match or surpass results from empirical models. Unlike in similar studies, the statistical models used in this study found their challenge in the weakest events rather than the strongest events.