Techniques to improve ecological interpretability of black box machine learning models

Journal of Agricultural, Biological, and Environmental Statistics
By: , and 

Links

Abstract

Statistical modeling of ecological data is often faced with a large number of variables as well as possible nonlinear relationships and higher-order interaction effects. Gradient boosted trees (GBT) have been successful in addressing these issues and have shown a good predictive performance in modeling nonlinear relationships, in particular in classification settings with a categorical response variable. They also tend to be robust against outliers. However, their black-box nature makes it difficult to interpret these models. We introduce several recently developed statistical tools to the environmental research community in order to advance interpretation of these black-box models. To analyze the properties of the tools, we applied gradient boosted trees to investigate biological health of streams within the contiguous USA, as measured by a benthic macroinvertebrate biotic index. Based on these data and a simulation study, we demonstrate the advantages and limitations of partial dependence plots (PDP), individual conditional expectation (ICE) curves and accumulated local effects (ALE) in their ability to identify covariate–response relationships. Additionally, interaction effects were quantified according to interaction strength (IAS) and Friedman’s H2 statistic. Interpretable machine learning techniques are useful tools to open the black-box of gradient boosted trees in the environmental sciences. This finding is supported by our case study on the effect of impervious surface on the benthic condition, which agrees with previous results in the literature. Overall, the most important variables were ecoregion, bed stability, watershed area, riparian vegetation and catchment slope. These variables were also present in most identified interaction effects. In conclusion, graphical tools (PDP, ICE, ALE) enable visualization and easier interpretation of GBT but should be supported by analytical statistical measures. Future methodological research is needed to investigate the properties of interaction tests.      Supplementary materials accompanying this paper appear on-line.

Publication type Article
Publication Subtype Journal Article
Title Techniques to improve ecological interpretability of black box machine learning models
Series title Journal of Agricultural, Biological, and Environmental Statistics
DOI 10.1007/s13253-021-00479-7
Edition Online First
Year Published 2021
Language English
Publisher Springer
Contributing office(s) Eastern Ecological Science Center
Google Analytic Metrics Metrics page
Additional publication details