Comparison of machine learning approaches used to identify the drivers of Bakken oil well productivity

Statistical Analysis and Data Mining
By: , and 



Geologists and petroleum engineers have struggled to identify the mechanisms that drive productivity in horizontal hydraulically fractured oil wells. The machine learning algorithms of Random Forest (RF), gradient boosting trees (GBT) and extreme gradient boosting (XGBoost) were applied to a dataset containing 7311 horizontal hydraulically fractured wells drilled into the middle member of the Bakken Formation from 2010 through 2017. The initial goal is to use these data‐driven machine learning algorithms to identify the most important explanatory predictors of well productivity within nine subareas and the composite area. Predictor variables representing initial gas production, the initial 180‐day water cut, and vertical depth vary spatially and are identified with geologically favorable areas. Well‐completion predictors include the well lateral length, number of fracture stages, volume of proppant per stage, and the volume of injected fluids per stage. The performance of methods is compared based on a common test sample. The analysis then examines the comparative predictive performance of the three algorithms for 1330 wells that had initiated production after the initial 7311 well sample had been producing. The computations of predictor importance identified the initial 180‐day water cut and the 30‐day initial gas production predictors as having a dominant influence in most subareas and for the composite area. The relative importance of well completion predictor variables, that is, the number of fracture stages per well, volume of injected proppant per stage, volume of injected fluids per stage, and lateral length, varied considerably across the subareas. For the common test or holdout sample, the models calibrated with the XGBoost algorithm had superior predictive power. The predictive power of all the algorithms trained on the data from the original sample suffered some loss when tested with a sample of wells that had started production after the end of that period. Implications of the empirical findings and strategies to mitigate loss of predictive power are discussed in the concluding section.

Publication type Article
Publication Subtype Journal Article
Title Comparison of machine learning approaches used to identify the drivers of Bakken oil well productivity
Series title Statistical Analysis and Data Mining
DOI 10.1002/sam.11487
Edition Online First
Year Published 2020
Language English
Publisher Wiley
Contributing office(s) Eastern Energy Resources Science Center
Google Analytics Metrics Metrics page
Additional publication details