Predicting geothermal favorability in the western United States by using machine learning: Addressing challenges and developing solutions

By: , and 

Links

Abstract

Previous moderate- and high-temperature geothermal resource assessments of the western United States utilized weight-of-evidence and logistic regression methods to estimate resource favorability, but these analyses relied upon some expert decisions. While expert decisions can add confidence to aspects of the modeling process by ensuring only reasonable models are employed, expert decisions also introduce human bias into assessments. This bias presents a source of error that may affect the performance of the models and resulting resource estimates. Our study aims to reduce expert input through robust data-driven analyses and better-suited data science techniques, with the goals of saving time, reducing bias, and improving predictive ability. We present six favorability maps for geothermal resources in the western United States created using two strategies applied to three modern machine learning algorithms (logistic regression, support-vector machines, and XGBoost). To provide a direct comparison to previous assessments, we use the same input data as the 2008 U.S. Geological Survey (USGS) conventional moderate- to high-temperature geothermal resource assessment. The six new favorability maps required far less expert decision-making, but broadly agree with the previous assessment. Despite the fact that the 2008 assessment results employed linear methods, the non-linear machine learning algorithms (i.e., support-vector machines and XGBoost) produced greater agreement with the previous assessment than the linear machine learning algorithm (i.e., logistic regression). It is not surprising that geothermal systems depend on non-linear combinations of features, and we postulate that the expert decisions during the 2008 assessment accounted for system non-linearities. Substantial challenges to applying machine learning algorithms to predict geothermal resource favorability include severe class imbalance (i.e., there are very few known geothermal systems compared to the large area considered), and while there are known geothermal systems (i.e., positive labels), all other sites have an unknown status (i.e., they are unlabeled), instead of receiving a negative label (i.e., the known/proven absence of a geothermal resource). We address both challenges through a custom undersampling strategy that can be used with any algorithm and then evaluated using F1 scores.

Study Area

Publication type Conference Paper
Publication Subtype Conference Paper
Title Predicting geothermal favorability in the western United States by using machine learning: Addressing challenges and developing solutions
Year Published 2022
Language English
Publisher Stanford University
Contributing office(s) Geology, Minerals, Energy, and Geophysics Science Center
Description 18 p.
Larger Work Type Book
Larger Work Subtype Conference publication
Larger Work Title Proceedings, 47th workshop on geothermal reservoir engineering
Conference Title 47th Stanford Geothermal Workshop
Conference Location Stanford, CA
Conference Date Feb 7-9, 2022
Country United States
Other Geospatial western United States
Google Analytic Metrics Metrics page
Additional publication details