Cost-Benefit Analysis of Computer Resources for Machine Learning

Open-File Report 2007-1398
By:

Links

Abstract

Machine learning describes pattern-recognition algorithms - in this case, probabilistic neural networks (PNNs). These can be computationally intensive, in part because of the nonlinear optimizer, a numerical process that calibrates the PNN by minimizing a sum of squared errors. This report suggests efficiencies that are expressed as cost and benefit. The cost is computer time needed to calibrate the PNN, and the benefit is goodness-of-fit, how well the PNN learns the pattern in the data. There may be a point of diminishing returns where a further expenditure of computer resources does not produce additional benefits. Sampling is suggested as a cost-reduction strategy. One consideration is how many points to select for calibration and another is the geometric distribution of the points. The data points may be nonuniformly distributed across space, so that sampling at some locations provides additional benefit while sampling at other locations does not. A stratified sampling strategy can be designed to select more points in regions where they reduce the calibration error and fewer points in regions where they do not. Goodness-of-fit tests ensure that the sampling does not introduce bias. This approach is illustrated by statistical experiments for computing correlations between measures of roadless area and population density for the San Francisco Bay Area. The alternative to training efficiencies is to rely on high-performance computer systems. These may require specialized programming and algorithms that are optimized for parallel performance.
Publication type Report
Publication Subtype USGS Numbered Series
Title Cost-Benefit Analysis of Computer Resources for Machine Learning
Series title Open-File Report
Series number 2007-1398
DOI 10.3133/ofr20071398
Edition Version 1.0
Year Published 2007
Language ENGLISH
Publisher Geological Survey (U.S.)
Contributing office(s) Western Geographic Science Center
Description iv, 7 p.
Online Only (Y/N) Y
Google Analytic Metrics Metrics page
Additional publication details