Cost-Benefit Analysis of Computer Resources for Machine Learning

Richard A. Champion

doi:10.3133/ofr20071398

Cost-Benefit Analysis of Computer Resources for Machine Learning

Open-File Report 2007-1398

By: Richard A. Champion

https://doi.org/10.3133/ofr20071398

Links

More information: USGS Index Page (html)
Download citation as: RIS | Dublin Core

Abstract

Machine learning describes pattern-recognition algorithms - in this case, probabilistic neural networks (PNNs). These can be computationally intensive, in part because of the nonlinear optimizer, a numerical process that calibrates the PNN by minimizing a sum of squared errors. This report suggests efficiencies that are expressed as cost and benefit. The cost is computer time needed to calibrate the PNN, and the benefit is goodness-of-fit, how well the PNN learns the pattern in the data. There may be a point of diminishing returns where a further expenditure of computer resources does not produce additional benefits. Sampling is suggested as a cost-reduction strategy. One consideration is how many points to select for calibration and another is the geometric distribution of the points. The data points may be nonuniformly distributed across space, so that sampling at some locations provides additional benefit while sampling at other locations does not. A stratified sampling strategy can be designed to select more points in regions where they reduce the calibration error and fewer points in regions where they do not. Goodness-of-fit tests ensure that the sampling does not introduce bias. This approach is illustrated by statistical experiments for computing correlations between measures of roadless area and population density for the San Francisco Bay Area. The alternative to training efficiencies is to rely on high-performance computer systems. These may require specialized programming and algorithms that are optimized for parallel performance.

Additional publication details
Publication type	Report
Publication Subtype	USGS Numbered Series
Title	Cost-Benefit Analysis of Computer Resources for Machine Learning
Series title	Open-File Report
Series number	2007-1398
DOI	10.3133/ofr20071398
Edition	Version 1.0
Year Published	2007
Language	ENGLISH
Publisher	Geological Survey (U.S.)
Contributing office(s)	Western Geographic Science Center
Description	iv, 7 p.
Online Only (Y/N)	Y
Google Analytic Metrics	Metrics page