A significant data quality challenge for highly variant systems surrounds the limited ability to quantify operationally reasonable limits on the data elements being collected and provide reasonable threshold predictions. In many instances, the number of influences that drive a resulting value or operational range is too large to enable physical sampling for each influencer, or is too complicated to accurately model in an explicit simulation. An alternative method to determine reasonable observation thresholds is to employ an automation algorithm that would emulate a human analyst visually inspecting data for limits. Using the visualization technique of self-organizing maps (SOM) on data having poorly understood relationships, a methodology for determining threshold limits was developed. To illustrate this approach, analysis of environmental influences that drive the abundance of a target indicator species (the pink shrimp, Farfantepenaeus duorarum) provided a real example of applicability. The relationship between salinity and temperature and abundance of F. duorarum is well documented, but the effect of changes in water quality upstream on pink shrimp abundance is not well understood. The highly variant nature surrounding catch of a specific number of organisms in the wild, and the data available from up-stream hydrology measures for salinity and temperature, made this an ideal candidate for the approach to provide a determination about the influence of changes in hydrology on populations of organisms.
Additional publication details
Using self-organizing maps to determine observation threshold limit predictions in highly variant data
Larger Work Title:
Proceedings of SPIE - The International Society for Optical Engineering
Signal Processing, Sensor Fusion, and Target Recognition XV