We demonstrate the effectiveness of a genetic algorithm for discovering multi-locus combinations that provide accurate individual assignment decisions and estimates of mixture composition based on likelihood classification. Using simulated data representing different levels of inter-population differentiation (Fst ~ 0.01 and 0.10), genetic diversities (four or eight alleles per locus), and population sizes (20, 40, 100 individuals in baseline populations), we show that subsets of loci can be identified that provide comparable levels of accuracy in classification decisions relative to entire multi-locus data sets, where 5, 10, or 20 loci were considered. Microsatellite data sets from hatchery strains of lake trout, Salvelinus namaycush, representing a comparable range of inter-population levels of differentiation in allele frequencies confirmed simulation results. For both simulated and empirical data sets, assignment accuracy was achieved using fewer loci (e.g., three or four loci out of eight for empirical lake trout studies). Simulation results were used to investigate properties of the 'leave-one-out' (L1O) method for estimating assignment error rates. Accuracy of population assignments based on L1O methods should be viewed with caution under certain conditions, particularly when baseline population sample sizes are low (<50).
Additional publication details
Sampling issues affecting accuracy of likelihood-based classification using genetical data