thumbnail

SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

Journal of Heredity

By:
, , , and
DOI: 10.1093/jhered/est056

Links

Abstract

SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

Additional Publication Details

Publication type:
Article
Publication Subtype:
Journal Article
Title:
SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data
Series title:
Journal of Heredity
DOI:
10.1093/jhered/est056
Volume
104
Issue:
6
Year Published:
2013
Language:
English
Publisher:
Oxford University Press
Contributing office(s):
Forest and Rangeland Ecosystem Science Center
Description:
5 p.
Larger Work Type:
Article
Larger Work Subtype:
Journal Article
Larger Work Title:
Journal of Heredity
First page:
881
Last page:
885
Number of Pages:
5