SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

Mark P. Miller; Brian J. Knaus; Thomas D. Mullins; Susan M. Haig

doi:10.1093/jhered/est056

SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

Journal of Heredity

By: Mark P. Miller, Brian J. Knaus, Thomas D. Mullins, and Susan M. Haig

https://doi.org/10.1093/jhered/est056

Links

More Information:
- Publisher Index Page
- Publisher Index Page (via DOI)
Open Access Version: Publisher Index Page
Download citation as: RIS | Dublin Core

Abstract

SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

Additional publication details
Publication type	Article
Publication Subtype	Journal Article
Title	SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data
Series title	Journal of Heredity
DOI	10.1093/jhered/est056
Volume	104
Issue	6
Year Published	2013
Language	English
Publisher	Oxford University Press
Contributing office(s)	Forest and Rangeland Ecosystem Science Center
Description	5 p.
Larger Work Type	Article
Larger Work Subtype	Journal Article
Larger Work Title	Journal of Heredity
First page	881
Last page	885
Google Analytic Metrics	Metrics page