While short tandem repeats are commonly used in forensics and genealogy, they are not as popular in the research world. "Somehow these markers got forgotten, to some extent," says Yaniv Erlich from the Massachusetts Institute of Technology, adding that single nucleotide polymorphisms are often used in research instead.
Short tandem repeats, or STRs, are a type of genetic variation that contains repetitive elements that are two nucleotides to six nucleotides in length and have a high spontaneous mutation rate. STR expansions have been linked to Huntington's disease and fragile X syndrome. Currently, most STR profiling is performed -using capillary electrophoresis.
Erlich and his lab have developed lobSTR, an algorithm to profile STRs using next-generation whole-genome sequencing data. Most common bioinformatics pipelines cannot detect STRs at present, so the researchers had to develop their own way to do so. They also had to develop a new alignment pipeline and genotype caller, says Melissa Gymrek, the lead author of the group's paper, which was published in Genome Research.
First, lobSTR finds and characterizes STRs from sequencing -libraries using a signal processing and fast Fourier transform approach. Then, it aligns the STRs to the reference genome using the non-repetitive flanking regions as a guide, thus determining the position and length of the STR. Lastly, it genotypes the STRs using a statistical learning approach that minimizes the stutter noise that is incorporated when DNA is amplified using PCR.
LobSTR can use FASTQ/FASTA or BAM formats. Erlich says that it works best with Illumina data. STRs can have long homopolymer sequences, which can be hard to detect using Ion Torrent and 454 machines, he adds.
LobSTR is also fast — it generally runs in a few hours and is 20 times faster than BWA, and two-and-a-half times faster than Bowtie, he says. Erlich envisions lobSTR as a supplement to mainstream aligners — something to be run quickly, and at the same time.
Gymrek says she hopes this tool gets the community thinking about STRs. "Somehow people have kind of ignored them, and now people are aware that STRs are out there and that there is a lot that you can do with them," she says.
"It opens another layer of information of the genome," Erlich adds.