Skip to main content
Premium Trial:

Request an Annual Quote

New Algorithm Identifies Short Tandem Repeats From Next-Gen Sequencing Data


While short tandem repeats are commonly used in forensics and genealogy, they are not as popular in the research world. "Somehow these markers got forgotten, to some extent," says Yaniv Erlich from the Massachusetts Institute of Technology, adding that single nucleotide polymorphisms are often used in research instead.

Short tandem repeats, or STRs, are a type of genetic variation that contains repetitive elements that are two nucleotides to six nucleotides in length and have a high spontaneous mutation rate. STR expansions have been linked to Huntington's disease and fragile X syndrome. Currently, most STR profiling is performed -using capillary electrophoresis.

Erlich and his lab have developed lobSTR, an algorithm to profile STRs using next-generation whole-genome sequencing data. Most common bioinformatics pipelines cannot detect STRs at present, so the researchers had to develop their own way to do so. They also had to develop a new alignment pipeline and genotype caller, says Melissa Gymrek, the lead author of the group's paper, which was published in Genome Research.

First, lobSTR finds and characterizes STRs from sequencing -libraries using a signal processing and fast Fourier transform approach. Then, it aligns the STRs to the reference genome using the non-repetitive flanking regions as a guide, thus determining the position and length of the STR. Lastly, it genotypes the STRs using a statistical learning approach that minimizes the stutter noise that is incorporated when DNA is amplified using PCR.

LobSTR can use FASTQ/FASTA or BAM formats. Erlich says that it works best with Illumina data. STRs can have long homopolymer sequences, which can be hard to detect using Ion Torrent and 454 machines, he adds.

LobSTR is also fast — it generally runs in a few hours and is 20 times faster than BWA, and two-and-a-half times faster than Bowtie, he says. Erlich envisions lobSTR as a supplement to mainstream aligners — something to be run quickly, and at the same time.

Gymrek says she hopes this tool gets the community thinking about STRs. "Somehow people have kind of ignored them, and now people are aware that STRs are out there and that there is a lot that you can do with them," she says.

"It opens another layer of information of the genome," Erlich adds.

The Scan

Germline-Targeting HIV Vaccine Shows Promise in Phase I Trial

A National Institutes of Health-led team reports in Science that a broadly neutralizing antibody HIV vaccine induced bnAb precursors in 97 percent of those given the vaccine.

Study Uncovers Genetic Mutation in Childhood Glaucoma

A study in the Journal of Clinical Investigation ties a heterozygous missense variant in thrombospondin 1 to childhood glaucoma.

Gene Co-Expression Database for Humans, Model Organisms Gets Update

GeneFriends has been updated to include gene and transcript co-expression networks based on RNA-seq data from 46,475 human and 34,322 mouse samples, a new paper in Nucleic Acids Research says.

New Study Investigates Genomics of Fanconi Anemia Repair Pathway in Cancer

A Rockefeller University team reports in Nature that FA repair deficiency leads to structural variants that can contribute to genomic instability.