Skip to main content
Premium Trial:

Request an Annual Quote

Genome Features Spelled Out Using Models Trained With EvoAug Data Augmentation

In a paper appearing in Genome Biology, a Cold Spring Harbor Laboratory team describes a computational method known as EvoAug that is designed to boost the performance of deep neural network (DNN) models used to understand regulatory motifs and other genomic features with the help of genetic variation data and evolutionary insights. Starting with artificial DNA sequences resembling established sequences, the approach takes into account alterations that may crop up through evolution, the researchers write, adding that EvoAug training steps take changes that do or do not alter functional processes into account. The researchers found that EvoAug-trained models outperformed models trained on biological data alone when it came to evaluating cis-regulatory elements and transcription factor binding sites, for example. "Our findings support previous arguments for using evolution as a natural source of data augmentation," the authors write. They suggest EvoAug "will have broad utility in improving the efficacy of sequence-based DNNs for regulatory genomics."