In the 1950s, Erwin Chargaff noted that the amount of adenine in a given stretch of DNA is equal to the amount of thymine, and guanine to cytosine, a finding that James Watson and Francis Crick used to support their double-helix model of DNA. At arXiv, two biophysicists report that they've extended these "grammar" rules for DNA. "Chargaff's rules apply to words where k=1, in other words, to single nucleotides," says The Physics arXiv Blog. "But what of words with k=2 (eg AA, AC, AG, AT and so on) or k=3 (AAA, AAG, AAC, AAT and so on)?" Brazil's Michel Beleza Yamagishi and Roberto Herai searched through large genomic data sets from more than 30 species, finding a fractal-like pattern, arXiv adds. "To the best of our knowledge, these new rules show for the first time that oligonucleotide frequencies do have invariant properties across a large set of genomes, and these rules, regardless the number of nucleotides remains the same (self-similarity)," the researchers write in their paper.
The researchers add that their work has a practical application: It could be used to check short read data for biases.