NEW YORK (GenomeWeb News) – In a paper appearing online today in PLoS ONE, a team of American researchers found that they could begin characterizing regulatory sequences in fruit flies by comparing parts of the small Drosophila genome with similar areas in the much larger genomes of fruit flies from the Tephritidae family. That, in turn, has the team touting the importance of evaluating big genomes.
"The fact that the tephritids had big genomes was originally a nuisance because we had to do more sequencing and more screening," senior author Michael Eisen, a Howard Hughes Medical Institute investigator and a molecular and cell biologist affiliated with the University of California at Berkeley and the Lawrence Berkeley National Laboratory, said in a statement. "It was only after we got the data that we realized this might actually be an advantage."
There is a wide variation in the size of animal genomes, which range from tens of millions of bases to several billion bases of DNA. But it's unclear why some species have small genomes and others have large genomes.
If one were to look solely at the sequenced vertebrate and invertebrate species, Eisen told GenomeWeb Daily News, they would probably get the impression that vertebrate genomes are usually big and invertebrate genomes are usually small. But, he said, "That's certainly not true."
He argues that this reflects a bias in the organisms that have been selected for sequencing so far. Because it costs more money and takes more time to sequence big genomes, that investment is usually reserved for vertebrate genomes — for instance, the human genome and the genomes of species important for human health and agriculture. There's a "huge, huge, huge bias in the invertebrate genomics world," Eisen said.
At first, Eisen said, "We weren't interested in the genome size issue." He and his colleagues generally study gene regulation during development, and the project started as a comparison between genes involved in early embryonic development in Drosophila and four tephritid fruit fly species.
The team selected four fly species that were the right phylogenetic distances from Drosophila to be useful for comparisons, Eisen explained, and were also readily accessible: the Mediterranean fruit fly, or medfly (Ceratitis capitata), the oriental fruit fly (Bactrocera dorsalis), the melon fly (Bactrocera cucurbitae), and a fourth tephritid fly species, Ragoletis juglandis.
The first three species were captured from stocks at the US Department of Agriculture's Pacific Basin Agricultural Research Station in Hawaii, while adult R. juglandis flies were captured in Arizona.
They then used propidium iodide staining and flow cytometry to estimate the size of each fly's genome. "It's always essential to know how big the genomes are," Eisen said.
Based on their results, the team estimated that the four species had genomes between 440 and 850 million bases — much larger than the 175 million-base Drosophila genome.
The researchers weren't exactly surprised that the genomes were so much larger than Drosophila, a model organism known to have a small genome, Eisen said. But they did find an unexpected wealth of information when they sequenced and compared regions of the genome potentially involved in embryonic development.
After screening 20 genes and pulling out four genes that were present in three or more of the tephritid species tested, the researchers found intriguing patterns in the tephritids. These flies had conserved non-coding sequences separated by relatively large chunks of non-conserved DNA — a pattern more closely resembling humans or other vertebrate genomes than the Drosophila genome.
Based on the small vertebrate genomes that have been sequenced so far, such as the puffer fish genome, Eisen said, the same pattern seems to hold in vertebrates: small genomes tend to have smaller introns and intergenic regions.
Even so, the researchers reported that the non-coding DNA in the tephritid fruit fly genomes is functional. When they plopped nine conserved non-coding sequences from C. capitata into Drosophila embryos, the team saw that six of these drove expression patterns corresponding to Drosophila enhancers.
At the moment, the researchers are sequencing the entire genome of the medfly, C. capitata. That work is being done by researchers in Eisen's lab using Illumina sequencing and by collaborators at the US Department of Agriculture and Baylor College of Medicine using Roche 454 sequencing. Along with sequencing additional genomes down the road, Eisen said he is also interested in using their approach to look at modular regulatory regions and how they have evolved.
Overall, the findings highlight the notion that preferentially sequencing the smallest invertebrate genomes paints an imperfect or inaccurate picture of what genomes look like and misses valuable insights housed within the larger genomes, Eisen said, including information about genome structure, the nature of rapidly evolving sequences, and regulatory element information.
While it can be more difficult and costly to assemble large genomes packed with repetitive DNA, he added, vertebrate studies have already paved the way for tackling such challenges. "It's a little more expensive, but our point is it's worth it," Eisen said. "There's a tremendous value in sequencing these larger genomes."