A graphic on the front page of the New York Times shows a visual representation of the recently sequenced genome of the Zika virus' mosquito vector, Aedes aegypti. But what do those lines actually mean, asks Scientific American's Jen Christiansen.
To find out how to read the chart, Christiansen contacted its author, Mark Kunitomi, a postdoc in the Andino Lab at the University of California, San Francisco.
"Each of the colored lines in Kunitomi's graphic represents a string of chemical base pairs — the A,T, C and G of the mosquito's genetic code — whose accuracy researchers are highly confident about," Christiansen writes. "There are 3,752 contigs in the full map. The 2007 draft map included 36,206 contigs. The ultimate goal of continued sequencing efforts is to end up with just three lines; one continuous string of base pairs for each chromosome."
A contig's base pairs are represented in the length of each line, she adds, ranging from about 35,000 base pairs in the smallest line on the chart to 7.9 million base pairs. The full data set is comprised of about 1.7 billion base pairs, which includes both coding and non-coding regions of the genome, Christiansen says. Further, each grouping of lines represents contigs that likely belong together, but there are still some gaps, overlaps, and conflicts.
"Kunitomi created the graphic with the bioinformatics visualization tool Bandage, developed by Ryan Wick (currently a research assistant in Kathryn Holt's research group at University of Melbourne)," Christiansen says. "A description paper was published last year in the journal Bioinformatics: the software is available online, or you can clone the source code on GitHub."