When it comes to de novo sequencing, next-generation technologies have largely stayed within the realm of bacteria. But two independent recent genome projects show that next-gen sequencing, combined with Sanger sequencing, can tackle eukaryotic genomes, despite challenges like homopolymers and repeat sequences that these genomes tend to harbor.
“When all is said and done, our conclusion will be that [next-generation technologies] have a big contribution to make in eukaryotic hybrid assemblies,” says Stephen Kingsmore, president of the National Center for Genome Resources, who headed one of the projects.
Both sequencing projects, the genomes of the Pinot Noir grapevine and the plant pathogen Phytophthora capsici, were presented at the Plant and Animal Genome meeting in January. They used different blends of Sanger and 454 sequencing to create their assemblies. But while the NCGR-led P. capsici project included paired 454 reads, the grapevine project did not.
Sequencing of the 475-megabase grapevine, or Vitis vinifera, finished recently and the researchers just submitted the genome to the EMBL Nucleotide Sequence Database. Riccardo Velasco of the Istituto Agrario San Michele All’Adige, who led the effort, says the genome project began Sanger sequencing about a year and a half ago.
However, “after 7x coverage with Sanger, we saw that to go ahead with traditional Sanger to fill in the gaps would have been very expensive and time-consuming,” he says. So about six months ago, “we tried some rounds of 454 sequencing, and it worked very well in filling up small gaps that were still present after assembling the sequence.”
In the end, researchers settled for 7x coverage with Sanger sequencing and 4.2x coverage with 454 sequencing. Both short homopolymer stretches and longer repetitive regions, which are mostly present in eukaryotes, were a challenge for the 454 technology alone but could be overcome by a combination of the two technologies, Velasco says.
- Julia Karow
The Virginia Bioinformatics Institute at Virginia Tech installed the new 454 sequencer, known as the GS-FLX, in its core lab. The sequencer should generate 100 megabases of sequence in seven hours, with read lengths up to 200 base pairs, according to the institute.
Scientists at the National Institute of Allergy and Infectious Diseases have sequenced the genome of Trichomonas vaginalis, a parasite that causes the common STD trichomoniasis. According to the researchers, the genome is made up of 160 megabases and nearly 26,000 predicted genes.
Integrated Genomics will sequence and annotate the genome of a dairy lactic acid bacterium for the European dairy cooperative Arla Foods. Scientists at Arla, which operates in Sweden and Denmark, will use the sequence data with Integrated Genomics’ ERGO bioinformatics platform to help develop lactic acid bacteria for use in dairy products such as cheese, milk, and butter.
The National Center for Genome Resources has teamed up with the New Mexico Institute of Mining and Technology to create the New Mexico Genome Sequencing Center to focus on medical resequencing. Based in Santa Fe at the NCGR’s headquarters, the new center has been funded in part by $600,000 from the state of New Mexico; the center will seek an additional $1.1 million in federal funds.
US Patent 7,164,991. Specific identifiers of amino-acid base sequences. Inventors: Tetsuro Toyoda and Akiko Itai. Assignee: Institute of Medicinal Molecular Design. Issued: January 16, 2007.
This patent generates “specific identifiers of sequences … from data representing connection order of residues in the sequences by using a conversion function, such as collision intractable hash function or universal one-way hash function, and are assigned to the sequences,” according to the abstract.
US Patent 7,164,992. Method and system for polynucleotide synthesis. Inventors: John Mulligan, John Tabone, and Gregg Brickner. Assignee: Blue Heron Biotechnology. Issued: January 16, 2007.
This patent covers “an Automated Polynucleotide Synthesis Design System … which automatically generates a synthesis design for a designated target sequence specification,” according to the abstract.
Number of unique genes found in a particular Aspergillus niger strain, which was sequenced by a team led by Dutch industrial chemical firm DSM. The fungal genome has about 33.9 million base pairs.