Skip to main content
Premium Trial:

Request an Annual Quote

PacBio Sequences E. coli Genome, Increases Average Read Length to Nearly 1,000 Bases

Premium

Pacific Biosciences has used its single-molecule technology to sequence the 4.6-megabase genome of E. coli and has generated reads several thousand bases in length, a company official said last week.

At the Advances in Genome Biology and Technology conference in Marco Island, Fla., last week, PacBio Chief Technology Officer Steve Turner presented results from this and an earlier project, in which company researchers sequenced a human bacterial artificial chromosome.

According to Turner, the company is on track to roll out its single-molecule real-time, or SMRT, sequencer in the second half of 2010.

Last year, PacBio scientists sequenced a 107-kilobase human bacterial artificial chromosome to 68-fold coverage with an average read length of 446 bases and other reads that surpassed 2,000 bases, Turner reported. The coverage was “fairly uniform,” he said, and showed no GC bias.

The consensus accuracy was 99.99 percent in non-repetitive regions of the genome, and 99.96 percent in repetitive regions. The system also called 21 of 24 SNPs in non-repetitive areas, and 13 of 20 SNPs in repeat regions.

The project used a sample-prep method that PacBio first presented last fall, which ligates hairpin adaptors to the ends of double-stranded DNA (see In Sequence 10/14/2008).

Since the BAC project, “we have made improvements in the protocols, the sequencing conditions, and the performance had improved enough to move on to the next logical step,” which, according to Turner, meant sequencing the 4.6-megabase genome of E. coli K12, which the company completed last month.

At 38-fold coverage, 99.3 percent of the genome was “unambiguously” covered with an average read length of 568 bases and a maximum read length of approximately 2,800 bases.

The sequence quality reached Q61 in those 4.5 megabases of the genome that were covered at 20-fold coverage or higher, and Q54 for the entire genome except for duplicated areas.

Overall, the SMRT system produced four sequencing errors for the E. coli genome and discovered one novel variant that differed from the reference.

The coverage of the genome was “remarkably close to Poisson distribution,” according to Turner, after accounting for the fact that the E. coli DNA was enriched for sequences near the origin of replication due to culture conditions. No significant GC bias was present, he added.

As expected, the sequence accuracy was consistent over the length of the reads, with less than 5 percent coefficient of variation over 1,200 bases on average.

To study the error profile of the technology further, PacBio scientists only looked at eight-fold coverage and found “significant differences” in the rate of errors. They tracked these down to the nature of the fluorophores, some of which are less efficiently excited by the laser — and therefore less bright — than others. Turner mentioned that the firm has new fluorophores in development that will alleviate this problem.

‘Like Toys’

Since the E. coli sequencing project, the company has increased the average read length of the instrument to 946 bases. It has also generated a 3,200-base read from the E. coli genome that covers a 2.6-kilobase exact repeat. “It’s the first time that such a large repeat section has ever been bridged with any technology,” according to Turner.

Three years from now, he said, the company is hoping to reach read lengths of between 20,000 and 40,000 bases.

However, PacBio does not intend to provide long reads only, he said. Rather, users will be able to switch between short reads at high throughput and long reads at lower throughput, he said. “No longer will you have to maintain one instrument for short reads at high throughput and another for longer reads.”

The system can also be used for “redundant sequencing,” or consensus sequencing, in which a small circular molecule gets sequenced over and over, avoiding the heterogeneity of systems that sample different molecules to increase the consensus accuracy.

To test the system for this application, PacBio scientists pooled two DNA constructs that differed in a single base position in different ratios and sequenced them. They were able to accurately reproduce the ratios in which the samples were mixed.

“Even in this system, which is a far cry from what we will be shipping in over a year, suggests that we will be able to reliably detect minor fractions of less than 1 percent,” Turner said.

The company has 12 prototype instruments operating today, some in production and others for development efforts.

These prototypes currently have 3,000 zero-mode waveguides. Turner declined to reveal how many reads the commercial system will produce in parallel, saying that this number has yet to be determined and will be announced “in due course.”

“We are on track for delivery in the second half of 2010 of an instrument that will make these prototypes look like toys,” he said.

“I think they have made a lot of progress,” Chad Nusbaum, co-director of the Broad Institute’s genome sequence and analysis program, told In Sequence last week. “It’s a very nimble machine. It takes an hour to run the thing, which is fantastic for development cycles.”

Nusbaum also finds the long reads the system offers appealing. “I’m sort of an old-school genome sequencer — long reads are extraordinarily appealing for so many reasons,” he said.

However, PacBio will need to stay ahead of the established players if it wants to be successful, he cautioned. “They have to be able to beat what Illumina is doing substantially on the day of their launch if they want to get in the game, and they have to continue to stay ahead.”

In agreement was George Grills, director of operations of core facilities at the life sciences core laboratories center at Cornell University, from which PacBio spun off originally. “Since last year’s [AGBT] meeting, there have been some really major developments just in their public disclosure,” he said.

“They are asking the right questions,” he added. “The real thing is going to be, ‘How well do they actually implement answer those questions?”

He said it will be interesting to see what choices PacBio will make in terms of density, robustness of the instrument, technical support, and total cost for customers, including service contracts.

“It’s not surprising that they don’t have the answers yet; they are still in that development phase,” Grills said. “But I have liked what I hear in terms of their presentations, I like what I hear when I talk to them directly, … and I would like to have one of their toys.”