A non-genomics researcher mentioned to Mike the Mad Biologist that there was a lot of criticism of the E. coli 104:H4 outbreak sequences, and Mike writes on his blog that he wasn't sure what his colleague was referring to. Then he realized that assessments of the quality of the sequence data could look like criticism of the science to an outsider, even when it's not. With a high-quality sequence, Mike writes that the error rate could be between 1 in 100,000 and 1 in 1,000,000 per base. "That sounds good until you realize that a typical E. coli genome is around five million bases long," Mike says, later adding that "no genome sequence, even a finished one, is perfect. But we can still do good science, even as we recognize the flaws in the data."
Jul 12, 2011