"Contamination of various sorts has plagued genome projects from the get-go," complains Keith Robison at Omics! Omics!. Doing some background searching, he found a lot of contamination in sequence data sets, including plasmids and vectors among eukaryotic sequences. He says a New Year's resolution for everyone should be to recheck sequencing pipelines. "The solution is to run filters -- search everything you do against vectors, E.coli and other common contaminants. In addition, especially in this day-and-age, if your 'human' mRNA sequence doesn't match the genome, you've got some 'splaining to do," he writes.
Check It Before You Wreck It
Jan 27, 2009