As part of a research project funded by the US Food and Drug Administration, researchers at the University of Maryland's Institute for Genome Sciences have been awarded a $1.4 million research program contract to sequence, assemble, and annotate a population of bacterial pathogens and make that data available in public databases.
The overall goal of the study is to create a comprehensive, curated database of microbial genome sequences and associated metadata that will be used as reference to evaluate and assess high-throughput sequencing-based diagnostic devices.
For their part in the project, researchers in IGS — which is part of Maryland's School of Medicine — will use instruments developed by Illumina and Pacific Biosciences to sequence the genomes of a number of yet-to-be disclosed pathogens. By using two complementary sequencing platforms, the researchers expect to be able to cross-validate consensus sequences to generate the highest possible genome sequence accuracy.
After sequencing, they'll apply multiple software packages to handle genome assembly, data quality assurance and control, and data annotation and curation. Luke Tallon, the scientific director and founding leader of the UMD's Genomics Resource Center, told BioInform that his team's pipelines will include tools such as the Hierarchical Genome Assembly Process, a de novo assembly algorithm developed by PacBio, different "flavors" of the Celera assembler, and Masurca, whole genome assembly software developed by UMD researchers. He also said that the researchers are developing additional assembly tools and error detection methods, and are developing new applications for the QA/QC steps.
"The goal is to make sure that we are getting the best possible assembly with each dataset and each assembler has some strengths and weaknesses and so we are always trying to use them to complement one another and find the best possible assembly for each genome," he said.
In addition to genomic sequence data, other participants in the project will collect additional metadata such as phenotype information. UMD researchers will help manage that data and make it available in public resources, Tallon said. The data will be made available through various repositories managed by the National Center for Biotechnology Information. Tallon also said that his team will work with the FDA to build a separate database that will host all of the data generated by the project.
"This database will be an important reference for the scientific and medical diagnostic communities," according to IGS Director Claire Fraser. "We have worked with federal agencies and global scientific partners to sequence and analyze an extensive population of bacterial pathogens since our Institute launched in 2007 and are pleased to develop this reference database with the FDA," she said in a statement.