NEW YORK (GenomeWeb) – A University of New Mexico research team is developing a long-read sequencing technology that it plans to use for population sequencing projects.
Jeremy Edwards, director of bioinformatics at UNM's Comprehensive Cancer Center, who is spearheading the research, told GenomeWeb that the technology is somewhat similar to 10X Genomics' linked read technology in that it involves using barcodes to join together short DNA fragments that have been sequenced with a short-read sequencing technology.
His group has not yet published the method in a peer-reviewed journal, but aims to do so this year. Eventually, he said, the group would look to commercialize the technology.
Edwards said the team plans to incorporate the technology for population sequencing projects such as the New Mexico Genome Project, which aims to better understand the state's diverse populations, particularly those with Hispanic and Native American heritage. Edwards said his group also aims to partner with other groups to gain access to resources such as large sample collections. For example, they plan to collaborate with researchers on the Genome Korea project, a population sequencing project that initially aims to sequence 10,000 Korean genomes.
In addition, he said the group plans to use the technology for cancer genome sequencing, in order to help elucidate structural variants, rearrangements, and other complex mutations that can be difficult to sequence with short-read technology.
The long-read technology that Edwards' group is developing is based off a patent application he has filed titled DNA Sequencing and Epigenome Analysis. Essentially, the method involves first immobilizing and stretching DNA. Then, the researchers generate optical maps of the DNA, attach short barcodes at various locations, and then sequence it with short-read technology. While standard optical maps give distances between marks, this technique gives more detailed information, Edwards said.
"It gives you more insight into what the features are at each barcode site," he said. The barcodes then also act like "zip codes" to orient and order the shorter sequenced fragments to generate scaffolds. Thus far, Edwards said that the technique has enabled the team to generate scaffolds longer than 1 megabase in size and that they are now working on getting longer and more accurate scaffolds.
One of the first applications will be for cancer genome sequencing. UNM's Comprehensive Cancer Center already has a protocol in place to do whole-genome sequencing for consenting patients. Edwards said that incorporating the long-read sequencing technology for these individuals would make the most sense as a starting point.
However, he said he is especially looking to incorporate the technology for population sequencing projects, such as the New Mexico Genome Project. For that project, it will be necessary to recruit individuals outside of the cancer center to ensure a diverse representation, he said. That project is still in the very early stages, and a pilot phase is still being planned.
The researchers have initial funding of $100,000 for that phase and plan to sequence 50 genomes in 2017 for the pilot. Aside from Edwards' lab, researchers from UNM's departments of anthropology, biology, chemistry, and earth and planetary sciences will participate in the project, as will researchers from the Museum of Southwestern Biology, the Maxwell Museum of Anthropology, and the Center for Advanced Research Computing, Edwards said.
New Mexico's populations are "underrepresented in a lot of the genome sequencing projects" that have been conducted elsewhere in the US, Edwards said. The state has a very heterogeneous population, and researchers would like to better understand the evolutionary history of those populations. Individuals with Hispanic heritage have migrated from many different geographic locations.
Edwards said there are probably over a dozen different genetically distinct groups in New Mexico that are all lumped together in the same category. By better representing those groups in the New Mexico Genome Project, Edwards said the researchers hope to get not only a better idea of their ancestry and history, but also start to ask questions about how their genomes relate to the health of those populations.
He said that for the project one key would be developing better and faster bioinformatics pipelines. Already, he said, he has worked with Mountain View, California-based Sentieon to speed up their analysis.
Another important piece of the project will be putting in place strategies to recruit participants, he said, adding that the Hispanic and Native American populations will have their own institutional review boards.
The researchers will also need funding beyond the initial pilot phase of 50 genomes. Edwards said that this fall New Mexico residents will vote on a bond measure to fund the creation of a new building that would support interdisciplinary science groups. The Physics & Astronomy and Interdisciplinary Sciences building project would provide lab and computational space for interdisciplinary groups, including housing the Center for Bioinformatics and Genomics, which will have a sequencing facility, leverage resources from Sandia National Laboratories and other UNM partners, and will host the New Mexico Genome Project.
Edwards said that the goal is to make the long-read technology his lab is developing cost-effective enough to be used for every whole genome that is sequenced. The method is incorporated into the initial library prep, and Edwards said that the researchers are working to make the additional cost and time negligible.