NEW YORK (GenomeWeb) – Canada's Genomics Enterprise, or CGEn, has embarked on a nationwide effort to sequence the genomes of 150 species deemed important to the country.
The project, known as CanSeq150, will span a broad range of plants, animals, and microbes. The unifying theme: each new genome sequence should enhance research on the species at hand, while providing some sort of economic, social, or other benefit to Canada, explained Naveed Aziz, a researcher with Centre for Applied Genomics at SickKids and CGEn's chief administrative and scientific officer.
"They don't have to be Canadian species, as long that sequencing will help Canada," he said.
Although the precise sequencing strategy may vary somewhat depending on the species and the type of genome at hand, the researchers expect to sequence each of the species to at least thirtyfold average coverage using Illumina short reads in combination with Pacific Biosciences' long reads and 10x Genomics' linked-read technology (the Chromium de novo assembly solution), when possible. The project has already established partnerships with all three firms.
"One of the reasons we wanted to partner with [CanSeq150] is we're very interested in supporting these large sequencing activities," Anushka Brownley, informatics product manager at 10x Genomics, said. She noted that the 10x de novo assembly methods scale well for large-scale efforts such as CanSeq150 that look at organisms with a range of genome sizes and input DNA amounts, though they are optimized for diploid organisms.
"If you're going to do your first-pass genome using 10x Genomics, it's incredibly cost-effective and gets you almost all the way there in a lot of cases," added Sara Agee, head of commercial marketing for 10x Genomics.
In some cases, the team will do de novo sequencing on species for which there is currently no genomic data available, Aziz noted. For other species, the CanSeq150 researchers plan to top up existing sequence data generated by other groups to inch closer to a reference-quality genome.
The CanSeq150 executive committee has already selected several species for sequencing including the Canada gray jay, Canada lynx, snowshoe hare, Vancouver Island marmot, Steller sea line, white-sided dolphin, and northern fur seal. It has also received genome candidate applications from the Toronto Zoo.
But it will not just be cuddly charismatic animals on the sequencing docket, Aziz emphasized. At least one unicellular green algae species is expected to join the list. And the CanSeq150 executive committee meets about once a month to review applications for a wide range of animals, plants, and microbes.
"There's a lot of opportunity to provide support for researchers, especially smaller labs who are not working on humans or wheat and maize crops, but are working on species that are still very important," Aziz explained, noting that there has been "little support in the system" for such projects until now.
The CanSeq150 committee will continue taking applications until they reach 150 species — a goal selected as a nod to Canada's 150th anniversary of confederation in 2017. Over the longer term, Aziz said, CGEn hopes to continue contributing to the country's research community, "hopefully even beyond the 150 species."
The project has secured funds from the University of Toronto's McLaughlin Centre, and participating centers such as the Hospital for Sick Children (SickKids) in Toronto, McGill University, the BC Cancer Agency, and the University of British Columbia.
Aziz did not provide an estimated final price tag for the project, but said the team has sufficient funds to complete 150 genomes. Keeping with the "150" theme, he noted, the investigators are aiming to complete the genomes in 150 weeks, or nearly three years.
CanSeq150 is one of the first major initiatives from CGEn — a genome sequencing and analysis network centered in Toronto, Montreal, and Vancouver that was established in 2014 with the help of more than C$100 million (US$78 million) in funding from the Canada Foundation for Innovation, Genome Canada, and other organizations.
"The mandate for CGEn is much bigger than CanSeq150: it is supporting the sequencing, including clinical sequencing and basic research sequencing, in Canada across all sectors," Aziz said.
Although the team is embarking on efforts that support basic research, CGEn is also looking ahead to a time when clinical sequencing becomes more widespread across the country.
"All three sites are in the process of setting up clinical-grade sequencing facilities or labs," Aziz said. "Right now we are primarily research-focused, but … CGEn's mandate is to set up that ready state for [clinical sequencing in] Canada."
Across the CGEn network, researchers generated more than 14,000 whole-genome sequence libraries in 2016 and 2017 alone, he noted, though participating sites are also using a range of other genomics technologies such as real-time PCR and microarrays.
The network may ultimately establish a central sequence repository, though Aziz noted that most of the large-scale sequencing projects it participates in have distinct data release and data management policies depending on the scope of the effort and region.
Because much of CGEn's sequencing is done for public sector projects, he added, the team is currently working through the relevant policies and guidelines for establishing cloud-based sequence repositories that are publicly accessible.
For the CanSeq150 project, the team is partnering with Toronto-based cloud computing company DNAstack, which is helping the investigators tackle the diverse genome assembly challenges associated with sequencing such a wide range of organisms across the phylogenetic tree.
"It's an interesting project for us,"DNAstack CEO Marc Fiume said. "If you're wanting to do a new genome from scratch, it's computationally super-demanding. That's where the advantages of using cloud-computing are really realized."
That firm teamed up with CGEn, SickKids, and McGill's Centre of Genomics and Policy to launch the Canadian Genomics Cloud earlier this year. DNAstack is also participating in the SickKids-led "Precision Oncology for Young People" (PROFYLE) project, a Terry Fox Research Institute-funded effort to sequence relapsed or treatment resistant cancers occurring in children and young adults across Canada.
Fiume noted that CanSeq150 and other large-scale sequencing projects spearheaded by investigators at SickKids and other centers highlight "the kind of leadership we have in Canada, particularly around open science."
"We don't know how valuable [CanSeq150] is going to be, but in a good way," he added. "I think … we're going to open up new opportunities in the study of all kinds of interesting species that have to do with agriculture, that have to do with environment, that have to do with the microbiome."
Aziz noted that he is in touch with leaders of other large-scale sequencing efforts, such as the Vertebrate Genome Project, as well as government- and academic-led sequencing efforts in Canada, to avoid overlap between the projects. He and the executive committee plan to allocate genomes to the three participating sequencing centers as species applications are approved.
"If the [principal investigators] are in one of the three areas or provinces, it makes a lot of sense to allocate it to that center because the managers at each center will have to work with the PIs to get the samples in, do the work, and provide the data."
In other cases, the team will put together data generated in Toronto, Montreal, and Vancouver to showcase the "all-for-one ability" of the sequencing sites.
Investigators who are interested in applying to CanSeq150 can obtain a two-page application form by emailing through the CGEn site, he said. Successful applicants will be contacted as the executive committee approves each set of applications.