NEW YORK (GenomeWeb) – How much time and money would it take to sequence the genomes of 9,000 eukaryotic families, 150,000 to 200,000 genera, and 1.5 million species?
The answer, according to organizers of the Earth BioGenome Project (EBP), is about 10 years and roughly $4 billion. The attempt to sequence all eukaryotic species on Earth has been likened by its coordinators to the Human Genome Project (HGP), both in its ambitions and for the benefits they hope it will bring to biological research.
"You have to view this as a biology infrastructure project — this would be the infrastructure of the future of biology research, and all aspects of biology from evolution to ecology and conservation," University of California, Davis evolution and ecology professor and EBP co-founder Harris Lewin said in an interview. "We frame it as sequencing life to understand and conserve biodiversity. But it's even deeper than that. We now have settled on a broad mission which could be described as creating the new foundation for biology that drives solutions for preserving Earth's biodiversity, managing ecosystems, spawning bio-based industries, and sustaining human societies."
The plan to sequence all life on Earth began with Lewin; John Kress, a research botanist and curator at the Smithsonian National Museum of Natural History; and Gene Robinson, director of the University of Illinois at Urbana-Champaign's Carl R. Woese Institute for Genomic Biology.
"I've been working on the Genome 10K Project since its inception, and it became apparent to me that it would be possible to sequence everything at a certain point," Lewin said. In 2015, Lewin, Kress, and Robinson invited 30 guests to a Smithsonian workshop, including representatives from academia, government, and science funding agencies such as the National Human Genome Research Institute's Eric Green and the National Science Foundation's Paula Mabee.
The reaction? "People seemed to think it was a great idea whose time has come," Lewin said. And so began the work to flesh out the idea into a detailed plan of how it could be done, what resources would be needed, and how much it might all cost.
Building a Community
More meetings took place and more presentations were made at conferences. Eventually, BGI and the Wellcome Trust Sanger Institute expressed interest in joining in. At BioGenomics2017, the Global Biodiversity Genomics Conference in Washington, DC, in February, Lewin again made presentations to various research groups, and held workshops to develop a more detailed roadmap, funding plan, and organizational framework, and to drum up some enthusiasm among the gathered evolutionary biologists, ecologists, botanists, entomologists, vertebrate and invertebrate researchers, and conservationists.
The project's organizers realized that if it was going to work, it would take the help and effort of all of these research communities to set standards for the project, gather samples, do the sequencing, analyze and store the data, write and publish papers, and even invent new technologies that could help speed the effort.
"We went much deeper [at the BioGenomics2017 meeting] and got an enthusiastic green light from the community," Lewin said. "Part of that was to build out the idea of the community of communities, bringing together all the existing genome communities to try to work together." Groups such as the Global Invertebrate Genomics Alliance (GIGA), the i5K insect genome sequencing project, G10K, and others endorsed the project. That's important in that the work those groups have already done can be integrated into the EBP's research — that way, the consortium can avoid having to redo sequencing and analysis that's already been done.
Importantly, the Global Genome Biodiversity Network (GGBN) also gave a thumbs up. "A very big step was to get GGBN," Lewin said. "The hardest part [of this project] is to get the samples to do the sequencing, and so GGBN, being the largest consortium in the world collecting samples for genome sequencing, was also supportive of the concept because they were just collecting, but they had no mechanism or plan of getting to the sequencing and analysis part." Having access to the GGBN's samples could give the EBP a leg up on one of its biggest challenges.
Step One: Collect Samples
Sample collection will be arduous, and it relates in part to the EBP's proposed strategy. The group is taking two tacks. One is what Lewin calls a "phylogenetic wave." It works from the top of the taxonomic system to the bottom, and involves first sequencing 9,000 eukaryotic families in the first three years, then the 150,000 to 200,000 genera, and then the remaining species in the last four years of the project. By itself, this will require an intense collection effort to make sure researchers have the necessary samples.
The second strategy is just as challenging. The group plans to do deep and repeated ecological sampling in every kind of environment on Earth, in order to both collect as many organisms as possible and to look at how environment or climate change affect biodiversity.
"We'd like to focus on biodiversity hotspots and it's also a way to get at the ecology — if you do a phylogenetic wave, you can't get at who's talking to who. But if you do a high ecological context, you have a way to understand that ecosystem in a more robust way," Lewin said. "So we'll be looking at agricultural ecosystems, desert ecosystems, marine ecosystems, fresh water — all of the main ecosystems that support the planet." He also added the "brute force of sequencing" all these ecosystems could also bolster research of understudied species and could even lead to the discovery of new organisms.
What's not currently on the list for the EBP is an examination of prokaryotes. With current efforts like the Human Microbiome Project (HMP) underway to sequencing bacteria, eubacteria, and archea, Lewin said the EBP isn't currently considering recreating those efforts. However, he does add that he and the other EBP organizers are considering having a conversation with the HMP and others to discuss integrating their sequence data, though that conversation is still a long way off.
Also not on the list as of now: extinct species. The EBP's current plan is to stick to what's alive on the planet right now. However, Lewin added, the genome reconstruction expertise of researchers in the paleogenomics community is welcome.
"I'd say genome reconstruction is an important part of what we want to do, because it's an important part of evolutionary science," he noted. "We'll reconstruct the genome and the chromosome organizations, but reconstructing the species is a different kind of question. If you get genomes along the way from extinct species, those are going to be very helpful in any kind of reconstruction work, but [sequencing extinct species is] not an explicit goal at this time."
Challenges and Cooperation
"I'm pretty jazzed about it," Aristedes Patrinos, deputy director for research at New York University's Center for Urban Science and Progress and a professor of chemical and biomolecular engineering, said in an interview. Patrinos was one of the leaders of the HGP — back then, he was with the Department of Energy — and knows firsthand what a project like this entails and the challenges the organizers and researchers have ahead of them.
Though Patrinos is not currently participating, he would like to get involved in the EBP. "It's a very ambitious project and has several things going against it at this time, so a herculean effort needs to be launched," he said. "Some of the goals seem very impossible at the onset, but it depends on the energy we bring to the project, and the individuals and groups we can seduce into supporting it one way other another." He also emphasized the international nature of the effort, noting the diversity of thought and experience as a strength for the project.
But like Lewin, Patrinos also said that the challenges inherent in a project like this are not minimal. And the biggest one he sees is sample collection. "I think once this gets off the ground, the enthusiasm will mobilize the people we'll need who are ready and willing to do these expeditions, and do them in the same way so we have the same standards whether it's in China or in Yemen or in Canada," he said.
Lewin re-emphasized the importance of cooperation between communities in other aspects of the project as well. Though the EBP organizing committees aren't specifying which sequencing technologies researchers should use, they are setting certain standards.
"What we're shooting for is reference-quality [genomes] for the families, and draft quality for the remaining species," Lewin said. "If you have a good reference at the family level, you can do a lot with in silico techniques without going all the way, which is very expensive at this time. And then there will be many of the groups themselves that will do higher quality assemblies."
But setting standards doesn't mean enforcing them, something the EBP organizers can't really do, he explained. So cooperation from all the participating groups will be necessary in order to keep quality at an even level.
Informatics and data storage is another area where cooperation will be required. The EBP is currently discussing a plan for the informatics and data analysis side of this equation, as well as trying to figure out where it's all going to go. "My guess is that it's going to be distributed around to where the different sequencing centers are in the world," Lewin said, adding that the EBP model calls for organizing the data around geographical nodes like the US, the UK, China, and Brazil, among other places. The plan also calls for "a coordinating network of centers that will handle all the data," he added.
Show Me the Money
Funding is also being heavily debated. With a price tag in the $4 billion range, it'll take more than just one or two sources of funding to accomplish the EBP's goals — at a time when current US President Donald Trump is calling for billions of dollars in cuts to the NIH's budget, it may be safe to assume that such a project as the EBP may not be getting much help from the US government. So the EBP is talking to whoever it can to get the funding it needs, from various foundations to government agencies and even foreign governments.
The price tag might also scare researchers who might feel that their own funding could be threatened if the EBP gets major support from agencies such as the NIH and the NSF. "At the conference, there was a lot of support, but I also saw some reactions that was like what I experienced at the dawn of the Human Genome Project, which is 'Whoa, you're [planning] a huge program that takes $4 billion, and where is that money coming from — is it going to impact my funding?'" Patrinos said. "[We need] to reassure these individuals that this is not an attempt to raid their budgets, but if this is successful this will enhance their work because we'll provide capabilities, data, [and] resources that would significantly help them in their research."
But Patrinos is also optimistic that wealthy individuals and foundations could also fill some funding gaps. "We're blessed in this country by having foundations and so many generous rich people willing to fund long-term research," he said.
He also believes the indirect impact of this project on human health may cause some people to look at it askance rather than with optimism. "This is a biological initiative that doesn't necessarily involve human health. Some people argue there could be new discoveries of substances that could prove useful to human health, and if you allow the ecosystem we rely on to collapse, we're not going to be that far behind. But it's still indirect," he explained. "And it represents something different from what traditional PIs do in biology and medicine. They want a hypothesis-driven proposal and this is an infrastructure project — there's no hypothesis inherent in the proposal. We need to get a better handle of what's around us, and we're panicking about the sixth extinction. There is a sense of urgency, but it doesn't resonate with many people because it doesn't directly deal with people."
Now is the Time
But despite the challenges, Patrinos is adamant that now is the right time for this project, and Lewin is certain that the benefits — much like those of the HGP — will outweigh the costs. "When you think it might be expensive, over 10 years, it's really not that much. The [HGP] started at $3 billion and ended up at $2.7 billion. In today's dollars that's $4.8 billion. And so [we're looking at] doing this project at less than the cost of the HGP," Lewin said. "We're starting to hear there are bloggers asking why we're going to do this when we need the money for other things — well those are the questions that were asked about the HGP, and I don't think there's anyone around today who doesn't see the value in the HGP. [This project is for] every species, and the impact of that on all of biology, not just human medicine, is going to be orders of magnitude greater than [the impact] the HGP had."
Further, he said, just as the HGP positively affected technological invention, so too could the EBP. "Where the innovation needs to happen is first on the sample collection side. We're going to need to collaborate with engineers — I'm thinking drones, robotics. You can get a lot of citizens — citizen science is very important — but I think especially when you get to the oceans and hard-to-get-to places on Earth, I think drones collecting, processing DNA, and maybe even sequencing might be necessary, and that kind of integrated technology development could be driven by a project like this," Lewin said. "And then there are innovations in computing architectures, algorithms, and especially data visualizations that are going to be necessary for a project like this."
Patrinos agreed, saying that new technology is "absolutely" an outcome he sees for the EBP. "The advances that this will trigger will be extremely valuable, not just for this project, but for many others," he added. "After the HGP was completed, there was a natural regression. People said, 'Now we've done that, so let's go back to the way we did business before.' It's time to kick start it again."