Skip to main content
Premium Trial:

Request an Annual Quote

Sanger Institute Launches New Program to Perturb Genotypes, Predict Phenotypes


NEW YORK – A new program at the Wellcome Sanger Institute will attempt to understand and predict the effects of changes to human genomes and to engineer biological systems.  

Announced earlier this month, the Generative and Synthetic Genomics Program will join other Sanger efforts such as the Human Genetics and Tree of Life programs. It plans to bring together computational and experimental scientists to understand and predict the effects of editing DNA by generating large amounts of genomic data to feed into computational models that will predict their effects, such as the impact of mutations on disease. The researchers will also develop technologies to write and edit genomes.  

"Biology has accelerated to a point where a PhD student today can perform more experiments on genes and proteins than the entire global research effort could a decade ago," said Ben Lehner, a professor at Sanger who will head the new program. "Plus, we can develop highly predictive models that use artificial intelligence. It will be the combination of these technologies that will enable us to solve the fundamental question of how genetic sequence determines the properties and regulation of proteins. To do this we require huge amounts of data, and the Sanger Institute's capabilities of large-scale data generation and genomics expertise make it the natural place for us to undertake this ambitious research." 

"This basic direction is where the field is going and needs to go," said George Church, a researcher at Harvard Medical School whose lab has long pursued projects that marry computational and synthetic biology. "It's time to really be scaling this up, and Sanger has shown time and again, it's good at scaling up." 

Like many, Lehner was locked out of his lab during the COVID-19 pandemic, giving him a lot of time to think. "I was spending far too much time with a five-year-old… and was discussing with two or three friends via Zoom what we should really be doing next in biology," he said.  

"There are two key technological advances that we believe make this a special time in biology," he said. "First, DNA synthesis and sequencing now allow us to perform millions of highly quantitative perturbation experiments in a way that has never been possible before. Second, the world has become very good at using artificial intelligence to build predictive models when data of sufficient scale and diversity exists. We are also starting to learn how to extract knowledge and understanding from these models." 

"It quickly became obvious that the Sanger Institute would be the best place to locate such an effort as the institute has a unique track record in large-scale data production and the analysis of big data," he said.  

He moved from the Centre for Genomic Regulation in Barcelona, Spain to Sanger in 2022.  

Generative and Synthetic Genomics will be the first new program at Sanger since the Tree of Life, which launched in 2019 to investigate biodiversity and create eukaryotic reference genomes. It is being launched with "redeployment" of existing funding at the Sanger Institute as well as supplemental funding from the Wellcome charitable foundation. Sanger did not disclose how much funding has been made available.  

"I think it's easy to overestimate how much is needed," Church said. "I think there's a tendency for big genomics groups to throw a lot of money when what is needed is a new way of thinking." 

At launch, Lehner will pair with Leopold Parts' lab as the first two "core" groups, joined by several associated faculty. There are open calls for three additional core groups, Lehner said, and he predicted that the program could soon grow to about 100 people.  

"We are really looking for labs that will take on bold transformative projects," Lehner said. "Our goal is to generate the data and models that will allow biology to be engineered as easily as software, electronics, and cars and to greatly accelerate the understanding of genomes and the development of therapeutics." 

A key goal is to bring together computational and experimental labs, he added. "We believe this is crucial to make more rapid progress." 

The program will employ several strategies. One is to convert all its assays to use next-generation sequencing, which is "very cheap, very accurate, has a huge dynamic range, and allows measurements to be massively multiplexed," Lehner said.  

They will use these assays to look at phenotypes that are directly encoded by DNA sequence, including the properties, regulation, and interactions of proteins and RNAs.  

Lehner noted that synthetic DNA is an important technology for the program. "New technologies for longer, cheaper, and more multiplexed DNA synthesis would transform what we can do and we are very interested in any company trying to achieve this," he said.  

The program will also interface with legal and ethics experts. The Sanger Institute said its policy team has already carried out initial research to consider the ethical implications of creating synthetic genomes. The researchers will build on this work to proactively consider the implications of this new program and develop processes for responsible governance and wider engagement, the institute said.