![Regeneron Genetics Center Regeneron Pharmaceuticals](https://crain-platform-genomeweb-prod.s3.amazonaws.com/s3fs-public/styles/230x150/public/regeneronsequencers.jpg)
NEW YORK (GenomeWeb) – Three years after Regeneron Pharmaceuticals began sequencing human exomes on a large scale and analyzing them along with electronic health records, the effort has started to bear fruit in the form of new drug targets and additional insights into existing ones.
Over a short time span, large-scale human genetics, a term the company prefers over 'genomics', has become an indispensable tool in its approach to drug discovery. "It's so important to us, it's intricately integrated into everything we do," said George Yancopoulos, Regeneron's co-founder, president, and CSO, during a recent media event at the company's sprawling campus in Tarrytown, New York.
As a result, he said, the firm plans to double its investment in the Regeneron Genetics Center (RGC), the wholly-owned subsidiary it founded in 2014 to get the effort underway. Last week, the RGC celebrated sequencing its 250,000th exome, and the plan is to do at least as many next year. The ultimate goal is to sequence some 5 million to 10 million individuals within a decade or less, and the company is busy identifying additional collaborators with suitable cohorts to reach that goal.
In particular, the RGC has been mining its exome data for mutations with large effects on a person's health or phenotype — either increasing their risk for disease in a big way or protecting them from disease. "We're looking for X-Men and -Women," Yancopoulos said. "We are looking for people who have mutations that make them special."
"Our mission was to apply large-scale human genetics to find these large-effect genes that would drive all the new targets in this company," said Aris Baras, vice president and co-head of the RGC. That mission appears to have been fulfilled: "It's how we do R&D now every day," he said. "It's every target, it's every program, it's every therapeutic area. Every clinical development program has a heavy and huge dose of human genetics in every part of it."
This includes, for example, finding "human knockouts" with loss-of-function mutations in a particular gene. If that mutation is protective for a certain disease, the idea is to create a drug, such as an antibody, to block the protein product of that gene, which would presumably have a similar effect as the knockout.
Regeneron has a track record of developing drugs that are based on human genetics discoveries: For example, its Praluent (alirocumab) antibody, which inhibits the PCSK9 enzyme and is used to lower LDL-cholesterol levels in patients with atherosclerotic cardiovascular disease or familial hypercholesteremia (FH), came out of human genetic studies that linked mutations in PCSK9 to FH and deletions of PCSK9 to very low blood cholesterol levels.
Finding individuals with such large-effect mutations requires access to many samples, since they are rare. It also requires generating DNA sequence information on a big scale and having access to deep medical and phenotyping data, Yancopoulos said. What has been rate limiting is not the ability to sequence, he said, but to prepare samples for sequencing and to find the right study cohorts.
Finding the right partners
To that end, the RGC has formed approximately 55 research collaborations of various sizes for different types of projects, starting with Geisinger, a large integrated health system serving more than 3 million patients, mostly in Pennsylvania. Regeneron is adding between 20 and 30 new collaborations per year, Baras said, both small and large ones.
Geisinger runs the MyCode Community Health Initiative, a precision medicine project that has been collecting blood and other samples from Geisinger patients in a biobank for genomic analysis. As part of its 2014 collaboration with Geisinger, the RGC has been sequencing the exomes of almost 100,000 samples so far and has been analyzing them along with patients' de-identified electronic health records. Last year, the two partners published initial results in Science from the so-called DiscovEHR study that involved 51,000 patients, and Regeneron's goal is to sequence at least 250,000 exomes from Geisinger patients.
Geisinger in 2015 started to return actionable results in a limited number of genes to patients after validating them in a CLIA-certified laboratory and has found that this can have clinical utility, for example by identifying cancer early in patients with increased genetic risk.
Returning results to patients is not Regeneron's main interest, though, but merely "a useful byproduct" of the process, Yancopoulos said. Instead, the company has been funding sequencing and sample collection for the project in order to identify new actionable targets. "We're in it to create the world's most powerful genetics database where we can get the ideas of the future," he said.
What has made the Geisinger collaboration especially compelling is the quality of the health system's electronic health records. EHRs are predominantly used for billing purposes these days, Yancopoulos said, and doctors frequently enter false diagnoses to get certain tests or procedures covered by insurance, which he said "contaminates the health record." With Geisinger, on the other hand, which is both payor and healthcare provider, such conflicts of interest don't exist, and as a result, the information in their EHRs is more reliable.
Another collaboration, on an even larger scale, is Regeneron's partnership with the UK Biobank and GlaxoSmithKline, which was announced earlier this year. The goal of that project, which involves lots of phenotyping information, including imaging, is to sequence all 500,000 participants.
Besides collaborations that involve unselected populations, Regeneron is also engaged in more focused types of projects. These include family-based studies of Mendelian diseases; projects involving founder populations, such as the Amish; and patient cohorts for diseases that are neither rare nor common, such as autoimmune and neurodegenerative diseases.
Baras said the company has accumulated cohorts of at least 10,000 patients for several dozen diseases, which it plans to sequence next year. "It's not going to work everywhere, but playing the averages, we're going to see a lot of big signals there, and that is going to lead to a lot of new and exciting programs," he said.
Identifying X-Men and X-Women
Already, the existing collaborations have yielded insights that may ultimately result in new drugs. Regeneron's philosophy for selecting which leads to follow, Yancopoulous explained, is to go where the biology takes the researchers rather than to focus on pre-defined therapeutic areas based on market size or the company's existing drugs and commercial infrastructure.
According to Baras, the RGC has already identified 40 to 50 large genetic signals from its studies. "We're pushing them into the Regeneron target biology process, we're making the [mouse] knock-outs, the knock-ins, … we're thinking about the therapeutic strategy," he said.
So far, the data has led to a number of new drug targets, for example for nonalcoholic steatohepatitis (NASH) and inflammatory bowel disease. Yancopoulos said that NASH, which is often associated with obesity and type 2 diabetes, is becoming the leading cause of liver failure and liver transplants. RGC data, he said, uncovered a genetic pathway that seems to protect individuals from NASH. "There are 'X-People' walking around that are protected from this despite having obesity and fat deposits in their liver," he said.
Also, at the American Society of Human Genetics annual meeting last month, Regeneron and Geisinger presented data on human gene knockouts in 61,000 exomes from their study. In all, they found almost 6,700 individuals who had a putative knockout mutation in one of almost 1,700 genes.
Looking at their EHR data and clinical phenotypes, the researchers found associations between knockouts in CRHR2, which encodes corticotropin-releasing hormone receptor 2, and hypertension as well as anxiety disorder. In addition, they found relationships between GCKR knockouts and morbid obesity and chronic nonalcoholic liver disease; between CTNS knockouts and cystinosis; and between NPHP4 knockouts and kidney disease.
Furthermore, RGC data have validated the targets of existing drug candidates and have provided evidence for novel indications. This spring, for example, Regeneron researchers published a study in the New England Journal of Medicine that looked at almost 60,000 patients from the Geisinger study and uncovered loss-of-function mutations in the ANGPTL3 gene that led to low blood lipid levels. It also found that the anti-ANGPTL3 antibody evinacumab, which Regeneron has in Phase II clinical trials for the treatment of FH, decreased lipid levels and atherosclerotic lesions in mice.
Another drug target that RGC data have helped to illuminate is IL-33, against which Regeneron is developing an antibody, REGN3500, to treat inflammatory diseases. That drug is currently in a Phase I clinical trial for the treatment of asthma.
Exomes still trump genomes
Exome sequencing remains the approach of choice for the RGC, Baras said. The overwhelming majority of pathogenic mutations in Mendelian diseases, which usually have large effects, reside in coding regions, while non-coding regions remain difficult to interpret. Also, he said, there is still a cost differential of almost 10:1 between sequencing genomes and exomes, and the additional cost does not justify the incremental gain in biological insight. Further, all of Regeneron's existing drugs or drug candidates that came out of human genetics involved mutations in genes. While whole-genome sequencing undoubtedly enables interesting science, "we haven't come up with any really worthy reason to spend that money to go after something that could tell us something informative, from a translational therapeutic insight," Baras said. However, the company does genotype each sample with a whole-genome microarray, which, combined with imputation, delivers some information on non-coding regions.
While Baras declined to disclose Regeneron's internal cost per exome, he said that costs have declined over the past years, which "has been a huge factor in terms of our ability to scale up and take on more projects." Sample prep automation has contributed to that cost reduction and has allowed the company to "physically handle" a large volume of samples, he said, as well as making the data quality more consistent.
On the technology side, the RGC, which has a staff of 60 to 70 employees and moved into a new building on Regeneron's campus in 2015, has seen several recent upgrades.
This summer, it installed a new robotic system, built internally with parts from third-party providers, to capture exomes and prepare sequencing libraries in high throughput. The new system, which has an automated refrigerator and increased consumables capacity, is capable of prepping several hundred thousand samples per year and can run unattended for several days. A second liquid handling system that will come online shortly will double the capacity, according to John Overton, senior director and head of sequencing and lab operations at the RGC.
For exome capture, it has moved to reagents from Integrated DNA Technologies, and for pre-capture, it now uses custom kits from New England Biolabs and Roche's Kapa Biosystems.
Earlier this year, the center also started to replace its existing Illumina HiSeq 2500 sequencers with NovaSeq 6000 instruments, installing two NovaSeqs in the spring and another two just a month ago. Each NovaSeq is equal to five or six HiSeqs, Overton said, and the RGC has found it easy to transition between the two platforms.
Eight of 15 HiSeqs currently remain in the lab but the plan is to fully transition to the NovaSeq in early 2018, he said. The RGC uses the NovaSeqs to generate 75-base paired-end reads, which takes about 20 hours per run. Overton declined to disclose how many samples the company multiplexes per run but said that the new S4 flow cells, which he plans to start using early next year, will increase the instrument's capacity about threefold. According to Baras, the RGC is well on its way to be able to sequence 400,000 to 500,000 samples per year.
Yancopoulos said Regeneron has come a long way since he cloned and sequenced the very first gene at the company — a nerve growth factor gene. The results the RGC has delivered already have far exceeded his expectations: "I was hoping for one important thing after five years," he said.
He said he hopes Regeneron's human genetics approach will be adopted by others and will accelerate the drug discovery process in general. Last year, for example, the FDA approved only 22 new drugs, he said, of which just eight represented a new class of drugs, five of them for rare diseases.
"It would be great if a lot of other people really started adopting and incorporating these large-scale human genetic approaches to everything that they did because then maybe the productivity of the whole industry would go up, and we need that," Yancopoulos said.