Skip to main content
Premium Trial:

Request an Annual Quote

Large-Scale Clinical Sequencing Projects Report Progress, But Challenges Remain


ROCHESTER, Minn. (GenomeWeb) – Large-scale sequencing projects that straddle the line between research and clinical are progressing and starting to bear fruit that is making a difference in patients' lives, researchers reported this week at the Individualizing Medicine conference hosted by the Mayo Clinic.

Leaders from Genomics England's 100,000 Genomes Project, the MSSNG Autism Genome Sequencing Project led by Autism Speaks, University of Toronto, and Google, and the US federal government's Precision Medicine Initiative touted the advantages of whole-genome sequencing and discussed their strategies for engaging participants, who unlike in research studies of the past, expect to receive information about their genomes.

The researchers said one of the main challenges they grapple with is figuring out the best practices that serve both research and clinical needs. They must both simultaneously keep sequence data secure and private, while also maintaining the ability to share data for research. In addition, there are still technical challenges, particularly in obtaining high-quality DNA from tumors.

By the numbers

Genomics England's 100,000 Genomes Project is about 10 percent of the way through its sequencing, while the MSSNG Autism Genome Sequencing Project led by Autism Speaks has sequenced around 7,500 of 10,000 genomes. Meantime, the US's 1 million participant Precision Medicine Initiative is gearing up to launch later this year or early next year, researchers said this week.

Mark Caulfield, chief scientist at Genomics England, said in a presentation at the conference that it had received 12,000 genomes and generated 1 petabyte of primary data, including from 200 million germline variants and 48 million somatic variants. It has also recorded around 70,000 human phenotype ontology terms in an effort to standardize phenotype reporting.

The 100,000 Genomes Project includes three sub-projects: rare disease, cancer, and infectious disease. It has completed a rare disease pilot, which included data from 15,000 total individuals, including 4,800 affected individuals and their family members; and it is in the process of scaling that up to 16,000 affected individuals. Thus far the three major categories of rare disease have been related to intellectual disability, neurology, and cardiology. In addition, the researchers are collecting secondary data, such as the number of hospital episodes participants had experienced, which was 250,000.

"Imagine the cost of this," Caulfield said, explaining that one goal of the project is to enable earlier diagnoses of disease, which would both "change the course of patients' disease" and "avoid huge costs to healthcare," he said.

The Genomics England team has also sequenced 3,000 multidrug-resistant strains of Mycobacterium tuberculosis, and the National Health System is now implementing a sequencing-based diagnostic for tuberculosis. The cancer arm was put on pause while the team worked out technical issues related to obtaining high-quality DNA from tumor tissues, but Caulfield said the group has now done a series of pilots, including one of chronic lymphocytic leukemia, with 164 tumor/normal samples sequenced and 150 more going through the pipeline. The CLL cases will be one of the first publications to come from the cancer arm of the project, he added. "We're seeing somatic mutations that trigger a relapse and also the potential to identify personalized care," he said of the CLL cohort.

Stephen Scherer, director of the Centre for Applied Genomics at The Hospital for Sick Children, also known as SickKids, in Toronto, said that of 7,500 sequenced genomes as part of the MSSNG project, 5,200 have been annotated. And, aside from whole-genome sequencing, the team is running microarrays, since they sometimes identify copy number variants that sequencing misses. Thus far, the researchers have identified 64 autism risk genes, 13 of which are novel. Copy number variants affect around 5.8 percent of patients, and some patients have both a CNV and a point mutation, he said.

Participants: not just research samples

A key component of the three projects is that the participants are more than just research samples. In all cases, participants can opt to get sequencing results back.

For both the rare disease arm of Genomics England's project and Scherer's autism sequencing project, having samples from the affected individual and both parents helps in obtaining a diagnosis.

For about half the cases in Genomics England's project, the team has data from both parents, while 22 percent of cases include two related individuals, 23 percent of patients are just the affected individuals — typically because the patient is an adult — and for 7 percent of cases, the researchers have data from four or more family members.

Already, Caulfield said, the project has made an impact. For example, he described one girl who was just over 4 years old when she entered the study. She had begun exhibiting signs of developmental delay and intractable seizures at 4 months. Drugs that treat epilepsy did not help her seizures, and her developmental delay continued to progress. Her parents had been searching for an answer for four years when the family's genome was sequenced.

The Genomics England team identified a de novo mutation in the glucose transporter 1 gene, which was preventing her from being able to transfer sugar to her brain. The solution was to put her on a high-fat diet. The brain was able to break down the fat into the sugar it needed. Her seizures stopped and she even showed some improvement in her developmental delay syndromes. Caulfield said that this case highlights the potential of whole-genome sequencing for rare disease, and especially for doing the sequencing sooner rather than later.

"If we could have discovered that at 4 months, what difference would that have made?" Caulfied asked.

SickKids' Scherer said that there is a close collaboration between the researchers and the clinicians in his autism project, which he acknowledged would be difficult to scale up to the size of a project like the 100,000 Genomes Project. There are many "challenges of the delivery of information," he said. Aside from ensuring that families understand the implications of the information itself, the timing of the delivery is also crucial. Families with autistic children are likely going through a number of challenges, so it is important to deliver the information when they are able to handle it, he said.

Nonetheless, he said, whole-genome sequencing is making a difference in patients' lives and helping to point to potential treatment. At the very least, he said, sequencing could help diagnose autism earlier, when interventions are the most successful. "There are no effective medications for the core autism features," Scherer said. "We need new drug targets," and genomics could help identify those targets.

One possibility for a future clinical trial, Scherer said, would be sleep medications. Some of the novel genes that the group has identified as affecting autism are also related to the circadian clock, he said. "That opens up the possibility for treating sleep disorder complications," he said.

The next step in Scherer's autism sequencing project is to develop a cloud-based database and tools in collaboration with Google. The goal of that is to make the data accessible to other laboratories studying autism or a related disorder. Eventually, he said, he wants to make the data available to all investigators and also to develop a public portal that will be accessible by the families themselves.

Engaging participants and their families is a key aspect of the US's Precision Medicine Initiative as well. As part of the project, Kathy Hudson, deputy director for science, outreach, and policy at the National Institutes of Health, said that participants would stay engaged with the researchers for one year. They will "provide lots of information to us, and in return, we'll provide lots of information to them," she said. The PMI is looking to recruit individuals through healthcare organizations as well as to enable individuals to sign up directly on their own.

Initially, Hudson said, the data would be "high-value data" from participant questionnaires, EMRs, a baseline physical exam, as well as 'omics and other data from biospecimens. Eventually, though, she said, the group wants to incorporate data that can be extracted from mobile and wearable technology, as well as geospatial and environmental data.

Because the project wants to gather so much data from so many individuals and also have a cohort that is representative of the diversity in the US, Hudson acknowledged that one main challenge will be engaging communities that have typically been underserved and not involved in research. "We know that some people have a deep-seated distrust" of the government, she said. However, she said, "we know that some people are eager to have a seat at the table as co-collaborators," she said.

Hudson added that another way the team is looking to build trust and buy into the program is by focusing on issues that matter most to the participants, rather than those that might be the most cost effective from a research standpoint.

One component of the PMI's infrastructure will be a Participant Technology Center. The center will be led by Eric Topol at the Scripps Research Institute, but will also work with patient advocacy groups like PatientsLikeMe. Other partners will include Vibrent Health, Sage, and Walgreens. The center will be the point of contact for the direct-access individuals and will include a point of contact for individuals to sign up for the program and to assure that they go through the initial steps of the physical exam and specimen collection. But, in addition, Hudson said, the center would be testing the wearable technology component of the project.

Another issue regarding trust will be convincing participants that their data will be safe. With the number of high-profile hacking incidents this year, that will be a hard task. Data security "is an important question," Hudson said. "We have to put in place the best security systems we can, but we also know and have to tell people that there's no perfect system."


Regarding the cancer arm of the project, Caulfield said that the researchers paused that program while it tried to figure out protocols for sequencing DNA from tumor biopsies. Typically, pathologists take the biopsy and then formalin fix and paraffin embed it, which is "great for preserving tissue, but bad for DNA," Caulfield said.

He said that researchers developed protocols to optimize the process and also implemented protocols so that after pathologists took a biopsy, some of that biopsy would be fresh frozen. He said they were able to convince pathologists to go this route after doing a pilot comparing sequencing results from fresh-frozen tissue, the optimized FFPE protocol, and the standard FFPE protocol. The difference in sequence data between fresh frozen and the standard FFPE was dramatic enough to convince researchers to go with fresh frozen when possible.

Another challenges is data standardization and interpretation — both the phenotypic data and the sequence data. Caulfield said one major advance of the project was that for the rare disease pilot, the researchers had defined 56,000 human phenotype ontology terms, including nearly 13,000 that indicate the presence of a specific feature and 43,000 that indicate the absence of a feature.

A portion of the Genomics England project is its Clinical Interpretation Partnership, (GeCIP), which seeks to bring together clinicians and researchers to more quickly interpret sequence data. The GeCIP is collaborating with researchers from 300 different institutions from 24 countries working in various disease-specific domains to interpret whole-genome sequence data. After a genome is sequenced, the data is sent through the GeCIP network to work on interpreting it.

Both the PMI and the Autism Speaks project are collaborating with Google on developing data infrastructure, while Hudson said the PMI is working with Verily, Google's life science branch, to develop cloud-based data centers.

She also said the PMI is working with researchers at Genomics England to implement some of their best practices regarding their methodologies for making data available for both research and clinical purposes.

In addition, Hudson, Caulfield, and Scherer all acknowledged that to get even more value from their respective projects, the data should be in a format so that they could be shared, compared, and studied between the projects.

"The sequencing is the easy part," Caulfield said.