Skip to main content
Premium Trial:

Request an Annual Quote

Ambitious TGAC Collaboration Looks to Re-Engineer Genomics Research to Predict Phenotypes


CHICAGO (GenomeWeb) – The Genomic Ascertainment Cohort (TGAC), a nascent project of the National Institutes of Health and Falls Church, Virginia-based Inova Health System, is an ambitious attempt to re-engineer genomics research.

Announced at the beginning of March, the two-year TGAC pilot will collect 10,000 human genomes and exomes, then apply analytics to predict phenotypes from genes and gene variants. Researchers then will have the opportunity to examine the DNA sample donors to test the accuracy of the forecasts.

"That's where we really need to go in predictive medicine — sequence someone and use that to predict healthcare," said Leslie Biesecker, chief of the medical genomics and metabolic genetics branch at the NIH National Human Genome Research Institute, which is leading the TGAC pilot.

"We need a research model that matches that. To do that, we have to start with a lot of research participants who are sequenced and then see if we can make predictions and if those predictions are accurate," Biesecker explained.

The project will seek to reverse-engineer data on groups of individuals who have already been sequenced, repurposing genome and exome analyses.

Inova is contributing nearly 8,000 sequences of mother-father-child trios from its extensive Longitudinal Childhood Genome Study, while NHGRI and several other NIH institutes will share 1,000 genome and exome sequences from ClinSeq and other research programs, participants indicated. The other 1,000 sequences will come from newly recruited patients.

Biesecker hopes eventually to include others to create what he called a "virtual supercohort," but for the initial pilot, work will concentrate on the already large collection of 10,000 sequences.

All of the patients included in the TGAC have already given express consent to be called in for re-examination, which is key to the project. Researchers using the database will be able to request that patients with specific genotypes come into the NIH Clinical Center in Bethesda, Maryland, for phenotyping via blood tests and medical imaging.

"What we're doing is coupling [sequencing] to the NIH Clinical Center, which is arguably the best research phenotyping institution on the planet," Biesecker said. "We can then select a sequenced patient, bring them into the Clinical Center, and do research phenotyping on them, because that's extraordinarily difficult to do outside the research context," he continued.

"If I have a finding in the genome that I think predicts, let's say, an MRI finding in an individual, at a private hospital, who's going to pay for their MRI? That's a huge problem. That's exactly what the NIH Clinical Center is designed to do."

Researchers won't call in all 10,000 people, but they will have that cohort to choose from for recontact. "We're allowed to ask. We can contact them and explain to them what we are thinking about and what we're doing." He added that while not everyone will agree, enough of them might to make the endeavor worthwhile.

"It's something that a lot of us in the field have wanted to do, and it's a fantastic adjunct when you can take a sequenced cohort and when you do have the ability to recontact them and follow up with them, you can do more and interesting things with the data," Biesecker added.

He said NIH has been following this model on a pilot basis for several years with the 1,000-person ClinSeq cohort. "They've been really fantastic participants and allowed us to do rather amazing things to try and do these kinds of clinical phenotyping experiments on them and with them," according to Biesecker.

The Inova Longitudinal Childhood Genome Study dates to 2012. Early in pregnancies, Inova offers this research opportunity to expectant mothers, said John Niederhuber, CEO of the Virginia health system's Inova Translational Medicine Institute.

"We follow them through pregnancy," Biesecker said, taking occasional blood and tissue samples for various testing, including genomic sequencing. At the time of birth, the health system takes specimens from the mother, the father, and the baby. Virginia law does require newborn screening for 29 specific disorders. "We just tag along on that little sample for ourselves," Niederhuber said.

For those who have opted into the study, Inova follows up with the families about every six months with a questionnaire on environmental and health factors. Niederhuber reported that Inova has seen an 80 percent success rate in obtaining follow-up survey responses.

This runs for three years, or about the first 1,000 days of life, after which time Inova asks for consent for the families to continue in the study. "We did that just to be sure that from an ethical standpoint we were not imposing on them or not taking advantage of them," Niederhuber said.

Through February, Inova had 3,900 families in the study, and more than 1,400 whole-genome sequences, according to Niederhuber. Counting this and two other family studies, the health system has close to 8,000 WGS records on more than 2,500 mother-father-child trios at a minimum sequencing depth of 40x, and obtains an average of 30 to 40 new consents a month, he said.

This genome database has been on the NIH radar for some time. About two years ago, Biesecker and Richard Siegel, clinical director and chief of the autoimmunity at NIH's National Institute of Arthritis and Musculoskeletal and Skin Diseases, started talking with Niederhuber, a former director of the National Cancer Institute, about the work he was doing at ITMI. Biesecker, Siegel, and other NIH clinical leaders then visited the Inova center, Niederhuber reported.

"We've had many, many meetings to discuss what they were doing with a database called ClinSeq and how they might use that model and expand it significantly by bringing in other genomic databases," Niederhuber said. The NIH leaders also were thinking about how to apply analytics tools to harmonize the different databases for easy searching.

For TGAC, the NIH National Human Genome Research Institute is building and supporting all of the analytics, but the institute has received some significant help from Daniel MacArthur, codirector of medical and population genetics at the Broad Institute in Cambridge, Massachusetts. MacArthur has donated the web architecture and infrastructure for the Genome Aggregation Database, or GnomAD, to NIH for TGAC, Biesecker said.

"We have basically cloned that in the intramural genome project," Biesecker explained. "Now, we're populating it with, initially, the ClinSeq data set, which is what's in it now," then they will add the 8,000 Inova sequences, then pull in additional sequence data from other NIH cohorts. The Broad is not participating in TGAC.

NIH does have to make a few minor modifications to the GnomAD architecture to accommodate these particular data sets, but otherwise, NIH will be using the technology as is. For example, much of the Inova data includes trios of two parents and a child, something that MacArthur had not set up his system for. "We're not reinventing that wheel," Biesecker said.

ClinSeq data has been in since early March. It will take a few months to get the large Inova data sets onto the platform, the NIH geneticist said.

For the pilot, only NIH and Inova researchers will have access to the TGAC cohort. "We're modestly funded to start, and we can't open a floodgate like national or international [researchers] because we know we couldn't handle it. We have to walk before we run," Biesecker said.

Expansion might be in the longer-term plans. "We're starting it with these databases that we have available right now, but we're pretty confident that others will want to contribute their databases to this as well," Niederhuber said.

First, though, the two partners have to prove the concept.

"We're hoping, first of all, to show that this approach is a very viable, useful approach, a way of thinking how to go from a database of genomic information and work our way back to understanding clusters of variations [and] to see how those clusters of variations within the database relate to potential phenotypes," Niederhuber said.

Ultimately, they want to be able to make earlier, more accurate diagnoses, develop new interventions, and better manage patients. "We're all in this business because we want to make a difference in terms of patient care and diseases that affect them," Niederhuber noted.