NEW YORK (GenomeWeb) – As the number of large-scale research initiatives is growing, the Garvan Institute of Medical Research in Australia has invested in high-performance computing infrastructure from Dell EMC to support its Data Intensive Computer Engineering (DICE) group's efforts to analyze high-throughput genomic data.
Garvan began building its base infrastructure in 2012 and has expanded the system with the help of grants as its sequencing activities have grown, according to Warren Kaplan, the institute's chief of informatics. While Garvan is not disclosing how much it spent on the Dell system, Kaplan said that the latest expansion represents an investment that matches its previous expenditure in high performance computing infrastructure over the years.
Part of the incentive for expanding the infrastructure came from the institute's involvement in several large-scale genomics and precision medicine initiatives, he said. An additional push came from the establishment of a single-cell genomics facility at Garvan in partnership with Israel's Weizmann Institute of Science in 2016 which requires greater capacity to analyze several million single cells, Kaplan said in an interview, "and [we want] to be able to support analytics around that." Furthermore, Gavan supports a bioinformatics community of around 80 researchers internally with various analytics needs that will benefit from the hardware expansion, he added.
Garvan's updated system includes Dell PowerEdge servers, Intel Xeon processors, NVIDIA graphics processing units, and field programmable gate arrays (FPGAs). It offers 744 terabytes of NVM Express to accelerate in-memory processing of genome datasets as well as 41 terabytes of random access memory. "In terms of actual computing cores we've expanded to over 5000 compute cores from our previous 2000," Kaplan said in an email. "However, the overall performance of the new machines is way better than before, and we expect to see far more performance than a linear scaling that the number of cores would suggest."
Prior to selecting Dell's infrastructure, Garvan received several proposals in response to tenders for potential partners, and Dell offered "the most competitive and most complete" proposal, Kaplan said. Garvan currently has a grant from Microsoft that supports genome analyses on the Azure cloud and a partnership with Australia's National Computational Infrastructure for some of its computational needs. However, some of Garvan's analysis needs are not met by its current partners, he said.
"For example, we run both traditional HPC jobs and Spark jobs on our single infrastructure [and] to support this, we have around 20 terabytes of fast storage on each node," Kaplan explained. "Our partners at the national supercomputing facility don't support this type of hardware configuration, or a Spark cluster." It is possible to do these types of analyses on commercial clouds, however, this system is designed to support research projects which often don't have the budget to pay for the cloud costs, he said.
The new infrastructure offers opportunities for Garvan's researchers to explore some new functionality in greater detail. For example, they can use FPGAs to speed up Garvan's iteration of the Broad Institute's Genome Analysis Toolkit and haplotype caller, Kaplan said. "Our attitude has always been [to] take full ownership of everything" from the hardware to the operating systems to the cluster management and software platforms built to run on the infrastructure, he said. "We purchase all our hardware with support upfront [and] we don't purchase any other type of software support, because we make it our business to fully understand and maintain all our technologies ourselves."
The infrastructure will also support Vectis, an open-source software platform that Kaplan and his team designed to help biologists, clinicians, researchers, and data scientists store, analyze, and query genomic data efficiently. Users can search for information by chromosome coordinates, gene names, and annotation. They also have access to a Beacon, which can be used to locate variants from studies in the Global Alliance for Genomics and Health's network. In addition, the developers have created filtering capabilities, currently in beta, that will enable uses to categorize patients based on clinical attributes, as well as search for specific genotypes at the individual level.
The impetus for building the Vectis platform grew out of funding provided by the New South Wales government's office of Health and Medical Research to set up the Sydney Genomics Collaborative in 2014. Through this initiative, the government is funding whole-genome sequencing across three programs, comprising about 10,000 whole genomes.
Programs under the auspices of the collaborative are the Medical Genome Reference Bank, which houses about 4,000 whole-genome sequences from healthy elderly individuals that are intended for use as controls in disease-specific studies; the NSW Health Collaborative Genomic Medical Research Grants Program, which funds whole-genome sequencing projects focused on diseases; and the NSW Cancer Genomic Medicine Program, which focuses on early detection, prevention, and management of cancer.
To support the collaborative's efforts, Garvan developed Vectis platform as a hub for housing datasets from the project, Kaplan said. Vectis is also used by Genome.One, Garvan's wholly-owned subsidiary, to integrate genomic and phenotypic information as part of its whole-genome sequencing services offering.
The platform has also been tapped to support the Australia Genomics Health Alliance (AGHA), an international initiative that brings together 80 organizations to explore applications of genomics within healthcare. It currently houses information from about a dozen patient cohorts associated with AGHA that together have over 8,200 patients and participants. AGHA's iteration of the platform, called Vectis Variant Atlas, currently has sequence data from over 4,600 genomes – about 700 terabytes of data. Earlier this year, the Alliance kicked off a program called Acute Care Genomics, supported by Edico Genome, which aims to bring rapid genomic testing to 12 neonatal intensive care units and six pediatric ICUs in Australia.
Other projects that will benefit from the new hardware include the Lions Kids Cancer Genome Project, a joint initiative of the Garvan Institute, the Lions Clubs International Foundation, and the Australian Lions Childhood Cancer Research Foundation that is focused on providing whole-genome sequencing and analysis for children with high-risk cancers.
The infrastructure will also support the Australian Genomic Cancer Medicine Program, which seeks to identify new treatment options for patients with rare cancers or patients who have exhausted other treatment options, through the Molecular Screening and Therapeutics clinical trials initiative. The program recently received $50 million in funding to expand to all states and territories in Australia.
In addition, the new hardware will support some of the Garvan institute's internal genomics initiatives and projects, Kaplan said. This includes research projects at the Kinghorn Center for Clinical Genomics, which seeks to improve the interpretation of genomes and genome variants to further the use of genomic information in patient care. Researchers from the center recently published a pair of studies that compared the ability of whole-genome sequencing and of more targeted approaches to diagnose hereditary cardiovascular disorders.
"Our enthusiasm for this expansion [is] not just about meeting [the demand] that we have right now," Kaplan said, "but also to be able to design and understand what the actual requirements and best approaches are that are needed." One of the advantages of having the infrastructure in-house is that "we [can] optimize the analysis solutions, and then share our experiences with others, like our partners, the national supercomputing facility, who can adopt our exact specifications," he said.