Skip to main content
Premium Trial:

Request an Annual Quote

Dana-Farber Informatics Team Launches GenoSpace to Link Genomic and Clinical Data via the Cloud


GenoSpace, a Cambridge, Mass.-based informatics startup, has launched a cloud-based platform intended to provide access to a variety of 'omic and phenotype data.

The company, which opened its doors in 2011, has taken advantage, or "lessons" learned, from compute infrastructure development efforts at the Dana-Farber Cancer Institute focused on linking clinical and genomic information, John Quackenbush, a professor of computational biology and bioinformatics at Dana-Farber and GenoSpace's CEO, told BioInform in an interview last week.

One difference between the Dana-Farber informatics efforts and the GenoSpace offering is that the company's "core infrastructure is built from the ground up using a cloud-based approach," Quackenbush said. The company offers what it says is a secure environment based on Amazon's cloud infrastructure for sharing and storing genomic data.

In addition to providing a general data access portal, GenoSpace works with clients to build customized research portals that will enable them to ask particular research questions. For example, a user might want to locate candidates in a disease cohort that have a particular mutational profile required for a clinical trial, Quackenbush explained.

Additionally, "we have built tools focused on the assays that people now regularly run as part of clinical practice in different diseases and the information that we can provide back to them help[s] them use genomic data with other sources of information — scientific literature, for instance — to make an informed decision," he said.

As an illustration, a clinician treating a colon cancer patient with a mutation in the KRAS gene could consult medical literature for treatment options and use that information to choose an optimal therapy, he said. The labels for the colorectal cancer drugs Vectibix and Erbitux recommend KRAS testing to pick out best responders.

GenoSpace is also considering pulling in publicly available information that it will make available to customers, he said.

Through its platform, the company seeks to establish online communities that connect physicians, individuals, and researchers who are interested in using clinical and genomic information; as well as to provide analysis and interpretation tools that support personalized medicine.

GenoSpace has begun developing a bespoke research portal for an undisclosed disease foundation, and is holding talks with a contract research organization and a pathology group, whose identities also aren't being disclosed, Quackenbush said.

He added that the company is "very interested" in signing new partners who would like to take advantage of its infrastructure.

In terms of pricing, the company has adapted a transactional pricing model where the costs of access to data storage and analysis tools vary depending on the specific needs of the customer in question, he said.

Besides disease foundations, CROs, and pathology departments, other likely clients for GenoSpace's cloud power include pharmaceutical companies and information vendors, he said.

The Evolution of GenoSpace

Quackenbush co-founded GenoSpace with Mick Correll, the associate director of Dana-Farber's Center for Cancer Computational Biology. Correll serves as the company's chief technology officer.

Although the company's doors officially opened in 2011, "the system started to come together in a very real way earlier this year," Quackenbush said.

GenoSpace is intended to address what its founders see as a research bottleneck — "having the tools that you need to move all this data around" as well as to "get it into a useful form and pull out the bits that you need."

He said the company is built on "a philosophy that we've been developing here at Dana-Farber for a long time," which is to make data both "useful and usable."

"First, you have to make sure that there is information of value that is being provided--that is the 'useful' part. We have partners who can provide outstanding, reliable content that we can link to genomic alterations," he explained.

But that’s not all. "You also need the data and tools to be useable," he continued. "One of the groups we worked with sat down with us and opened the conversation by telling us that if we were going to show them another R command window as the portal to analysis that we might as well stop. What we showed them were tools that were designed to answer questions that users had in ways that most biologists found quite accessible."

When it comes to handling genomic data "you need both," he said. "Pretty interfaces with poor quality analysis won't fly. And outstanding analyses that nobody can run isn't of much value either."

Quackenbush told BioInform that the road leading to GenoSpace's launch began in 2006 when Dana-Farber received an Oracle grant that it used to build an integrated genomic and clinical data warehouse.

The grant was a two-year, $1 million Oracle Commitment Grant that the center used to develop a data warehouse that would store patient information securely. Around the same time, Dana-Farber partnered with InforSense — now owned by IDBS — to develop translational research informatics infrastructure that would give its researchers access to the clinical and experimental data in its repository (BI 4/18/2008).

"That early experience taught us a lot about things that worked and things that didn't work well in linking clinical and genomic data," Quackenbush said.

Dana-Farber later received a grant from the National Heart, Lung, and Blood Institute to build data collection and analysis infrastructure for the Lung Genomics Research Consortium, which would add genetic, genomic, and epigenetic data and analysis tools to an existing clinical biorepository at NHLBI (BI 2/11/2010).

That project, according to Quackenbush, "got us really involved in [the] process of trying to understand how we integrate clinical and genomic data to drive our understanding of disease."

Through that experience, "we recognized that there were some limitations in what we were doing," but, he added, on the plus side, the group also realized that "there was real potential to try and take what we had learned and apply it more broadly."

Furthermore, as the cost of sequencing continues to fall, putting access to genomic information well within the reach of an increasing number of individuals, "we recognized ... that having access to that kind of data was going to change the way we had to think about managing information for genomic studies," Quackenbush said.

That led Quackenbush and Correll to "reengineer and rethink some of the aspects of the systems we had built that were problematic and to really think through some of the ramifications of having access to genomic sequence," he said.

One of those ramifications that required consideration was the fact that "the whole concept of de-identification is rendered moot by having genome sequence data," he said.

That's because "whole-genome sequence data is, by definition, identifiable," he explained pointing out that about 250 kilobases of sequence has enough SNPs to uniquely identify someone.

"What that means is that the data should be reliably encrypted, yet accessible, so that one can control access to the data and safeguard the security and integrity of the data," he said.

As a result, the first building block of GenoSpace's infrastructure was "a secure data archive" that allowed "controlled and rapid access" to the data.

That ruled out relational databases, Quackenbush said. "What you want is ... a modern web-based architecture that can scale to cloud-based infrastructure."

The next step was to ensure that the resource addressed the needs of the target user communities — patients and physicians, disease foundations, pharmaceutical companies, contract research organizations, and research scientists.

Besides access to data, these groups require tools that allow them to ask questions that are of interest to them and further their research efforts, Quackenbush noted. Therefore, developing a well-rounded tool meant understanding the kinds of research questions that users might want to ask of the data and then building tools that would allow them to do that.

For example, research scientists might want to explore the activity of a gene or pathway of interest in a cohort. GenoSpace enables that "by allowing them to define cohorts of their own" based on clinical and genomic information, he said.

The cloud also caters to the needs of larger collaborative projects that generate large quantities of information and require a solution that allows participants to move data freely amongst partner institutions and researchers, he said.

"That's a problem we are seeing time and again" in these large projects, he said.

Quackenbush acknowledged that companies like DNANexus and Illumina that also offer cloud-based genomic storage solutions are also attempting to tackle issues of data access and movement.

DNANexus, in particular, will likely have its eye on GenoSpace, whose offering competes with its own. Quackenbush pointed out, however, that DNANexus has historically catered to research markets and has only recently tried to break into the clinical arena.

In response to whether GenoSpace would offer some sort of consulting or data analysis service to customers, Quackenbush said that it isn't "something we thought about doing as a business."

Part of the underlying reason for that decision is that Dana-Farber already offers a genomic data analysis service through its Center for Cancer Computational Biology, which Quackenbush directs.

"I don't want to compete with what I am doing at Dana-Farber," he said. "What we want to focus on is the infrastructure to allow people to do research."