In May, Toby Bloom became the deputy scientific director for informatics at the New York Genome Center, a role in which she will be in charge of developing and implementing the analysis pipelines, infrastructure, and services that will support the center's projects.
Before she came to NYGC, Bloom was the director of informatics for the genome sequencing platform at the Broad Institute. In this role, she led efforts to develop infrastructure for analyzing, visualizing, and managing genomic sequence that has been used in large-scale research projects at the institute and in the scientific community at large. Prior to the Broad, she was chief technology officer for Clinsoft Corporation and executive director for Phase Forward.
This week, BioInform spoke with Bloom about informatics infrastructure plans for the NYGC and her decision to leave the Broad after more than a decade there. What follows is an edited version of the conversation.
I'll start with the obvious question. Why did you decide to leave the Broad?
The New York Genome Center is a startup and there is just such an opportunity to have an impact here. I really like the fact that it's very clinically focused. The NYGC is a consortium of 12 founding hospitals and we now have some associate members as well. We are collaborating on patient studies with hospitals all the time, trying to figure out how genomics can be applied in the clinic. That’s the center of everything. That seemed really exciting to me.
How long were you at the Broad?
I was at the Broad for 11 years.
What are your responsibilities as NYGC's deputy scientific director for informatics?
I'm responsible for all of the bioinformatics services, which is all the analysis we’re doing on the sequencing done here and the analysis consulting for the collaborating hospitals, as well as all of the methods development. I have a team of computational biologists working for me and I’m responsible for the whole bioinformatics infrastructure, the software engineering, and the research computing and IT.
What sort of bioinformatics services will the NYGC be providing?
Most of the sequencing we do here is part of collaborations with researchers at the hospitals. We also do some fee-for-service sequencing. On all of the sequencing done here, we provide analysis through variant calling and annotation, and we will do interpretation if requested. Also, there are lots of researchers and physicians at our collaborating hospitals who are now seeing that maybe they could use genomics as part of their studies. So we are providing a lot of front-end help and consulting on that as well – everything from what kind of sequencing would be best for their study, and how to set up the study, through the analysis.
You mentioned that you'll be doing some methods development. Can you elaborate on what that will involve?
We are not … saying we are going to do methods development in any one particular area. Rather, we’re taking the approach of [determining what] we have to do to analyze these genomes or this sequence or get the results for this project. Where we find that we need to add to [existing] methods, then we do. Often that starts out as more manual work, exploring the data, and then eventually crystallizes into more standardized methods.
One thing we’re working on now is longitudinal analysis of RNA-Seq data, which is related to our pilot autoimmune disease project. Another area we’re interested in is integrating and analyzing disparate kinds of data. We're doing a lot of work right now in comparing existing methods and trying to figure out why the results you get from them are so different and why you get better results from one method versus another. That may also lead us to some methods development.
Let's talk a bit further about what methods you're comparing and how you are conducting the comparisons.
This actually started as trying to figure out what the best results were that we could get from analysis. We’ve been looking at somatic variant callers, structural variant callers, and fusion transcript tools for RNA-Seq. If you look at any three tools in one area, you realize you get very different answers from each. Sometimes it's obvious why you are not getting the same answers, and sometimes its not. We’re trying — like everyone else is — to figure out how to resolve the differences, or under what circumstances to use one versus another.
We're just starting to build our automated pipelines right now and we’re trying to decide how to standardize. We'll often run several callers in the pipeline and give our customers back all of the different answers. They have all the data and if they think one is more trusted than another, they can use it.
Will you offer bioinformatics services to groups that aren't part of the NYGC consortium?
Yes, on a limited basis we will provide sequencing, bioinformatics, compute, and storage services as a service to non-members. Organizations can also become members which gives them access to NYGC scientific initiatives. [Also] we will host large datasets for the community.
Do you know which datasets you'll be hosting?
That’s open right now. There is an Alzheimer's dataset that we'll probably bring in; there's likely to be an autism dataset, and there is some autoimmune data we are bringing in. There are different consent issues on all these.
What sort of hardware do you have set up?
We have a standard Isilon parallel file cluster, and a Linux compute farm. It’s a very standard data center organization, but with the ability to provide compute and storage services for our members, and to host large datasets that many members would like to access, we become something of a community bioinformatics cloud.
Any reason you're not looking at the cloud?
A lot of our data can't go on the cloud for security reasons. The other reason is that I've done a lot of testing on the cloud. I had an NHGRI grant to move the Broad's primary pipelines to the cloud. I don’t feel like the standard cloud architecture is very amenable to sequencing pipelines. You'll see papers out there about how any one method works well on the cloud. When you string together 12 or 24 of those, all of a sudden the cloud isn't so efficient anymore. It doesn’t have some of the flexibility we need.
You mentioned earlier that you joined NYGC because of its clinical focus. Have you come up against any unique challenges of handling data in this space?
Security is a big challenge. I think we're still learning what the best way is to start using genomics in the clinic.
What are some projects you've worked on so far?
We’re working on several projects in a variety of areas. We’re preparing to start large clinical studies on glioblastoma, and on autoimmune diseases. And I’ve been collaborating with a number of hospitals on a clinical data warehouse project. The autoimmune study is across multiple diseases, including rheumatoid arthritis, Crohn's disease, multiple sclerosis, and possibly lupus. We want to see what changes when there are flares in these diseases. When patients have a flare, we are going to look at baseline expression levels and then at the levels immediately preceding, during, and after the flare.
The ability to do longitudinal genomic analysis on patients is something not a lot of people have done. This may be a very different way of finding the mechanisms of disease. It also might give us early indications of when a flare is going to happen, in which case the doctor has the opportunity to intervene earlier and possibly lessen the severity or prevent flares. So we are going to see if we can find a way to predict flares earlier and to find better information about what the expression changes are during disease stages.
For the glioblastoma study, we’re going to enroll a number of recently diagnosed patients in the New York area. We’ll do whole-genome and RNA [sequencing], and we’ll try to match mutations in each patient to a combination of drugs. This is all about researching how genomics can best be used in the clinic.
We’re also in an informatics collaboration with six hospitals. We've submitted a proposal to host a repository for basic, anonymized clinical data that we’ll receive for large numbers of patients who are pre-consented. That will enable us to do easier cohort identification for new research studies and more easily do comparative effectiveness research across large numbers of patients. And we'll have the ability to collect data for prospective studies as well.
Do you have sufficient staff in house or are you hiring?
We are hiring. I am looking for bioinformatics scientists, software engineers with or without genomics experience, IT people, [and] bioinformatics programmers.