Name: Andrew Brooks
Title: Chief operating officer, RUCDR Infinite Biologics
If 2012 was the year of the exome array, then 2013 just might be the year that biobank arrays dominate the market. Last month, Affymetrix and the UK Biobank announced an agreement that will see the array vendor genotype UKBB's collection of 500,000 samples (BAN 3/26/2013). In addition, Illumina last year launched a biobank genotyping array; and last week Fluidigm and the Rutgers University Cell and DNA Repository, one of the largest biorepositories in the US, disclosed that they would launch a biorepository-focused SNP panel later this year (see related story, this issue).
According to Andy Brooks, chief operating officer of RUCDR — which was recently rebranded as RUCDR Infinite Biologics — there will be more large biobank array projects to come. Brooks joined RUCDR eight years ago from the University of Rochester Medical School, where he was director of URMS' Functional Genomics Center and Microarray Analysis Core, and brought his knowledge of new technologies with him. In his time at RUCDR, Brooks has worked to automate and develop the repository's service infrastructure to provide high-throughput sample management and analysis for DNA, RNA, and protein-based technologies to hundreds of labs globally.
Last week, at the Human Genome Meeting and International Congress of Genetics in Singapore, BioArray News spoke with Brooks about the new Fluidigm panel, the challenges facing biorepositories, and Affymetrix and Illumina's new biobanking arrays. Below is an edited transcript of that interview.
Can you provide a general overview of RUCDR's activities?
We are the federal repository for many [US National Institutes of Health] institutes, including the National Institute of Mental Health, the National Institute of Diabetes, Digestive, and Kidney Disorders, the National Institute of Drug Abuse, and others. We are the central genetics resource for any trials or large-scale clinical projects that go on within those institutes. We provide those same services to different foundations.
There are several main functions that a fully functional biorepository offers. The first is sample collection. We do all of the trial management, we have a communications team that deals with all of the investigators, the collection sites, the collection kit fulfillment, site visits, training, all of the things that it takes to get sample collection going. Even [contract research organizations] that we work with have to be trained for those specific protocols. Then there is sample processing. That is really key for making sure that we have high-quality biomaterials and renewable resources. We are responsible for all of the sample storage, and all of the distribution of those samples, with all the right chain of custody and approvals for requests. We have built out over the last five years a big analytical services structure.
There is often a big disconnect between distributing samples and the time to data. So, we do all of the genotyping, expression, and sequencing that people might need as a service, so that instead of requesting the samples, normalizing them, and shipping them out, they can request the output of data that they want from different collections. We provide the whole continuum of services for managing a clinical trial or large study, whether it be genetic analysis or proteomic analysis or what have you.
And the funding for that comes from the state and your partners.
We have three different sources of funding. We have federal funding, and that comes in the form of grants and contracts. The second source of funding is through foundations. Most of these are disease-related foundations. Those are contracts for sample logistics, storage, and the services we provide. And then the third source of funding is through our private clients — industrial, pharma, biotech clients. Those are all fee-for-service contracts. Now, with a new contract with Biostorage Technologies and the formation of a joint Bioprocessing Solutions management business, are all managed through Bioprocessing Solutions. As much as we know how to handle government contracts and grants, we don't have that same kind of business acumen in the private sector. Biostorage's clients include the top five pharma companies as well as 15 of the top 20 biotechs and pharmas. They know how to manage those relationships. We have the technical and service expertise, so it's a very good partnership.
Where do you get your samples?
We collect samples from hundreds of sites around the world. They come from major labs like LabCorp or Quest … from companies that do clinical data collection, like RTI or Westat … from private family doctors … or from central coordinating centers — academic centers that have some specific collection sites as a part of a major medical center, where they will collect the samples and send them to us. The one thing that is uniform among those is that they all use our sample collection kits.
During your talk, you mentioned that 98 percent of errors are due to sample collection issues.
When you look at family-based studies, or things that happen in a clinic or in a medical center, a lot of times tubes are mislabeled or put back in the wrong kit. There are errors in reporting. I would say that for the small percentage of overall errors that occur in the RUCDR, there is a percent and a half of sample identity error. That might not sound like a lot, but even if there is a one percent error rate, it's a big deal. Of that percent and a half or 2 percent error rate, 98 percent of the errors are from errors in field collection. The other 2 percent of that … error rate are processing contamination or sample mix ups or something of that nature. We have to be able to proactively address any sample issues for any given sample.
And you developed your own quality control tools, including the RUID panel.
The RUID panel is not only a QC panel that will tell the quality of the sample based on what SNPs are called, what that data looks like. It also qualifies that sample for downstream analysis, whether it is with quantitative PCR or microarrays or sequencing. But it also is an identity panel, so it has enough complexity so that you don't see the same reproduction of that panel in 2 x 1024 individuals. It gives you ethnicity, it gives you parentage analysis, so in the case of family studies, you can confirm family structure from those 96 SNPs. For less than $5, you have a functional performance of that sample in perpetuity. And as long as the cost of the panel is less than the cost of the extraction, it's a bargain. For people not running the panels in volume, the panel might cost $10. But even for $10, before you go do an exome sequencing run for $1,000 or a microarray for a couple hundred dollars, or even qPCR in thousands of samples, to know that you are going to get quality data from that sample, and that sample is banked to be used and distributed over and over again, $10 is a very small investment.
We also have a similar panel we have developed for gene expression. It's called the RUID GX Panel. That has been licensed to Wafergen.
When did you start using these panels, and what had been the QC process up until that point?
We filed the first patent for the RUID panel in 2009, and the panel was implemented in late 2008. The RUID GX Panel we filed in 2010. Before that, what we did was not unlike what a lot of repositories do now. We would run gels on DNA samples, we would do restriction enzyme digest, we would see nice banding. That might tell you that the DNA is clean enough to be cut by a restriction enzyme, which might say that it's okay for PCR, but there was no downstream validation that said that if you had a good-looking gel, you had a guarantee that you were going to get a good sequencing performance with the sample.
Think about it from a potential sample contamination standpoint. Even if you have high-quality DNA, if there is contamination in that sample, it wouldn't affect the gel results, but it might affect your ability to PCR amplify the sample. The biggest limitation for that approach is that you didn't know if it was the right sample. You could have high-quality DNA, but there was nothing to say it was the right sample.
What is your view on the new biobank arrays that have been developed by Affymetrix and Illumina?
I think these products are at some level addressing what we created with the RUID panel, but with a different cost and at a different level of data generation. The RUID panel doesn't measure any identifying personal information, although it is a panel that can tell if one sample is different from another DNA sample. It doesn't have the complexity to lead back to a subject. Even with ethnicity, using the RUID panel, we can place a person on one of four continents, but we can't specify any further and say, 'There were five people in the study that came from this town.' Don't get me wrong, with the certain level of bioinformatic ability, could you link a sample back to a person. You could, but with only a certain level of confidence.
The biobank or core array is a concept that is not only aimed at looking at sample performance, but also being able to generate limited discovery-based information for every sample. Both of these companies have lowered the level for cost and maximized the amount of content so you can look at pharmacogenetic markers. You can look at genome-wide associations. You can impute other SNPs. They basically made a cost-effective tool for screening large numbers of samples that you can actually make data assessments on. Not just identity, not just sample quality, but actually generating a data set off of those samples at incremental cost. It really depends on the study and what its goals are to see if those arrays are applicable.
So, I think these arrays enhance the discovery process, and that they don't replace the RUID panel. If the biobank arrays were $80 and the RUID panel was $80, I would say, 'Don't run both.' But for $5 you can make sure that the data you get with the biobank array is high quality. But the concept of biobank arrays I think is an excellent one. It allows you to choose the samples for deeper investigation more sensibly.
Recently, UK Biobank announced an agreement with Affymetrix to genotype 500,000 samples. Do you think we will see genotyping projects of similar size?
I think you ultimately will. I think you will see it in pharma and industry first, where we are talking with some of our industry partners who are interested in running biobank arrays on their new collections as they come in. So that process is happening, but people aren't hearing about it, because they don't typically make those kinds of announcements. This had been going on since before the UK Biobank announced its plans. It's already been in motion, it's something we've been running, but I think that announcements like this will help people who are either on the fence but not aware what the potential is to move in that direction.
What it really boils down to is: Can you afford to spend on these collections another $60 or $80 a sample or not? What often happens with these large collections is that all the money goes into the sample collection, the data collection, but the biobanking part is a very small piece. And the studies don't often design the cost of running these arrays into their budgets. I think it is going to require a change in the thought process of putting these projects together, how institutions that are prospectively collecting samples consent their subjects and how they budget their collections. But I do think you will see more of these studies.
Would you like to see other biobanks adopt the RUID panel?
What we are hoping is that it will become a standard for quality control assessment. So that the one day when we connect with, say, the UK Biobank, we might have different collection approaches, use different chemistries for extraction, quantify our samples using different technologies, but if we could agree and normalize based on the functional quality of every sample, then we could share samples without concern. We could provide samples across both organizations, knowing that the results are going to be in play.
The issue here is that we are not concerned about the quality of the RUCDR and the UK Biobank, which are both very large. It is all of the hundreds of little biobanks and labs that don't operate with the quality control and assurance that we do and that the UK Biobank does, which leads to all the variability and questions. So, it's not just a proposal for a gold a standard for large biobanks, but for anybody who is banking DNA samples and then anybody who is looking to qualify samples for clinical use, so that you know before that test that you have the right sample, and know the quality is going to give you a call you can trust.
We are happy to share. There is nothing terribly magical about this panel. Tis was a brute force effort. There is some ingenuity in the selection and algorithms we used or making these calls, but those are now all available, and really what we are hoping is that the community of people who are collecting samples for translational research, basic science research, will all use a single technology, so that I know when I get that sample, I can trust that sample.
How do you communicate with other biorepositories? Is it all informal or do you have meetings?
There is one society called ISBER, the International Society of Biological and Environmental Repositories, that has an annual meeting where people involved in biorepositories now go. This year it is in Sydney, Australia, in the first week of May. That is really the only loose association. I say loose, in that there is no accreditation, there is no regulation, it's just people getting together to share ideas. And, by the way, there are now biorepository meetings held by Cambridge Healthtech [Institute] and all of the European companies that do meetings because it is a very popular topic. There are probably about a dozen biorepository meetings you can go to a year if you want to. ISBER's annual meeting, though, is the central one that large biorepositories know about.
What the community really needs is regulatory oversight, and not through self-assessment tools. Different agencies in the US are already starting to recognize this. The College of American Pathologists does CAP accreditation for clinical labs, and now has an accreditation program for biorepositories, with detailed check lists, audits, and everything else that you need to know you are operating at a certain level. And they are continuing to evolve and develop that program. So, in addition to those kinds of meetings that people put together, like the meeting here, we are waiting to see how standardization programs and accreditation will be driven within the community. I do think there needs to be an international effort outside of ISBER. Although they have their working groups, ultimately the community needs regulations that we will have to follow.