Biobanking is changing rapidly, and it's in no small part due to the demands of systems biology. While small, university-centered banks have existed for decades, large-scale biobanks — whether tissue repositories or population databases — have recently been implemented all across the world. Many are also adding clinical annotation, genetic data, and increasingly genomic, proteomic, and other 'omics information. Population-wide biobanks exist in Iceland, the UK, Sweden, Canada, Estonia, Latvia, Singapore, and Japan. The UK Biobank is one of the most ambitious of these projects, intending to collect, store, and study the genetic information of 500,000 people with the hope of finding correlations between disease and lifestyle, environment, and genes. In the US, the Utah Population Database is the largest of these types of biobanks, housing data from generations of families for more than 8 million people.
Because biobanks not only collect and store specimens, but serve as a library of sorts for researchers wishing to work with these samples, they have many requirements. These repositories have always had a number of challenges, such as: ethical concerns, including informed consent; maintaining high-quality samples through good collection and handling techniques; syncing material information with donor clinical information; and maintaining patient privacy in the hands of researchers. Now, as demand from the systems biology community ramps up, biobanking practices are changing to deal with new hurdles. In fact, biobanking is becoming a science in and of itself.
A biobank's main job is to collect, store, process, and distribute either biological specimens like tissue, blood, or urine, or clinical data about these specimens, or both. In large-scale biology studies, ideally all samples should come from one biobank. But because genomic and proteomics studies require such vast amounts of samples, they take from many banks — and it's a safe bet that those banks have handled their samples differently.
European initiatives are leading the way in what's called harmonization, or making sure that all biobanks follow evidence-based standards for collecting, storing, and handling of specimens. The goal, says Jennifer Harris at the Norwegian Institute of Public Health in Oslo, is to develop a common infrastructure that encourages sharing in order to make high-throughput work possible. "Getting the most out of the data will require a certain amount of sharing and data release," she says.
When it boils down to maximizing sample use, one of the biggest challenges is making sure the banks are interoperable. "We really want things to be set up so that across all of these platforms, biobanks can talk with each other and work together," adds Harris, who is also the coordinator for PHOEBE, or Promoting Harmonization of Epidemiological Biobanks in Europe. Other projects working toward building a network of biobanks include P3G, or Public Population Project in Genomics, and BBMRI, or Biobanking and Biomolecular Resources Research Infrastructure. BBMRI is focused on the design and management of biobanks, standard protocols for sample handing, cataloguing and comparing information, and coordinated bioinformatics.
When he first started his lab several years ago, Philip Bernard says that the biggest concern was getting fresh tissue to run microarrays for breast cancer biomarkers. Bernard, who is the medical director of the Solid Tumor Molecular Diagnostics Laboratory at ARUP and an investigator at the Huntsman Cancer Institute of the University of Utah, started the fresh tissue biobank at Huntsman Hospital mainly because so little fresh tissue was available. In 2008, the project consented 3,500 cancer patients. "Five years ago, I realized how important it was … to have good quality specimen that we could get intact RNA from so we could do microarray [studies]," Bernard says. He adds that today, one of the biggest challenges is making sure that the quality of the tissue is what's expected based on what the investigator wants to use it for. To that end, he "set up standard operating procedures on how tissues are supposed to be collected, and what types of tubes they're supposed to be procured in," he says. "But in the end, even if you follow the protocol and you think that you've done everything right, there [are] still variables that you don't know for sure until you actually analyze the tissue to see if it's of the right quality."
Elisa Eiseman, a senior scientist with the RAND Corporation and advisor to the National Biospecimen Network Design Team, was contracted by NCI to draft the first report in the US analyzing what's needed for a high-quality biobank network. She conducted case studies of 12 existing human tissue banks to identify best practices for optimizing genomics- and proteomics-based research. Taking into account everything from sample collection and processing to bioinformatics and privacy, ethical, and consent issues, the largest concern turned out to be standardization, Eiseman says. "The biggest problem is the way one biobank does its collection, processing, and storage may not be the same as another biobank. And when researchers use tissue or samples from those two different biobanks, the results may not match because they weren't collected, processed, and stored in the same way," she says. "I think what everyone's trying to do is to set out some guidelines and rules for biobanks to follow so that there is some kind of standardization across biobanks."
For large-scale studies, increased standards means decreased variability. When it comes to determining whether variability seen in an assay is due to actual sample variability or variability from different biobanks, standard protocols are key. In another study that Eiseman is working on for NCI's Office of Biorepositories and Biospecimen Research, she's looking at what she calls "pre-analytical variables," or variables that might be introduced anytime between when the sample is taken to when it ends up in the researcher's hands. "If you're trying to determine changes that might be due to a cell becoming cancerous but you're seeing all these changes, how do you know whether it's because the cell is a cancer cell or because it was sitting on the bench top for too long?" Eiseman says. "Standardization is going to go a long way."
Biomatrica CSO Rolf Müller thinks that the sample volume needs of high-throughput studies are going to force biobanking to become "green" in order to be sustainable. His company has developed a technology for long-term dry storage based on extremophile biology, which ultimately could replace the costly and energy-consuming freezer or nitrogen-storage techniques that are needed today for reproducible and reliable experimental results. "It is a very big energy and carbon footprint, and it's also very, very expensive," Müller says of current biobanking storage techniques. "In the systems biology environment, we look at thousands of genes, thousands of proteins. In order to make good analysis, you need samples that are not partially degraded … so sample stability is key." So far his company has applied the techniques to DNA and RNA, and is working on applying it to proteins.
Biofx to the max
Going hand in hand with sample collection, storage, and handling come the bioinformatics challenges of not only associating phenotype and study data with samples, but also making the bank's information accessible to clinicians and researchers. Now that sample sizes have gotten so large and banks continue to house more and more data, including genotypic, proteomic, clinical, and demographic information, most experts agree that bioinformatics infrastructure and expertise are pressing concerns. "It's absolutely essential, and it's key," says Harris. PHOEBE's Databases and Biobank Information Management Systems platform is being put to good use — how samples are traced and how the metadata are integrated into the biobank information management systems is especially important. LIMS have gone to BIMS, or "biological information management systems, [which] are much more extensive in terms of the kinds of information that are in those systems," Harris says, noting that these have put a higher demand on researchers in terms of knowing how to use them.
Linking clinical information to these samples is also becoming a daunting task, considering the amount of available information from large population databases and the huge sample sizes. Getting consent from patients, which lets researchers link clinical information to samples, requires a biospecimen tracking database that's both manageable and searchable, Bernard says. The backlog has been in entering data from pathology reports, which are usually written by hand. "Getting the information into the database in a way that it's searchable has been a real challenge, and I think most of the institutions around the country have struggled," Bernard says. "[Bioinformatics] is huge and it continues to be needed."
Ethics for all
Ethical concerns surrounding patient privacy have always been important. While traditional biobanks have similar issues, growing them for large-scale studies presents new challenges around informed consent, maintaining patient privacy and confidentiality, and the sharing of research or clinical benefits.
One of the first steps to obtaining patient samples is getting informed consent, which traditionally means patients agreeing with how their samples may be used. As banks grow in size and samples are used for increasing numbers of studies, the concept of consent is changing. Genome-wide association studies will soon be followed by proteomic, transcriptomic, and metabolomic assays as researchers try to make the most of samples. According to Harris, "This kind of traditional idea of informed consent is difficult to fulfill if your definition of 'informed' is that people know how their samples are going to be used because sometimes you don't know how they're going to be used." The idea of broad consent doesn't always pass with all IRBs and ethics review boards, she adds.
Additionally, with more and more samples being used in large GWAS and other studies, patient privacy is a big concern for many donors. "Consent [from] patients allows us to link information from their [sample] material to clinical information," Bernard says, so there has to be some method to de-identify samples for research purposes. Elisa Eiseman says that informed consent has always been an issue and while it's important especially for bigger banks, where they're collecting so many samples, a larger issue is standardizing collection, storage, and de-identification processes to make sure patient information isn't somehow leaked between biobank and researcher. "What people worry about when you start talking about genomic studies is that once you have enough information from a single person, can you easily identify that person?" Eiseman says. "The whole idea of privacy, confidentiality — people worry about that more."
Virtual Biobanking: The Serious Adverse Events Consortium
While large repositories continue to hoard increasing numbers of samples, not everyone is jumping on that bandwagon. The future of biobanking, according to Arthur Holden of the Serious Adverse Events Consortium, is targeted collection — not housing thousands of samples that may never be used, but going out and finding specimens for specific studies. As the founder of the SAEC, which was set up in 2007, he's amassed a large network of different academic institutions and pharmas that provide hard-to-access samples for genotyping studies. They've focused on two serious adverse events so far — drug-induced liver injury and serious rash — and they collaborate with Expression Analysis to run the genome-wide association study. Their goal is to identify and validate genetic variants that may be predictive of drug-associated serious adverse events, says Holden, or "something that is severely debilitating, if not lethal, and typically requires the immediate cessation of the drug."
Because it's not easy to collect samples from such rare cases, Holden says it made sense to set up the SAEC as a network, rather than spend years trying to acquire the necessary number of samples on their own. He works out of his office in Chicago, while the partner institutions span the globe, and the genotyping facility is in Durham, NC. "The reason it came about is that no one organization — no pharmaceutical company, the government, no single health provider — has the scale because of the rarity of these events to aggregate the necessary cohorts both in terms of ethnic variation associated with the phenotype or the differences across and within drugs that may be associated with the phenotype," Holden says.
The samples for phase I of the drug-induced liver injury study, which now has almost 500 patients, came from leading academic institutions across Europe; for serious skin rash, the major contributor was GlaxoSmithKline. Building a collaborative bank underlies the reality that biobanking for 'omics studies is becoming more of a team sport. "Instead of saying, 'We're just going to bank a whole bunch of people and figure out what science we're going to do off of that,' we said, 'No, we want to do some science and so we're going to build a network to yield the bank to enable us to do that science.'" Holden sees future steps toward Web-based consent and enrollment, as well as expanding the use of cases generated in pharma clinical trials.