NEW YORK (GenomeWeb) – BioRealm Research's SmokeScreen Genotyping array for drug and addiction research is ready for heavy-duty usage in high-throughput studies and will soon be deployed in a genotyping study of over 50,000 samples from a National Institute on Drug Abuse biobank.
Two small business innovation research grants from NIDA funded development of the array, which analyzes 1,031 genes related to addiction using 273,493 biomarkers. It also includes the Affymetrix Biobank Array for genome-wide association studies, other markers related to addiction, SNPs related to smoking and lung cancer, and other markers BioRealm thought might be important, such as loss-of-function markers and pharmacogenetic markers.
In all, researchers can look at 646,247 markers in each well of the 96-well plate.
Monument, Colorado-based BioRealm designed the array as part of a SBIR grant, but the firm involved Affymetrix early on in the development process.
"We evaluated the Affymetrix Axiom technology and it was a perfect fit for this array," Chris Edlund, bioinformatic principal at BioRealm, told GenomeWeb. The Axiom Biobank Array provides the backbone of the array, he said, and BioRealm went back and forth with Affymetrix for several iterations to come up with the final design.
"The design goal of the array was to cover all SNPs and genes associated with tobacco addiction as well as providing genome-wide coverage of multiple populations," including genetic variation found in European, East Asian, and African populations, Affymetrix COO Andy Last told GenomeWeb.
Affymetrix manufactures the array, which is built to run on the GeneTitan instrument. For the NIDA genotyping project, RUCDR Infinite Biologics, a unit of Rutgers University's Human Genetics Institute of New Jersey, will perform the genotyping.
Last added that Affymetrix will have a role in commercializing the array through its standard commercial channels.
In addition to designing the array, BioRealm developed the back-end software to process the data. The bioinformatics package, which runs on Amazon's cloud services, can also take in phenotypic information associated with a sample and integrate that with the genotype, Edlund said.
The project started in 2012, when BioRealm won an NIDA-sponsored Phase I SBIR grant to develop a genetic screening tool for research on tobacco dependency. BioRealm Co-founder James Baurley said the firm had experience in the array space, having been involved in consulting projects making custom Illumina arrays.
"The specifications were loose, so we developed a plan based on the general requirements that they laid out," Edlund said. "They mentioned a list of 5,000 SNPs that they were interested in. Those 5,000 SNPs made it onto the array, but we expanded it to almost 650,000."
In a pilot study conducted during the Phase II grant, the SmokeScreen array was able to identify a gene previously associated with nicotine metabolism, providing some proof that the tool would be useful in research.
Three considerations guided the array design, Edlund said. The firm wanted to provide coverage of genes associated with addiction, a genome-wide backbone, and other high-priority SNPs BioRealm had identified in the scientific literature. "Once we went to design it, we realized we could add a lot more," he said. The firm asked researchers in the field to nominate additional genes and SNPs of interest.
Additions included the National Human Genome Research Institute GWAS catalog and 3,000 SNPs associated with lung cancer.
The firm has validated the array by genotyping about 800 samples from multiple populations in collaboration with Rutgers, Edlund said. "The arrays performed very well," he said, and BioRealm is currently in the process of writing a manuscript that will report the validation statistics.
BioRealm's informatics pipeline first uses Affymetrix's best practices workflows to filter out poorly performing SNPs, Edlund said. After the quality control has been performed, it uses Minimac software for imputation. "It's designed to be flexible so that we can spawn up instances as needed," he said. Multiple instances running in parallel on the cloud can allow the software to finish running in a day.
The software also does some simple GWAS statistical analysis, like providing association p-values for SNPs, Edlund said, as well as some more advanced statistics, which he declined to specify.
Researchers can download the data in standard formats, though Edlund said BioRealm's preferred format is for the PLINK toolset.
"We developed the bioinformatics pipeline to be flexible enough to use with any Affymetrix Axiom array," Edlund said, including both custom- and fixed-content arrays.
Baurley, the BioRealm Co-founder, said that the firm is currently exploring options to package its pipeline and analysis services with other Affymetrix Axiom arrays. "Some pieces of it could be modified for other diseases or complex traits," he said.