SAN FRANCISCO (GenomeWeb) – Arima Genomics, a startup that spun out of the University of California, San Diego, aims to tap into the growing market for studying long-range genomic interactions.
The firm has developed a kit that is a modified version of the standard Hi-C protocol to capture 3D chromosome conformational information and recently completed an early-access program. In addition, researchers participating in the international Vertebrate Genomes Project have been using Arima's technology to develop Hi-C proximity ligation data in the first phase of the project.
The firm has 11 employees and is currently funded primarily by federal grants but has also received external funding from industry partners such as Agilent Technologies. It will potentially raise more funding later this year, according to Siddarth Selvaraj, Arima's CEO and founder.
Arima joins two other companies — Santa Cruz, California-based Dovetail Genomics and Seattle, Washington-based Phase Genomics — that have developed Hi-C sequencing-based products and services to cater to the growing interest of researchers in capturing more than just sequence information. Having additional information about long-range genomic interactions could help with phasing, better assemblies, and understanding 3D genome structure. Both Dovetail and Phase launched Hi-C sequencing services early last year, but have since expanded to offer kits and products tailored to specific applications. Dovetail, for instance, has developed technology to identify structural variants from formalin-fixed paraffin-embedded samples, while Phase has developed a Hi-C product for metagenome assembly.
Selvaraj began working with Hi-C sequencing as a graduate student at UCSD's Ludwig Institute for Cancer Research and published a study in Nature Biotechnology in 2013 describing an approach for haplotyping using proximity ligation and sequencing.
The Hi-C sequencing technology enabled DNA sequence and structural information to be captured simultaneously and to be used to analyze phasing as well as gene regulation within the context of the 3D conformation of genomes, Selvaraj said, but because the Hi-C protocol is "time-intensive and challenging," it "wasn't accessible to the broader community."
Thus, the goal of forming Arima was to further develop the technology to make it more user friendly and faster, he said.
The difference between Arima's technology and other Hi-C protocols, Selvaraj said, is that the process has been significantly modified in order to reduce the turnaround time to six hours from around two days.
In addition, the team moved from using one restriction enzyme to multiple enzymes, which Selvaraj said helps generate more uniform coverage across the genome. Because the data will be proximal to the enzyme's cut site, if just one enzyme is used, "close to 30 percent of the genome might be far from that cut site," he explained. Multiple enzymes generate cut at different sites and result in more uniform coverage.
He noted that the firm's researchers essentially spent the first two years testing multiple combinations of enzymes to find a set that worked well together.
To reduce the turnaround time, they went through each step of the protocol to adjust temperature, volume, and concentration. The kit enables inputs as low as 100,000 cells.
According to Caitlin Castaneda, a postdoctoral researcher at Texas A&M University's animal genetics lab, who is an early-access user of Arima's technology, the kit was easy to use in a pilot study her group conducted on horse genomes. Ultimately, she said, the goal is to use the Arima Hi-C technology to study a complicated region within the horse genome where there appear to be interactions between genes.
For the initial pilot study, the group focused on a simpler, known region, where in two genes interact to create a multi-colored, so-called paint, phenotype.
"It's very well documented, so the thought process was that if we can see that these two genes are interacting and behaving differently in the paint horse, we can use it to help us research the more complicated genes we're trying to look at," Castaneda said.
So far, results look good, Castaneda said, but the team is still in the process of analyzing the data. However, she noted that the technique was easy to use and seemed to produce high-quality data. She added that at each step in the protocol, there are quality control checks, which were helpful for ensuring that the process worked. Castaneda said that the researchers had previously tried using a different Hi-C technology and "consistently had problems." She added that she never figured out exactly what was wrong but was never able to generate usable sequencing libraries.
Her group is also testing 10x Genomics' linked-read technology, and has data from both technologies on the same genome, but has not yet analyzed the 10x data or compared it with the Arima Hi-C protocol.
She said that it took her group between two and three days to go from raw sample to a prepared library.
Another early-access user, Hiruy Meharena, a postdoctoral fellow in Li-Huei Tsai's laboratory at MIT, said his group is using the Arima technology to study neurological disorders.
The data quality is good, he said, and importantly, high complexity libraries can be generated from low sample inputs. For instance, with an input of 2 million cells, the library complexity was consistent with what would be seen with 15 billion cells.
The other advantage, he said, is that the protocol is much shorter than the standard Hi-C sequencing protocol, which he called "very laborious," taking between four and five days. Using the Arima technology, that's been reduced to around two days, including the six-hour Arima process andone day for library prep.
Meharena said his group has tested the Arima Hi-C technology on induced pluripotent stem cells and is now applying it to cells from the mouse brain to look at differences in gene expression and the influence of chromatin interactions. "We know that epigenetics plays a role, but we want to understand how that is linked with gene expression changes, so we're trying to explore whether there are long-range interactions," he said.
One area Meharena said could still be improved is the bioinformatics analysis. Because the Arima method uses multiple restriction enzymes, other currently available computational tools "have to be tweaked," he said, adding that he expects the firm to develop tools.
Selvaraj said that currently, Arima's product is focused on two main applications — to scaffold contigs for better genome assemblies and to study chromatin conformation for epigenetic studies. But in the future, he said, the firm plans to develop its technology for other applications, such as phasing and understanding cancer translocations.
Despite the advances that have been made in long-read sequencing technologies from Pacific Biosciences and Oxford Nanopore Technologies, Selvaraj said, there is still a place for Hi-C. "I think these are two complementary technologies," he said. For instance, any sequencing technology will still be limited by the size of the DNA molecule, while Hi-C can enable chromosome-length information, as well as interactions between chromosomes. Hi-C also preserves information about the genome structure prior to fragmenting DNA for sequencing, an important feature that he said researchers involved in the Vertebrates Genomes Project plan to make use of to study evolution.