NEW YORK – Biostate AI this week exited stealth mode by unveiling a total RNA-seq service alongside a natural language-based AI tool for RNA-seq analytics called Omics CoPilot that it hopes will make RNA-seq cheaper and more accessible for academic and pharmaceutical researchers.
The Houston-based company, which has raised $4.2 million in venture capital funding and has 16 full-time employees, is also generating longitudinal, organ-specific gene perturbation data on humans and animals as part of its OmicsWeb RNA-seq repository that it hopes will help potential customers select patients for clinical trials by determining their likelihood of experiencing adverse treatment effects.
OmicsWeb is "basically an interface [for] the RNA-seq data we're collecting [and] access to the tools that we are internally using to analyze that data," said Ashwin Gopinath, Biostate's chief technology officer.
Biostate is collecting a wealth of longitudinal RNA-seq data from animal models and de-identified human samples. The company had been collecting animal samples from China-based Wuxi AppTec, although it recently had to change to US vendors Charles River and Jackson Labs, due to uncertainty surrounding the US Biosecure Act. Biostate acquires human cell and tissue samples from Discovery Life Sciences and several nonprofit organizations.
Gopinath said that relatively little data currently exists to answer questions of how the transcriptome changes over time in response to constantly being administered one drug or another.
"The cost of RNA-seq is just too much for somebody to explore rich datasets like this," he said.
Gopinath explained that Biostate AI has found a way to significantly reduce the cost of RNA-seq through its patent-pending Barcode Integrated Reverse Transcription (BIRT) library prep technology, which allows scalable profiling of both coding and noncoding RNA.
In BIRT, primers with predetermined secondary structures prevent the self-folding that sometimes occurs and which can prevent reverse transcription of RNA sequences. A poly-N primer sequence then enables binding to both mRNA and noncoding RNA. This is in contrast to most other library prep techniques, which use poly-T primers to extract only mRNA.
CEO David Zhang said that BIRT enables the application of sample barcodes during the reverse transcription step so that up to 96 samples can be pooled together for subsequent purification and library preparation. Reducing library preparation costs delivers more bang for buck than reducing sequencing costs alone, Zhang noted.
"The sequencing is only about 10 to 15 percent of the overall cost," he said. "So what we've been doing is developing new technologies for reducing the cost of everything upstream of the actual Illumina sequencing itself."
Zhang said that Biostate has so far only tested its BIRT technology on Illumina platforms but that it should be compatible with all sequencing platforms. The company plans to test it on the Ultima Genomics platform shortly.
Zhang said that Biostate's overall cost of RNA-seq, from sample to analysis, is between $100 and $240 per sample, whereas he said that other companies and core facilities often charge between $360 and $800 per sample.
The cost of conducting an RNA-seq experiment from sample preparation to analysis can vary widely based on specifics such as sequencing depth, the type of RNA being assayed (mRNA, miRNA, total RNA, etc.), and library preparation method. Price ranges posted by core sequencing facilities at the University of Pennsylvania and Boston University, for example, estimate the cost of library preparation alone at approximately $250 per sample. The University of Pennsylvania and the University of Pittsburgh core facilities listed bioinformatics analysis costs ranging from approximately $390 to $3,000 depending on experimental needs.
Biostate is also collaborating with synthetic biology company Twist Bioscience to develop and refine even more high-efficiency sequencing methods, although Zhang said that it is too early to further disclose details of that collaboration.
Biostate's stripped-down approach to RNA-seq also means that network biology algorithms are used to reconstruct the whole transcriptome from sparse sequencing data, a process that the company refers to as "AI in-painting."
The idea behind this method, Zhang explained, is akin to AI such as DALLe completing an image wherein a section has been "blotted out."
"We can do the same thing with RNA sequencing," he said. "We can blot out about 15 percent of the genes and be able to accurately reconstruct the part that's blotted out."
Biostate is in the process of submitting its methodology to academic journals and has a manuscript describing it on BioRxiv.
Some researchers expressed skepticism regarding the AI in-painting method described in that preprint.
Federico Giorgi, a professor of pharmacy and biotechnology at the University of Bologna in Italy, who employs systems biology approaches in his research, said via email that as currently written, he found the method itself and its specific applications unclear.
"It is not clearly described, it is not tested, and it is not available in any usable form [such as] Python or an R package," he said.
In particular, he pointed to an analysis of single-cell expression data as an instance where the underlying biology might need to be better understood before applying Biostate's informatic methods to it. In that analysis, the model showed suboptimal negative predictive performance, which the authors attributed to inherent bias in the dataset, "where 89.2 percent of the raw molecular expression counts are 0, and another 7.5 percent are 1," the authors wrote.
"It's not bias," Giorgi said. "It's the nature of the cell."
Reliable gene regulatory network training data, Giorgi explained, comes from large, high-quality datasets, which are not available in the single-cell context due to technological limitations and the dropout effect, wherein a given gene is observed at a moderate expression level in one cell but is not detected in another cell of the same type from the same sample, causing underestimation of its gene expression level and overestimation of variation in the data which may generate false positive results.
Gopinath agreed that the high proportion of zero values in single-cell datasets is not so much a bias as it is a characteristic of those datasets. However, he said that the company's primary aim in using those datasets was to demonstrate that it could successfully apply the architecture of its in-painting method to a biological setting.
"This proof of concept was crucial for laying the groundwork for more advanced applications," Gopinath said. "By testing on data with high proportions of zeros, we can identify areas where our model needs improvement to better handle the unique characteristics of single-cell data."
One of the more advanced applications that Biostate is exploring is transfer learning, in which information gained through experiments in one model organism can be directly applied to another.
The company has another proof-of-concept paper on this topic stored on BioRxiv while awaiting peer review.
"Our ultimate aim is to develop methods that can work effectively with inherently sparse data, and various data sources [such as] cells, tissues, [and] species, while also leveraging the architecture's capabilities for transfer learning and cross-species predictions," Gopinath said. "These approaches have the potential to uncover biological insights that might be missed by methods that struggle with the characteristics of a single data source or are limited to single-species analyses."
Zhang said that while Biostate continues to work on proving the reliability of its core transcriptomic methodology, the company has also bolstered its portfolio by recently licensing intellectual property from the California Institute of Technology related to developing scalable methods for omics beyond RNA, including DNA methylation and proteomics.
Zhang said that labs at Stanford University and Harvard University have been using Biostate's tools as part of an early-access program. The company has also been exploring collaborations with organizations in the rare disease space, including potential work with the GUARDIAN study to apply its RNA-seq technology to rare disease screening for newborn infants.
GUARDIAN (Genomic Uniform-screening Against Rare Diseases in All Newborns) is a partnership between Columbia University Irving Medical Center, NewYork-Presbyterian, the New York State Department of Health, Sema4, and Illumina to explore the utility of whole-genome sequencing for screening newborns for rare diseases.
With pharmaceutical clients, the company aims to leverage its wealth of RNA-seq data and proprietary AI tools to more rationally select patients for clinical trials by matching drugs to the patients most likely to respond, based on their transcriptome profiles.
Gopinath said that Biostate's computational methods already allow it to predict drug response in preclinical models, and the company is also collecting transcriptomic data specific to organs, in order to evaluate organ-specific drug effects.
"[Getting] as much information on every single one of these biological systems as possible is, I think, very helpful to society as a whole, in terms of accelerating the development of new drugs," Zhang said.