NEW YORK (GenomeWeb) – Lexogen, a transcriptomics and next-generation sequencing products and services provider, is seeking participants for an ongoing early access program that it has launched to test drive Mix2, an algorithm that it has developed to more accurately quantify transcript isoforms in data from RNA-sequencing experiments.
Participants in the EAP will have an opportunity to test Mix2 on their data and to compare the algorithm's performance and results to existing methods with the support of Lexogen specialists in RNA-seq analysis, Jungsoo Park, the company's senior marketing and sales manager, told GenomeWeb via email.
Candidates selected to participate in the program will receive a free standalone executable of the Mix2 software which they can use to analyze their own RNA-seq data, the company said. They'll be expected to submit the reports of their analysis with particular emphasis on how Lexogen's software compares to existing algorithms used for gene isoform quantification. The company will award a first place prize and two second place prizes to the authors of the most comprehensive analysis reports.
Initially, applications to participate in the EAP were due on January 31, however the company has extended the deadline to February 15. The company has already selected one batch of testers from the first round of applications and will begin distributing the software this week.
Mix2 is Lexogen's first software solution for the RNA-seq market. It was developed in response to "strong demand" from the scientific community for improved accuracy in transcript concentration calculations, according to Park. Among other advantages, the solution offers improved transcript concentration estimates compared with existing solutions such as Cufflinks and PennSeq; more accurate detection of differential expression; repeatable concentration estimates across different library preps; and detection and classification of bias types in RNA-seq data. It's not clear at this point how Lexogen plans to distribute the software or when the official release date will be. The company is still mulling its options on that front, according to Park.
Last November, Lexogen published a preprint paper on the BioRxiv that describes Mix2at length as well as the results of its application to synthetic and real sample datasets.
According to the paper, Mix2 addressees the problem of positional bias in RNA-seq data analysis, which results from preferentially producing cDNA fragments from certain positions in the transcript. Improper representation of this and other biases in RNA-seq by current statistical models is one of the main reasons for inaccuracies in transcript quantification measurements, the paper notes.
Mix2 addresses this issue, its developers wrote, by using "a mixture of probability distributions to model the transcript-specific positional fragment bias." In contrast to the positional bias models used in Cufflinks and PennSeq, "the Mix2 model is parametric, which considerably simplifies adaptation of the [its]parameters," the researchers wrote.
Those parameters "can be efficiently trained with the expectation maximization algorithm resulting in simultaneous estimates of transcript abundances and transcript-specific positional biases," and the parameters "can be tied between transcripts with similar fragment distribution leading to improved estimates of the relative abundances," the researchers wrote. Finally, because Mix2 uses a mix of probability distributions, it's flexible enough "to allow for multiple positional biases of arbitrary complexity."
Also described in the paper are the results of internal experiments run by the Lexogen researchers to compare Mix2 to two iterations each of Cufflinks and PennSeq, both of which are used for estimating transcript abundance. For the analysis they used synthetic data that covered seven genes of different complexity, four types of fragment bias, and sample sizes of 500, 1,000, 5,000, and 10,000 fragments as well as real RNA-seq data from the Universal Human Reference (UHR) and Human Brain Reference (HBR) datasets from the Microarray Quality Control experiments.
In tests using the artificial data, Lexogen reports that its software largely outperformed the competing solutions — with the exception of one iteration of Cufflinks. Tests with the real datasets "showed better correlation" between the qPCR and FPKM values for Mix2 than for both Cufflinks and PennSeq, according to the researchers. "Furthermore, the correlation between the qPCR and FPKM fold changes between UHR and HBR are noticeably higher for the Mix2 model than for the other methods," the researchers wrote "leading to substantially higher accuracy in the detection of differential expression."