Skip to main content

Bioinformatics New NYU tool blends languages for genome analysis


Salvatore Paxia has no particular background in biology, but his latest bioinformatics advance may be just the thing for this increasingly complex field.

Challenging the Babel-like conditions of today’s computational biology realm, Paxia, a senior research scientist in Bud Mishra’s lab at the Courant Institute of NYU, has worked for the past three years on a novel database and programming environment capable of integrating any number of programming languages.

The project began with DOE funding when Paxia and the rest of Mishra’s team — which was involved in designing the software for optical mapping at the time — decided they needed a platform that would answer their need for continually writing new algorithms. “There were no tools to build prototypes of bioinformatics applications rapidly,” says Paxia, whose background is in electrical engineering.

Today, the tool is known as Valis, and in the next few months it will be made available to non-commercial users. At press time, Paxia says, Mishra’s team was filing patents on Valis and the NYU IP lawyers were knee-deep in laying out terms for licensing the technology.

That’s important because Mishra and Paxia both predict a fruitful future for Valis as bioinformaticists realize its value. What may first draw users in is the appeal of the language-blind environment: program modules can be written in any language and called by modules of any other language. Got a neat Perl script and a separate code in Python? No problem, says Paxia — he recognizes that not only are certain languages better suited to certain applications, but also that the legacy of the past decade of work in bioinformatics has left mountains of valuable algorithms and applications that need to be grandfathered into any truly useful programming environment.

Valis’ other components will prove equally user-friendly, Paxia says. Valis allows for drag-and-drop building of graphical user interfaces “in the same way Visual Basic does,” he says. That’s faster and easier than the current methods more traditionally associated with computational biology, he says, adding that “once you build these graphical user interfaces you can customize them and control them from any scripting language.”

Behind the scenes, Valis relies on a novel database structure which Paxia developed. “We designed this special free-form database [for] arbitrarily long string sets and flexible arrays,” he says. Valis eschews the hierarchical structure of fixed format or relational databases, because Paxia believes his method is a more natural way to handle the unstructured data typical of bioinformatics repositories.

“I’ve had a lot of fun working with biologists,” says Paxia, whose interest in integrating programming languages goes back several years. He hopes to add more features to Valis before releasing it into the wild, but says he expects that users will contribute as well. “At some point we really have to wait for other groups to use it so that it can be expanded and be more useful to all users,” he says.

— Meredith Salisbury

The Scan

Response Too Slow, Cautious

A new report criticizes the global response to the threat of the COVID-19 pandemic, Nature News reports.

Pushed a Bit Later

Novavax has pushed back its timeline for filing for authorization for its SARS-CoV-2 vaccine, according to Bloomberg.

AMA Announces Anti-Racism Effort

The Associated Press reports that the American Medical Association has released a plan to address systemic racism in healthcare.

Nucleic Acids Research Papers on miRMaster 2.0, MutationTaster2021, LipidSuite

In Nucleic Acids Research this week: tool to examine small non-coding RNAs, approach to predict ramifications of DNA variants, and more.