Skip to main content
Premium Trial:

Request an Annual Quote

Penn State, Enlis Genomics Awarded Top Honors in Illumina's Next-Gen Sequencing Software Challenge

Premium

This article has been updated to updated from a previous version to correct an error in the report. The overall winner from academia is Pennsylvania State University.

By Uduak Grace Thomas

Illumina has named Pennsylvania State University and bioinformatics startup Enlis Genomics as the overall winners in the Illumina Data Excellence Award, a competition intended to encourage academic and commercial developers to create better software for interpreting next-generation sequence data.

The two teams were among six winning entries from three academic and three commercial groups announced during a conference for the competition, dubbed iDEA, held in San Diego last week.

Awards were presented in two categories alongside the overall winners: the most creative algorithm, which went to Germany's University of Tubingen and Partek; and the most creative visualization, which went to the Flanders Institute for Biotechnology and Genomatix.

The Penn State team was awarded a $50,000 grant by Illumina to help it continue develop its software, while Berkeley, Calif-based Enlis was offered a one-year co-marketing deal with llumina, details of which are still being ironed out. The other four winning teams received "recognition awards."

In total, 30 groups entered the competition and 13 finalists were invited to San Diego to present their entries to non-Illumina judges over the course of the day-and-a-half event.

Commercial participants included Strand Life Sciences, Golden Helix, IBM, Softberry, and others, while academic entries came from Harvard University; the University of Delaware; the University of California, San Diego; the University of Georgia; and elsewhere.

The datasets for the challenge were developed by Illumina sequencing instruments that included RNA-seq, directional RNA-seq, small RNA, DNA methylation, and genomic DNA from a series of breast cancer cell lines (BI 06/18/2010).

Entries were evaluated by a panel of judges that included Steven Jones of the British Columbia Cancer Research Centre, John Quackenbush of the Dana Farber Cancer Institute, Steven Salzberg from the University of Maryland, Gavin Sherlock of Stanford University, and the Broad Institute's Bang Wong and Jared Maguire.

The New Guys

As the overall winner in the commercial category, newcomer Enlis Genomics beat bigger and more established vendors that are poised to become its competitors once the company later this year launches the winning software, dubbed Enlis Research Edition.

Enlis hasn't even debuted as a formal company itself yet. Founder Devon Jensen told BioInform that he is currently filing the necessary paperwork to incorporate and currently has a staff of four with plans to expand when the company is up and running, which he said will be in the next two months.

Research Edition, a research-use-only tool, enables researchers to analyze and visualize genomic sequence data. Jensen said that his team began developing the software about a year ago with the goal of creating a tool that is accessible to researchers of all types.

He explained that current methods of generating and presenting data are "not accessible" to many researchers, particularly those in the wet lab biology space such as immunology and medical research.

These users have "been the focus of the software that we have developed," he said.

Among its capabilities, Research Edition lets users load and analyze more than 100 full human genomes simultaneously and locate differences between two whole human genomes in less than one minute.

The development team is also working on enabling the tool to import different data types as well as making sure that Illumina's data works well with the software.

In addition to Research Edition, Enlis is developing Genome Oncology Edition, which will support the analysis of genomic sequence data for cancer diagnosis and treatment; and Genome Personal Edition, which is geared toward consumers.

A beta version of Enlis Genome Personal Edition is available for download on the company's website but Jensen did not give a timeline for the release of these products.

Enlis has not yet determined pricing for Research Edition, but Jensen expects that it will be similar to other genomic data analysis packages on the market, which he said typically run between $2,000 and $4,000 annually per seat.

While the exact details of his company's co-marketing deal with Illumina are still being ironed out, Jensen said a potentially "good fit for co-promotion" is the firm's Illumina Genome Network, a service that links researchers to institutions that offer sequencing services on the company's sequencers.

"We think that’s a very natural fit with our software because it is also geared toward people who are not bioinformatics experts," he said.

It is this "ease of use" that’s the "differentiating factor" between Research Edition and other packages on the market, Jensen said. He also noted that a lot of these commercial tools focus on performing tasks that occur following sequencing such as read alignment and variant calling, but don't go much further.

The software "recognizes that for ... wet lab folks, the work starts after variants have been called," he said. "You need to build software that is accessible."

Another point of distinction, Jensen said, is that Research Edition provides all its information — for example structural variants, copy number variants, SNPs, insertions/deletions, and so on — in a single .genome file format, while other tools present this information in separate files.

This way, "I [can] say to my biomedicine colleagues or my medical research colleagues, 'Here is the genome file for that patient with the muscle disease,' or, 'Here is the genome file for the mouse that developed spina bifida," he said adding that this makes the data "a lot easier to use."

Enlis faces a number of competitors in the genome analysis market, including CLC Bio, GenomeQuest, and DNAnexus, which did not participate in the iDEA challenge.

It will also go up against Genomatix Software and Partek, which entered the contest and who won awards for the most creative visualization and most creative algorithm categories, respectively.

Headquartered in Munich, Germany, Genomatix entered the most recent versions of its current product offerings — Mining Station and Genome Analyzer — for the challenge.

The updates, which include improvements in genomic annotation, SNP analysis, and tools to handle indels, will be available to customers in the next release of the software later this year, Klaus May, chief business officer of Genomatix, told BioInform.

Genomatix Mining Station lets users map NGS reads onto genomes, transcriptomes, and splice junction libraries. It also includes tools to detect known and de novo splicing events as well as for SNP analysis and copy-number variation.

Genomatix Genome Analyzer lets users analyze data from ChIP-seq, RNA-seq, or genotyping experiments as well as genome annotation and background data for 31 species.

Both packages can be accessed through MyGenomatix, a next-generation sequence data-analysis service that Genomatix launched in April (BI 03/18/2011).

MyGenomatix has been "well received" by the market, May said, particularly as an option for customers who need sequences analyzed but are hesitant to make the initial software investment.

Representatives from Partek could not be reached for comment by press time.

[ pagebreak ]

A 'Sanger-Like Look'

Penn State developed its overall winner, Integrative Next-generation Genome Analysis Pipeline, or inGAP, to detect SNPs and indels from Roche 454 and Illumina sequence data.

Since it was originally published in 2009 in Bioinformatics, the algorithm has undergone some changes and now includes improved methods of detecting and visualizing structural variations, which "appealed to the iDEA judges," Stephan Schuster, a co-author on the inGAP paper and Penn State physician, told BioInform.

"We put a lot of information [in the data] by showing paired-end reads and then do a color coding with regard to the expected distance [from] short to long but also whether the paired ends end up on the correct strand," he explained. "We then suggest 12 different structural variants that can be detected ... inGAP shows you the datasets in a graphical display and [a user can] then pull up a chart [that] will suggest to you whether it is a duplication, inverse repeat," and so on.

According to Schuster, in its initial design, inGAP was developed to give next-gen sequence data a "Sanger-like look" by producing color-coded output that "mimicked" capillary sequencing displays, since most biologists were comfortable with that data format.

InGAP uses the same color regime for its structural variant displays as well and includes an application that installs all the packages needed to explore the data. It also incorporates a toolset that lets users split a FASTA file in half or convert one file format to another.

These design details were all part of efforts to "overcome the little hurdles that might be a showstopper for a bench scientist [but that would make] a bioinformatics person … just laugh about it," he said.

Schuster's team plans to publish a paper describing the updated software in a few months. That paper will describe how they used inGAP to detect structural variations in a test dataset from the 1000 Genomes Project.

Schuster said that the bulk of the $50,000 iDEA award will go to co-authors Ji Qi and Fangqing Zhao, who were primarily responsible for coding the software. Both currently hold junior-faculty positions in institutions in China.

The funds will be used, among other things, to fund the continued development of inGAP, which will include adapting the software for other sequencing platforms such as Life Technologies' Ion Torrent PGM.

"The driving force [for software development] is always the projects that we have on hand," Schuster said. "All of the tools we develop are ... to solve any kind of data-analysis problem that we face going through our project."

Ring-Around-the-Genome

Kay Niesalt, a group leader of bioinformatics at the University of Tubingen, whose team won the most creative algorithm portion of the competition, told BioInform that her group entered its proposal for GenomeRing, an algorithm for aligning and visualizing genomic data, in order to receive feedback from the community to help it decide whether or not to continue developing the software.

When it is developed, the tool will let users align multiple genomes and visualize the results as a ring, a visualization approach intended to make it easier to compare more than two genomes.

Current genome browsers represent data linearly, which poses problems for users who want to compare larger quantities of data, "particularly if there are a lot of events that happened between the genomes," Niesalt explained.

She said the idea was inspired in part by software that’s used to visualize prokaryotes. These packages provide a view of a chromosome in the form of concentric rings that reflect certain characteristics of the chromosome, such as where the proteins are located.

The team also hopes to develop a better genomic coordinate system, Niesalt said, pointing out that the species-specific reference genomes that are included in genome browsers often miss mutations or regions of newly sequenced genomes that may not be included in the original reference.

Buoyed by her team's success at iDEA, Niesalt hopes to launch a first version of GenomeRing later this year. She noted that the main challenge will be to develop software that allows "interactive manipulation" and lets users explore as many genomes as they want.


Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.