Skip to main content
Premium Trial:

Request an Annual Quote

Harvard Team Develops Scriptome to Help Non-Programmers Handle Perl

Premium

Like many bioinformatics support teams, the Computational Biology Group at Harvard University's Bauer Center for Genomics Research has its hands full. The team of seven people is responsible for supporting several hundred experimental biologists across all of Harvard — including its medical school and affiliated hospitals — and "we can't write things for all of them every day," said Amir Karger, a bioinformatics programmer in the group.

The challenge for the Harvard team is a common one, Karger said: "People who are not programmers still need to work with large batches of data, and they get very frustrated because they just don't have the tools, and they don't have six months to become a fancy programmer, so instead they have to ask programmers to do it — or they try to do it themselves by hand, and then they're working on it for hours when you can do it in a few minutes with a little Perl script."

Faced with a choice between constantly writing scripts or teaching hundreds of biologists how to program, Karger and his colleagues opted for the middle ground and began building a set of simple, one- and two-line Perl scripts that biologists can copy and paste into a Unix command line interface to create their own bioinformatics workflows.

The set of scripts, which they've dubbed "the Scriptome," has grown to around 40 simple sequence- and microarray-analysis tools that are available on a dedicated website (http://cgr.harvard.edu/cbg/scriptome/). Karger said that the team adds new tools to the website "every couple of days."

The scripts are simpler than one would find in BioPerl, Karger said, which still requires a bit of Perl programming skill to use. Each of the tools available from the Scriptome website comes with thorough documentation, he added, so that non-programmers can easily pick and choose the right tools for their research task. On the website, each tool is clearly displayed in a blue box, with text that may need to be edited — such as input and output filenames — highlighted in red.

One unintended side effect of the project, Karger noted, is that it has proven to be an effective way for biologists to pick up Perl programming skills at their own pace. Since the scripts serve as "working examples" on how to solve particular problems, Karger said that several Harvard biologists have already learned how to make minor edits and changes in order to modify the tools for their own research needs.

"We're hoping that this is a way to get biologists started in little steps," Karger said. Researchers can use the ready-made scripts to get a head start on their own programming efforts, "and they're not forced to worry about syntax issues," he said.

While the project has only been underway for a few months, Karger said that his team — and the broader Harvard biology community — is already seeing results. "Our trial users have been able to use the Scriptome to get real research done," he said.

Karger added that the tools are available to anyone — not just Harvard researchers — and that the project welcomes suggestions and contributions from the broader bioinformatics community. "We're at the point in the project where we want to get feedback from other people dealing with this issue," he said.

Eitan Rubin, who heads the Computational Biology Group, said that the team is seeking partners — both from industry and academia — interested in extending the capabilities of the current set of tools.

Karger said the team is particularly interested in getting some feedback on the Scriptome's bare-bones interface. He said that the team opted out of developing a "fancy," complex GUI because it would take a lot of time to write, and also because it enables more flexibility. Now, he said, "We can write a new tool in 20 minutes and have it on the web in another five." Depending on feedback, however, he said the team may still write its own GUI for the Scriptome, or it may draw from existing projects, like the Pasteur Institute's PISE, to create a more intuitive front end.

"We're still in the early stages of the project, and we have lots of ideas about where it can go," he said.

— Bernadette Toner ([email protected])

Filed under

The Scan

Rare Genetic Disease Partnership

A public-private partnership plans to speed the development of gene therapies for rare genetic diseases, Stat News writes.

Approval Sought for Alzheimer's Drug

The Wall Street Journal reports Eli Lilly has initiated a rolling submission to the US Food and Drug Administration to seek approval for its drug to treat Alzheimer's disease.

DNA Barcoding Paper Retracted

Science reports that a 2014 DNA barcoding paper was retracted after a co-author brought up data validity concerns.

Nature Papers Present Genomic Analysis of Bronze Age Mummies, Approach to Study Host-Pathogen Interactions

In Nature this week: analysis finds Tarim mummies had local genetic origin, and more.