Skip to main content

Thou Shalt Not Reinvent the Wheel: Carole Goble on the ‘Seven Deadly Sins’ of Bfx

Premium

Carole Goble
Professor
University of Manchester
Carole Goble, a professor in the School of Computer Science at the University of Manchester, has played a large role in bioinformatics software development as director of the UK’s myGrid project, which created the open source workflow package Taverna.
 
In July, Goble created a bit of a stir in the bioinformatics community by warning attendees of the Bioinformatics Open Source Conference about the “seven deadly sins of bioinformatics,” which she compiled with the help of several colleagues.
 
Goble’s list veers a bit from the traditional seven deadly sins of lust, gluttony, greed, sloth, wrath, envy, and pride to encompass failings particular to the bioinformatics community. These include parochialism and insularity; exceptionalism; scientific method sloth; autonomy at all costs; monolith megalomania; vanity; and instant gratification.
 
Her full presentation is available here.
 
A few weeks ago, BioInform spoke to Goble about her thoughts on these “sins” and where myGrid is going.
 
The following is an edited version of that discussion.
 

 
Where did you come up with ‘the seven deadly sins of bioinformatics’ and why did you choose to talk about it at BOSC?
 
The team and I have been doing bioinformatics for a long time. I’m a computer scientist who builds solutions for bioinformaticists [and] biologists, and I have been in this area since about 1995. So when you’ve been doing it this long, you see the same old things come up over and over again, people making the same mistakes, people coming up with similar solutions [that they did] before, constantly reinventing the wheel, [but] we’ve [also] seen the discipline kind of grow and evolve.
 
And so, as we’ve observed this, there is a tremendous amount of repetition and reinvention and territorial maneuvering, sort of [saying], ‘My group is different … by some subtle, completely non-obvious way, so we are going to do it in a different kind of way.’
So we began to get more and more frustrated because a lot of these things had nothing to do with technology; they all have to do with social aspects, so we were talking at the pub — as we do — about the seven deadly sins of bioinformatics, and we couldn’t come up with seven virtues. So we thought that would be a great idea for the BOSC workshop since the people there build systems, they’re bioinformaticians. I think it struck a chord.
 
A conference attendee who was sitting in on the BOSC session wanted me to ask you what you consider the virtues.
 
(Laughs) That’s tricky. Obviously, the virtue of bioinformatics is the fact that it really, genuinely, enables you to harness the benefits of others if you are willing to do that. There is the resource [side] of bioinformatics and then there’s the technology side of bioinformatics.
 
There are the databases, the mechanisms we use, like the algorithms and protocols we use, and if you can really harness these and share these, then you can make tremendous inroads. You can’t do biology now without bioinformatics; it’s impossible. … You couldn’t have sequenced the human genome without bioinformatics. … Whether you actually need, though, the over 200 pathway databases that are available is more of a moot point.
 
One of the veterans of the field I spoke to at [the Intelligent Systems for Molecular Biology Conference] felt that the term ‘bioinformatics’ was a bit of a misnomer, so it’s interesting that you are saying there is no biology that could be done without bioinformatics right now.
 
Well, the biology can be done; I am sure people do biology without bioinformatics — at least they think they do. But if you are doing biology and putting your results into a database, you are doing bioinformatics. If you are putting them into a spreadsheet and running an algorithm, you are doing bioinformatics. So it’s the processing of data; it’s the running of simulations. … It has to be some way to record and manipulate results [to be considered bioinformatics], and share and infer new information, disseminate that information in a way other people can consume and use it.
 
[Can] you give me an overview of myGrid? What is the main goal of this project? What was its aim from the outset?
 
MyGrid is actually a project and a suite of technologies. The myGrid project started in 2001 with a consortium and now it’s part of something called the Open Middleware Infrastructure Institute in the UK.
 
It builds a series of different components. The main aim is to support bioinformaticians … who were basically doing routine bioinformatics, largely data intensive, who were trying to link together the various public resources available to them, published by the community, and link together their own resources. They were building their own data sets.
 
And so [we wanted them to be able] to do that without having to write their own Perl scripts or trying to simulate cutting and pasting between various web browsers, trying to bring a more industrial-level or more precise repeatable, reproducible approach to bioinformatics.
 
We did this by largely building workflows … and that is free and open source and allows you to build data-chaining scripts to link together different resources that are available. … With Taverna, you have access to about 3,500 different services that are published, from Blast to your local database.
If you want that, you need to describe what they mean — that’s where the semantics come in. … And the workflows are protocols. … You should be able to examine them, and share them.
 
We’ve also got, as part of the myGrid suite, something called myExperiment [BioInform 06-29-07], which is like a Facebook social networking [site for] scientists where [they] can share with one another … and alter [their workflows].
 
So we build these different services that are out in the wild, so if they are out in the wild, you know people are actually using them.
 
Who are the primary users of myGrid?
 
Front-line bioinformaticians [who are] scattered throughout the world. It’s used in a number of places, used by institutions such as the [Netherlands Bioinformatics Center]. Many universities use it as part of their post-grad curriculum. … It’s being used by systems biologists as well; [and] it’s being rolled out in Thailand, too.
 
Does myGrid have competitors? If so, what are they?
 
There are about 50 workflow systems. There has been some commercial stuff that is similar to it, [such as] Pipeline Pilot and Inforsense commercially. Inforsense are buddies of ours. …
 
[Our version is so quick, though], so it took us only 15 minutes to incorporate all of the national text mining [that is available].
 
I bridge commonalties. I have worked with biologists for a long time and have actually built something that people use.

File Attachments

Filed under

The Scan

Call to Look Again

More than a dozen researchers penned a letter in Science saying a previous investigation into the origin of SARS-CoV-2 did not give theories equal consideration.

Not Always Trusted

In a new poll, slightly more than half of US adults have a great deal or quite a lot of trust in the Centers for Disease Control and Prevention, the Hill reports.

Identified Decades Later

A genetic genealogy approach has identified "Christy Crystal Creek," the New York Times reports.

Science Papers Report on Splicing Enhancer, Point of Care Test for Sexual Transmitted Disease

In Science this week: a novel RNA structural element that acts as a splicing enhancer, and more.