Skip to main content
Premium Trial:

Request an Annual Quote

Ewan Birney, Team Leader at European Bioinformatics Institute

Premium

AT A GLANCE: Holds Ph.D. in genetics from Cambridge University, undergraduate biochemistry degree from Oxford University. Began working for EBI in January, running the Ensembl project. Enjoys running and cooking.

Q: Where will bioinformatics be in five years? Ten years?

A: I suspect part of bioinformatics will just disappear into general research biology, much as molecular biology has. Gathering datasets, manipulating and interpreting them will become part of everyday life. But on top of that I see the field still growing and probably merging with other “informatics” fields, such as econometrics or aspects of social studies. Bioinformatics is the best example in my mind of “applied computer science” and I think we will be leading this field for a while.

Q: What are the biggest challenges bioinformatics must overcome?

A: There are many: Large data sets, complex data sets, heterogeneous data sets. Different parts of bioinformatics stretch different aspects. Storage, compute resource, algorithmic ability or simple, straightforward software engineering are all problems for some areas. Undoubtedly the biggest problem is getting skilled people into the field. It is not about bringing in biologists or computer scientists any more; it is about training real bioinformaticists, regardless of their background.

Q: What hardware do you use?

A: At the Hinxton campus (where both the Sanger Center and European Bioinformatics Institute are) there is mixture of Compaq Alpha, SGI, Sun Microsystems, and Linux boxes. The largest compute and storage systems are built from Compaq Alphas, but we have a number of cost effective Linux farms as well.

Q: Which databases do you use? Public, proprietary or third party.

A: Being one of the main international sites for databases, the EBI hosts a large number of public domain content databases. These are managed in a variety of implementations: Oracle for the large, primary archive databases, such as the European Molecular Biology Laboratory data library, which has been stably managed at the EBI now for over a decade. SRS plays a role for managing smaller databases. Inside Ensembl we use the open source RDB MySQL heavily. MySQL for us handles the throughput well, is easy to administer and can run on laptops, giving everyone a development environment they can take home. I wouldn’t use MySQL for everything however—its lack of transactions and foreign key restraints would scare someone concerned about watertight data integrity. I expect MySQL to improve steadily over the next couple of years in this data integrity area with the announcement that MySQL is being re-licensed under the GNU Public License.

Q: What bioinformatics software do you use? Do you use in-house developed or third party software?

A: I use a lot of open source bioinformatics software. Open source software is a great fit to bioinformatics as it is hard to provide “one size fits all” software for bioinformatics, and in any case, the real value is in the data, not the software. Ensembl is a big Perl system at the moment, and sits on top of Bioperl as a base level bioinformatics library. We are planning to transition over to Java, again using the open source BioJava project as our base level library. Ensembl itself uses many pieces of academic software, such as Genscan, Est2Genome and GeneWise.

Q: How do you integrate your data?

A: With hard work and good algorithms! There is no magic bullet for data integration, just understanding and sweat.

Q: How large is your bioinformatics staff? Is the organization hiring additional bioinformatics staff?

A: I think the Hinxton campus has the largest number of bioinformatics staff anywhere in the world, with upwards of 200 people doing bioinformatics as a central part of their work. Ensembl is growing as well.

Filed under

The Scan

Drug Response Variants May Be Distinct in Somatic, Germline Samples

Based on variants from across 21 drug response genes, researchers in The Pharmacogenomics Journal suspect that tumor-only DNA sequences may miss drug response clues found in the germline.

Breast Cancer Risk Gene Candidates Found by Multi-Ancestry Low-Frequency Variant Analysis

Researchers narrowed in on new and known risk gene candidates with variant profiles for almost 83,500 individuals with breast cancer and 59,199 unaffected controls in Genome Medicine.

Health-Related Quality of Life Gets Boost After Microbiome-Based Treatment for Recurrent C. Diff

A secondary analysis of Phase 3 clinical trial data in JAMA Network Open suggests an investigational oral microbiome-based drug may lead to enhanced quality of life measures.

Study Follows Consequences of Early Confirmatory Trials for Accelerated Approval Indications

Time to traditional approval or withdrawal was shorter when confirmatory trials started prior to accelerated approval, though overall regulatory outcomes remained similar, a JAMA study finds.