Skip to main content
Premium Trial:

Request an Annual Quote

In the Informatics Trenches

Premium

  • Title: Team Leader, European Bioinformatics Institute
  • Education: PhD,  Washington University, 2004
  • Recommended by: Alan Guttmacher

As the head of the Vertebrate Genomics Group at the European Bioinformatics Institute and a co-leader of the data flow group for the 1,000 Genomes Project, Paul Flicek is intimately familiar with the ongoing battle to make sense out of the endless stream of next-gen sequencing data. Flicek helps guide the vertebrate genomics group in its responsibilities for providing the comparative, variation, and functional genomics resources within Ensembl, a joint project led by EBI and the Sanger Institute to maintain eukaryotic genome data. And in his work for the 1,000 Genomes Project, which already has upwards of 3 terabases of data, he has the daunting task of getting the data processed so that it can be used and analyzed by the community.

Before he took up informatics arms on the front lines of next-gen data management for large-scale genome projects, Flicek was a graduate student who had his sights set on studying tissue engineering and artificial organ production. An introductory course at Washington University on computational molecular biology led by Sean Eddy changed all that. "I was in that course for about three weeks when I decided that I was going to do [computational biology] rather than anything else I had decided to do at the time," Flicek says. "That was at the same time when human genome sequencing was ramping up at Washington University and the whole excitement around finishing the human genome was very real, so the first work that I did was on gene prediction and comparative genomics-based gene prediction with the program TwinScan." Soon after finishing his PhD, Flicek went to join EBI for his postdoctoral work, where he continued his genome annotation efforts and also became a member of the ENCODE project.

EBI is also where Flicek met Ewan Birney, who gave him a practical perspective on how to approach real-world bioinformatics problems, he says. "One of the aspects of working with Ewan and the way he thinks [helped] me understand the real importance of the pieces of the puzzle for bioinformatics, and as a side effect, once the large-scale pieces get built, to be able to handle data that most other people would struggle to handle," says Flicek.

One of the biggest challenges he faces is next-gen sequence data. "I have a slide that I give that compares the number of base pairs sequenced for the Human Genome Project to the number of base pairs that a big genome center can sequence today with next-gen technologies," Flicek says. "It used to be that the whole Human Genome Project could be done in a week, and now it's just a few hours. Keeping up with that and making sense of it is a big challenge because our ability to produce data is way ahead of our ability to analyze and make sense of it."

Looking ahead

Flicek says that improvements in sequencing accuracy will be critical to accelerating how scientists can apply that data. "A massive change in the accuracy of sequencing would mean that the amount of sequencing that we generate in a project like the 1,000 Genomes Project or in the ENCODE project, we could do many different things very quickly," he says. That'll be a big step, though: Flicek's wish would be to get raw data quality "a million times more accurate" than it is today, "so that the chance of an error in the sequencing was than one in 3 billion."

Publications of note

In 2007, Flicek and a team of researchers at EBI and the Wellcome Trust Sanger Institute published "Ensembl 2008" in Nucleic Acids Research. The team provided an update to the research community on new additions to the project, including extensive support for functional genomics data in the form of a specialized functional genomics database, genome-wide maps of protein–DNA interactions, the Ensembl regulatory build, and other improvements.

Filed under

The Scan

Highly Similar

Researchers have uncovered bat viruses that are highly similar to SARS-CoV-2, according to Nature News.

Gain of Oversight

According to the Wall Street Journal, the Biden Administration is considering greater oversight of gain-of-function research.

Lasker for mRNA Vaccine Work

The Scientist reports that researchers whose work enabled the development of mRNA-based vaccines are among this year's Lasker Award winners

PLOS Papers on Causal Variant Mapping, Ancient Salmonella, ALK Fusion Test for NSCLC

In PLOS this week: MsCAVIAR approach to map causal variants, analysis of ancient Salmonella, and more.