Associate professor in medical genetics
University of British Columbia
At A Glance
Name: Francis Ouellette
Position: Associate professor in medical genetics, University of British Columbia, since 2004; Director, bioinformatics core facility for the Center for Molecular Medicine and Therapeutics, since 1998.
Background: GenBank coordinator for the National Center for Biotechnology Information, 1992-1998.
Manager, Yeast chromosome I sequencing project in the department of biology at McGill University, 1992-1993.
The Blueprint Initiative, a non-profit organization led by Chris Hogue that developed the Biomolecular Interaction Network Database, recently laid off about half of its staff after failing to renew funding from Genome Canada. ProteoMonitor caught up with Francis Ouellette, a former administrator of BIND, to find out about the early days of the database, and about database work being done now at UBC.
How did you become involved in bioinformatics and BIND?
My background's in molecular biology. As a graduate student, I sort of took an affinity to computers and became the computer person. And then for my first sort-of bioinformatics job, I was involved in the yeast genome project — that was in the early 1990s. And after that I worked at the NCBI, where I was the GenBank coordinator for five years. I was there from 1992 to 1998 — from year five to year 10 of NCBI.
And then while I was there, I actually gave a talk at a workshop in Vancouver at UBC and they asked me to consider a position in Vancouver, and that's how I ended up doing that. I guess also relevant to my protein interaction world is while I was NCBI, that's where I met Chris Hogue — he was a postdoc there while I was there. We were part of the intelligencia, and we sort of got to know each other.
Were you a postdoc at NCBI?
Well, that's actually a sort-of checkered part of my history. When I was a PhD student at McGill, I actually quit, and I never finished my PhD. When I got hired onto the yeast genome project, it was actually an ad for a postdoc, but they decided that I had the experience that was necessary to do the job, so they hired me instead. And then at NCBI, I was a government employee. And now I'm actually an associate professor in medical genetics without a PhD.
What were some of the biggest challenges of the yeast genome project, and then GenBank?
The yeast genome project for myself and most bioinformatics people at the time was challenging because there were no standards — we were just flying by the seat of our pants, basically. The yeast genome was the first eukaryotic genome to be done. We were a small shop in Canada, and there was another small shop in Europe, and the yeast genome was sort of stitched together from many small pieces. The large genome centers that we know today sort of came out of that. We all learned from that experience.
To set a perspective on my GenBank days, when I joined GenBank in 1993 there were 300,000 records in GenBank. When I left GenBank five years later, there were two million records, and today there are 50 million records in GenBank. DNA databases are sort of growing faster than Moore's law, so we have to sort of deal with the data. They've managed to sort of make the available computing power deal with the size of the database. There's obviously been advances in database technology and compressing data.
Were you involved at GenBank with developing search algorithms?
No, mostly I was responsible for genome annotation. I was also involved as a sort of mid-level type senior person with many aspects of testing and developing all sorts of tools and applications.
Why did you decide to leave NCBI?
I left NCBI because I had a great opportunity to come to Vancouver and start a group and lead my own research activities, and have a little bit more freedom about what I was to do and where I was going to go. I'm heading up this new bioinformatics center at UBC, and that's definitely the culmination of this opportunity to develop my own thing over the years.
What were you working on when you first went to UBC?
When I first came to Vancouver, my first job coming here was to be director of the bioinformatics core facility, so we were providing training and support for bioinformatics at the Center for Molecular Medicine and Therapeutics — the CMMT. In parallel, I was responsible for bioinformatics training and development for the Canadian Genetic Diseases Network — CGDN. And there, that's where we developed the Canadian bioinformatics workshop series, which is sort of like a traveling road show of bioinformatics training across Canada. We've taught one- and two-week workshops across Vancouver, Calgary, Toronto, Montreal, Fredrickton, Ottawa, and so forth. I'm one of the instructors and I do a lot of teaching.
When did you become involved in protein work?
Soon after I came to Vancouver, I was involved with Chris Hogue in establishing and setting up the BIND database — Biomolecular Interaction Network Database. That was around 1999. That was the first time I became involved with protein interaction type work — it was definitely a sort of a new dimension in the bioinformatics world.
I was mostly involved in the establishment of the various resources — writing a lot of grants — and getting the thing off the ground. I was involved in developing the curation models, and so forth. And then when it actually got funded, funds ended up going only to Ontario, not to British Columbia, so therefore I was less involved. But I was still involved as a member of the scientific advisory group of BIND.
How did starting up BIND compare to your experience with GenBank?
I think the big difference is it was again at the leading edge, or bleeding edge — there were no standards in the field. People didn't know how to work with the data, they complained about the data quality. Having to deal with making the data open access was something new. Getting the community on board, and journals, was a task. It's still a challenge today.
Were you involved in trying to get this latest round of funding?
No, I wasn't. I'm still currently in the scientific advisory group of BIND, but I was not involved in the current demise of BIND. I'm pretty upset about it though.
So the biggest challenge to BIND was that there were no previous standards?
There were no standards. Chris and his group were definitely the pioneers of that, and are owed the credit of establishing standards for interaction databases. We have to work with dealing with quality of the data — what's a good interaction, what's a bad interaction — and the various methods that we have to work with in representing that information. I've spent a bit of time thinking and working on that in the last couple of years, not really as a participant of the BIND process, but more of a normal bioinformatics user in the community. Basically, my work is about integrating the various interaction databases that are out there — we've sort of looked at the top five interaction databases. So it's about getting all this data from all of these databases into a sort of standard format that will allow us to compare and contrast and accumulate all the information from all the databases. And then we can start thinking of ways to put some metric on the quality of the interactions.
There's definitely a lot of discussion amongst these top players to work together and have a standard format, a standard key identifier. There's a paper that came out last year on the Protein Standards Initiative adopting a molecular interactions standard for all of these databases, and BIND was a co-signatory of that, as well as a number of other databases. All these databases sort of agreed that they need to move towards a standard, it's just taking them a bit of time to get there. It's a lot of work, and I think funding of databases is actually quite challenging right now. There's a few examples of where funding of databases is very well supported, the classic example being GenBank and NCBI, but for all the other databases, it's still quite a challenge to get those funded. And not just funded one time, but funded long term so people can actually rely on those databases being there and develop software and resources around those databases.
Myself, right now, I have a few papers in the works that rely on BIND being out there. Although I saw press clips out there today about BIND being supported by Blueprint Asia and Singapore, it's still a very unstable future.
Now that you're not as involved with BIND, what projects are you working on?
I'm very much interested and involved with genome annotation and database integration. A lot of time is spent right now on integrating interaction data with other data types — gene ontology and gene expression and so on. I think one of the great challenges for bioinformatics in the near future is the integration — there's tons and tons of data out there, but it's not that readily integrated and it's become a big mess. We're definitely interested in uncluttering the mess and making it a more workable information space for people and for ourselves to use.
How many people do you have in your research group?
About a dozen people.
What are some of your long-term goals for the future?
We're looking to do more data integration — integrating not just networks of protein interactions, but networks from gene expression, networks from text mining. Networks in general is definitely where the new stuff is happening. It's a new sort of graph theory type stuff. We've been working on those tools already.
If you take an interaction network, for example, there's some classic pictures of all the proteins from the yeast genome that interact with each other, and if you look at it, it looks like a big hairball. We need to find new ways of representing that information graphically. Although graphic representation of networks is not an area of expertise of mine, it's definitely an area where we want to work with people who do do that type of stuff to help us with the networks we're generating.