NEW YORK (GenomeWeb News) – Four years ago, a team of Canadian researchers published an article reporting the results of a 2008 international survey of nearly 2,000 scientists that tried to ascertain how they develop and use software in their research.
They found, among other things, that "the knowledge required to develop and use scientific software is primarily acquired from peers and through self-study, rather than formal education and training." Specifically, author Jo Erskine Hannay of the University of Oslo, et al., reported that about 97 percent of respondents thought that informal self-study was important for both developing and using software. About 60 percent and 69 percent thought learning from peers was important for developing and using software respectively. Meanwhile, roughly 34 percent thought that education at an institution was important for developing software, and around 27 percent thought it was important for using software.
Now it's not clear from the paper if the survey participants were all life science researchers or not; however, some of its conclusions, it would be safe to say, appear to be true for at least some bioinformaticists who came into their current field of study with a purely biological background. For example, Krisztina Rigó, a staff scientist at next-gen sequencing informatics firm Omixon was trained as a zoologist — with an emphasis in parasitology — and prior to joining the company had next to no experience with NGS let alone data analysis, she told GenomeWeb. Rigó now blogs regularly for Omixon about bioinformatics workflows and related reading materials based on things she learned on her own.
Similarly, Maria Victoria Schneider, a member of the senior management team at The Genome Analysis Center, TGAC, in the UK, started her career as an evolutionary biologist but then gradually moved over to the software side, teaching herself some of the basics of bioinformatics and also taking some classes, she said. Others, like Stephen Turner, director of the University of Virginia's bioinformatics core and assistant professor of public health sciences, come into the field with backgrounds in biology and mathematics. Turner was trained as a molecular biologist and geneticist but also picked up a degree in statistics en route to become a bioinformaticist.
On the other side of the spectrum are those who come to the field from the computer science side. C. Titus Brown, an assistant professor in Michigan State University's departments of computer science and engineering and microbiology and molecular genetics, for example, started out programming, developing, and managing software for users in a variety of scientific disciplines including climatology before working in a genomics and molecular biology laboratory as part of his doctoral degree; and Richard Holland, chief business officer at bioinformatics consultancy Eagle Genomics, studied computer science before landing a job in a biotechnology company and working his way into the field from there.
Part of the reason for all this self-learning is that it's only been within the last decade that universities began offering structured degrees in bioinformatics — mostly masters, doctoral, and a few certificate programs. These programs tend to be conversion courses — that is, they provide bioinformatics training to students coming to the field from either biology or computer science.
However, in the US at least, bioinformatics training at the undergraduate level is still lacking, Pavel Pevzner, a computer science professor at the University of California, San Diego, told GenomeWeb.
Currently, "universities graduate biologists who are mainly computationally illiterate," he says. Even though computational tools are now used extensively in biological research, "there are no required bioinformatics classes and … as a result, a whole generation of biologists [have] graduated not ready to work in their own discipline."
Part of the problem is that "we have not figured out yet how to teach …computational concepts … to people who are not exposed to computational culture," he said. "There are so many things to know in computational sciences before you go to more complex problems" and without some understanding and knowledge about these basic concepts "it's very difficult to teach a computational discipline."
With this in mind, the annual Conference on Bioinformatics Education was first organized to discuss the best ways of teaching biologists the computational concepts of bioinformatics, Pevzner said. He is also involved in the Bioinformatics for Biologists effort, a project that has developed an introductory textbook for undergraduate biology students that combines diverse content submitted by multiple researchers.
Furthermore, Pevzner is developing an online course that will be available this fall through Coursera, an educational technology company that offers open online courses. Each module will begin with a biological question and will show students how to transform the problem into a computational one as well as what computational methods can help solve it. "This way, it doesn’t matter whether their university offers or doesn’t offer bioinformatics courses … biology students will be able to take the Coursera course and learn what bioinformatics is all about," he said.
Graduate degrees and certificates
For those seeking a more structured entry into the field, there are plenty of bioinformatics, computational biology, and biomedical informatics degrees and certificate programs to choose from. The International Society for Computational Biology's website has a comprehensive list of international institutions that offer relevant programs.
Some examples of US-based programs include a master's degree in bioinformatics offered by Johns Hopkins University. This 11-course degree — offered both onsite and online — includes core courses in molecular biology and epigenetics; introductory bioinformatics courses that teach the basics of the field and some programming basics as well as computer science courses that among other things provide students with a foundation in algorithms. Students can also choose from a variety of elective courses and can plan independent research projects.
The nearly 10-year old bioinformatics degree strives to provide students with a balanced education that marries both computer science and bioscience curricula, Kristina Obom, director of JHU's center for biotechnology education and program director for its MS programs in bioinformatics and biotechnology told GenomeWeb. This way, "students walk out of the program being able to program in more than one language like Java or C++ or Perl" and at the same time "understand algorithms and how they work [as well as] the biology behind the bioinformatics."
JHU's entry requirements are quite steep. Students are required to have at least a bachelor's in the life sciences or a related engineering field and to have taken organic chemistry, biochemistry, calculus, and biostatistics, as well as programming and data structures courses. But this shouldn’t discourage interested students, according to Obom. About 85 percent of incoming students do not have all the pre-requisites, she said. They're admitted on a provisional basis and allowed to pick up the classes they are missing at JHU or at another institution.
Separately, JHU also offers a post master's certificate in sequence analysis and genomics. This sequence, Obom said, is intended for candidates who want a basic understanding of bioinformatics tools and concepts and their application to DNA sequence assembly and analysis. "It’s a five-course program [for] people who need some of these skills and a credential as well [but] don't need a full degree."
Another example is the bioinformatics program at Boston University, which offers both master's and doctoral degrees in the subject. The core curriculum for both programs includes courses in computational biology, biological database systems, and molecular biology, as well as internship opportunities and graduate seminars. Paola Sebastiani, a professor of biostatistics at BU, told GenomeWeb that while the university's core bioinformatics curriculum tends to focus more toward the computational aspects of the field, this is done to ensure that students are proficient in both the statistical and computational methods needed for bioinformatics. It strives to balance this by offering a variety of additional classes taken from multiple disciplines including chemistry, biostatistics, and the school of medicine, she said.
Blogs, chat rooms, workshops, and online education platforms
For the cash-strapped, those who don't have time for another degree, and the constant learners, there are much cheaper ways to get a bioinformatics education. Blogs, wiki pages, discussion forums, and yes, even Twitter, can provide simpler, shorter, and more manageable chunks of information about specific applications of bioinformatics tools and techniques. They're also good places to get involved in discussions about best practices and learn about new methods as well as new ways of running analyses. Furthermore, because of the open nature of these resources, it's much easier to spread information about new methods.
Some examples are the Getting Genetics Done blog, co-authored by University of Virginia's Stephen Turner, and the Living in an Ivory Basement blog authored by MSU's Titus Brown. Both, which are updated regularly, share tips and provide useful information about bioinformatics tools and how to use them in research. Tuner also maintains a list of bioinformatics workshops.
There are also forums such as BioStar and SeqAnswers, which are outlets for researchers to ask and answer research-related informatics questions and also to share information about new methods and approaches for using bioinformatics tools. "Those are high-quality websites in terms of the people that post there and the opinions they have," MSU's Brown said. Also, since many researchers read and contribute to these forums, "you can sort of trust what you read there because there are a lot of people reading it and correcting it," he said.
Free and fee-based workshops and webinars are another way to learn bioinformatics' ins and outs. Places such as Cold Spring Harbor Laboratory and the website Bioinformatics.ca both offer workshops and short courses that cover bioinformatics content. Topics include programming for biology, computational and comparative genomics, bioinformatics for cancer, and informatics on high throughput sequencing data. Similarly, the European Bioinformatics Institute offers courses focused on topics such as metagenomics data analysis and advanced RNA- and ChIP-sequencing analysis.
Bellevue, Wash.-based OpenHelix offers both free and subscription-based tutorials that introduce bench scientists to a variety of tools they might need to use in their research such as the UCSC GenomeBrowser and Galaxy as well as databases such as FlyBase and BioMart. These tutorials focus on equipping biological researchers with the fundamental concepts needed to use the tools in their projects, and also to develop and run more complex kinds of analyses, Mary Mangan, the company's president, told GenomeWeb.
Meanwhile, some commercial bioinformatics vendors often provide training in the form of webinars and workshops or through company-run blogs. For example, Omixon's blog includes posts that introduce bioinformatics neophytes to file formats, data analysis workflows, and suggested reading. Golden Helix uses a mix of blog posts and webcasts to discuss applications of its analysis products and to provide education on topics not related to its business, Joshua Forsythe, the company's vice president of marketing, told GenomeWeb. The company selects topics based on interactions with the research community at conferences and trade shows and also from projects it's working on with customers, he said.
Another resource currently being developed is the Global Organization for Bioinformatics Learning, Education, and Training, or GOBLET, a newly minted international consortium that aims to build a registry of bioinformatics trainers and training materials, and to set standards for bioinformatics education and training, among other activities. Last month, the organizers published a paper in Briefings in Bioinformatics that lists several best practices for providing bioinformatics training.
GOBLET grew out of the Bioinformatics Training Network, BTN (BI 7/30/2010), an earlier community-based repository of training materials. TGAC's Maria Victoria Schneider, one of GOBLET's organizers explained that the new consortium was launched in order to " develop an institutional agnostic initiative that will keep what worked from the BTN but evolved … according to the community needs and not a specific 'personal' [or] 'institution driven' agenda." Currently, 26 organizations have joined the consortium and signed its memorandum of understanding.
Finally, online education platforms such as Coursera offer useful classes around things like biological network analysis and computational molecular evolution. There's also Rosalind, an open education platform that uses a problem solving approach similar to those used by Project Euler and Google Code Jam to teach bioinformatics concepts and problems, with users guided through a series of increasingly complex problems that address key concepts in biology and programming (BI 10/5/2012).
Designing a personal curriculum
With so many options to choose from, for those getting their feet wet in the field, Eagle's Holland suggests starting with a course or workshop offered by established research institutes like the EBI. "It's very easy just to Google your problems and find a million bulletin boards and blogs that might talk about it, but there is no real peer review of those resources, some of them might be different and some might not be," he said. "If you haven’t learned to spot the good from the bad, try to stick to the major institutes and resources first."
Alongside learning about what tools are out there, Holland also suggested that newbies try to understand the fundamental concepts that underlie the tools they use, this way they can apply them to more complex questions. "There are a million people out there who know how to run BLAST [for instance] but only in a majority of common situations," he said. "[If you] ask a complicated question, it's only the really good ones that can answer … not because they know the answer but because … they really understand how these things work [and] can apply them to unusual situations."
For those coming at this from the computer science perspective, University of Virginia's Turner suggested spending time reviewing biological literature and becoming familiar with the sorts of analysis being done. "Start with some reviews … and look at some primary literature until you understand what they are doing and why they are doing it," he said.
"The nice thing about our field is that a lot of the data is publicly available, and so you can go to the Gene Expression Ominbus or the Short Read Archive and download data and maybe download code used to run their analysis and try to reproduce what they are doing at least to some extent." Also, Twitter can be a good resource for learning about new tools and different training resources, he said.
MSU's Brown also highlighted Twitter as a useful source. He suggested following the Twitter accounts of people that write blog posts that prove useful or answer specific questions. "Odds are that they will then be retweeting blogposts that they find interesting and then follow those people on Twitter," he said.
Brown also suggested seeking out and becoming part of a local network of like-minded researchers.
"People often ask us in our various training sessions, what language should we be learning – python or R … a very sensible answer to that is go around your department, find other people who are doing similar things and ask them what language they know. If the answer is 90 percent of them know R, learn R because that way you will have somebody to ask for help … that same networking strategy applies to bioinformatics," he said. "Start building that local network of people to ask for advice on where to look online and what to read about and what papers to follow."