Skip to main content
Premium Trial:

Request an Annual Quote

Enlightenment Is Near


As the kids head back to school, it's natural for grownups to feel a twinge of nostalgia for the good old days when September marked the beginning of another wonderful year of learning. Well, stop sitting around like a statue with your head in your hands. Sign up for a distance-learning course in bioinformatics, and once again you'll be able to savor the pleasures of education. You can bet that if Rodin's thinker lived today, he'd be right there with you (virtually, of course), glued to his computer, studying bioinformatics online.

In this article, I'll flip the pages of the premier academic online course, the offerings of a leading commercial vendor, and online lecture notes from several flesh-and-blood courses. I'll also point and click across the most portable solution: good, old-fashioned textbooks.

There are plenty of choices. The challenge is to find the courseware that fits your background, learning style, and budget.

Open source à la S-Star

The premier academic course is from S-Star, a consortium of six academic institutions: the Karolinska Institute and the University of Uppsala from Sweden, the National University of Singapore, Stanford University, the University of Sydney, and the University of the Western Cape in South Africa.

The online course is free, and while enrollment is limited, anyone can sit in on the class or download the course material apparently without restriction. It's kind of like "open source" for courseware. Dare I say "open scourse"?

The curriculum consists of 14 lectures delivered by noted experts (see table). They vary in length from about half an hour to almost two hours. Registered students can participate in discussion groups and take the exams. Wow — an opportunity to take exams. You have to be really nostalgic to wait in line for that.

The lectures are delivered via the Web as streaming video accompanied by PowerPoint-like slides in PDF format. The idea is to play the video and watch the slides, just like you were watching the lecture live. You can also watch the slides without the video.

A key practical limitation of the lecture format is that there is no way to scan the video. The only choice is to watch it from beginning to end. If you miss a point, you can't go back and play it over. And if you totally understand or don't care about some topic, you can't skip forward. You can go back and forth in the slides, but the video doesn't follow.

I looked at several of the lectures — slides only, as I didn't have the patience to sit through the videos. In a lecture on ESTs, Winston Hide gives a good treatment of the practical difficulties of assembling ESTs into consensus sequences. Russ Altman gives a nice introduction to genetic networks, gene expression clustering, and RNA folding. Michael Levitt provides a remarkably lucid explanation of the physics behind protein folding, replete with a great discussion of how unhappy water molecules give rise to the hydrophobic effect. Betty Cheng's approach to protein physics is a bit more formal. Jan-Olov Hoog's lecture on proteomics has a good discussion of protein identification via mass spectrometry.

Glitz & Glamour at Gene Ed

If you want a more polished course, GeneEd is the place for you. The commercial firm offers about 15 courses on a range of biotech subjects, including four bioinformatics courses: introductory and advanced sequence analysis, SNPs, and microarrays. The intro course is about an hour long; the others are three hours.

Full disclosure: I know a fair bit about GeneEd's approach because I co-authored the company's microarray informatics course. But I don't get royalties (don't ask why), so I can write about it without a financial conflict of interest.

The course format is built around Flash graphics, which permit sophisticated cartoon animations to be delivered efficiently over the Web. A typical slide has some introductory text, then launches into a Flash animation to illustrate the main point of the slide. Meanwhile, an unseen lecturer speaks to you in a calm, reassuring voice. You can see a transcript of the lecture by clicking a button. There are frequent opportunities to interrupt the flow and get more detail by clicking buttons or rolling the mouse over well-marked active spots on the screen. Navigation is a real strong suit. You can jump from slide to slide and go back and forth within a slide at will.

The animations are incredibly effective in many cases. For example, the slide on the clustering of microarray data starts with a spreadsheet of expression ratios, colors the cells red or green depending on the ratio, morphs the spreadsheet into a heat map that has the colors but not the numbers, then rearranges the rows of the heat map the way a clustering algorithm would, and finally ends up with the typical picture you see in all the published microarray papers with the red and green squares nicely grouped in big blocks. The animation really demystifies the clustering process. They also have great animations of dot plots, Smith-Waterman, and many other concepts.

The suave speaker is an actor reading a script. It's not as colorful as S-Star's approach of having each expert lecture in his own voice, but there are fewer "ums" and "ers," and the locution is perfect. The actor's occasional mispronunciation of a famous scientist's name or technical term serves as an amusing diversion.

The courses are divided into multiple sections, each of which starts with learning objectives and ends with a short exam that tests how well you met them. One annoyance is that the questions are not multiple choice, and the answer-checking software is not very smart. For example, 'nucleotide sequence' is marked wrong when the program is looking for 'DNA sequence.'

Naturally, all this polish comes at a cost: $300 for introductory sequence analysis, $500 for the advanced course and SNPs, and $1,000 for microarray informatics.

The Semi-Academics

Organized courses are also offered by some semi-academic groups.

One is the Stanford Center for Professional Development, a unit of Stanford University that offers university courses to industry folks. It lists a biomedical informatics and two bioinformatics courses among 250 offerings. I suspect the courses are very polished, though I've never seen them. The fees are pretty steep — $3,000 to $4,000 a pop — but this is for an entire graduate-level course. Some courses require an additional corporate membership.

A more affordable semi-academic option is the Bioinformatics Institute of India, a nonprofit educational and R&D center. BII offers one-year distance learning courses in bio-, chem-, and medical informatics, with promises of more to come. The fee is a modest $400 a year.

Lecture Notes Galore

Beyond the organized courses, the Web has stacks of online lecture materials intended for students of regular courses at universities and elsewhere (see table, p. 90). Some places restrict access to registered students or local machines, but most seem to be open. I encountered a lot of dead links while surfing this material; this is not too surprising, as much of it was for use by students in courses from last semester or even earlier. (All the links in the table were live when I last touched them, but no promises.)

I was impressed by the quality of the material available. You can learn a lot from this stuff, although many of the university courses are aimed at computer science or math majors and might be heavy going for biology folks. Mixed in are short courses and practicums that offer useful how-to information.

Two good introductory courses are the ones by Brian Fristensky from the University of Manitoba and Robert Murphy from Carnegie Mellon University. Fristensky's course is probably better for a biologist and Murphy's for a computer scientist.

More comprehensive are courses by Russ Altman from Stanford and George Church from Harvard.

Altman covers sequence alignment and dynamic programming, multiple sequence alignment, protein structure and alignment, RNA secondary structure, microarrays, ontologies, motifs, hidden Markov models, energetics, genetic networks, gene finding, comparative genomics, phylogenetics, natural language processing, and proteomics. He ends with a lecture on career opportunities.

Church starts with introductory material on computational and statistical issues, then moves onto comparative genomics, polymorphisms and population genetics, pharmacogenomics, dynamic programming, Blast, multiple sequence alignment, hidden Markov models, microarrays and other gene-expression methods, protein structure and drug design, mass spectrometry, metabolic networks, molecular computing, and finally cellular, developmental, social, ecological, and commercial networks.

It's hard to believe that these are one-semester courses. The homework alone would take me two years — I'm starting to remember why I was so glad to be done with school.

Hard Copies

Two books show up in a lot of courses: Bioinformatics: Sequence and Genome Analysis by David Mount and Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition, edited by Andreas Baxevanis and Francis Ouellette. A third book I like a lot is Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology by Dan Gusfield. Each has its own strengths.

Mount's book is rigorous with a heavy emphasis on statistical issues. He states in the preface that the book is "written mainly for biologists," and the early pages have that feel. But the going gets pretty tough by mid-book.

The Baxevanis and Ouellette anthology is more varied. Overall, it's more of a how-to book that explains how to use various programs and databases.

Gusfield's book is a mainstream computer science text motivated by biological sequence analysis problems. It has the best explanations I've seen anywhere of Smith-Waterman, BLAST, and other important algorithms. But it's aimed at CSers; if you don't know what O(n x log n) means, this book is not for you.

The Upshot

Bioinformatics courseware is within everyone's reach. The open scourse S-Star is a great place to begin. For those with more money than time, GeneEd is a great next stop. For those with enough time for a full academic program, the Stanford Center for Professional Development and the Bioinformatics Institute of India sound intriguing, though I haven't seen the actual courseware of either.

To choose among the other online material, you'll need to spend an hour or two surfing the stacks.

Among the books, Mount is the choice for statistical aspects of important methods, Baxevanis and Ouellette has the most practical information, and Gusfield is a must-read for math and CS people. Oh heck, just buy them all — books are cheap.

It's time to hit the books or computers. Happy learning, everyone.


Title, Lecturer

Introductory Molecular Biology, Anthony Weiss

An Overview of the Computational Analysis of Biological Sequences, Subramanian Subbiah

Transcript Analysis, Winston Hide

Comparative Genomics, Liping Wei

Representations and Algorithms for Computational Molecular Biology, Russ Altman

Protein Structure Primer Shoba Ranganathan

Protein Structure Prediction, Betty Cheng

Protein Physics, Betty Cheng

Genomics and Computational Molecular Biology Genomics, Douglas Brutlag

Protein and Nucleic Acid Structure, Dynamics, and Engineering, Michael Levitt

Proteomics, Marc Wilkins

Proteomes: Proteins Expressed as a Genome, Jan-Olov Hoog

Structure Prediction for Macromolecular Interactions, Julie Mitchell

Protein-Ligand Modeling, Ten Eyck

Nat's Roundup of Online Course Material

Institutiion Name Author Date General description Other Topics URL
Stanford University Representations and Algorithms for Computational Molecular Biology Russ Altman Spring 2002 University course; nice set of links; parts available at S-Star; commercial version at Stanford Center for Professional Development Other topicsGenetic networks, energetics, text mining
University College, London An Interactive Web Practical in Bioinformatics Terri Attwood, Alex Michie Current Online companion to Attwood & Michie book
San Diego Supercomputer Center Biological Data and Analysis Tools Philip Bourne Fall 2001 University course Data modeling, ontologies, XML
Harvard University Genomics and Computational Biology George Church Fall 2001 University course

Proteomics, metabolic kinetics
University of British Columbia Topics in Algorithms and Complexity - Bioinformatics Anne Condon, Holger Hoos Spring 20021 University course; notes, not slides
University of Southern California Computational Genome Analysis Richard Deonier, Simon Tavaré, Michael Waterman Spring 2002 University course; looks like notes for a textbook

Sequence material only HMMs
University of Manitoba Bioinformatics Brian Fristensky Spring 2002 University course; very detailed notes, not slides; reasonable introductory course Linkage
Yale University Genomics & Bioinformatics Mark Gerstein Fall 2001 University course; online material for some lectures only
University of Toronto Proteomics and Bioinformatics Walid Houry, Boris Steipe, guest lecturers Spring 2002 University course; online material for some lectures only

Some proteomics; part of genomics course that encludes wet material
University of Colorado Health Sciences Center Introduction to Bioinformatics Larry Hunter Fall 2001 University course Text mining
University of Massachusetts, Lowell BioInformatics James Lyons-Weiler Fall 2001 University course; online material for some lectures only   unavailable
Biotechnology Computing Facility, Arizona Research Laboratories Computing Concepts for Bioinformatics Nirav Merchant 2000-01   Programming languages
California State University at Los Angeles Introduction to Bioinformatics Jamil Momand Spring 2002 University course
Biotechnology Computing Facility, Arizona Research Laboratories Presentations by David W. Mount David Mount 2000-01 University course
Cold Spring Harbor Laboratory Press Bioinformatics Online David Mount Current Online companion to Mount’s book
Carnegie Mellon University Computational Biology Robert Murphy Spring 2002 University course
University of Washington Computational Biology Larry Ruzzo Fall 2001 University course; notes, not slides
SUNY Stony Brook Advanced Algorithms (Computational Biology) Steven Skiena Fall 2000 University course; notes, not slides
Tel Aviv University Algorithms in Molecular Biology Ron Shamir Fall 2000 University course; notes, not slides Linkage
Tel Aviv University Analysis of Gene Expression Data,DNA Chips and Gene Networks Ron Shamir Spring 2002 University course; notes, not slides; some slow to load & poor visual quality Genetic networks
University of Washington Computational Biology Martin Tompa Winter2000 University course; looks like notes for a textbook

Many specialized sequence topics
Boston University DNA and Protein Sequence Analysis Zhiping Weng Fall 2001 University course
Tel Aviv University Introduction to Bioinformatics Course Racheli Zakarin,Hanan Stein Spring 2002 University course
KISAC (Karolinska Institute) KISAC Bioinformatics   Spring 2002 Five-day course
Swiss Institute of Bioinformatics Introduction to Bioinformatics   Fall 2002 One-week course; slides from 2001 online now
Human Genome Mapping Project Introductory Bioinformatics   July 2001 Training course
Resource Centre Universität Bielefeld Virtual School of Natural Sciences BioComputing   1995-96, mostly Heavily cited, but quite old; lots of material


Andreas D. Baxevanis and Francis Ouellette. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Wiley-Liss; ISBN: 0471383910; 2nd edition (April 6, 2001).

David Mount. Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory; ISBN: 0879696087; 1st edition (March 15, 2001).

Dan Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press; ISBN: 0521585198; 1st edition (January 15, 1997).



Stanford Center for Professional Development

Bioinformatics Institute of India

The Scan

Fertility Fraud Found

Consumer genetic testing has uncovered cases of fertility fraud that are leading to lawsuits, according to USA Today.

Ties Between Vigorous Exercise, ALS in Genetically At-Risk People

Regular strenuous exercise could contribute to motor neuron disease development among those already at genetic risk, Sky News reports.

Test Warning

The Guardian writes that the US regulators have warned against using a rapid COVID-19 test that is a key part of mass testing in the UK.

Science Papers Examine Feedback Mechanism Affecting Xist, Continuous Health Monitoring for Precision Medicine

In Science this week: analysis of cis confinement of the X-inactive specific transcript, and more.