As the kids head back to school, it's natural for grownups to feel a twinge of nostalgia for the good old days when September marked the beginning of another wonderful year of learning. Well, stop sitting around like a statue with your head in your hands. Sign up for a distance-learning course in bioinformatics, and once again you'll be able to savor the pleasures of education. You can bet that if Rodin's thinker lived today, he'd be right there with you (virtually, of course), glued to his computer, studying bioinformatics online.
In this article, I'll flip the pages of the premier academic online course, the offerings of a leading commercial vendor, and online lecture notes from several flesh-and-blood courses. I'll also point and click across the most portable solution: good, old-fashioned textbooks.
There are plenty of choices. The challenge is to find the courseware that fits your background, learning style, and budget.
Open source à la S-Star
The premier academic course is from S-Star, a consortium of six academic institutions: the Karolinska Institute and the University of Uppsala from Sweden, the National University of Singapore, Stanford University, the University of Sydney, and the University of the Western Cape in South Africa.
The online course is free, and while enrollment is limited, anyone can sit in on the class or download the course material apparently without restriction. It's kind of like "open source" for courseware. Dare I say "open scourse"?
The curriculum consists of 14 lectures delivered by noted experts (see table). They vary in length from about half an hour to almost two hours. Registered students can participate in discussion groups and take the exams. Wow — an opportunity to take exams. You have to be really nostalgic to wait in line for that.
The lectures are delivered via the Web as streaming video accompanied by PowerPoint-like slides in PDF format. The idea is to play the video and watch the slides, just like you were watching the lecture live. You can also watch the slides without the video.
A key practical limitation of the lecture format is that there is no way to scan the video. The only choice is to watch it from beginning to end. If you miss a point, you can't go back and play it over. And if you totally understand or don't care about some topic, you can't skip forward. You can go back and forth in the slides, but the video doesn't follow.
I looked at several of the lectures — slides only, as I didn't have the patience to sit through the videos. In a lecture on ESTs, Winston Hide gives a good treatment of the practical difficulties of assembling ESTs into consensus sequences. Russ Altman gives a nice introduction to genetic networks, gene expression clustering, and RNA folding. Michael Levitt provides a remarkably lucid explanation of the physics behind protein folding, replete with a great discussion of how unhappy water molecules give rise to the hydrophobic effect. Betty Cheng's approach to protein physics is a bit more formal. Jan-Olov Hoog's lecture on proteomics has a good discussion of protein identification via mass spectrometry.
Glitz & Glamour at Gene Ed
If you want a more polished course, GeneEd is the place for you. The commercial firm offers about 15 courses on a range of biotech subjects, including four bioinformatics courses: introductory and advanced sequence analysis, SNPs, and microarrays. The intro course is about an hour long; the others are three hours.
Full disclosure: I know a fair bit about GeneEd's approach because I co-authored the company's microarray informatics course. But I don't get royalties (don't ask why), so I can write about it without a financial conflict of interest.
The course format is built around Flash graphics, which permit sophisticated cartoon animations to be delivered efficiently over the Web. A typical slide has some introductory text, then launches into a Flash animation to illustrate the main point of the slide. Meanwhile, an unseen lecturer speaks to you in a calm, reassuring voice. You can see a transcript of the lecture by clicking a button. There are frequent opportunities to interrupt the flow and get more detail by clicking buttons or rolling the mouse over well-marked active spots on the screen. Navigation is a real strong suit. You can jump from slide to slide and go back and forth within a slide at will.
The animations are incredibly effective in many cases. For example, the slide on the clustering of microarray data starts with a spreadsheet of expression ratios, colors the cells red or green depending on the ratio, morphs the spreadsheet into a heat map that has the colors but not the numbers, then rearranges the rows of the heat map the way a clustering algorithm would, and finally ends up with the typical picture you see in all the published microarray papers with the red and green squares nicely grouped in big blocks. The animation really demystifies the clustering process. They also have great animations of dot plots, Smith-Waterman, and many other concepts.
The suave speaker is an actor reading a script. It's not as colorful as S-Star's approach of having each expert lecture in his own voice, but there are fewer "ums" and "ers," and the locution is perfect. The actor's occasional mispronunciation of a famous scientist's name or technical term serves as an amusing diversion.
The courses are divided into multiple sections, each of which starts with learning objectives and ends with a short exam that tests how well you met them. One annoyance is that the questions are not multiple choice, and the answer-checking software is not very smart. For example, 'nucleotide sequence' is marked wrong when the program is looking for 'DNA sequence.'
Naturally, all this polish comes at a cost: $300 for introductory sequence analysis, $500 for the advanced course and SNPs, and $1,000 for microarray informatics.
The Semi-Academics
Organized courses are also offered by some semi-academic groups.
One is the Stanford Center for Professional Development, a unit of Stanford University that offers university courses to industry folks. It lists a biomedical informatics and two bioinformatics courses among 250 offerings. I suspect the courses are very polished, though I've never seen them. The fees are pretty steep — $3,000 to $4,000 a pop — but this is for an entire graduate-level course. Some courses require an additional corporate membership.
A more affordable semi-academic option is the Bioinformatics Institute of India, a nonprofit educational and R&D center. BII offers one-year distance learning courses in bio-, chem-, and medical informatics, with promises of more to come. The fee is a modest $400 a year.
Lecture Notes Galore
Beyond the organized courses, the Web has stacks of online lecture materials intended for students of regular courses at universities and elsewhere (see table, p. 90). Some places restrict access to registered students or local machines, but most seem to be open. I encountered a lot of dead links while surfing this material; this is not too surprising, as much of it was for use by students in courses from last semester or even earlier. (All the links in the table were live when I last touched them, but no promises.)
I was impressed by the quality of the material available. You can learn a lot from this stuff, although many of the university courses are aimed at computer science or math majors and might be heavy going for biology folks. Mixed in are short courses and practicums that offer useful how-to information.
Two good introductory courses are the ones by Brian Fristensky from the University of Manitoba and Robert Murphy from Carnegie Mellon University. Fristensky's course is probably better for a biologist and Murphy's for a computer scientist.
More comprehensive are courses by Russ Altman from Stanford and George Church from Harvard.
Altman covers sequence alignment and dynamic programming, multiple sequence alignment, protein structure and alignment, RNA secondary structure, microarrays, ontologies, motifs, hidden Markov models, energetics, genetic networks, gene finding, comparative genomics, phylogenetics, natural language processing, and proteomics. He ends with a lecture on career opportunities.
Church starts with introductory material on computational and statistical issues, then moves onto comparative genomics, polymorphisms and population genetics, pharmacogenomics, dynamic programming, Blast, multiple sequence alignment, hidden Markov models, microarrays and other gene-expression methods, protein structure and drug design, mass spectrometry, metabolic networks, molecular computing, and finally cellular, developmental, social, ecological, and commercial networks.
It's hard to believe that these are one-semester courses. The homework alone would take me two years — I'm starting to remember why I was so glad to be done with school.
Hard Copies
Two books show up in a lot of courses: Bioinformatics: Sequence and Genome Analysis by David Mount and Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition, edited by Andreas Baxevanis and Francis Ouellette. A third book I like a lot is Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology by Dan Gusfield. Each has its own strengths.
Mount's book is rigorous with a heavy emphasis on statistical issues. He states in the preface that the book is "written mainly for biologists," and the early pages have that feel. But the going gets pretty tough by mid-book.
The Baxevanis and Ouellette anthology is more varied. Overall, it's more of a how-to book that explains how to use various programs and databases.
Gusfield's book is a mainstream computer science text motivated by biological sequence analysis problems. It has the best explanations I've seen anywhere of Smith-Waterman, BLAST, and other important algorithms. But it's aimed at CSers; if you don't know what O(n x log n) means, this book is not for you.
The Upshot
Bioinformatics courseware is within everyone's reach. The open scourse S-Star is a great place to begin. For those with more money than time, GeneEd is a great next stop. For those with enough time for a full academic program, the Stanford Center for Professional Development and the Bioinformatics Institute of India sound intriguing, though I haven't seen the actual courseware of either.
To choose among the other online material, you'll need to spend an hour or two surfing the stacks.
Among the books, Mount is the choice for statistical aspects of important methods, Baxevanis and Ouellette has the most practical information, and Gusfield is a must-read for math and CS people. Oh heck, just buy them all — books are cheap.
It's time to hit the books or computers. Happy learning, everyone.
THE STARS OF S-STAR
Title, Lecturer
Introductory Molecular Biology, Anthony Weiss
An Overview of the Computational Analysis of Biological Sequences, Subramanian Subbiah
Transcript Analysis, Winston Hide
Comparative Genomics, Liping Wei
Representations and Algorithms for Computational Molecular Biology, Russ Altman
Protein Structure Primer Shoba Ranganathan
Protein Structure Prediction, Betty Cheng
Protein Physics, Betty Cheng
Genomics and Computational Molecular Biology Genomics, Douglas Brutlag
Protein and Nucleic Acid Structure, Dynamics, and Engineering, Michael Levitt
Proteomics, Marc Wilkins
Proteomes: Proteins Expressed as a Genome, Jan-Olov Hoog
Structure Prediction for Macromolecular Interactions, Julie Mitchell
Protein-Ligand Modeling, Ten Eyck
Nat's Roundup of Online Course Material
Institutiion | Name | Author | Date | General description | Other Topics | URL |
Stanford University | Representations and Algorithms for Computational Molecular Biology | Russ Altman | Spring 2002 | University course; nice set of links; parts available at S-Star; commercial version at Stanford Center for Professional Development | Other topicsGenetic networks, energetics, text mining
|
www.smi.stanford.edu/projects /helix/bmi214/ |
University College, London | An Interactive Web Practical in Bioinformatics | Terri Attwood, Alex Michie | Current | Online companion to Attwood & Michie book | www.biochem.ucl.ac.uk/bsm/ dbbrowser/jj/prefacefrm.html |
|
San Diego Supercomputer Center | Biological Data and Analysis Tools | Philip Bourne | Fall 2001 | University course | Data modeling, ontologies, XML | www.sdsc.edu/pb/edu/pharm201/ |
Harvard University | Genomics and Computational Biology | George Church | Fall 2001 | University course |
Proteomics, metabolic kinetics
|
www.courses.fas.harvard.edu/ ~bphys101/lecturenotes/index.html |
University of British Columbia | Topics in Algorithms and Complexity - Bioinformatics | Anne Condon, Holger Hoos | Spring 20021 | University course; notes, not slides | www.cs.ubc.ca/labs/beta/Courses /CPSC536A-01/ |
|
University of Southern California | Computational Genome Analysis | Richard Deonier, Simon Tavaré, Michael Waterman | Spring 2002 | University course; looks like notes for a textbook |
Sequence material only HMMs
|
www.hto.usc.edu/temp_local/ bisc478/notes.html |
University of Manitoba | Bioinformatics | Brian Fristensky | Spring 2002 | University course; very detailed notes, not slides; reasonable introductory course | Linkage | www.umanitoba.ca/faculties/ afs/plant_science/courses/39_769/ |
Yale University | Genomics & Bioinformatics | Mark Gerstein | Fall 2001 | University course; online material for some lectures only | bioinfo.mbb.yale.edu/mbb452a/2001/ | |
University of Toronto | Proteomics and Bioinformatics | Walid Houry, Boris Steipe, guest lecturers | Spring 2002 | University course; online material for some lectures only |
Some proteomics; part of genomics course that encludes wet material
|
bioinfo.med.utoronto.ca/ Biochemistry/BCH2021S.html |
University of Colorado Health Sciences Center | Introduction to Bioinformatics | Larry Hunter | Fall 2001 | University course | Text mining | compbio.uchsc.edu/Hunter_lab/ Hunter/intro-course/ |
University of Massachusetts, Lowell | BioInformatics | James Lyons-Weiler | Fall 2001 | University course; online material for some lectures only | unavailable | |
Biotechnology Computing Facility, Arizona Research Laboratories | Computing Concepts for Bioinformatics | Nirav Merchant | 2000-01 | Programming languages | amadeus.biosci.arizona.edu/ ~nirav/ |
|
California State University at Los Angeles | Introduction to Bioinformatics | Jamil Momand | Spring 2002 | University course | www.calstatela.edu/faculty/ jmomand/Bioinformaticscourse.html |
|
Biotechnology Computing Facility, Arizona Research Laboratories | Presentations by David W. Mount | David Mount | 2000-01 | University course | amadeus.biosci.arizona.edu/~mount/ | |
Cold Spring Harbor Laboratory Press | Bioinformatics Online | David Mount | Current | Online companion to Mounts book | www.bioinformaticsonline.org/ | |
Carnegie Mellon University | Computational Biology | Robert Murphy | Spring 2002 | University course | www.cmu.edu/bio/education/ courses/03310/LectureNotes/ |
|
University of Washington | Computational Biology | Larry Ruzzo | Fall 2001 | University course; notes, not slides | www.cs.washington.edu/education /courses/527/01au/ |
|
SUNY Stony Brook | Advanced Algorithms (Computational Biology) | Steven Skiena | Fall 2000 | University course; notes, not slides | www.cs.sunysb.edu/~skiena/648/ | |
Tel Aviv University | Algorithms in Molecular Biology | Ron Shamir | Fall 2000 | University course; notes, not slides | Linkage | www.math.tau.ac.il/~rshamir/ algmb/01/algmb01.html |
Tel Aviv University | Analysis of Gene Expression Data,DNA Chips and Gene Networks | Ron Shamir | Spring 2002 | University course; notes, not slides; some slow to load & poor visual quality | Genetic networks | www.cs.tau.ac.il/~rshamir/ge /02/ge02.html |
University of Washington | Computational Biology | Martin Tompa | Winter2000 | University course; looks like notes for a textbook |
Many specialized sequence topics |
www.cs.washington.edu/education /courses/527/00wi/ |
Boston University | DNA and Protein Sequence Analysis | Zhiping Weng | Fall 2001 | University course | sullivan.bu.edu/be561/cal.shtml | |
Tel Aviv University | Introduction to Bioinformatics Course | Racheli Zakarin,Hanan Stein | Spring 2002 | University course | www.tau.ac.il/~hanans/course _spring_2002.html |
|
KISAC (Karolinska Institute) | KISAC Bioinformatics | Spring 2002 | Five-day course | kisac.cgr.ki.se/kisac/education/ courses/KIspring2002-KISAC/ |
||
Swiss Institute of Bioinformatics | Introduction to Bioinformatics | Fall 2002 | One-week course; slides from 2001 online now | www.ch.embnet.org/CoursEMBnet/ Pages02/Material.html |
||
Human Genome Mapping Project | Introductory Bioinformatics | July 2001 | Training course | www.hgmp.mrc.ac.uk/Courses/ Intro_3day/ |
||
Resource Centre Universität Bielefeld | Virtual School of Natural Sciences BioComputing | 1995-96, mostly | Heavily cited, but quite old; lots of material | www.techfak.uni-bielefeld.de/ bcd/welcome.html |
READING LIST FOR THINKERS
Andreas D. Baxevanis and Francis Ouellette. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Wiley-Liss; ISBN: 0471383910; 2nd edition (April 6, 2001).
David Mount. Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory; ISBN: 0879696087; 1st edition (March 15, 2001).
Dan Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press; ISBN: 0521585198; 1st edition (January 15, 1997).
AND FOR YOUR NEXT COURSE …
S-Star.org
http://s-star.org/
GeneEd
http://www.geneed.com/
Stanford Center for Professional Development
http://scpd.stanford.edu/
Bioinformatics Institute of India
http://www.bioinformaticscentre.org/