Skip to main content

Project Annotation


Researchers who study well-documented creatures like fruit flies or E. coli are blessed with the most powerful tools of biology at their fingertips. Armed with data from genome sequences, transcriptomes, ontologies, and pathways, the goal of putting pieces together in a network of interacting genes can be addressed pretty quickly.

But what if you study something else, like corn or wheat? The outskirts of biology tell a different story. Once you venture away from the well-annotated references in the model organism world, the set of research tools gets sparse. The databases look less and less complete. Annotations, if you can find them, are more shady, and words like "hypothetical" and "putative" appear around every corner. And those tools that we enjoy in the civilized world like GO ontologies or KEGG pathways? Good luck. Your genome sequence, if you have one, probably won't even merit a track at the UCSC genome browser, much less a current build.

So what do you do? Work on something else and wait for your 'omics collection to catch up? Or is there another answer?

The student force

One strategy is to seek help. Some college instructors have realized that education and annotation can coexist. They've found that involving students can speed the data annotation process and train future biologists at the same time. Rather than waiting for database curators to catch up on an ever-growing mountain of work, researchers benefit when lots of students are added to the labor pool.

This summer, I attended the iPlant Genomics Education conference in St. Louis and listened to talks about students' contributions to the world of annotating 'omics databases. Instructors spoke passionately about the results they saw from integrating the realms of teaching and research.

We learned about many kinds of annotation projects. Charles Hardnet from Spelman College showed electron micrographs of the phage that his students isolated and annotated through their school's partnership with HHMI. For the bacterial projects, Derek Wood, Brad Goodner, Daniel Rhoades, and Steve Slater from colleges across the US described a joint endeavor where students carried out functional annotation by using bacterial mutants to test whether genes can correct deficiencies in nutritional pathways. Cheryl Kerfeld from the Joint Genome Institute discussed a program where 65 bacterial and archaeal genomes can be adopted and annotated by college classes. Representing insects, Sarah Elgin of Washington University discussed a national partnership where students assemble and annotate genes from different species of Drosophila.

We also heard about annotation projects in plants. Brent Buckner from Truman State University described the use of yeast mutants to identify the functions of plant genes. Sue Wessler and Jim Burnette from the University of Georgia discussed annotating plant transposons. Iowa State's Volker Brendl presented a system where the entire community can use cDNA and EST evidence to evaluate and correct structural annotations that have been flagged as suspicious in the Plant Genome Database. And Uwe Hilgert of Cold Spring Harbor Laboratory presented a vision of the ultimate annotation pipeline.

Many iPlant attendees seemed to agree that the ultimate annotation pipeline would allow a biology class to check out a genome, review evidence, submit corrections, and give students some kind of visible credit so they could demonstrate their contribution to others.

The downside

Although the idea of combining useful work with directed studies might seem rather obvious, not everyone shares the belief that engaging students in real annotation work is a good thing.

From the researcher perspective, the major concern with having students annotate gene sequences is quality. They anxiously point to past horror stories of poor quality or misidentified sequences, like dinosaur DNA, contaminating GenBank. This group is wary of student work and they want the barriers high lest an incorrect annotation lead them down a wrong research path.

Certainly, these researchers are right to be wary, but this attitude shouldn't keep students from contributing. After all, many of the suspicious entries came from other researchers and were reviewed by their peers. The dinosaur entry (U41319), for example, was published in 1997 in Molecular Biology and Evolution. Further, many entries aren't supported by experimental evidence; they're predicted by algorithms.

If anything, teaching students how to review the evidence by having them annotate data could help demystify the whole database business and make future researchers a little more skeptical. Students would learn that "buyer beware" and reviewing evidence with a critical eye are good policies to follow no matter where your data originates.

Given the number of students in some courses and the push for cost-effective solutions, some skepticism from the professional researchers might be justified. It is difficult to imagine that students in an undergraduate class with 200 or more students, five teaching assistants, and a very tight lab schedule would be able to finish annotation projects or make contributions that wouldn't have to be double- or triple-checked. In fact, the National Science Foundation's Vision and Change survey of undergraduate biology students conducted during the past two years found one of the most common complaints was that their schedule doesn't include time for repeating labs or debugging mistakes. Perhaps a system could be adopted where professional curators would spot-check student work and assign reliability ratings (behind the scenes, of course) to classes from different schools.

In another argument, some iPlant educators shared stories of peers, criticizing them for "using their students as technicians," exploiting their students, and suggesting that they were allowing their students' education to suffer by having them do research. Still other instructors worry about the competition for time. Research projects are time-intensive endeavors and there's considerable pressure to pack as many topics in a course as possible. Would the time required to do a real project push other important topics out of the curriculum?

The concern about time is legitimate. Lab activities are often used to reinforce concepts from the lecture portion of the course. Some students learn best through hands-on learning, and these students could suffer if certain kinds of activities are lost.

Does the possibility of a few mistakes slipping by reviewers and contaminating databases mean we keep students from participating in annotation projects? Does the possibility that their learning experience might change — or that change might be hard — mean we should keep things as they are? To paraphrase Paul Farmer: this is not the end of the conversation. This is the beginning.

The case for inclusion

One of the biggest drivers for including students in annotation projects comes from the students themselves. The NSF Vision and Change student survey found that students want more of an emphasis on applying knowledge and problem-solving. They want more chances to do research and learn how research is done. They want to work with real data and practice evaluating scientific evidence. What they do not want are courses full of memorization that feel disconnected from "real world" science.

Although incorporating research means that learning opportunities will change, the benefits to students can be many. These projects can give students a chance to practice wet lab and bioinformatics techniques and give them better preparation should they pursue a career in life sciences or biotechnology. Students relish the chance to do something more meaningful than writing a lab report. They want to "publish" and contribute to the community. I'm glad there are instructors giving some of them a chance.

Sandra Porter, PhD, is president of the bioinformatics education company Digital World Biology. Her blog is located at

Filed under

The Scan

Pfizer-BioNTech Seek Full Vaccine Approval

According to the New York Times, Pfizer and BioNTech are seeking full US Food and Drug Administration approval for their SARS-CoV-2 vaccine.

Viral Integration Study Critiqued

Science writes that a paper reporting that SARS-CoV-2 can occasionally integrate into the host genome is drawing criticism.

Giraffe Species Debate

The Scientist reports that a new analysis aiming to end the discussion of how many giraffe species there are has only continued it.

Science Papers Examine Factors Shaping SARS-CoV-2 Spread, Give Insight Into Bacterial Evolution

In Science this week: genomic analysis points to role of human behavior in SARS-CoV-2 spread, and more.