By Aaron J. Sender
It’s not unusual these days for a talented grad student to develop software to automate his labmates’ work. Yet when Jin-Long Li, now a postdoc at Creighton University in Omaha, submitted a paper to Genome Research describing his software for genotype data, he got a call from an incredulous reviewer. “He asked me very seriously,” Li recalls, “‘Are you sure that Microsoft Access can manipulate all this data?’”
But Li has managed to push the run-of-the-mill software to its limits, transforming it into an automated genotype management system, called GenoDB, that allows his cohorts to spend more time conducting experiments and analyses and less time whipping the data into shape.
Creighton researchers Robert Recker and Hong-Wen Deng began whole-genome scans for genes linked to osteoporosis two and a half years ago. It wasn’t long before they had a data-handling problem. They were generating hundreds of thousands of pieces of genotype and phenotype data. “To keep it all straight, doing it by hand is tedious and almost impossible,” says Recker. In fact, the lab’s 10-plus students and technicians were spending more than 80 percent of their time managing the data. “It’s a big informatics problem. Every laboratory that does this work struggles with it and there aren’t a lot of commercial solutions,” says Recker.
So they took matters into their own hands. Li — then a PhD candidate in their lab who was also earning a master’s in computer science — was charged with building an automated database management system to widen this bottleneck to high-throughput genotyping. He turned to Microsoft Access, which Li says, despite some limitations, offers a small university lab exactly what it needs. “It’s cheap, commonly available and it’s portable, so everybody in the laboratory can use it,” Li says. The raw genotype data is simply converted into an Excel file and dumped into the Access database.
GenoDB, designed to process data generated from microsatellite markers, has four main automated features: comparison of data from experiments run by different individuals, adjustments for discrepancies of allele size from experiments run on different gels or machines, automatic classification of the alleles by fragment size, and compilation of the data into Excel files that can be checked against hereditary patterns. Furthermore, when experiments are repeated GenoDB automatically checks whether a corresponding file already exists.
After GenoDB was installed at the Creighton lab, time spent on data management dropped to 30 percent. But as more and more data were produced, Microsoft Access began bursting at its seams. “When you manage lots of data the speed is very slow and sometimes the program crashes,” says Li. To deal with this problem researchers divided their set of 250,000 genotypes into 28 subclasses, and have set up a separate GenoDB for each one.
The software may have its limitations, but Li points out that what the lab has made available free to academic researchers is only version 1.0. He is now working on extending his GenoDB algorithms and modules to more robust database platforms, such as Microsoft SQL server and Oracle.
As is, GenoDB may not hold up to standards of large commercial genotyping projects, but serves a need for academic labs with tight budgets. “It would seem useful to have different levels of data management software,” says Recker. “And this provides a level of data management software which would be useful for university laboratories and smaller industry laboratories.”