“Very often people in the mouse community or human genetics community consider sequence annotation to be a solved problem,” said Callum Bell of the National Center for Genome Resources. “But we’ve found that there are many, many scientific communities who are underfunded in the area of bioinformatics and they’re not able to do it for themselves.”
In response, Bell and his team of five NCGR colleagues developed an automated system for the storage, analysis, and visualization of DNA sequences that is housed behind a firewall at NCGR in Santa Fe, NM, which users can access remotely.
The software, called XGI for the X Genome Initiative, sprang from two prototypes: the Phytophthora Genome Initiative, first released in 1998, and the Medicago Genome Initiative, which was released in 2000. The reengineered version allows users to analyze and compare data from a variety of species and also offers a higher degree of flexibility and portability. An additional feature of XGI is automated annotation of sequence data through the use of Gene Ontology terms.
“There are things that many of us take for granted as being relatively easy in the bioinformatics world, but for users producing sequence, quite often they lack the computer experience to make that a smooth process,” said Bell. XGI was designed to automate as much of the process as possible. The system comprises an automated sequence analysis pipeline, a supporting relational database schema running on Sybase, and a web-based user interface. The sequence analysis step clusters ESTs into non-redundant sets, deriving consensus sequences in order to reduce the complexity of subsequent analysis, Bell said.
Users submit raw sequence data through the internet and are then able to access and search their processed data once it passes through the pipeline and is stored in the relational database. The data stream between the user and NCGR is encrypted for security.
Bell said the turnaround time for data processing depends on the volume of sequence, but NCGR provides a database mirroring system so that users can view existing data without interruption while it is being processed.
While the goal of the XGI project was to create a remote bioinformatics system so that users require only a web browser, Bell said it could also be installed locally to enable complete control over the system. NCGR is currently working on its first installed XGI system for Plant Research International, a non-profit research group based in Wageningen, the Netherlands. Plant Research International is collaborating with NCGR on development of the system and the two groups will share distribution of any new features.
The first user of the web-based system is another NCGR collaborator, the Samuel Roberts Noble Foundation of Ardmore, Okla. NCGR is also working with both the Syngenta Phytophthora consortium and a consortium of scientists also interested in Phytophthora on development of the system.
The Noble Foundation, the Novartis Foundation, and the USDA provided the funding to develop XGI, but Bell said NCGR is currently working out the best way to support its processing, maintenance, and customization costs. The group is trying to arrive at a coherent set of costs for non-profit and commercial customers, as well as a pricing system for local installation.
Last week, the NCGR also announced the full users’ release of its ISYS software to integrate bioinformatics software and databases. Free evaluation copies of ISYS can be downloaded from www.ncgr.org/isys.
XGI will soon be publicly available at www.ncgr.org/mgi.