Researchers at the Institute for Research in Biomedicine in Barcelona, Spain, have launched what they claim is the largest library of molecular dynamics trajectories for proteins — simulations of the molecules' 3D structures that reflect changes in shape over time.
The Molecular Dynamics Extended Library, or MoDEL, was launched last month and is the culmination of a four-year project led by Modesto Orozco, a professor of biochemistry and molecular biology at the University of Barcelona who heads the molecular modeling and bioinformatics group at IRB.
"Nowadays we design drugs as if the proteins against which they are to act were static, and this goes a long way to explain failures in the development of new drug therapies because this is not a true scenario," Orozco said in a statement. "With MoDEL, this problem is solved because it offers the user from 10,000 to 100,000 photos per protein, and these confer movement to these structures and allow a more accurate design."
A paper describing the project was published last month in Structure.
In the paper, the researchers write that MoDEL contains tools to automatically set up molecular simulations and to validate trajectories. Also included is a data warehouse, comprising a relational database and the underlying trajectories database, as well as tools for analysis.
Josep Lluís Gelpí, a professor in the biochemistry and molecular biology department at UB and a member of the IRB research group, told BioInform this week that the goal of the project was to "add movement" to protein structures deposited in the Protein Data Bank.
The new database not only includes the structure of the protein as represented in PDB, but also offers "a long enough simulation of the life of the protein" to see how the structure "accommodates different shapes and how flexible [it] is in order to be able to recognize different kinds of molecules."
By combining thousands of snapshots of the proteins, the researchers were able to create moving images of nearly 2,000 proteins culled from the PDB.
Gelpí pointed out that although PDB contains "a good number of protein structures" that have been gathered over the years, the images are "static" and do not account for things like flexibility and possible changes to the proteins' shapes, which are crucial to understanding their interactions with other molecules — particularly drug molecules.
A similar project focused on modeling protein stability, function, and folding based on data from the PDB is underway at the University of Washington, led by Valerie Daggett, a professor of bioengineering (BI 7/7/2008).
That database, called Dynameomics, aims "to characterize the native state dynamics and the folding and unfolding pathway of representatives from all known protein folds by molecular dynamics simulation," the developers state on their website, adding that they are continuing to include other proteins in the database.
So far, the Dynameomics group has performed nearly 11,000 simulations of more than 2,000 proteins for a combined simulation time of more than 340 microseconds, though the site only contains simulations for the group's top 100 targets.
For this first phase of the MoDEL project, Gelpí said that the team selected images of monomeric proteins from PDB because these were "the easiest to simulate."
The paper states that to produce the target list of proteins, the group selected proteins "with less than 90 percent sequence identity with other proteins " and then eliminated proteins that contained gaps in the structure, multimeric proteins, proteins with nonstandard residues, and proteins containing ligands.
Although the team only took a small subsection of the PDB's 40,000 structures, the developers note that the selected proteins probably represent about 40 percent of the total since proteins with similar sequences often have similar structures.
As a next step, the scientists created trajectories for some proteins spanning a 10-nanosecond time scale, which Gelpí noted was considered sufficient when the project began four years ago.
However as protein simulation technologies have improved since then, the team has been able to create some protein simulations that span hundreds of nanoseconds, as well as a few on the microsecond scale, he said.
The proteins were simulated using the Assisted Model Building with Energy Refinement, or AMBER, software, although Gelpí noted that the team has used and currently uses other types of simulation software.
The simulations are stored in a MySQL database that occupies about 20 terabytes on IRB's server.
Researchers can query the database for simulations by keywords, protein sequence or identification number, or by protein fold.
Currently, Gelpí said that the developers provide a compressed version of the simulation files that offers users a "90 percent equivalency of the original trajectory."
That's because the files — which vary in size from 1 gigabyte up to hundreds of gigabytes — are too large to download by researchers who want to use the simulations for their in-house research.
Gelpí explained that to collect the images of proteins in the ten-nanosecond scale, for example, the team took snapshots of the protein's evolution every picosecond, resulting in 10,000 snapshots. Furthermore, these snapshots include images of the water medium surrounding the proteins, which is a major factor in the large file size.
The compressed flies were created using the PCAzip toolkit from the Centre for Biomolecular Sciences at the UK's University of Nottingham. The MoDEL team reduced the file size even further by excluding unnecessary details such as the images of the simulation medium.
The team also provides services, using a suite of in-house tools that are part of its MoDEL Professional Platform, to groups looking for custom simulations of proteins that aren’t currently in MoDEL, or that require different environmental conditions for the simulations among other research needs.
These tools include MDGrid, a tool for simulating protein channels and the Classical Molecular Interaction Potentials, or CMIP, software which analyses macromolecular properties using three dimensional grid representations.
A third tool, Molecular Dynamics Web Server, or MDWeb, provides standard protocols to prepare structures, run molecular dynamics simulations, and analyze trajectories.
For their next steps, the group plans to add new simulations, especially for multimeric proteins, proteins with ligands that are not well represented in the PDB, proteins that can used as targets for drug design, as well as specific groups of proteins like kinases or transcription factors.
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.