They say you can’t go home again, but Victor Markowitz is hoping to prove that adage wrong. After leaving the data management research and development group at Lawrence Berkeley National Laboratory in 1997 to join Gene Logic as CIO, Markowitz is heading back to the public sector, and hoping to bring a little bit of hard-won industry know-how with him.
Markowitz is currently in the process of setting up the newly formed Biological Data Management and Technology Center (BDMTC) at LBNL — a “virtual” center that will provide informatics expertise for life science researchers at the lab, as well as other Bay Area organizations, such as the DOE’s Joint Genome Institute, the University of California Berkeley, and UC San Francisco. In addition, Markowitz said the BDMTC is taking steps to become affiliated with the Institute for Quantitative Biomedical Research (QB3), a multi-campus effort based at UCSF, with additional facilities at UC Berkeley and UC Santa Cruz. BDMTC also plans to seek partnerships with IT companies “with a strong interest in the life sciences,” Markowitz said.
The goal of the center, according to Markowitz, is to bring a new level of “professional” data management support to research groups in Northern California that have “little experience handling large amounts of data.” In addition, he said, the center will work with academic software developers to turn their innovative algorithms into “robust, maintainable tools.”
Markowitz said that jumping from the public sector into industry presented him with the classic grass-is-greener scenario that typifies the field of bioinformatics: “In industry, there is a strong focus on developing quality products, but the price of failure is too high, so it’s not a good environment for research. But academia is the reverse: Because you can go in all kinds of [research] directions, the practice of developing robust systems is sometimes lacking. So I posed the question of whether it’s possible to do both.”
Last fall, as Gene Logic’s focus moved away from software and database development, and funding agencies such as the NIH and DOE began pledging support for large-scale informatics infrastructure projects, Markowitz decided the time was right to strike out and build what he describes as a “bridge” between those two worlds.
He negotiated an arrangement in which LBNL’s Computational Research Division will host the center, and will provide access to the DOE’s National Energy Research Scientific Computing Center as well as available computational resources at the organizations that the BDMTC supports. Funding will come from the center’s collaborators. Markowitz said his group is already collaborating with the JGI on a project to build a data resource for its environmental sequencing program, in which entire communities of organisms are sequenced at once. JGI director Edward Rubin told BioInform that the BDMTC’s capabilities are sorely needed. As the institute has ramped up its sequencing capacity — it now generates about 2 billion bases a month — “in some ways we’ve overgrown our capabilities and need more of an industrial model, both in engineering as well as in data management,” Rubin said.
BDMTC is also included on a grant proposal submitted by UC Berkeley under the NIH’s National Centers for Biomedical Computing program. If the proposal is accepted, BDMTC would provide support for data management and software development, and would account for a quarter of the total award.
BDMTC’s operational model is unique within bioinformatics, and Markowitz said he’s working hard to impress upon potential collaborators that the center “wants to be seen as a part of what they are doing … as an extension of their existing capabilities.” Dismissing any comparisons to a public-sector consulting team as carrying “a negative connotation,” Markowitz stressed that BDMTC is aiming for a “more symbiotic” relationship with its partners than consultants generally provide.
And if it’s symbiosis that Markowitz wants, he’ll be getting a healthy dose of it in his first project with the JGI. “One of the things we’re very interested in is managing this new data set of sequence that comes from sequencing environments,” Rubin said. Unlike individual organisms, “the identifiers are much more complex” for communities, and include additional factors like pH, temperature, and location information. For the data to be useful, Rubin said, “we’re going to need to query it in ways that don’t fit with GenBank and normal ways of displaying sequence data. So we’re looking forward to working with this center to be able to capture community sequence data.”
JGI already employs around 50 bioinformaticists — around one-third of its entire staff, Rubin said — but the BDMTC will provide expertise in large-scale data management that will free up the JGI informatics staff for research, he said.
Markowitz said he’s seeking “highly skilled professionals” in the areas of software engineering, database modeling, data warehousing, and other fields “more commonly encountered in industry” to join the center’s staff, which only numbers a few people right now, but will be built out as collaborative projects increase.