By “optimizing bang for the buck” in genomic computation, a geek-chic open source operating system earns corporate acclaim.
by Oliver Baker
Leading a tour of Incyte Genomics’ data center, Steve Lincoln strides around as if on springs. Over the rumble of chilled air rushing through perforated paneling he shouts, “This room cranks!”
Lincoln, 36, is vice president for bioinformatics R&D at the Palo Alto, Calif., company whose business plan revolves around packaging and selling human genome data.
Inside the white chamber, sleek boxes crowd racks that line walls and cross the floor. The computers sift seven million gene transcripts from 1,000 cell libraries, consulting every DNA and protein sequence in the world’s archives. They are the workhorses of Incyte’s effort to identify every gene in the human genome.
Meanwhile, across the country in Rockville, Md., the supercomputer installation at Incyte’s main rival, Celera Genomics, is getting all the glory. Celera’s behemoth has been touted as the most powerful in use worldwide for multiple bioinformatics applications.
Lincoln’s sidekick Stu Jackson, Incyte’s bioinformatics director, shakes his ponytail with a weary expression. For the sort of analysis Incyte does, its machines could smoke Celera’s, he says. “What we have is probably five times theirs. At least double.”
Not to mention much cheaper. Incyte runs its core computer cluster on Linux. In other words, this company with a market capitalization of $3 billion is built around a piece of free software. Incyte says it has no software service contract for the Linux farm and gets tech support free from strangers in cyberspace.
Rather than build a homogeneous set of “$20,000 per CPU boxes” Lincoln says he buys “$1,200 per CPU boxes” that do nearly as well for 99 percent of the company’s tasks. “That’s how we built a system that’s much faster than Celera’s and costs a hell of a lot less,” Lincoln says.
Randy Scott, Incyte’s president and chief scientific officer, goes so far as to suggest that, while his company certainly isn’t giving away intellectual property, its business model is “fundamentally Linux-like.” By licensing genomic data nonexclusively, and requiring clients to license derivative products to each other, Incyte is after something akin to open-source development, he contends.
Scott says the analogy occurred to him only after Incyte embraced Linux. Perhaps that’s why he calls Lincoln not just a “technogeek” but also “one of our gurus.”
Damn Big Cluster
Linux acts like a full-blooded member of the Unix family of operating systems. But unlike most of these, it runs on the Intel chips of PCs. Like Unix, it’s intrinsically friendly to networking, and can subdivide tasks in ways that a conventional PC-oriented operating system such as Windows can’t. In effect, it can keep more balls in the air at one time.
Linux can also take advantage of the broad repertoire of applications written for doing distributed or parallel computing with a network of machines ¯ because virtually all of these were written for Unix. As a result, with a network of PCs running Linux, one can achieve supercomputer capabilities with commodity parts.
Marshall Peterson, Celera’s vice president of infrastructure technology, says he admires Linux too, but hesitates to rely on it. “If there’s a problem, I’ve got to get somebody who has a commercial stake in getting me an answer,” Peterson says. “I don’t want to go to my CEO and say, I’ve been searching the Internet but nobody’s been able to give me an answer.”
Peterson says Celera runs Compaq’s Tru64 Unix on Compaq boxes, which together hold almost a thousand Alpha processors. These chips are widely thought to be the fastest (and most expensive) available.
Lincoln says Incyte has about 3,000 Intel Pentium processors in 1,200 boxes, which are just so many toasters to the serious IT shopper.
Tom Slezak, bioinformatics team leader at the US Department of Energy’s Joint Genome Institute in Walnut Creek, Calif., says he would have been interested to see Incyte pit its computers against Celera’s in a race to assemble the human genome. “I guess I would have put my bet on Celera,” says Slezak. “If you’ve got money put into a commercial system, that’s probably going to get the job done.”
But validation of Incyte’s approach exists throughout the world of genome analysis. Darrell Ricke, a bioinformaticist at the Novartis Agricultural Discovery Institute in San Diego, says that even if few companies have popped their hoods for others to peek, Linux is becoming commonplace in life sciences research. He says Linux clusters exist or are under construction at many facilities owned by big pharmaceutical and bioscience companies, including his own.
Even Celera’s Peterson says that, now that he has breathing room, he’s looking into Linux. When Celera had to build an analysis infrastructure from scratch in early 1999 there was no time to spare, and he couldn’t afford to take risks, “no disrespect to Linux,” he says. Peterson predicts that Celera will soon be running Linux on a cluster of either Alpha or Pentium machines, depending on what his own in-house testing proves.
Greg Lindahl, a computer systems architect for High Performance Technologies in Reston, Va., says Incyte made a bet that it could write software to handle large sequence databases with the low RAM of Pentium chips and won. He applauds Lincoln and Jackson for being ahead of the curve, for seeing the cost-efficiency of Linux and networked PCs, and for building what he calls a “damn big” cluster.
Incyte’s Jackson, clad in flip-flops, shorts, and an untucked golf shirt, says he began dabbling with Linux about four years ago, when it was still considered a nutty idea in corporate computing circles. Lincoln recalls his boss at the time slamming his fist on a table and exclaiming, “Over my dead body will we run a production system at Incyte on a freeware operating system!”
Scott Clarke, Incyte’s former chief information officer who is now chief operating officer at BioSpace, admits he was concerned at the time about compliance with US Food and Drug Administration data collection standards. Commercial software had undergone extensive testing to meet those standards, and Linux still hasn’t, he notes.
But as Incyte narrowed its goals to non-FDA-regulated research, Clarke says he relented. He ceded some resources, Jackson scrounged others, and Jackson and Lincoln were off and running on their basement project.
Nowadays, Incyte’s upper management couldn’t be happier with the infrastructure these guys set up, says Randy Scott.
Fedoras, Fezzes and the Mom-and-Pop PC Shop
Lincoln and Jackson say they saw Linux in Incyte’s future when they tested a 20-PC Linux cluster in Jackson’s office. Using staple sequence alignment algorithms such as PHRAP and BLAST, they benchmarked their conjoined single Pentium-II-processor machines against the Alpha machines on the company’s analysis production line. They coaxed the Pentium chips to perform at four-fifths the rate of their Alpha-4100s, for which Incyte had paid as much as 10 times the price.
Even after the benchmark, they say it took politicking to win broad support for Linux around the office. The software team, for instance, feared a PC cluster would be hard to program. But, Lincoln says, “When you sat down and said, look guys, this is gonna let us take our computer budget and build something ten-fold more powerful, they got excited.”
The pair chose the Linux kernel and supporting software package distributed by Red Hat of Durham, NC, which offers free downloads, and sells CD-ROMs, phone support, and red felt fedoras. Jackson says that the hats were key in their campaign. Not long into it, Linux partisans phased in fezzes to augment the fedoras.
But the main objection to the Linux cluster project wasn’t about software, Lincoln says. It was, “What’s it going to take to support it?” He responds to that argument now with relish: “We have exactly one person who maintains 3,000 computers.”
That one person is helped out by the local vendors that build Incyte’s boxes and keep extras in stock. Whenever a box winks out, the vendor delivers a new one and repairs the failed one, Lincoln says. And by “vendor” he doesn’t mean Dell, IBM, or even Circuit City. He means Silicon Valley mom-and-pop PC shops. Tell one of them you want to buy 1,000 computers and “they’ll dote over you all you want,” he says.
According to his calculation, service is free because shops that provide it charge the same as those that don’t. Telling a vendor you’re happy to take your business elsewhere is one of the pleasures of the commodity-hardware approach that Linux makes possible, Lincoln says, launching into an animated soliloquy on free-market advantages of Linux clustering. As he talks, his eyelids flutter, semi-closed, as if he is channeling another plane.
The mom-and-pops suffice for service, says Jackson, because all of the system’s vulnerabilities reside in the Pentium boxes. One daytime systems administrator is enough, he says, because even the failure of a whole rack is but a paper cut to the 1,200-box system. If a Linux box dies during the night, it can wait until the next day.
When the time comes, a “magic floppy diskette” simplifies the task of bringing a replacement computer online. It holds an Incyte-modified version of Red Hat’s KickStart program. Pop it in, Lincoln says, and the computer boots, downloads the operating system from another server, and installs it. In 20 minutes it’s running BLAST jobs. “No human intervention,” he says.
A particular moment of glory for the magic floppy disk occurred one Friday afternoon. The deadline for a massive task was Monday. Meanwhile, stacks of cardboard boxes containing new computers stood in the machine room.
A three-man squad donned red commando berets and charged down to the machine room carrying large knives, Lincoln recalls. They slashed through the packaging, carried each box to a rack in the data center, plugged in power and network cords, inserted the magic floppy, and turned them on. The deadline was met. “I wish I had a picture,” he says.
Nerds on the Net
Still, Jackson and Lincoln admit, there are some problems a magic floppy and a lone systems administrator can’t solve. For instance, creating the magic floppy in the first place, or tailoring the software that brokers cooperation among a plethora of PCs to a particular hardware mix and configuration.
These are things for which Celera’s Peterson says he ungrudgingly pays to have Compaq at his side.
But Lincoln says that “thousands of nerds on the Net,” are at Incyte’s side. When a problem comes up with Linux or a Linux clustering application, he says, “I can get the guy who wrote a piece of software answering my questions. That’s never going to happen if I dial Sun’s 800 number.”
In particular, Lincoln says that Incyte has benefited from e-mail exchanges with the US National Center for Biotechnology Information in Bethesda, Md. The center’s BLAST tools were developed with an eye to Linux, and helping people run it is part of the institution’s mandate. “NCBI is an active part of the open source community,” he says.
So why don’t more companies rely on the online community of Linux developers? “FUD,” says Jackson. “Fear, uncertainty, doubt.”
Lincoln cites the adage “No one ever got fired for buying IBM,” then coins his own: “If your goal is to kick butt you’ve got to take chances.”
Incyte Loves Linux
On the data center tour, Lincoln traces the evolution of his Linux cluster: from single-Pentium PCs to compact “two-way boxes” to his newest sleek, horizontally mounted boxes that hold four processors each. Pointing to one he says, “This is what we call, crassly and with apologies, the 69 configuration.”
Two two-way boards are sandwiched head-to-toe. Front and rear panels of the three-inch-thick boxes sprout cables. A rack of them packs 96 processors into four square feet of floor space.
Incyte fills new racks about once every other month. Switching to notebook-computer boards would allow processors to be added at double density, Lincoln says. This would be cool, but there’s no rush, he adds. For now, Incyte’s got the floor space.
“Optimizing bang for the buck” is what this approach is all about, he says. Four-way boards and top-model Pentiums would violate the principle, as would assigning jobs to Pentiums that run more cost-effectively on Alphas. Already, assembly jobs involving more than 5,000 sequences are automatically diverted to the data center’s Compaq boxes.
To be sure, the reasons Incyte loves Linux go a little deeper than dollars. Lincoln and Jackson don’t try to hide their nerdy preference for the “roll-your-own” approach. But, citing homemade robots and off-the-shelf sequencers that the company’s teams have reengineered, Lincoln observes that the attitude has served the company well.