As the market leader in the microarray sector, Affymetrix deserves the credit (or perhaps the blame) for generating the majority of genomics data that researchers are wading through today. With indirect responsibility for keeping an extended family of software developers busy — if not employed altogether — any decisions Affymetrix makes in terms of its informatics strategy have repercussions throughout the bioinformatics community. BioInform caught up with Steve Lincoln, Affy’s vice president of informatics, to discuss the company’s relationship with commercial and academic software developers, and to find out a bit about its internal informatics development activities. A transcript of the interview — conducted by phone and by e-mail — follows.
In light of the partnership deal announced last week with Stratagene [BioInform 02.14.05], can you discuss Affymetrix’s current partnership strategy, and how it’s evolving? For example, when do you opt to partner with an outside party, and when do you choose to develop software in-house?
Stratagene has done a great job packaging some of the best primary analysis algorithms for Affymetrix gene-expression chips into a simple user interface that makes these complex statistical methods accessible to bench scientists. These analysis methods capitalize on the Affymetrix chip designs, where we employ multiple different probes for each gene to provide the best dynamic range, precision, and accuracy. In our deal with Stratagene, we get to provide that capability to our customers in an easy-to-use form at no additional charge, so that each of them will have straightforward access to the best available gene-expression results. Stratagene has the chance to upsell those folks to the remaining, further downstream analysis and visualization capabilities of ArrayAssist.
Of course, our deal with Stratagene is entirely non-exclusive, and other vendors, academic groups, and our internal folks will continue to innovate in these areas. More importantly, the expression values produced by ArrayAssist Lite are output in Affymetrix’s standard open file formats, so that all software packages, not just Stratagene’s, can access those data and perform downstream analysis using them. This means that the best signals can be easily fed into packages such as Spotfire, Resolver, Partek, and so forth, depending on each customer’s specific needs.
Flexibility and choice are without question a good thing, as the breadth of what our customers do with GeneChips dictates that a diversity of integrated solutions be available. Both we and Stratagene agree on this.
Affymetrix is coming out with a number of new products and application areas for its arrays — CGH, exon chips, higher-density mapping arrays. Can you provide an overview of how the company’s informatics development process keeps pace with these new offerings?
Recall that we, as an industry, have had many years to refine best practices in genome-wide gene-expression analysis, and we all appreciate how much that has improved the power and utility of gene expression microarrays. Probably the biggest challenge with some of the new Affymetrix array types is that the informatics methods will be an evolving area for some time. We and our collaborators have developed some really interesting and useful tools for exploiting these new kinds of arrays, and this software will ensure that the new chips are truly useful on the day they ship. However, I think we all agree that many of these methods are simply a great start in areas that will get a lot of attention from many groups over the near term.
Thus, our strategy has two parts: First we’re using rapid development methods to enable us to quickly release updated software and algorithms to our customers in the areas where the approaches will be evolving most quickly. Secondly we’re structuring these software packages so that the underlying technology components — that is, data files and analysis methods — can be re-used and integrated into other software that goes beyond what our own tools do. We also put a lot of time into documenting algorithms, and where we can even release source code, so that people can understand what we’re doing and build off of that. People familiar with the Affymetrix Developers Network resources on the internet [http://www.affymetrix.com/support/developer/index.affx] have seen a number of announcements about new tool releases which begin to do exactly this.
Affymetrix has begun to release quite a bit of its internally developed software under an open source license. Can you discuss the rationale behind that decision?
The technology components I mentioned are relevant not just to commercial software vendors, but also to academic software developers and to in-house programmers in our customer sites.
Open source has become a key part of our developer support strategy, where we are continually trying to reduce any barriers folks could have in building solutions around our GeneChip platform. Toolkit licensing was one such barrier, so we took steps to change that in a way which supports all three categories of programmers.
The academic, and generally open source, developer community these days plays an increasingly critical part in developing not only new methods for gene expression and SNP data, but also increasingly has produced some quite useful full-featured software packages. Examples such as BioConductor, GenePattern, Genetrix, Haploview, Merlin, Mega2 and many others are helping customers get the most out of our arrays. So we’ve been actively trying to help these folks develop code and “market” their tools, just as we have and continue to do with the commercial providers.
So does this change the way you deal with commercial software partners in any way?
Not in a major way, but subtly. We have about 250 licensees of our developer SDKs, and we have thousands of people on our Developer Network mailing list, including commercial, academic, and in-house programmers. We think our developer program has been shockingly successful.
Our newer technology component strategy, which I mentioned earlier, as well as some of our open source work, makes it easier for us to not just support outside developers, as we do, but also in some cases to collaborate more closely with them. This applies to both commercial and academic users.
Now that microarray-based diagnostics are hitting the market, what do you consider to be the primary informatics challenges for Affymetrix as it expands to serve both the clinical and research markets? How do the informatics needs of these two communities differ, and what is Affymetrix’s strategy for meeting these needs?
The requirements do differ, and while to date we’ve validated (and had approved) software with only modest changes between research and diagnostics, we’re rapidly evolving to a design where the application layer of the newer versions of our software will in fact be different in the two markets. In life science research, for example, a key requirement of the system is flexibility, and newer versions of our core software will greatly improve on the breadth of what customers can do with our software today.
By contrast, in diagnostics, the software user interface needs to focus on laboratory process control, which goes in the opposite direction. They’ll share a common validated kernel, but much of the user-interface code will be different.
How important will data standards be in accelerating the adoption rate for microarray technology beyond basic research? What — if any — standards efforts does Affymetrix support in this area right now?
Well, a former boss of mine and a very wise man named Scott Clarke [now president and CEO of Discovery Innovations] was fond of saying that the best thing about standards is that there are so many to choose from, and indeed never truer words have been said in the microarray space.
But again, I’ll remind you that different standards exist for different — and often good — reasons: For example, we support MIAME [Minimum Information About a Microarray Experiment] and MAGE-ML [Microarray Gene Expression Markup Language] for publication and uploading of data to public databases as a key customer need, but unfortunately MAGE is an inefficient way to transfer large data sets between microarray analysis packages. We’ve opened up our CHP and NetAffx formats as a highly efficient and standard way to accomplish this other need. We’re collaborating with Cold Spring Harbor and EBI on the DAS2 standards for gene annotations. We support the [Clinical and Laboratory Standards Institute’s] efforts in laboratory standards, and we’re big fans of what Rafael Irizarry has done with his Affycomp website [http://affycomp.biostat.jhsph.edu/] using standard benchmark data sets we provide.
Diagnostics brings a whole new set of established standards such as HL-7 [Health Level 7] and ICD [International Classification of Diseases], as well as new standards bodies such as the CAP [College of American Pathologists]. We work to pick the right hammers for each nail, and that’s what we recommend that others in the space do as well. At the same time we always try to talk to each other and learn from existing standards whenever possible to avoid reinventing wheels.