The European Patent Office has recently moved to make sequence data associated with biotechnology patents and patent applications more broadly available for patent examiners as well as the general public, according to Gérard Giroud, principal director in charge of search tools and documentation at the EPO.
The EPO is currently working with the US Patent and Trademark Office — as well as the European Bioinformatics Institute, the US National Center for Biotechnology Information, and the DNA Databank of Japan — “to enhance the amount of sequence disclosed in published patents through the public sequence databases. … [and] to create publication servers where the public could at any time download the sequences that have been disclosed in published patent applications,” Giroud told BioInform last week. The creation of such a worldwide repository would serve two goals, he said: First of all, it would provide patent examiners with immediate access to new sequence data to ensure “that we are not granting patents that should not be granted.” More broadly, he said, a comprehensive, publicly available patent sequence resource might help assuage public fears and correct misconceptions about biotechnology, with the goal of “making sure that biotechnology — and the role of patents in biotechnology — is well understood by the public.”
This ambitious project is still in its early stages, but the EPO has already taken a first step toward that larger goal by improving the efficiency of its own sequence search capabilities. After completing a thorough study on the applicability of available alignment algorithms to patent searching, the EPO decided to augment the publicly available Fasta algorithm that its examiners were using with Gene-IT’s GenomeQuest software — a new product that includes the GenePast algorithm that the Paris-based company published in Nature Biotechnology just over a year ago [BioInform 12-09-02]. This software will be installed at the EBI, which is supporting the EPO’s biotechnology search environment as part of a multi-year collaboration.
GenePast was designed especially for the intellectual property community, which currently relies on algorithms like Fasta, Blast, and Smith-Waterman to determine whether sequences of interest have already been published and, if so, what functions have been associated with them. But according to Gene-IT, the algorithms developed for biological research are poorly suited for IP applications because they rely on homology — the evolutionary relationship between sequences — rather than percent identity — the overall similarity of the sequence. Percent identity can detect sequences that may be overlooked by other methods, but would be of interest to IP professionals because patents are usually granted not just for the sequence itself but for all those closely related to it. “The concern of an examiner is primarily to say … whether a sequence exists in a database or has been published already,” Giroud said. “They are often less interested than a scientist in the probability of [a homologous] sequence according to the biological rules explaining the mutation of one sequence to the other.”
The complexity of traditional bioinformatics algorithms presents an additional hurdle for patent searching, according to Stéphane Nauche, an EPO official responsible for biotechnology under Giroud’s documentation directorate. “A person fully skilled in the art — a bioinformatician — could certainly achieve very good results with Blast or whatever by playing with the parameters and settings, but it’s not the purpose of the examiners to play with all the parameters of all the algorithms,” he said. “Our aim is to provide the best interfaces with easy working conditions for the examiners.” Considering that the office’s 250 or so biotech examiners must process an average of 30,000 submitted sequences a year, and that the rate of sequence-related patent applications is growing at around 10-15 percent annually, the EPO stands to gain a considerable amount of efficiency from the simplified user interface alone.
But performance was also a factor in the decision to license Gene-IT’s software. The EPO evaluated GenePast against Blast, Fasta, and Smith-Waterman using a test devised “to confirm that this algorithm could tell us if there is really a similar sequence available, and not only a probability of there being a similar sequence,” Nauche said. With the pressure on to “never grant [a patent for] something that is already known” — especially in the contentious landscape of biotechnology-related IP — Giroud said that it was crucial for the EPO to ensure it was not any missing prior art related to DNA or protein sequence information.
An Untapped Market
For Gene-IT, the EPO agreement is something of a coup, coming just as the company rolls out its GenomeQuest software package for the IP community — a potentially sizable user base that has been largely ignored by prior developers of sequence alignment methods. “We identified through the GenePast experiments … that there were significant unmet needs of end users of intellectual property search tools, so that forms the basis of the initial thrust of the GenomeQuest offering,” said Ron Ranauro, general manager of Gene-IT. Indeed, the IP community may prove to be a better commercial market for sequence alignment tools than the biological research community. If nothing else, biotech and pharmaceutical companies have a clear economic incentive to ensure that their IP rights are secure — both in terms of the freedom to operate within the context of existing patents, or in terms of determining the patentability of a novel sequence.
GenomeQuest was designed to appeal to both IP professionals and biologists within biotech and pharmaceutical firms, and includes the GenePast algorithm along with Blast, Smith-Waterman, and a proprietary fragment-search algorithm in a single software package with a consistent user interface. According to Ranauro, this combination of search features should help facilitate communication between research groups and IP analysts, who are currently separated from one another in most companies, despite the similarity of their goals.
The software interoperates with the Derwent commercial patent sequence database, as well as public and in-house resources, which allow users to search against multiple databases at once. Along with the software, the company also provides a sequence database that is updated on a nightly basis and currently holds around 40 million sequence records.
According to Ranauro, the market opportunity for the software is substantial. “You could say that every biotech, pharmaceutical, and academic research institution would have to have this as a core competence, whether it’s something that they outsource or run it in house,” he said.
Building a Global Resource
Gene-IT is also helping the EPO with its study of the feasibility of creating a single, global resource that would contain all the sequence data that is currently submitted to the world’s patent offices.
Giroud said that the study has only been underway for about six months so far, and “at least we have now a pattern of what should be improved in order to constitute a comprehensive database.” Primary challenges, he noted, include “disruptions” in the exchange mechanism that currently exists between the parties involved, as well as some discrepancies between the format required for patent depositions and for sequence repositories.
Sequence data is only the first — and possibly the least complex — of a wave of biological data formats that the EPO is anticipating in future patent applications. Nauche said that the office — in collaboration with the USPTO and JPO — is also preparing to handle 3D structural data, as well as SNP and haplotype information, and that these datasets will pose their own data management and search challenges.
Giroud acknowledged that it could take several years to reach agreement among the world’s patent and sequence data authorities on the various issues associated with a single sequence resource, but stressed that the EPO’s aim “is to become as complete as possible when it comes to searchable datasets.” After all, he added, while access to the GenomeQuest software is a marked improvement over the EPO’s previous methods, “What is the use of the best search tools if the data you’re searching is not complete?”