Skip to main content
Premium Trial:

Request an Annual Quote

2002 KDD Cup Winners Include Newbie Partnership of Celera and ClearForest


Beginners luck? As first-time participants in the Association for Computing Machinery’s Knowledge, Discovery and Data Mining Cup competition, Celera Genomics and New York-based data mining company ClearForest took home a first-prize trophy in one of two categories last week.

The Celera/ClearForest team beat out 32 other participants in a competition to build a system that would automatically curate thousands of scientific articles on Drosophila melanogaster for the FlyBase database. The data mining algorithms had to accurately indicate which articles included results on expression of gene products, and which genes and proteins were involved.

Adam Kowalczyk and Bhavanni Raskutti of Australia’s Telstra Research Laboratories won the second KDD Cup competition, in which 52 teams used Medline abstracts to predict the effect of knockout genes on different sub-cellular components in yeast cells.

The eighth annual KDD Cup, held in Edmonton, Canada, and co-chaired by Mark Craven of the University of Wisconsin and Alexander Yeh of Mitre Corporation, focused on data sets in biology for the second year running. “Biology is especially compelling because it has become a very data-rich field in the last few years,” Craven said previously [BioInform 05-06-02].

Barak Pridor, CEO of ClearForest, agreed that the biological domain offered several “unique challenges” that the company hadn’t encountered in its previous work in competitive intelligence, intellectual property research, and federal intelligence applications. Taking a step away from the needs of current customers such as Kodak and Dow Chemical, ClearForest drew on the domain expertise of several researchers from Celera to gain the edge it needed in its first KDD Cup appearance, Pridor said.

The two companies had an “existing relationship” prior to their involvement in the competition, Pridor said, but he was unable to provide further details of their collaborative efforts.

ClearForest’s approach to text mining combines three common methodologies: statistical analysis, structural analysis, and semantic analysis. Pridor said the company’s natural language processing technology draws primarily from the latter category to assess the patterns among textual entities, events, and facts, but also includes statistical and structural elements.

The key, Pridor said, is knowing when to apply which type of method. For example, he noted, gene-based information can be extracted using a controlled vocabulary such as the Gene Ontology, but information about proteins requires a discovery-based approach because the goal is to uncover previously unknown relationships. The ClearForest/Celera team pooled their expertise to apply the best combination of tools to the problem, he said.

ClearForest is mulling commercialization options within the life science sector for its technology, but was mum on whether Celera would play any part in this effort. ClearForest has already seen “considerable interest” from the biotech and pharma community following its success at the KDD Cup, Pridor said.

— BT

Filed under

The Scan

Genetic Risk Factors for Hypertension Can Help Identify Those at Risk for Cardiovascular Disease

Genetically predicted high blood pressure risk is also associated with increased cardiovascular disease risk, a new JAMA Cardiology study says.

Circulating Tumor DNA Linked to Post-Treatment Relapse in Breast Cancer

Post-treatment detection of circulating tumor DNA may identify breast cancer patients who are more likely to relapse, a new JCO Precision Oncology study finds.

Genetics Influence Level of Depression Tied to Trauma Exposure, Study Finds

Researchers examine the interplay of trauma, genetics, and major depressive disorder in JAMA Psychiatry.

UCLA Team Reports Cost-Effective Liquid Biopsy Approach for Cancer Detection

The researchers report in Nature Communications that their liquid biopsy approach has high specificity in detecting all- and early-stage cancers.