Skip to main content
Premium Trial:

Request an Annual Quote

London s Natural History Museum Translates Card-Bound Species Data into Digital Resource


The Natural History Museum, London, is midway through a project to convert the wealth of data locked in its card archives into an online database of biodiversity and taxonomic information.

The museum is partnering with the University of Essex on the development of the database, and has enlisted the aid of Boulder, Colo.-based Parascript, whose FieldScript software is being used to translate legacy typewritten, handprinted, or cursive handwritten data into a digital format.

Malcolm Scoble, head of biodiversity at the Natural History Museum, said the first stage of the project involves converting data from 29,000 index cards on the Pyraloidea family of moths. He is pleased with the progress of the project so far, which would have required an estimated 430 man-years to re-type manually. While the current process is not entirely automated — a team of curators examines the data after its been scanned in and analyzed by FieldScript — the project is on track to complete the first phase of the database in 18 months.

The VIADOCS (Versatile, Interactive, Archive Document Conversion System) project team at the University of Essex is coordinating the IT side of the project. Andy Downton of the university’s department of electronic systems engineering said the challenges of the museum project are unique, rendering many optical character recognition packages unacceptable. In particular, the specialized Latinate vocabulary used to describe the specimens was difficult for many recognition packages to deal with. FieldScript, however, was able to identify and categorize the various fields on the cards and associate them with specific database fields with an acceptable error rate, according to Downton.

The VIADOCS team is also developing a web-based interactive verification tool for the project and currently houses the database.

Scoble said the museum intends to make the completed database part of the Species 2000 project at the University of Reading — a collection of 14 databases that currently catalogues over 220,000 species. Similar biodiversity informatics projects are on the rise worldwide, Scoble said. “There’s so much information stored on index cards in natural history museums across the globe and many of them are trying to get it accessible on the web now,” he said.

The UK’s Engineering and Physical Sciences Research Council and Biotechnology and Biological Sciences Research Council are funding the VIADOC project. Essex has received £125,000 ($175,100), while the museum has received £71,600.

— BT

Filed under

The Scan

Self-Reported Hearing Loss in Older Adults Begins Very Early in Life, Study Says

A JAMA Otolaryngology — Head & Neck Surgery study says polygenic risk scores associated with hearing loss in older adults is also associated with hearing decline in younger groups.

Genome-Wide Analysis Sheds Light on Genetics of ADHD

A genome-wide association study meta-analysis of attention-deficit hyperactivity disorder appearing in Nature Genetics links 76 genes to risk of having the disorder.

MicroRNA Cotargeting Linked to Lupus

A mouse-based study appearing in BMC Biology implicates two microRNAs with overlapping target sites in lupus.

Enzyme Involved in Lipid Metabolism Linked to Mutational Signatures

In Nature Genetics, a Wellcome Sanger Institute-led team found that APOBEC1 may contribute to the development of the SBS2 and SBS13 mutational signatures in the small intestine.