Skip to main content
Premium Trial:

Request an Annual Quote

London s Natural History Museum Translates Card-Bound Species Data into Digital Resource


The Natural History Museum, London, is midway through a project to convert the wealth of data locked in its card archives into an online database of biodiversity and taxonomic information.

The museum is partnering with the University of Essex on the development of the database, and has enlisted the aid of Boulder, Colo.-based Parascript, whose FieldScript software is being used to translate legacy typewritten, handprinted, or cursive handwritten data into a digital format.

Malcolm Scoble, head of biodiversity at the Natural History Museum, said the first stage of the project involves converting data from 29,000 index cards on the Pyraloidea family of moths. He is pleased with the progress of the project so far, which would have required an estimated 430 man-years to re-type manually. While the current process is not entirely automated — a team of curators examines the data after its been scanned in and analyzed by FieldScript — the project is on track to complete the first phase of the database in 18 months.

The VIADOCS (Versatile, Interactive, Archive Document Conversion System) project team at the University of Essex is coordinating the IT side of the project. Andy Downton of the university’s department of electronic systems engineering said the challenges of the museum project are unique, rendering many optical character recognition packages unacceptable. In particular, the specialized Latinate vocabulary used to describe the specimens was difficult for many recognition packages to deal with. FieldScript, however, was able to identify and categorize the various fields on the cards and associate them with specific database fields with an acceptable error rate, according to Downton.

The VIADOCS team is also developing a web-based interactive verification tool for the project and currently houses the database.

Scoble said the museum intends to make the completed database part of the Species 2000 project at the University of Reading — a collection of 14 databases that currently catalogues over 220,000 species. Similar biodiversity informatics projects are on the rise worldwide, Scoble said. “There’s so much information stored on index cards in natural history museums across the globe and many of them are trying to get it accessible on the web now,” he said.

The UK’s Engineering and Physical Sciences Research Council and Biotechnology and Biological Sciences Research Council are funding the VIADOC project. Essex has received £125,000 ($175,100), while the museum has received £71,600.

— BT

Filed under

The Scan

Not Yet a Permanent One

NPR says the lack of a permanent Food and Drug Administration commissioner has "flummoxed" public health officials.

Unfair Targeting

Technology Review writes that a new report says the US has been unfairly targeting Chinese and Chinese-American individuals in economic espionage cases.

Limited Rapid Testing

The New York Times wonders why rapid tests for COVID-19 are not widely available in the US.

Genome Research Papers on IPAFinder, Structural Variant Expression Effects, Single-Cell RNA-Seq Markers

In Genome Research this week: IPAFinder method to detect intronic polyadenylation, influence of structural variants on gene expression, and more.