Skip to main content
Premium Trial:

Request an Annual Quote

Didn't Quite Mean 1-Mar

Some 20 percent of papers that include supplementary gene name lists in Excel have gene name errors, a new paper in Genome Biology reports.

A trio of researchers from the Alfred Medical Research and Education Precinct and Monash University downloaded and screened supplementary files from 18 journals that had been published between 2005 and 2015, for a total of 35,175 supplementary Excel files. They identified 7,467 gene lists from 3,597 published papers.

From their screen, the trio led by Monash's Assam El-Osta uncovered and confirmed gene name errors in 987 supplementary files from 704 published articles. That is, 19.6 percent of published papers with gene lists harbor errors.

In Excel, El-Osta and his colleagues note, gene names are often converted to dates or floating-point numbers.

For instance, Slate notes that the gene symbol for Septin 2, SEPT2, is converted to the date '2-Sep', while Membrane-Associated Ring Finger, or MARCH1, becomes '1-Mar'. At the same time, identifiers like accession '2310009E13' are converted to '2.31E+13,' it adds.

These errors, adds data scientist Neil Saunders at his What You Are Doing Is Rather Desperate blog, aren't unknown — he points out a BMC Bioinformatics article on the topic dating back to 2004 and a post he wrote in 2012.

"We tell you not to use Excel. You counter with a host of reasons why you have to use Excel. None of them are good reasons. I don't know what else to say," he adds.