Since taking the reins as director of the National Library Medicine in 1984, Donald Lindberg has skillfully navigated the world’s largest medical library through the rise of the Internet, the completion of the Human Genome Project, the emergence of open access scientific publishing, and the threat of bioterrorism. In the process, Lindberg has ushered the NLM through its greatest period of change since its humble beginnings in 1836 as a collection of medical books and journals in the office of the United States Army Surgeon General.
A pathologist by training, Lindberg was an early adopter of computer technology in health care research. Before his appointment as NLM director, he was professor of information science and professor of pathology at the University of Missouri-Columbia, and from 1992 to 1995 he served in a concurrent position as founding director of the National Coordination Office for High Performance Computing and Communications in the Office of Science and Technology Policy, Executive Office of the President.
As the head of the NLM, Lindberg oversees 690 employees — nearly half of whom are employed by the NCBI — and an annual budget of around $284 million.
BioInform recently spoke to Lindberg about how technological advances over the last two decades have expanded the role of biomedical informatics within the NLM and where he sees the field going in the future.
You’re approaching your 20-year anniversary as director of the NLM. How has the field of biomedical informatics changed over that time?
It’s more central now than it was 20 years ago — for a number of reasons, but at the top of the list certainly is the desire of patients, families, and the public to have direct access to reliable scientific information. Medline was created by the NLM a long time ago, with the online version of Index Medicus, and Index Medicus started in 1879, so we’ve been in the business a long time. But with the increasing capability of the machines, and even more importantly, the networks, we felt able to take on the additional task of serving the public directly. We were quite surprised when we made access to Medline public that the volume picked up very markedly, and within a year we recognized that actually 30 percent of the searches were done by members of the public.
What is the impact of this increased interest from the general public in NLM resources?
The Internet, of course, was the major change, and there are many, many sources on the Internet about health and medicine. There certainly is misinformation out there, so of course a person has to be aware. Just a few more steps from that brings you to some serious policy problems.
The Human Genome Project has succeeded remarkably well, under budget and ahead of time, but it does bring you immediately to the proposition that [genetic] tests are increasingly affordable. I think the majority of Americans would believe that nobody should be penalized in a job or insurance or a license to marry because of a genetic test, particularly if the test was compelled of them. So while I think the majority of folks have that feeling, we really don’t have any federal law that assures that.
That’s a new problem, this genetic business, but the standard proposition of intellectual property rights and how they are protected and used for the public good is an old story. At the moment, in the case of the scientific literature, we publish a bibliographic citation, plus an abstract where it exists, but you naturally want to get to the full text of the article. So we link to the home page of the commercial publisher in some 4,000 or so journals, and then the access to the full text is variable depending on whether you have a subscription or not. There’s also PubMedCentral, and in those cases around 100 journals have simply filed all of their full text, everything, with us, and made it free immediately and forever. No one really knows how all of this is going to come out. It appears to be to the good of everybody that research that’s paid for with public funds remains available to the public, but, yet, on the other hand, it doesn’t make sense to go out of your way to bankrupt journals. That’s not right either. So the [open access] market has not made itself yet in a definitive fashion and the response of the journals has varied quite a bit.
The Sabo bill was introduced in the House earlier this month on the topic of journal access [BioInform 07-04-03]. Does NLM have a stance on the bill?
I haven’t had the chance to read the bill, but I don’t have any insider view of it. We’re sort of happy to be in the middle, really, because if we can bring together electronically someone who wants the information with someone who has it, that’s great. Congress can do whatever it wants, but I’m sure that this bill or any of the others wouldn’t want to harm American publishing companies.
But the same issues arise with respect to how fast research is published by the people funded by public funds. The [Sabo] bill is saying that when something is published and it’s about science that’s supported by grants, then everyone should get to it quickly and freely. Well, okay, but preceding that point is the issue of when the experiment’s done and data are in, how fast are you going to make them available to people? One of the reasons that the Human Genome Project succeeded so well is that for many years they operated under the Bermuda rules. Now that part of it is done, but there are still many, many genomes to be done. When they arise out of research grants, the federal government hasn’t taken the position that it has any direct authority over that grant recipient. But on the other hand, we definitely want the data to be available to the research community at the earliest moment that it’s usable. So that’s basically a public policy issue.
How well integrated is NCBI with the other units of the NLM?
NCBI started out in 1989 as 12 people, and it’s now about 300. It wasn’t intended to be such a large thing, but there’s just that much progress — the medical informatics, the genomics — that’s so central to the progress in science right now. And, as you doubtless know, networks don’t maintain themselves and databases don’t maintain themselves — junk in, junk out. So it takes a lot of very informed people to monitor all this stuff in order to make sure that the descriptions match the data and all that kind of stuff.
In what’s now called homeland defense, we not only don’t have a prime role, but we don’t want to have any role at all if we can possibly avoid it. But we are attacked. I don’t like to emphasize it because I don’t want to get any more [attacks], but I’m disgusted that vandals would attack databases such as ours that are totally in the public domain, totally free. We’re not peddling anything, there are no secrets whatsoever and there never will be, so it’s just for the fun of destroying. And it costs a good bit of money to fend off those attacks.
Has the number of attempted attacks gone up recently?
They run as high as 10,000 times a month. Most of these are denial-of-service attacks — they try to tie your computer up in knots so it can’t do anything. One reason they would go after us is not because of the information we hold, which is totally public, but because in some cases if you infiltrate one federal agency it’s easier to get into other federal agencies. So they might just be trying to use us as a stepping stone to someplace more important. We can’t have that, but we want to do whatever is in the best interest of the country in terms of homeland defense. It isn’t too clear how well that’s going to work. In terms of bioterrorism, the secrets of all the organisms are sitting right here in GenBank, so that plays a very central role.
Are you getting pressured by Congress or the Department of Homeland Security to restrict access to the information in GenBank?
We haven’t. It would be silly, truthfully, because it is so very public. We did review what we have. Nobody ever told us to do it, but I recall a couple of years ago — I suppose it was in 2001 when all the bad stuff started — and it turned out that there were federal databases that in retrospect were way too freely offering [potentially dangerous information]. So we were horrified when we heard about that, and we looked over all of our databases, and we didn’t have anything like that. Everybody is equally confused and worried about it — how to still be the wonderful United States and not be a sap, not be the victim.
What’s your plan, then? To keep an eye on things and adjust as necessary, or wait for legislation to go through?
I think the former — keep an eye open. This business of terrorizing the databank is a high priority for me. We definitely don’t ignore that. We have in the past called in the FBI. We’ve gotten worried about stuff and called them. It’s a totally different approach to life that they have than I have. For a while I was a little impatient, and in the end I realized that they were absolutely super. They were just technically super, and they were within the law and wouldn’t make a move on anybody until they had really, really good evidence. So I have nothing but praise for the FBI’s mode of operating.
How much of your staff do you have to devote to this problem?
At least four people, maybe six. But the cost of rebuilding things is very, very high. Of course, you have backup tapes, etc., but it costs a lot of money to rebuild systems.
It seems like you’ve been doing a good job of keeping everything up and running. It doesn’t seem like GenBank is down very often.
I think we have done a good job, but everything does go down. You have to run these things in multiple partitions so you’re not totally at risk. One of the things that you fear is that people are technically able to hack in and change data. I don’t think that’s been done, but how do you know? You have to keep watching very, very carefully, and if you have any suspicion of that, then you have to completely rewrite that whole file. That’s sort of where the money comes in, and that’s why we worry about it. When any data has been removed, people have been punished. But unfortunately, the other side of the coin is that if there are ten thousand times they try denial of service, the FBI would certainly not thank us for informing them ten thousand times. But we do track the attacks and map them.
Looking forward, what do you see as the NLM’s biggest challenges in biomedical informatics — either from a technical standpoint or a public policy standpoint?
I really do think it’s pointed in two different directions. One is serving researchers, particularly the genomic researchers, and that does inevitably get to be pretty technical. On the other hand, we’re really trying our darnedest to give the information to the public. And in a way, I think that’s tougher because there’s no very good model. Naturally, one likes to think that these creations of NLM are user-friendly, satisfying, accurate, up-to-date, and so on, but beauty is in the eye of the beholder. So, for us, informatics is headed in two directions: one toward genomics and one toward information for the public, so-called consumer health information.
Do you see these two areas coming together? Will consumers start accessing genomic information directly?
Absolutely. You can easily see that’s not very many years ahead of us. There are parts of NCBI that are aimed at students and the public, but, in truth, how many doctors have had a course in genetics? Very damn few. Now, of course, that isn’t any reason to slow down in informing the public, but it’s just that probably, both of them will get pretty well informed in the next five years.
So that’s sort of the brave new world, and I would think that within five years we will have settled down so that the medical profession has updated what it needs to come to grips with, and the public will have a pretty sound understanding of genetics, and genetic tests, and when are they something you want. There are websites right now that offer to sell you [genetic] testing, just as they offer to sell you spiral CTs in the shopping center. It’s really an interesting time, but I don’t think it will take many more years before doctors and patients settle down with a common understanding of the areas in which [genomic information] is useful to them and the areas in which it’s not useful or threatening.