Your resident bioinformatics guru is probably obsessed with finding the perfect algorithm. You know, the one that’ll plow through mounds of data to generate the optimal solution in the least amount of time. It’s a tough job, and that’s why we stay away from it. But it is worth it to learn the language of these kind, determined creatures. Goodness knows they’ve learned enough sloppy biology to assemble your sequences and search your gene lists.
That’s why we here at GT have compiled this user-friendly guide to a few problems at the heart of a bioinformaticist’s day job. They’re called NP-complete problems, and these are the hardest problems to solve in something called “non-deterministic polynomial time.” It sounds interminable, but it’s the job they chose. Here are a few … er, “translated” problems to wrap your head around while impressing that cute number cruncher down the hall.
The traveling salesman
Think of that vendor who travels through all of the labs in your institution once every fiscal quarter to rack up sales by committing you to 20 boxes of pipette tips per month. Yeah, that one. Now think of the poor salesperson that’s assigned to Bethesda. The problem: Starting at NHGRI, find the shortest and cheapest round-trip route through all 19 institutes (we’re not counting the NLM –– that would be cruel). For extra credit, an NP-harder problem: How can the vendor avoid being assigned to the NIH in the first place?
This one should be easy. Just call to mind some immensely complicated data set you have eating up near-petaflops of storage. Let’s say, all of the genes, SNPs, proteins, and metabolites you’ve documented that are associated with a given sample — better yet, a metagenomic sample. The problem: If you have a program to generate some sort of meaningful result based on this morass of data, will the program ever stop? No cheating by bringing up emergency power outages either.
Boolean satisfiability problem
This is one we encounter every day here at GT. So you’re in PubMed, looking for a particular set of papers. You’ve finally mastered the advanced search using operators like AND, OR, NOT, along with those bracketed tags à la [au], [dp], and [mh]. The problem: Given that PubMed has accomplished the unlikely task of returning results that reflect your Boolean intention, will any of the resulting papers feature conclusions that are actually true?
All right, we’ve all been here. It’s kickoff night of some conference, the exhibit hall is open, and you are not worn out by attending or giving talks. You find yourself drawn to a machine. You want to learn more. And, yes, look at that, you find yourself suddenly in possession of a squishy toy or weird pen. We understand — you only took it to give to your kids later. (Sure.) Anyway, the problem: Your conference bag can only hold so much, but the swag seems infinite. What is the most “valuable” set of exhibit hall bric-a-brac you can stuff into that bag without breaking it? You know, for the kids. Ahem.