Chief Scientific Officer
Name: Mark Skolnick
Title: Chief Scientific Officer, Myriad Genetics
Experience and Education:
— Co-founder, Myriad Genetics, 1991
— Faculty member, University of Utah, since 1974 (currently adjunct professor, department of medical informatics)
— PhD in genetics, Stanford University, 1975
— BA in economics, University of California, Berkeley, 1968
Mark Skolnick’s early research focused on the genealogy of Italy’s Parma Valley, and its population genetic analysis. Later, he created a computerized database of over 170,000 three-generation families in Utah, which helped produce the first population-based analysis of genetic predisposition to cancer.
At the University of Utah, his group was responsible for genetically mapping a number of diseases.
At Myriad Genetics, where Skolnick is now CSO, his team cloned susceptibility genes for breast cancer, ovarian cancer, prostate cancer, heart disease, obesity, and depression.
At the recent Advances in Genome Biology and Technology meeting in Marco Island, Fla., Skolnick talked about sequencing the grapevine and the apple genomes using Sanger sequencing and 454’s Genome Sequencer.
In Sequence caught up with him last week to find out what Myriad has learned from these projects, and what role next-generation sequencing could play in diagnostic tests.
Why is Myriad Genetics, which is best known as a diagnostics company, involved in sequencing the grapevine and apple genomes?
Myriad has tremendous expertise in high-throughput sequencing from our BRACAnalysis test. We originally used this on the rice genome [in collaboration with Syngenta]. It was the largest fully shotgun-sequenced genome sequenced at the time — since the human [genome used] a complex strategy — and it was done just after that. There were a number of participants, but Myriad did all of the shotgun sequencing.
Then we followed that up with these projects on apple and, before that, grape (see In Sequence 1/23/2007). The reason we did that is because it provides a customer who pays for new generations of robotics and sequencers and software development [the opportunity to] test this equipment in a non-critical environment. That is to say, if the equipment does not work initially, which is usually the case on some of the robots, or if something happens when the equipment stops working for a week or two, it affects that project, but there is not a patient waiting for information.
Our purpose is to test out new equipment before moving it into the diagnostics laboratory, and also to develop software, sequence analysis [and] base-calling algorithms, and so on.
Why did you choose the 454 technology for the apple and grapevine projects?
We were hired to do these projects by the Istituto Agrario San Michele All’Adie [in Trento, Italy] and we started doing the sequencing as per our relationship. We came to the end of the grape project and had developed a highly automated robotic primer-walking platform that we were going to use to cover gaps. And at that time, 454 was just becoming available, so we thought that adding a 4X coverage with 454 — randomly, shotgun — would cover remaining gaps very nicely. And it had the same non-bias that the primer walking would have, in that it was not clone-dependent. In fact, it’s better bias, because if something would not clone, you would not have the underlying clone to walk on.
In fact, it worked so well we never had to go back to any primer walking. That was a very successful project. It was a complex project, in that both grape and apple are non-inbred, natural organisms. The complexity is that you are actually sequencing two genomes simultaneously, the maternal and the paternal chromosome. You have to account for that, so when you see sequence differences, it could be an error or it could be a polymorphism. And when you account for that properly in your assembly, then it works fine.
In the apple genome project, you used a different strategy than in the grape project. Can you tell me about that?
In grape, we basically finished the assembly and were ready to do primer walking to close gaps when we decided to use the 454. We used 7X Sanger coverage and 4X 454 coverage.
In [the apple] project, we have just done a 4X [Sanger] coverage, using BACs and fosmids and 10 kilobase clones, and the majority of it in 2- to 3-kilobase clones. And then, before we assemble, [we added] a 10X coverage of 454 reads, most of which are these longer reads now that average 500 bases.
You can think of that as extending all of your sequence ends off of all of the BACs, off of all of the fosmids. Many of our 2- to 3-kilobase clones would be completely closed in, would be just a single-length sequence, before you start assembling, and your 10-kilobase clones have longer ends.
Now, our total coverage is 14X, rather than 11X, and since there are two chromosomes, a paternal and a maternal, it gives you a 7X coverage of each polymorphism, on average, rather than 5.5X, which is some further improvement in terms of determining the solidity of any specific difference between the two chromosomes.
Also, going back to the clones, we are going to use [454’s] Michael Egholm’s new 15- to 20-kilobase clones, which would have about 200 bases on each end. We are going to get a large set of those and show how they can replace both the 10-kilobase clones and the fosmid clones. And the paired-ends could replace the 2-kilobase clones. So the next step forward in this chain of replacing Sanger sequence with 454 sequence would be to replace everything except the BACs with 454 sequences. And we are not doing that in this project.
Who developed the assembly software for these projects?
The assembly software was developed by Andrey Zharkikh in our group. The assembly program is unique in that it basically says, ‘I’m assembling, but I also recognized that I’m assembling two different haplotypes.’ So it puts together contigs that show sequence similarity and are therefore related to each other, and at the same time, it tries to separate them into the A and B chromosomes. So when it sees a sequence difference or a deletion, it has to ask, ‘Is this an error that I am trying to correct, or is this a real sequence difference that I am trying to understand?’
With this strategy of [assembling] a heterozygote, you get millions of genetic markers, automatically, which is really nice. So then you use a subset of 1,000 or 2,000 or 3,000 of those on all of the largest meta-contigs, and use those to position them with regard to each other.
So you get a tremendous amount of biological information, plus, you are sequencing a real-life plant and not a laboratory artifact.
Do you plan to make the assembly software available to others?
We would like to. But basically, the only way we could do that would be to clone Andrey. It’s really not a program or product, it’s a series of scripts and manipulations of fragments of codes. The best we can do is to make all of this information known to 454, and ask them to include that in assemblers that they will be producing. It would be a tremendous amount of work to make a product out of it. That’s really outside our scope.
Do you plan to do more of these agricultural projects?
No. In fact, we are now very much focused on developing products in cancer personalized medicine, which is much closer to our basic business strategy. Again, it’s been very successful for us to use the 454 to get introduced to it through these agricultural projects, and now we want to turn what we have learned to our core business.
What have you learned from these projects that you will now apply to diagnostics?
The 454 [sequencer] is actually an extremely valuable instrument for looking at tumor sequences and mutations. Tumors are very highly heterogeneous, so may have, for example, a K-ras mutation that has only occurred in 3 or 5 percent of a tumor. Also, the cells that you collect may be 60 or 70 percent tumor, and other, normal tissue of the same tissue type, or infiltrates of a different tissue type that are keeping the tumor alive. You can’t detect [a variation] at 5 percent mutation frequency by Sanger sequencing. [454 sequencing] just allows a whole level of analysis of tumors that’s extremely important for us and for other people.
So that’s our main application. And again, our strategy worked very well, we got very familiar with the 454 [technology], we got to learn its capabilities to exercise it, if you will, in an environment of working on grape and apple. And now, without incurring these learning expenses, we are able to utilize that in our own development.
Is the goal to develop a diagnostic test to characterize tumors?
Two of the big issues are, is a tumor likely to be progressing or not, and therefore, worthy of treatment or not? There are certain stage-2 cancers of different types where it is hard to know whether you should treat or not. And if you could distinguish the tumors that are likely to progress, you would treat those, and the ones that are not likely to progress, you would not treat those. So that’s one general class of problems.
The other general class of problem is assignment of therapy. Many of these expensive new biologic therapies work 30 percent of the time — to pick a number — and if you could determine who is a responder and who are the non-responders in a particular therapy, and have 80 or 90 percent responders and 10 percent non-responders, that would be fantastic in terms of the patients and in terms of the healthcare system. And this is a very big field for us and for many other companies, trying to develop these products to improve our healthcare and to reduce cost.
What would be the advantage of using sequencing for that instead of expression microarrays, or looking for expression of a particular protein?
It’s a mixed strategy. You want to use expression microarrays, you want to use protein-expression analysis, you want to look for specific mutations. And if those mutations are occuring in a small set of samples, then you need to look with a tool that is sensitive, such as the 454.
You have to be able to do immunohistochemistry, and RNA analysis and copy number analysis as well. I think that an individual product may end up using a mixture of different technologies.
Have you tested any other next-generation sequencer, like Illumina’s or ABI’s?
I believe that these different sequencers have their ideal applications. For the genomic application that we did, we think that the 454 was clearly the best. We are also comfortable with developing the system we are developing for looking for specific mutations in specific genes on the 454.
This is not to say that it can’t be done on other platforms, but this works very well for us. And at a certain point, you can spend your time testing every available platform, or find something that you are comfortable with and that you got experience with and then try to make it work. And there is a lot of energy that goes into working with each of these platforms. For us, the ability to look at the long reads is important. The turnaround is much faster on these machines, and for certain applications that’s important as well.
I think that there is sort of a battle going on for people to prove which applications are best done on which platform. And that’s really not our field. We are also technologically agnostic, meaning, as soon as somebody says, ‘This application should be done on that machine,’ and all the proper comparisons have been done, we will definitely adopt different machines for different applications.
For your purposes, do you see any good applications for one of the short-read technologies?
We are not exploring the short read technologies right now. Given the complexity of what we are trying to do, we think that we can do what we need with the 454. And again, that’s not to say that in a certain amount of time, a year or two or three, we won’t be looking at another platform. There are certainly some intriguing platforms that are being waved in front of us that are not on the market yet. So we will just have to see what comes.
Are you thinking of platforms like Pacific Biosciences’?
I was very impressed with Pacific Bio’s talk at Marco Island; that was very exciting. I think there is a good chance that we may stick with the 454 and continue to see what applications work well on it, and I think that there will be many in the long run, but the ability to do a whole genome reasonably inexpensively on a PacBio machine is certainly intriguing.
And there are others. We are all waiting to see if these single-molecule nanopore technologies are going to come to life or not. That would be very exciting. So the world is just opening before our eyes. And if you want to make the computer analogy, I think we have just hit the point where university-wide mainframes are being replaced by mini-computers, and we still have personalized computers and laptops and iPods and all that in front of us that will come out, very rapidly, in different forms of analysis of sequences and expression and protein and so on. There is just a tremendous wave of development in front of us.
How many 454 instruments does Myriad currently own?
Just one now. But we expect to have more in the not-too-distant future.
What are the challenges you still need to overcome to meet the criteria for diagnostic sequencing?
I think that the 454 would be able to do that. I don’t think that there are specific challenges. It might astound you to know how much work it is to change a platform like that. It involves all of the robotics, all of the sequencing, all of the interpretation software, all of the base-calling software. And then to validate that, and put it in a CLIA lab, is a big step. We are exploring the feasibility of that. It’s just too early to think of when that might actually hit the market.
Our criteria for diagnostic sequencing is that the answer has to be 100 percent correct, 100 percent of the time. ‘Almost’ is not good enough, ever. And we feel we are there with the Sanger platform.