Name: Ken Chahine
Title: General manager, DNA, Ancestry.com
When Ancestry.com introduced its AncestryDNA autosomal DNA testing service in 2012, it followed several entrenched players onto the market, such as 23andMe, which had been offering an array-based test for five years. And while Family Tree DNA and National Geographic's Genographic Project also introduced array-based autosomal DNA tests around the same time, both had been offering other kinds of DNA-based ancestry testing for many years.
Fast forward two years, and Ancestry.com is a direct competitor of all three, having genotyped about 300,000 samples to date. And, according to Ken Chahine, general manager of the Provo, Utah-based online genealogy company's DNA business, the consumer genomics business is only "at the very early part of the S curve," and hasn't yet hit "that massive inflection point."
Chahine joined Ancestry.com in 2011 after holding several positions in the biopharmaceutical industry, including as CEO of Avigen. At the annual RootsTech conference, held earlier this month in Salt Lake City, he addressed attendees during a general session, and also spoke with BioArray News about new features planned for AncestryDNA, including better kinship matching tools, more specific ethnicity predictions, as well as a planned launch for the service outside of the US.
The following is an edited transcript of that interview.
You introduced some new features for matching cousins during your talk at the general session.
We have taken a slightly different approach with respect to cousin matching or identity-by-descent. We recognized that some of the distant matches were false positives because they are short, five megabases of DNA. We had estimated from the beginning that about 50 percent would be false positives, and it's actually held true. These are shared pieces of DNA that are likely to represent identity by state, meaning a common deep ancestry, rather than descent.
This issue created a dilemma for us from the very beginning. We could have done what a lot of other people have done and said, "We know there are a lot of false positives, so we are never going to give you all of those matches. We are just going to give you matches where people share much more DNA." But that approach causes a false negative problem, in other words, you are missing out on a ton of real matches. But one thing that we have at Ancestry.com is we have trees. People can use the trees and information to figure out if a match is false or not. So rather than limiting the number of matches at the expense of failing to provide real matches, we decided to give you more matches, knowing from the beginning that a lot of them were going to be false positives. We predicted that we would get better at distinguishing true positives from false positives and now we are at a point where that prediction is coming true.
The team has studied hundreds of thousands of matches and now understands much better how to distinguish the true from false positives. We estimate that we can remove approximately between 90 percent and 95 percent of the false matches. So now, half of the deep matches – fifth cousins, et cetera – are going to disappear, but the distant matches that remain are going to be more valuable. So, we are really at a point where we hoped we would be two years ago – a lot more matches that are more accurate.
You recently moved to the 24-sample Illumina chips, but I take it that the content hasn't changed much. So how are you able to make those new matches?
There are three aspects of the matching experience that have been improved. First, to get good matches, you need to have good phasing. If the phasing isn't good, when you go to do identity-by-descent, you end up getting a lot of garbage. So, the phasing has to be done right and we have developed a new phasing algorithm that relies on a much larger reference set. The new algorithm is much better and significantly reduces the switching errors. The other thing is that we realized that a lot of those false positives are identity by state, not identity by descent. But when you see hundreds and thousands of those matches, you start understanding how to distinguish those two so we can filter most of the identity-by-state matches that are false positives. Finally, we went to centimorgans, so now we have a relationship predictor which is quite good. It's a combination of three things that work together to give you these massive improvements.
Will you be able to tell people where they match on specific chromosomes?
We clearly have that information. Currently, we don't pass that information to the customer. There are a couple things we are considering and one of them is the privacy of the customer. If you and I share DNA in a certain region, and that segment of DNA happens to have something interesting in it about me, by matching with me, I am implicitly telling you that you also have something interesting in this segment. So, maybe what we need is some kind of consent that says, "Yes, I am okay with you sharing my DNA segments." Those are things we have to think through carefully. I understand many serious genetic genealogists want that information, but on the other hand, we have to think about the other 99 percent of customers who aren't serious genetic genealogists. In short, the answer is not no when it comes to sharing that information, but we are thinking of how to best do that.
And you are also coming out with a new ethnicity predictor.
We are really excited about that. We have genotyped a couple thousand samples from all over the world and our team is very busy looking into what genetic variation is in those samples. A couple years ago, the idea of going subcontinental in terms of our ethnicity predictions seemed kind of silly. Now, we're clearly subcontinental. Every step we take gets harder and harder though. It just comes down to more samples, more alleles, better algorithms, and more data. But if I had told you a few years ago that we are going to get this resolution from an Illumina OmniExpress chip, which has the most common variations in the human population, I would have thought, "Not a chance." What has helped is the numbers, and the scale. The number of people has demonstrated that even though these alleles are some of the more commonly found ones, with scale, we start seeing separation of the populations that I don't think we would have predicted.
Spencer Wells, the director of National Geographic's Genographic Project, said in his talk that the number of people tested will double this year. Do you agree with that estimate?
As I announced at the keynote, we are very close to having 300,000 genotyped to date. So, the answer is yes, the growth is good and I agree with Spencer. We started with our customer base, because it made sense. But I think at this conference, what we are seeing is that more people in our customer base are getting interested, and we are starting to see general interest, lots of people outside our base who are ordering AncestryDNA, and that now seems to be the fastest growing segment.
Ancestry.com maintains a significant archive of historical documents and materials. Do you feel that gives you an advantage over competitors in the consumer genomics testing market?
There is no question. It helps us both on the back end from the research standpoint, but it also helps us from the perspective of new AncestryDNA customers. If customers' DNA matches, our algorithms will compare trees and see if they can find the common connection, and even if they don't, they can build trees using documents to try and find the common connection. At the end of the day, knowing someone is a fourth cousin is interesting, but what they really want to know is how they are related, through what line, from what country. To me, that's the richness. And if you can find a document that demonstrates where you are related, all of a sudden, I think it makes it real.
Some people have told me that genetic genealogy has never been as widely discussed at RootsTech as it has been this year. Do you agree with that assessment?
What I think you are seeing, is that people are starting to have success. Before, it was, "What am I doing this for? Does the ethnicity prediction really work?" But more people are finding success as the database grows and they begin to understand DNA to advance their genealogy. And now, I predict that we are at the very early part of the S curve. I think we are just getting past the early adopters, we are just barely going mainstream, and I don't think we have hit that massive inflection point. There is no question in my mind, that in a couple years, we could be sitting here talking about doing a million samples a year. I think it's just going to grow rapidly from here.
Have the vendors taken a more hands-on approach given that anticipated growth?
I work directly with Matt Posard, who is the manager of Illumina's New and Emerging Market Opportunities business. He and I have believed in this market from the beginning. I guess we are the middle men, we are the ones who interface with the consumers, so I think it makes the relationship easy for both companies. Quite frankly, with the 24-sample chip format, we were one of the customers who helped them validate that chip.
AncestryDNA is only available in the US. Do you have any plans to make it available abroad?
Definitely. It has been the plan since the beginning. The goal is to get something out soon, hopefully in the first part of next year. Our goal is to launch in Canada, UK, Ireland, and maybe Australia as soon as we can.