Thursday, March 19, 2015

Triangulated Small Segments are Identical by Descent

   Autosomal DNA segment matching is a complex issue.  Through testing and observation, it is obvious that some segment matches are false positives.  Computer algorithms will detect any matching allele with no knowledge that the allele is of paternal or maternal origin.


   If we said that the left columns are from the father’s sides and the right from the mother’s, we would see that none of the columns match.  Obviously, we can’t just draw a line down the middle and say one side is the mother’s DNA.  To determine which DNA came from mm and which came from dad, the autosomal results would need to be phased.  To phase the results of an autosomal sample it must be compared to at least one parent result.  By difference, the child result can be split into its paternal and maternal contributions. 


   If it were possible to phase every sample to be matched, false positives by computer algorithm would be eliminated.  Unfortunately, phasing every sample is not always possible.  A person’s parents may be deceased or even unknown.

   Another method of reducing or eliminating false positives is to triangulate each matching segment.  If a segment from autosomal sample A matches the corresponding segment from sample B and sample B matches sample C and sample C matches the original sample A, then the segment is considered triangulated and identical by descent.  How confident are we that the triangulated matches aren’t just a circular series of false positives? 

   Let’s look at segment on chromosome 3 that starts at rs6796502 and is 2.5 cM and 946 SNPs.  For this exercise, any chromosome segment could be used. 

Table 1.  Allele frequencies of 20 loci on chromosome 3.
   On that segment, there are 20 published locations with allele frequencies (NCBI).  Table 1 shows the how often a certain allele combination (AA, AC, AG etc.) appears for a European population.  Based on allele frequency, the most common combination of alleles in this section of chromosome 3 for a population of European descent is listed in Table 2.  I have artificially selected the most common combination to simulate a large portion of the population with European descent.  About 1 in 3,400 or about or about 300,000 people should have this combination. 

Table 2.  Predicted allele combination.
   Imagine for a moment that you roll six dice.  The first die comes up with a one and the second is a two and so on.  The probability of rolling a one on the first die is 1/6 (one side up on a six-sided die).  The probability of rolling a one and then a two is 1/6 times 1/6 or 1/36.  It will happen once every 36 rolls.  The combination illustrated on six dice would happen once in every 46,656 rolls.  Now imagine that is your DNA and we are looking for a match.  The other person would need one through six in the same order.  To calculate that probability we multiply 46,656 by 46,656 and get 2,176,782,336.  DNA matching actual has a better probability of matching.


   Table 3 lists the most common alleles again along with potential alleles that would generate a half match and the corresponding summed frequency.  The probability of the set of 20 potential combinations existing is equal to the product of the frequencies - 0.759.  This probability has to be extrapolated from 20 loci to 946, giving us 2.45x10-6 or 1 in 400,000.  There is a 1 in 400,000 chance of a completely random match on this section of chromosome 3 for the alleles with the highest frequency.  It is well within reason to expect false positives for this one-to-one match.

Table 3.  Probability of a half match within a European population.
   In the event of a three-way match (triangulation), we multiply by 2.45x10-6 again, giving us a probability of 1 in 167 billion.  Now we are outside of what is statistically reasonable.

   The most common set of European alleles doesn't produce the highest probability of a random match.  When the alleles are not the same (AC, AG, CT etc.), there is a higher chance of an autosomal half match.  Table 4 shows an actual set of alleles and the corresponding set of alleles to generate a half match.

Table 4.  Probability of a half match within a European population using actual sample.
   This actual sample takes us from a false positive probability of 1 in 400,000 to 1 in 5,900 (0.000169).  A probability of 1 in 5,900 indicates that we should be seeing completely random matches that have no genetic relationship on a regular basis.  Considering a population of about 1.6 million autosomal tests taken, each of us would have 270 false positive matches on a segment similar to the one shown.     

   Triangulated matches exist for this segment of chromosome 3.  For the probability of this triangulated segment, we multiply by 0.000169 again, giving us 2.87x10-8 or about 1 in 35 million.  Considering the number of results available for matching (about 1.6 million), it is not realistic that we are matching randomly.  In fact, most triangulated matches involve more than three test results.  If four test results are triangulated, the probability goes to 1 in 205 billion.  These probabilities indicate that triangulated results cannot be random and are matching due to common genetic descent.

   I have intentionally used two examples that have a higher probability of having false positive matches.  As soon as we look at matches that don’t have the higher frequency European alleles, the probability of a false positive diminishes. 

Table 5.  Probability of a half match within a European population with a Mediterranean sub-component.
   Table 5 shows a typical set of alleles.  There are two alleles at rs7630053 and rs4558783 that are not typical European and may indicate a Mediterranean ethnicity.  The probability of a one to one match on this segment being a false positive calculates to be 1 in 7 quadrillion. 

   Currently, we cannot examine the allele frequency for every SNP in every match we attempt.  When looking for autosomal matches consider phasing or triangulation.  Phasing the data is very valuable, yet the resources are not always available.  I’ve shown that triangulation eliminates false positives and those matches are statistically identical by descent.  Triangulated small segment matching is very valuable in our research.



References:

Maglio, MR (2015) Autosomal DNA and the Triangulation of Small Segments:  A Statistical Approach (Link)

© 2015 Michael Maglio and OriginsDNA.  All Rights Reserved. 

1 comment:

  1. You might enjoy my study of small segments, as compared to a Y-DNA Group.
    "HAM DNA Group #1 Small Autosomal Segment Triangulation"

    Located here:
    https://drive.google.com/file/d/0B8IN3Go7mIx6clZYWTRjUlU2enM/view

    ReplyDelete