Tuesday, December 2, 2014

DNA, SNP, STR, OMG!

(Originally published May 2014 in Going In-Depth)

   Oh my gosh, there are many acronyms in genetic genealogy.  You have to agree that using the acronym DNA is better than writing deoxyribonucleic acid repeatedly.  Although, when we talk about using DNA for genealogy and we only use acronyms, they start to lose their meaning and become just another ‘thing’.  “Hey, I’ve got a SNP.  Do you have a SNP?”  “I dunno, let me check.”  Maybe I’m weird.  I like to understand what all the acronyms mean and how they play a part in the larger picture.

   Let’s start with some DNA basics.  We have DNA in every cell except the red blood cells.  Inside the nucleus of our cells, we have 46 chromosomes or 23 pairs (nuclear DNA).  One set of 23 comes from dad and one set comes from mom.  If we took the tightly coiled DNA from one cell and stretched it out it would be about six feet long.  In that six-foot double helix from one cell, there are over 3 billion base pairs.  If you picture our double helix DNA as a twisted ladder, each rung is a base pair made up from four nucleotides (DNA building blocks).  The rungs are made from either an adenine-thymine rung or a cytosine-guanine rung.



   When we talk about DNA, we often also talk about mitochondrial DNA.  Mitochondria exist outside of the nucleus as an energy source for the cell and have their own independent DNA.  Mitochondrial DNA has just over 16,000 base pairs in comparison to the 3 billion base pairs in our nuclear DNA.  We inherit our mitochondrial DNA only from our mothers.

   DNA is divided into coding regions (genes that define proteins for such things as eye color) and non-coding regions (sometimes called junk DNA).  The coding region that defines us is less than 2% of our overall DNA and within that, there are less than 25,000 genes.  A gene is a sequence of nucleotides averaging about 23,000 base pairs.  One of the largest genes, which encodes for the Caspr2 protein, has over 2.3 million base pairs.


   Within the 3 billion base pairs of our DNA there are variations (normally occurring mutations), where one base pair has been replaced with another base pair.  As an example, it was adenine (A) and now its guanine (G).  This is a single nucleotide polymorphism or SNP (pronounced snip).  There are over 15 million SNPs in our DNA.  Once a SNP occurs, it is usually permanent in the population.  The farther back in time that the SNP occurred, the more people will have that particular mutation.  To be considered a SNP, it has to exist in greater than 1% of the population.  They are found in both the coding and non-coding regions of our DNA.  In the coding regions, SNPs are often markers for genes.

   Let’s divide our DNA into four groups.  Group one, the autosomes, are the first 22 pairs of chromosomes.  The next two groups, the sex chromosomes, are one X and one Y if you are male and two Xs if you are female.  That gives us yDNA and xDNA.  The last DNA group is mitochondrial.  All types of DNA have SNPs.  Autosomal SNPs are used for health and ethnicity.  Mitochondrial and Y-DNA SNPs are used to determine world haplogroups.  While there are 1,000s of X SNPs, there doesn’t seem to be much research around them.

   SNPs have no effect on health, but their presence may predict a health risk.  If you had an autosomal test from 23andMe (prior to the FDA ruling), they would have delivered health information with your results.  They were able to report SNPs in the coding region associated with gene combinations responsible for health risks, like cancer or Alzheimer’s or basic information, like eye and hair color.  Even though you cannot get health information from 23andMe currently, you can still use your autosomal results with Promethease from SNPedia.com to research your health risks.
   Combinations of SNPs are analyzed to determine ancestry-informative markers (AIM – another new acronym for you).  AIMs are used to estimate the ethnicity or at least the geographic origins of your ancestors.  When you receive ethnicity results from an autosomal test, it will be based on the AIMs that the test company are using.  They don’t all use the same markers, so results will vary.  There are even 42 SNPs associated with having Neandertal ancestry.
   SNPs are used to organize us into larger branches of the human family tree (haplogroups).  Our maternal family tree is organized into 26 branches (A through Z) using mitochondrial DNA.  Our paternal tree is similarly organized into 20 branches (A through T) using yDNA SNPs.   As an example, take four men (I use men because the scenario works for both mitochondrial DNA and yDNA), Abe, Bob, Chaz and Dave.  Test each of them for three SNPs, X, Y and Z.  You find that they all test positive for SNP Z, Abe and Chaz test positive for X and Bob and Dave test positive for Y.  You can start to see the branches and the beginning of a tree.



   The first yDNA and mtDNA trees were built using only a few dozen SNPs.  Today, the paternal and maternal haplogroup trees are much more detailed, based on thousands of SNPs.  Complete SNP testing has been available for mitochondrial DNA for a number of years.  Starting last year, complete SNP testing is available for yDNA from companies like FamilyTreeDNA with their Big Y test.  Previously yDNA SNP tests were designed to look for specific SNPs.  With advances in technology, they can now look for all the SNPs across over 12 million yDNA base pairs.

   Just to add another acronym to the pile, there are also STRs or short tandem repeats (aka microsatellites).  STRs are short sequences of base pairs that repeat.  These repeats are found in autosomal, y and x DNA.  You may have heard the term CODIS if you watch Crime/Drama shows on television.  CODIS is the FBI’s Combined DNA Index System (more acronyms).  When DNA is collected for CODIS, they typically test for 13 STR markers across the autosomes.  When you have a yDNA STR test done, genetic genealogy companies test for up to 111 markers only on the Y chromosome.  They will also perform a basic SNP test to identify your paternal haplogroup.  SNPs and STRs are different in that SNPs appear to be permanent changes in our DNA and STRs are variable.  STRs are identified by location on the chromosome and by the number of times that the repeat occurs.  The number of repeats per STR can change over time, sometimes increasing, sometimes decreasing in number or increasing then decreasing again (known as a back mutation).  The combined set of STR markers is your haplotype and may be unique to your surname or span multiple surnames.  With the advances in yDNA SNP testing, SNPs will be found that are unique to your surname, which could make STR testing obsolete.

   We all have DNA: 23 chromosomes in our cell nuclei, half from mom and half from dad.  We also have mitochondrial DNA from our moms.  Less than 2% of our DNA is in the form of genes, which define who we are.  SNPs can be used to identify our “good” and “bad” genes.  SNPs can also help identify our ethnicity and build our paternal and maternal family trees.  STRs can organize us down to the paternal surname level.  When folks start talking DNA, don’t be afraid to question them about, “What kind of DNA?”, “What does that SNP indicate?” or “What type of STR is being tested?”.  We’ll never get away from using acronyms to simplify how we communicate genetic genealogy.  That doesn’t mean we need to let the acronyms simplify the meanings to a point where the science is lost.  Every little bit of knowledge adds to our understanding of ourselves.


© Michael Maglio

No comments:

Post a Comment