June 14, 2011

Very slow and variable mutation rate in humans

A direct measure of the mutation rate in two family trios (mother, father and child), one from Utah and the other from SW Nigeria, finds that the average mutation rate is of the order of one mutation per 100 million sites, what totals some 60 inherited mutations per newborn child.

There was some remarkable variation between the number of mutations inherited from each parent but the data failed to support the hypothesis that the male line was more prone to introduce mutations just because one more meiosis takes place in the production of sperm than in the formation of ovules.

This is not the first case that real measures of the mutation rate happen to be slower that hypothesized: 
And let's not forget that we diverged from Chimps not before 8 million years ago. But way too many geneticists and other people who write on genetics ignore these facts of life, what I find pretty much unscientific (careless at best, mischievous at worst).


  1. Dienekes recently posted a study done by the Journal of Human Genetics about Haplogroup Q1a in Siberian populations, little did he know the 13.8 ± 3.9 Ka divergence time between Q1a3-M346 and Amerindian Q1a3a was found using the evolutionary rate, which he so despises, I already posted it on his blog.
    This is from the study:
    Malyarchuk et al(2011)
    "The age of STR variation within haplogroups was estimated as the average squared difference in the number of repeats between all current chromosomes and the founder haplotype (formed by the median values of the repeat scores at each STR locus within the haplogroup), averaged over STR loci and divided by means of a mutation rate.25,33 The evolutionary effective mutation rate of 6.9x104 per 25 years based on STR variation within Y chromosome haplogroups in the populations with documented short-term histories was used.33 The upper bound for divergence time of two groups of haplotypes was calculated as divergence time estimate, assuming STR variance in repeat number at the beginning of population subdivision (Vo) equal to zero.34 In the age calculation procedures, only tri- and tetranucleotide markers were used, so, besides the ambiguous loci DYS385a and DYS385b, locus DYS438 with pentanucleotide repeats was excluded from the calculations."
    So if we follow his correction for the genealogical rate, and divide the numbers by a factor of 3, then it would mean that divergence time between these Q1a3(*)-M346 haplotypes and Amerindian-specific haplogroup Q1a3a-M3 is equal to 4.6±1.3 Ka, something awfully recently considering the archeological record of the colonization of the Americas. It seems the genealogical mutation rate might have hit a brickwall.
    It is indeed interesting the almost complete absence of R-M269 is Turkic speaking populations, with the exception of 2 Tofalars (N=30), the rest of the populations seem to lack R-M269. So I fail to see the link between R-M269 and Turkic speakers from Altaic, like Klyosov claimed.
    This is from the study:
    Malyarchuk et al(2011)
    “R1b1b2- M269 that is frequent in Europe is rarely observed in diverse set of Siberian populations: Evenks (2.4%), Buryats (0.7%), Mongols (4.3%) and Tofalars (6.7%). However, more interesting fact is the presence of haplogroup R1b1b1-M73 in the whole series of Turkic-speaking populations—Shors (13.2%), Teleuts (11.4%), Khakassians (3.2%), Tuvinians (1.9%), Altaians (1.1%), as well as in Mongolic-speaking Kalmyks (2.2%).”

    Here are more interesting results from the study:
    Malyarchuk et al(2011)
    “Coalescence age of South Siberian Q1a3*-M346, based on the average squared difference in the number of tri- and tetranucleotide repeats, is about 4.03±1.25Ka, while the age of the Koryak Q1a*-MEH2 appears to be only 1.0±1.0Ka.

    Median network of haplogroup R1b1b1-M73 shows that there are two subclusters of haplotypes in Siberian populations studied. One of them (designated as a in Figure 2) is determined by median haplotype 14-13-16-13-17-22-11-13-13-15-10-13 (for all loci studied: DYS19, DYS385a, DYS385b, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439) and another one (B) by haplotype 14-13-13-14-16-19-11-13-13-15-10-13 (Figure 2). The coalescence age of R1b1b1-M73 in South Siberia, based on tri- and tetranucleotide repeats, is estimated as about 18.2±10.5Ka. The ages of subclusters A and B are amounted to 4.4±1.5 and 5.6±4.0Ka, respectively. Analysis of published data demonstrates that these two subclusters are present simultaneously not only in Siberia but also in different ethnic populations of China7 and the Caucasus.23,36”

    PS: I have the study, so if you need a copy of it, let me know, and I'll send it to you.

  2. The "evolutionary rate" method clearly approximates better the reality than the "germline rate" that Dienekes and some others love so much (out of wishful thinking) but I still think it lacks on at least one element: a wrong estimate of the Homo-Pan divergence, which is systematically undervalued.

    Anyhow, I find really hard to estimate ages in the Y chromosome, because the full sequences are never known, so you need to rely on a set of more or less random markers, which are often criticized. Instead the mtDNA is much better known, what allows for a more precise measure.

    However what we see in mtDNA is that some lineages have less mutations, even if we can access the whole sequences, than others. This is not explained but my best understanding is that large haplogroups mutate less in effective terms because novel mutations in the population are drifted out easily by the dominant one(s) which is/are the older one(s). In smaller populations instead the chances of an old or a novel (mutated) lineage being the survivor are similar, so eventually a mutated lineage takes the dominant role, only to be replaced by another soon after.

    That's how I explain that H and U, for example, even if being of similar age when counting from upstream, from the shared root in R, show very different mutation rates downstream: U sublineages are highly mutated and show few and small star-like nodes, H instead shows very large star-like nodes since early on (signature of a brutal sudden expansion) and its sublineages often show only so many downstream mutations (the large parent lineages must have reabsorbed the daughter ones once and again).

    "I have the study, so if you need a copy of it, let me know, and I'll send it to you".

    Sure, send me a copy, please. I am not really interested in Y-DNA molecular clock hunches but I imagine that the paper has more than that, right? At least it should produce some haplotype trees interesting to look at.

  3. To what email address can I send you the paper?

  4. It's in my Blogger profile (but remove anti-spam protection: "DELETETHIS"). Anyhow it is:

    lialdamiz [AT] gmail [DOT] com

  5. I'm not an expert in DNA evaluation, but it would seem that the migration of Q1a3 etc. Haplogroups may have taken place over a long period of time and it did not all happen at once. It is my opinion that possibly there were many migrations from Asia to North America and possibly back to Asia. Additionally, these people probably didn’t arrive in the same way, some may have walked other may have come down the coast by small boats etc., both of course following the food chain.
    When these groups arrived, some comingled, other didn't or were isolated, this could explain the difference in Q1a3 and Q1a3a groups. We have just skimmed the surface of DNA genealogical research. As the DNA pool expands a clearer picture of all the groups that came to the Americas by way of Asia will become more settled. We must think outside of the box on such issues and not be predisposed to any on scenario.

    1. Not sure why you posted that here but it's apparent to me that we should not estimate the arrival of humans to America based on molecular-clock-o-logy but on archaeological evidence and this clearly supports nowadays dates of c. 17-16,000 BP for North America and 15-14,000 BP for South America. It's very possible therefore that the direct ancestors of Native Americans were in Beringia (or nearby areas like Kamchatka or whatever) at the latest at the end of the Last Glacial Maximum, and maybe even earlier.

      This is IMO consistent with Q1a3 coalescing in Siberia after 40-30 Ka ago, roughly the dates of arrival of Homo sapiens (with West Eurasian "Aurignacoid" technology) to the area of Altai, replacing the pre-existent hominins with Mousterian technology (Neanderthals and their Denisovan "cousins").

  6. I find that Chandler's 'Father-Son' marker specific mutation rates may be more reasonable and closer to the results obtained to the averaged out 'evolutionary rate' than the the other types of rates available. I had made a blog post about my experiment with different mutation rates with YDNA STR haplotypes here: http://ethiohelix.blogspot.com/2012/06/finding-tmrca-of-ethiopian-ydna.html

    1. Sincerely Ethyopis, I can't accept that haplogroup A (old or revised) has ONLY 60 Ka but rather in the order of 150-200 Ka, what implies 3x, 2x at the very least, Chandler's father-son rates.

      Why must it be more (maybe much more) than the pedigree rate? Because of the "cannibal mum" effect, in this case the "cannibal dad": as mutations only happen every so many generations, it means that most relevant people (adult men in this case) in every generation has the non-mutant state, what implies that the mutant state will generally be "drifted out".

      There is a conservative tendency, so to say. I tried to figure (with the help of a math-loving friend) out the numbers for mtDNA and could not (simulations must be made, apparently, to estimate the actual chances). All we could do is to confirm that this tendency exists for almost all Ne values (not for Ne=2 but indeed for Ne=10 or higher). The case may be a bit more complicated for Y-DNA, depending of the number of markers in consideration but the overall effect should be the same.

      As for E1b1b1-M35, I also think it should be 2x or rather 3x the figures you get with Chandler's. E-V13 is known to have existed in rather evolved manner some 7000 years ago (L'Arbreda, Catalonia), implying an origin (at least) in the Greek/Albanian Neolithic or Mesolithic, some 10,000 years ago or more (if Mesolithic), again getting at least 2x figures, even with Chandler-derived oldest dates.

      Q.E.D. (as far as I can tell)

    2. Remember that the haplotypes I used for those calculations come from FTDNA, i.e. from privately sampled individuals that can afford a DNA test, therefore there is a selection bias towards those of European and/or New World descent, which means that the haplotypes are not really representative of the putative origin of the lineages (especially A and E-M35) that I attempted to get TMRCA estimates on, it is just a general guide to see what the different rates out there would yield in terms of TMRCA, and in that respect, Chandler's rates definitely yield the most reasonable dates. The inclusion of several diverse haplotypes of haplogroup A from Africa would thus definitely change the results, same in regards to E1b1b1-M35, the inclusion of haplotypes of E-V42, E-V6,E-M293,E-V92 lineages would most definitely (in my opinion) boost the TMRCA estimate of the lineage close to that of Cruciani (~22KYA).

      The 7,000 year old E-V13 lineage found in S.W Europe was associated with the Neolithic, I have never seen evidence of haplogroup E entering Europe as any type of pre-neolithic wave. As far as the E-V13 haplotypes I used in my experiment, all it really says is that they seem to be a more closely related bunch than the other haplotypes of M35 lineage, what reason for this closeness could be speculated upon, for instance, several waves of E-V13 may have entered Europe post-neolithic at different times, and the descendants of the 7,000 year old E-V13 could have simply gone extinct.

    3. I'm not sure if what you say changes much or not: people at two distant sub-branches of any haplogroup should be similarly distant, be them European or African or whatever (because of the phylogenetic constraints, I'm assuming that you're not comparing two random samples without first being sure that they belong to distinct and distant enough haplogroups, right?) I'm even surprised that there could be any native Europeans with haplogroup A at all in those databases - most likely they are Africans or African-Americans.

      Whatever the case, I was thinking in Franchthi Cave (Argolis, Greece) when thinking of Mesolithic continuity but now I read that the origin of the continuous settlement is c. 20,000, probably a tad too early for the arrival of E1b-pre-V13.

      Still the beginnings of the Greek Neolithic (co-existant pre-Sesklo and proto-Sesklo, leading to Sesklo-Balcanic-Danubian and Cardium-Impressed Pottery cultures respectively) date from c. 10,000 years ago (8000 BCE), which is still 2x your "Chandler" estimate.

      I think it's a good exercise, as you are into molecular-clock-o-logy but want to be scientific and not just scholastic, to find haplogroups that look like easy to calibrate (E-V13 can be one but there are others for sure) and use them as reality checks in order to approach better estimates methods for the "molecular clock", if such thing is possible.

    4. “I'm assuming that you're not comparing two random samples without first being sure that they belong to distinct and distant enough haplogroups, right? “

      Yes, that is correct, the STR haplotypes were grouped based on the UEPs defining the relevant lineages before I did the calculations. The reason why I think more diverse haplotypes would yield different results is simple, say you have a lineage x that has two sons, x1 and x2, if you want to find the TMRCA of x then you need to use STRs from a variety of samples that come from both the x1 and x2 lineages, while if you want to find the TMRCA of x1 then only repeats from the x1 lineage would suffice. When you include haplotypes from both the x1 and x2 lineages then the putative ancestral haplotype or modal will change, so will the squared differences from each samples repeat values to the modal values which are then accumulated to make a TMRCA estimate.

      “as you are into molecular-clock-o-logy “

      Anybody who studies this type of genetic analysis is into molecular clockology, including yourself, you just think that the clock should be ticking slower, while others think it should be ticking faster.
      We all know that all these YDNA haplotypes had a common origin in one man who had the relevant UEP, which means there was only one STR profile in the common ancestor, once he started having children, and his children had more children then the STR profiles start changing in the descendants, the issue is on how quickly the profiles were changing and not if they changed.....

      BTW, YDNA haplogroup A has been found outside of Africa including Europe, most especially A-M13. In the dataset I used for my experiment there were several self declared hap A haplotypes from England, Italy, Scotland, Ireland and even Finland.

    5. I do not just think that the effective "clock" may be ticking much more slowly but also that there are probably many very important irregularities in that alleged tic-toc, caused by issues like population size (large pops. accumulate almost no ticks, while very small ones (Ne<10) go at almost pedigree rate - but we can't know for sure which was which some 20 or 50 Ka ago) or mere randomness.

      In theory Y-DNA is exempt from such issues but only if you sequence the full chromosome or a fraction large enough to have at least one mutation per generation, i.e. when the risk that the studied haplotype being the same between parent and child has been neutralized (with current methods it's just too big, altering everything).

      So I am not really too interested in molecular-clock-o-logy for that reason: it almost invariably produces arbitrary unrealistic results.


Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).