May 22, 2014

Autosomal modeling getting closer to archaeological facts by doubling effective mutation rate

Interesting try at autosomal DNA nuclear clock-o-logy. Not quite it yet but interesting nevertheless because it approximates much better what seems to be the reality, based on archaeological data, than previous attempts.

Stephan Schiffels & Richard Durbin, Inferring human population size and separation history from multiple genome sequences. Pre-published at bioRxiv, 2014. Freely accessibleLINK [doi:]

The availability of complete human genome sequences from populations across the world has given rise to new population genetic inference methods that explicitly model their ancestral relationship under recombination and mutation. So far, application of these methods to evolutionary history more recent than 20-30 thousand years ago and to population separations has been limited. Here we present a new method that overcomes these shortcomings. The Multiple Sequentially Markovian Coalescent (MSMC) analyses the observed pattern of mutations in multiple individuals, focusing on the first coalescence between any two individuals. Results from applying MSMC to genome sequences from nine populations across the world suggest that the genetic separation of non-African ancestors from African Yoruban ancestors started long before 50,000 years ago, and give information about human population history as recently as 2,000 years ago, including the bottleneck in the peopling of the Americas, and separations within Africa, East Asia and Europe.

Based on Figure 4c:

Figure 4: Genetic Separation between population pairs
(...) (c) Comparison of the African/Non-African split with simulations of clean splits. We simulated three scenarios, at split times 50kya, 100kya and 150kya. The comparison demonstrates that the history of relative cross coalescence rate between African and Non-African ancestors is incompatible with a clean split model, and suggests it progressively decreased from beyond 150kya to approximately 50kya. (...)

This comparison reveals that no clean split can explain the inferred progressive decline of relative cross coalescence rate. In particular, the early beginning of the drop would be consistent with an initial formation of distinct populations prior to 150kya, while the late end of the decline would be consistent with a final split around 50kya. This suggests a long period of partial divergence with ongoing genetic exchange between Yoruban and Non-African ancestors that began beyond 150kya, with population structure within Africa, and lasted for over 100,000 years, with a median point around 60-80kya at which time there was still substantial genetic exchange, with half the coalescences between populations and half within (see Discussion). We also observe that the rate of genetic divergence is not uniform but can be roughly divided into two phases. First, up until about 100kya, the two populations separated more slowly, while after 100kya genetic exchange dropped faster. We note that the fact that the relative cross coalescence rate has not reached one even around 200kya (Figure 4c) may possibly be due to later admixture from archaic populations such as Neanderthals into the ancestors of CEU after their split from YRI [29].

Follows their population size estimates:

Figure 3: Population Size Inference from whole genome sequences
(a) Population size estimates from four haplotypes (two phased individuals) from each of 9 populations. The dashed line was generated from a reduced data set of only the Native American components of the MXL genomes. Estimates from two haplotypes for CEU and YRI are shown for comparison as dotted lines.

A serious problem I have with this graph is that the gradual bottleneck affecting Eurasian-plus populations does not begin to recover within this simulation before c. 40 Ka. That doesn't seem good enough because by that time the Asian population must have expanded at least moderately, as they had colonized all the continent and even Australasia by that date. 

This means that there is a lot of refining still to be done to the methodology, because there should be signal of expansion in Asia much earlier than 40 Ka and not more and more apparent decrease of the population size, what is totally inconsistent with the ongoing colonization of a whole continent. 

I could try to double again the rates to get a more consistent Asian expansion age of c. 80 Ka but that should push the Eurasian-plus bottleneck to a much earlier date, 600 Ka ago, what is simply nonsensical. So the only possible conclusion is that the algorithm is far from realistic and still needs a lot of work.
Non-Bantu East Africans belong to the proto-Eurasian cluster:
Our results suggest that Maasai ancestors were well mixing with Non-African ancestors until about 80kya, much later than the YRI [Yoruba]/Non-African separation. This is consistent with a model where Maasai ancestors and Non-African ancestors formed sister groups, which together separated from West African ancestors and stayed well mixing until much closer to the actual out-of-Africa migration.

South Asians exchanged a lot with West Eurasians before Neolithic:
.... the GIH [Gujarati emigrants to Texas] ancestors remained in close contact with CEU [NW European emigrants to Utah] ancestors until about 10kya, but received some historic admixture component from East Asian populations, part of which is old enough to have occurred before the split of MXL.
Figure 4: Genetic Separation between population pairs
(...) (d) Schematic representation of population separations. Timings of splits, population separations, gene flow and bottlenecks are schematically shown along a logarithmic axis of time. (...)

Overall their population tree makes good sense, except for the apparently too recent dates for nearly all the events and very especially for the intra-Eurasian split. There are no doubt confounding factors acting here. Probably if MXL (Native American component) were excluded, the West-East split could be moved backwards in time.

They heavily rely on the MXL Native American element to calibrate the clock, what makes sense on the surface. But  the fact that Native American origins are themselves a mix of West/South Eurasian and East Asian origins may be tricking them. In the tree, MXL derives from East Asians and it actually should be, we know for a fact, intermediate between East Asia and West/South Eurasia, something that is not reflected at all and that is almost certainly altering the picture.

But, as said above, there are more corners, some quite prominent, to be polished in all the modeling process until a future version of it can be acknowledged as a reliable "clock" (emphasis on reliable, because some people put way too much faith on these rough approximations, what is clearly an error).

On mutation rates:
Our results are scaled to real times using a mutation rate of 1.25×10-8 per nucleotide per generation, as proposed recently [16] and supported by several direct mutation studies [14-16]. Using a value of 2.5×10-8 as was common previously [44, 45] would halve the times. This would bring the midpoint of the out-of-Africa separation to an uncomfortably recent 30-40kya, but more concerningly it would bring the separation of Native American ancestors (MXL) from East-Asian populations to 5-10kya, inconsistent with the paleontological record [25, 26].

In short: using the usual scholastic mutation rates would have been nonsensical. Doubling them was common sense needed to achieve minimal coherence with observed reality (how many times have I said that?) It is obviously not enough but it was something needed in any case.


  1. I glanced through this paper today. I thought the analytical approach interesting. However, again, and I am getting bored of saying it, the populations chosen in Africa are too few.

    1. You mean surely that the good old hunter-gatherer references, representing some of the most ancient branches of Humankind, are missing. Good point.

  2. Yep.

    Also, I'd like to see this paper reconciled with the recent U6 gene flow paper:

    And this as well:

    1. I blogged the U6 paper at and also commented at Dienekes on it. Their incorporation of archaeology evidence from particular well dated and defined archaeological cultures and paleoclimate date into their genetic analysis, and willingness to acknowledge that those dates may be more solid that mtDNA dating in close cases, is refreshing. For example, their association of U6 as a whole with the intrusive Levantine Aurignacian, and of U6a with the Iberomaurusian culture in the Maghreb, are both well reasoned.

      Their willingness to acknowledge that the history of U6 could have been a complex, multi-waved process, for example, with at least one or two UP waves followed by several successive waves in the Neolithic and later, is also appreciated. This data set is also well suited to mapping out those kinds of complex histories.

      Their argument for a non-Levantine origin of mtDNA hg U as a whole is also interesting, if a bit over specific to Central Asia given what the data requires. Their narrative of expansion within Africa is also a sensible read of the evidence.

    2. In general, as I will surely mention tomorrow, I find their chronology for U6 as very realistic (within reasonable CIs). As for the arrival of U6 to NW Africa from West Asia, I still have doubts because there's nothing known between Aterian and Oranian (Iberomaurusian). An alternative possibility could be that pre-U6 actually entered NW Africa from Europe expanding then from a North Moroccan center of radiation in the Oranian. The Aurignacoid wave apparently only reached as far West as Cyrenaica (Dabban) and that is a serious problem to claim a West Asian origin for even pre-U6. Food for thought.

      But in most aspects is a top quality paper that I really want to write about. Tomorrow almost certainly.

      BTW, did you expand from the first article at your blog Andrew?, because I remember it much shorter (although memory can play strange tricks sometimes).

  3. The first one is in my list since monday or so. It's a high quality review of U6. Nothing really too new but refreshing to see all the previous research more or less confirmed, with quite greater detail wealth.

    The second one, I rather dislike: the Fst-based conclusion seem a bit amateurish, especially as Denisovan admixture and drift by isolation are not considered for Australasian aborigines, which are the keystone of the argument. Also their chronological estimates are absurdly too recent and the alleged Central Asian route just clashes with archaeological reality. So I'm ignoring it because it lacks merit IMO.

  4. The U6 paper is a must read.

    Regarding the second paper, I think it is a very good paper, primarily because it is one of the first papers to do a very good job of combining geometric morphometrics (a very powerful technique) with genetics. Their results are supported by several other recent papers.

    1. Its results make no sense whatsoever, really. And I have no idea what other studies might support it, much less why.

      The "morphometrics" thing is like adding gibberish to babbling, sorry. Anthropometry is pretty much obsolete and rather meaningless.

  5. They calibrate from 15kya for a distinct Berginian population bottleneck. Given climate and archaeology, a date closer 20kya to 22kya or so is closer to the mark for that event. A 33%-50% longer set of dates would improve the archaeological fits quite a bit.


Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).