August 24, 2011

R1b-M269 debate: new paper vindicates my stand

A must read:

George B.J. Busby et al., The peopling of Europe and the cautionary tale of Y chromosome lineage R-M269. Proceedings of the Royal Society B, 2011. Open access. [doi: 10.1098/rspb.2011.1044]

This paper is an almost total vindication of what I have been saying, specially in the last year since the infamous Balaresque paper was published and got all that undeserved media and blogs' clout.

What I said back then can be found at my old blog Leherensuge:


Actually Busby et al. make reference to all these three papers once and again, however they seem to side almost totally with Morelli (whose research I applauded as well) and disagree profoundly with Balaresque. No wonder.


Sample

They used the Myres sample, enlarged specially for better coverage of Western Europe (fig. S2). Sadly the demographic center of Paleolithic Europe is clearly undersampled (excepted Provence, well covered by Myres and dot in what are probably Toulouse and Santander respectively): not a single sample was taken in Perigord (Dordogne), Gascony or the Basque Country in what is probably a major shortcoming.


Molecular clock 'not credible'

They also make reference to what is usually known as the molecular clock, with a quite negative remark of the methods used at present:

... we conclude that at the present time it is not possible to make any credible estimate of divergence time based on the sets of Y-STRs used in recent studies. Furthermore, we show that it is the properties of Y-STRs, not the number used per se, that appear to control the accuracy of divergence time estimates, attributes which are rarely, if ever, considered in practise. 

In the discussion section they mention again this issue:

Dating of Y chromosome lineages is notoriously controversial [25,4144], the major issue being that the choice of STR mutation rate can lead to age estimates that differ by a factor of three (i.e. the evolutionary [25] versus observed (genealogical) mutation rates [33,45]). Interestingly, despite the fact that Myres et al. and Balaresque used different STR mutation rates and dating approaches, their TMRCA estimates overlap: 8590–11 950 years using a mutation rate of 6.9 × 10−4 per generation, and 4577–9063 years using an average mutation rate of 2.3 × 10−3, respectively. Separately, Morelli calculated the TMRCA based only on Sardinian and Anatolian chromosomes, and estimated the R-M269 lineage to have originated 25 000–80 700 years ago) [22], based on the same evolutionary mutation rate [25,41] as Myres et al.

Leaves any casual (and even knowledgeable) observer quite perplex, right? The conclusion of the authors is clear: the molecular clock can't be trusted.

Even Dienekes admits it, saying that this paper could well be titled An epitaph for Y-STR.


Diversity 

Busby and colleagues followed the same methods as Balaresque but instead of considering R1b1a2 as a single amorphous haplogroup, they consider the various clades downstream of it as distinct entities:

We next calculated STR diversity for each population for the whole R-M269 lineage, and for the R-S127 and R-M269(xS127) sub-haplogroups, and investigated the relationship between average STR variance and longitude and latitude in exactly the same fashion as Balaresque. (...) We normalized latitude and longitude, and performed a linear regression between these values and the median microsatellite variance for the three R-M269 sub-haplogroups. We found no correlation with latitude (data not shown) and, contrary to Balaresque, we did not find any significant correlation between longitude and variance for any haplogroup.

The results are apparent in fig. 2 (left frequency, right variance in relation to longitude):



If anything the result is the opposite, showing a mild tendency for greater variance towards the West.

They explain the differences with Balaresque as follows:

The Balaresque dataset presents genotype data only to the resolution of SNP R-M269. Our results show that the vast majority of R-M269 samples in Anatolia, approximately 90 per cent, belong to the R-M269(xS127) sub-haplogroup. Removing these Turkish populations from the Balaresque data and repeating the regression removes the significant correlation (R2 = 0.23, p = 0.09; details in the electronic supplementary material and figure S2). These populations are therefore intrinsic to the significant correlation.

This is something I already noticed back in the day: that Balaresque's bias blinded her to the subtleties of the downstream structure of the haplogroup, making a blank slate of all the clade.

Probably the apparent greater diversity observed in Turks and Armenians is caused by the addition of (1) great diversity of R1b1a2(xR1b1a2a1) plus (2) an also diverse (but clearly derived, even in Balaresque's own data) backflow of European R1b1a2a1.

This backflow must be pre-Neolithic as far as I can discern, because since Neolithic the flow of people has been almost exclusively from East to West.

Another serious criticism they make about Balaresque is the use of an Y-search dataset representing Ireland (surprisingly amateurish!) When compared with actual samples (Y-search relies on the good will of online reporters) the low diversity that Balaresque found for Ireland vanished.


A 'West Asian' sublineage of R1b1a2?

This paper falls short of finding the defining SNP for such speculated sub-haplogroup but it does confirm the finding of Morelli 2010 of the Eastern or Anatolian bloc making up an STR-defined distinct clade of its own. My annotations on Morelli's work:




What is most intriguing in my opinion is that, if this second haplogroup is confirmed, then R1b1a2 may have ultimately expanded from the Balcans, where most carriers of the core node seem to live today.

This could be consistent with the finding by Busby now of greatest frequency of R1b1a2* in Bulgaria and Romania (Morelli's 'Balcans' are actually Serbia, where the lineage is rare).


Distribution of some sublineages

This paper also expands a bit our knowledge of the distribution of the most common (and best studied) sublineages under R1b1a2a1 (fig. 3):

(a) R1b1a1a1a (S21), (b) R1b1a1a1b4 (S145), (c) R1b1a1a1b3 (S28)

It must be said here that the major known sublineages of R1b1a1a1b (P312/S116) are as follow (update: corrected Mar 15 2012):
  • R1b1a2a1a1b2 (Z196) ··> most basally diverse among Basques and Gascons but also common among Catalans and other East Iberians and found as well among "French", Bavarians and (it seems now) some Scandinavians (see comments)
  • R1b1a2a1a1b3 (S28/U152) ··> see map (c) above
  • R1b1a2a1a1b4 (L21/M529/S145, L459) ··> see map (b) above

In addition most R1b1a1a1b* (not yet classified as any sublineage) exists in SW Europe: in France and Iberia, where often makes up the majority of the Y-DNA pool.  I have therefore argued that this lineage probably coalesced within the Franco-Cantabrian region, around which all sublineages fan out. However it is admittedly hard to explain the penetration into North Italy - but I cannot think of any better explanation because neither Italy nor Central Europe seem to host enough basal diversity to be considered potential homelands for R1b1a2a1a1b. 

I have also argued that the "brother" haplogroup R1b1a2a1a (M405/S21/U106), shown in map (a) above, may be related to the somewhat distinct Hamburgian-Ahresnburgian-Maglemösean techno-cultural complex of Northern Europe. The people of this cultural group surely saw their expansion favored by the end of the Ice Age.

19 comments:

  1. Interesting paper! So, molecular clocks aren't a reliable source, what a big suprise :P

    "but I cannot think of any better explanation because neither Italy nor Central Europe seem to host enough basal diversity to be considered potential homelands for R1b1a2a1a1b. "

    Where does this statement come from? Are you talking about Y-chromosome or DNA as a whole? Or only archaeological data?

    ReplyDelete
  2. The statement is only about Y-DNA data. I discussed this largely when dealing with Myres 2010.

    Maybe most illustrative is this map, where we can see that:

    1. The supposed 'ancestral clade' (probably hiding an Eastern distinct sub-haplogroup) is concentrated in West Asia, Balcans, Central Europe...

    2. There is a small 'transitional' or 'Western root' clade scattered between Hungary and Iberia (but absent in the East, excepted Crete).

    3a. There is a sizable 'North clade' centered in the Netherlands (at least frequency-wise)

    3b. There is a large 'South clade' centered in SW Europe, not just frequency-wise (that'd be Basques probably) but also diversity-wise.

    There is a lot to clarify here but this 'South clade' now renamed as R1b1a1a1b (S116) quite clearly has its greatest basal diversity by known sublineages in South France. Meanwhile the commonality of the under-researched "asterisk" R1b1a1a1b* paragroup is also concentrated in SW Europe (France and Iberia), so it's virtually impossible for the logical origin of this lineage to be displaced towards the North or the East: the Franco-Cantabrian region seems the most likely origin.

    ReplyDelete
  3. "This paper is an almost total vindication of what I have been saying"

    It may come as a surprise but I have nothing but respect for your ideas concerning European haplogroup origin and dispersal.

    You wrote at Dienekes:

    "We would have MNOPS expanding maybe c. 70 Ka ago in SE Asia, P soon after in Bengal, R then in NW India and R1b would be in SW Asia c. 50 Ka, while some of its Q 'cousins' would be reaching to Altai simultaneously".

    I think MNOPS's expansion in SE ASia is more recen tthan 70k., perhaps as recently as 40k. I don't think K was the first Y-hap across wallace's Line. That distinction goes to Y-hap C. M's expansion into Melanesia may be as recent as 30k. And Q may not have reached Altai until about then either.

    "an also diverse (but clearly derived, even in Balaresque's own data) backflow of European R1b1a2a1"

    That's interesting. It makes sense that haplogroups should move backwards and forwards though. It is obviously a mistake to assume unidirectionality.

    ReplyDelete
  4. "I think MNOPS's expansion in SE ASia is more recen tthan 70k., perhaps as recently as 40k. I don't think K was the first Y-hap across wallace's Line".

    What I care is that for Q being in Altai and R1b being in Europe maybe as early as 48 Ka ago, P must have diversified (in or near Bengal) in the 60-50 Ka zone and MNOPS must have diversified in SE Asia earlier than that.

    Of course it could have been fast or MNOPS could have languished in "larval" state for long in SEA before crossing.

    I associate MNOPS "backflow" to South Asia with the flow of mtDNA N in the same direction and the coalescence of mtDNA R. This is, according to my own reconstruction, in CR mutational step 4 (7 mutations downstream of L3), which is prior to the colonization of West Eurasia, i.e. older than 50 Ka.

    Maybe 70 Ka is too old? IDK but for sure MNOPS must be older than 50 Ka., at least within my understanding of the Great Eurasian expansion.

    ReplyDelete
  5. "It is obviously a mistake to assume unidirectionality".

    Very much agreed. However the backflow can be used to establish reference dates. If we have Central European or SW European derived lineages in Turkey (enough to be detected in a random sample), we should look for when they arrived.

    In at least one case the Turkish lineage is Iberian-derived, and it is one of the most common ones among the European-derived lineages in Turkey.

    Of course one could argue it is Sephardic but why would Sephardites become Muslim Turks? Beats me. Also it should be relatively easy to check if the haplotype exists among real Sephardites (among whom R1b is low).

    There are no other instances other than maybe Roman slave trade that can justify that W->E migration in (post-)Neolithic times.

    However in the late Upper Paleolithic (or maybe Epipaleolithic) we have a flow of Western-style art to Turkey (and maybe from there to Egypt?), which is at least one possible instance of such backmigration. This again allows us to find plausible time-frame references.

    ReplyDelete
  6. "What I care is that for Q being in Altai and R1b being in Europe maybe as early as 48 Ka ago"

    Can we be sure they are that old?

    "I associate MNOPS 'backflow' to South Asia with the flow of mtDNA N in the same direction and the coalescence of mtDNA R".

    I certainly agree with the association with R but, as you know, I am yet to be convinced that any other N haplogroup is involved with the backflow.

    "Maybe 70 Ka is too old? IDK but for sure MNOPS must be older than 50 Ka., at least within my understanding of the Great Eurasian expansion".

    I can easily accept a date of 50k, or perhaps a little older.

    "However in the late Upper Paleolithic (or maybe Epipaleolithic) we have a flow of Western-style art to Turkey (and maybe from there to Egypt?), which is at least one possible instance of such backmigration".

    I suppose iot is a while since you've looked at Cavalli-Sforza's map of the 5th principal component for Europe. He claims it shows the Basque presence but to me it looks very much like a movement from Western Europe into the Balkans and Anatolia:

    http://www.google.co.nz/imgres?q=5th+principal+component+europe+cavalli-sforza&um=1&hl=en&sa=N&biw=899&bih=373&tbm=isch&tbnid=jsWn_5ADZnPeSM:&imgrefurl=http://rugiland.narod2.ru/evropeiskii_genofond/european_genetic_variation/&docid=91XiHEomBnlYFM&w=450&h=284&ei=YQZaTp2YAbH3mAWVn4SODA&zoom=1&iact=rc&dur=156&page=3&tbnh=110&tbnw=168&start=18&ndsp=10&ved=1t:429,r:9,s:18&tx=109&ty=78

    ReplyDelete
  7. "Can we be sure they are that old?"

    We can't ultimately be sure of anything at all. According to a friend of mine the world began in 1968 (i.e. when he and I were born) and all what is claimed to have happened before are nothing but implanted memories and stories written in books...

    We can well choose to believe that the World began yesterday and all that we think we remember is but an illusion and the coherence we find in reality just patterns we are programmed to find even if they do not exist at all.

    But discarding this theory, I'm quite certain that it is the case: neither R1b nor Q could have caused the founder effects needed to become hegemonic in their respective regions but at the very moment of human colonization - or in any case in the Paleolithic. Q also should have been in Siberia in the LGM in order to take part in the colonization of America, which happened as soon as conditions improved, c. 17 Ka ago or more.

    ...

    ReplyDelete
  8. "I can easily accept a date of 50k, or perhaps a little older"... [for MNOPS].

    You don't think big enough. If the OoA happened c. 120 Ka or 90 Ka at the most recent time possible, the date of 50 Ka is just too late.

    Let's see (using the latest possible dates of 90Ka and son on, being "conservative", hyper-shy - based on this):

    1. mutational step 1 (or 4 counting from L3): M explosion in South Asia. Must have been between 80 Ka (earliest MSA) to 74 Ka Toba aftermath.

    2. mut. step 2: many M subclades expand in Asia and Papua.

    3. mut. step 3: more M sublineage expansion, first Australians incl., and N node.

    4. mut. step 4: more M and some N sublineage expansion in Asia and Sahul and R node.

    5. mut. step 5 expansion of R0 (in West Asia necessarily), which we can calibrate to c. 50-48 Ka.

    Let's be hyper-con and use the 74-48 Ka timeline:

    74 - 48 = 26 Ka

    5 -1 = 4 mut. steps

    26 Ka / 4 mut. steps = 6.5 Ka per mut.

    So now we can estimate when each of the steps took place and:

    ·step 2 would be dated to c. 67.5 Ka ago, what is totally consistent with the Luzon metatarsal, dated c. 69 Ka ago.

    ·step 3 (N node): c. 61 Ka.

    ·step 4 (R node): c. 54.5 Ka.

    I estimate the coalescence time of Y-DNA MNOPS to be coincident wit that of mtDNA N. So at least 60 Ka. (as all these dates are the most conservative possible ones, would we have used the 110 Ka. reference for step 1, the results would have been all much older, proportionally so, and there are other less important possible fine tunings, all towards older ages).

    But you see the difference between you and me: I argue with data and good logic, you just say "I think that..."

    I don't care! I don't care what you "think". I am interested in good data and good reasonings founded on that good data, not just the opinion of some random guy from the antipodes. I believe that you can understand that, don't you?

    ReplyDelete
  9. "I suppose iot is a while since you've looked at Cavalli-Sforza's map of the 5th principal component for Europe. He claims it shows the Basque presence but to me it looks very much like a movement from Western Europe into the Balkans and Anatolia"...

    Wrong. You can't read components in negative.

    ReplyDelete
  10. Maju:

    To what extent in the convention wisdom of a Bronze Age expansion of R-M269 out of Anatolia based on the Balaresque paper? I remember reading an abstract of it and being stunned by the idea that the movement to Western Europe of R-M269 could be that recent. It seemed like a novel idea at the time, but I am not that familiar with all the literature.

    Conventional wisdom changes slowly, so nothing may change in the immediate future. But is there any other basis to place this expansion so recently?

    ReplyDelete
  11. Bronze Age?!

    There's no convention, no wisdom and no fat chance of R1b expanding from Anatolia (or anywhere else) in the Bronze Age, that's the weirdest idea I have ever heard. Though admittedly I wouldn't be surprised if the likes of Dienekes and such could spouse it - but I doubt they have other reasons than wishful thinking.

    I may have stumbled now or then with the occasional "Indoeuropeanist" proposing an R1b expansion related to that of R1a in the Chalcolithic, Bronze and Iron ages but from Eastern Europe within the Kurgan Model. The Anatolian hypothesis for IE expansion relates IE languages and culture to Neolithic flows.

    In any case the Balaresque paper all it says is that R1b is amorphous and if it has any structure, she could not care less about it, and that she gets TMRCAs of Neolithic approximation. She could not care less about actual Neolithic archaeology, etc. It's an ideological paper with just some raw data of any value.

    "But is there any other basis to place this expansion so recently?"

    I advocate for the Paleolithic model because there is not any other basis for recent massive demic expansions (and related obligated genocides) as far as I can tell. At least not in the southern half of the continent, where all changes seem to take place in a general context of relative continuity - even Neolithic to great extent.

    Archaeology doesn't seem to support the big, sudden kind of demographic changes required to explain the current distribution, diversity and structure of R1b in South Europe (at least) with a Holocene timeline as Balaresque and others have argued for. Ancient DNA is not supportive either (it could be for Central Europe however with modern demographic structure maybe only achieved in the late Bronze Age, with Urnfield culture).

    ReplyDelete
  12. I didn't make that up. Check out Eupedia.com if you want to see what I consider "conventional wisdom." (I mean a web site aimed at people like me - someone with a degree in English literature.) Maybe popular culture is a better description. Those sites describe a R1b M 269 expansion out of Anatolia around 4000 BCE (just before the Bronze Age).

    Anyway, my point is that several such web sites talk in these terms and I wondered if that was a direct result of the Balaresque paper or not. And conventional wisdom/popular culture changes slowly. So it may be an error that perpetuates itself.

    My personal view is that such opinions place too great a weight on inherently flawed statistical analysis and not enough weight on archeology data. Statistical models are not complex enough to take into account all historical phenomona, as Dienekes himself has noted in other moments. Consequently they underestimate dates. See his July 6, 2011 comments on the Simon Gravel, et al paper. And that paper had excellent data and did good analysis. It still didn't correspond to reality as understood by archeology. Other statistical analysis is not as good to begin with.

    But yes, there is a bunch of stuff on the internet that dates R1b M-269 expansion into Europe that late.

    ReplyDelete
  13. Ah, Eupedia: that explains your confusion. While the raw data shown there is correct as far as I can tell, the theories presented have no basis I know of (maybe molecular clock speculations - but these have been declared 'heresy' this month, with even Dienekes' blessings and all).

    What is clear is that this paper of Busby et al. doesn't just bury STR-based molecular clock speculations but, on the issue of R1b1a2, it supports at least to some extent Morelli's finding of most of the lineage found in West Asia (Turks and Armenians specially) being a distinct sub-haplogroup.

    The "high diversity" of Anatolian R1b is actually an artifact of less important back-migrating Western lineages (and this was already apparent in Balaresque's own data, but she chose not to see it). We learn a lesson here: high STR/haplotype diversity does not automatically mean origin. We must understand first the underlying phylogeny and not just treat large haplogroups like R1b1a2 as amorphous categories. We must understand the haplogroup diversity and not the mere, sometimes confusing, haplotype diversity.

    In Busby 2011 we appreciate, as in Morelli 2010, that the likely origin of R1b1a2-M269 is in the Balcans, even if there the lineage is minor in frequency.

    We also appreciate, though more in the line of Myres 2010 in this case, how the sub-haplogroup structure does not match any known Neolithic flow and the highest diversity of the largest sublineage is in SW Europe, possibly in the Franco-Cantabrian region. I discussed this on the entry on Myres 2010.

    ReplyDelete
  14. "We learn a lesson here: high STR/haplotype diversity does not automatically mean origin".

    I have been trying to tell you so for some time now.

    "We must understand the haplogroup diversity and not the mere, sometimes confusing, haplotype diversity".

    But often even the haplogroup diversity is a product of other than mere 'origin'.

    ReplyDelete
  15. Why?

    That you think so doesn't make it automatically true.

    In Y-DNA haplotype diversity is misleading because STRs may have repeated themselves in time (unlike SNPs, whose chance of random repetition is almost infinitely low). That's why when you test for Y-DNA and haplotype suggests some adscription, an SNP test [yes, I do NOT pronounce "snip" but es-en-pee] is the next step and the only test that can confirm or deny the haplotype assumption.

    The only reason why STRs are used is because they are cheap but as my grandma used to say: "cheap often ends up being expensive". Or a total waste.

    But do not dare compare the cheap imprecise version with the expensive precise one.

    ReplyDelete
  16. Mahu wrote, "R1b1a2a1a1b2 (Z196) ··> almost exclusive of Basques and Gascons."

    I know you said "almost" but I would say it is a bit further away from "exclusive" than almost.

    L165, a subclade of Z196, appears (to Ethnoancestry anyway) a Norse marker. SRY2627, another Z196 subclade, is found in Germany and Scandinavia. SYR2627 also appears in decent frequencies all across Iberia, not just in the Basque region. The Z196* North-South Cluster is named because it appears to the north and well as south. Of course, you find Z196 in the British Isles as well.

    ReplyDelete
  17. The Busby paper said, "We next calculated STR diversity for each population for the whole R-M269 lineage, and for the R-S127 and R-M269(xS127) sub-haplogroups, and investigated the relationship between average STR variance and longitude and latitude in exactly the same fashion as Balaresque. (...) We normalized latitude and longitude, and performed a linear regression between these values and the median microsatellite variance for the three R-M269 sub-haplogroups. We found no correlation with latitude (data not shown) and, contrary to Balaresque, we did not find any significant correlation between longitude and variance for any haplogroup."

    I have no problem with their finding, but do you notice their poor logic? They complain about Y STR diversity as being a poor indicator and then they use it as the basic data analysis to make their primary argument in the paper - the argument that Barlaresque wrong. You can't have it both ways. It's no good but you use it for your primary proof.

    ReplyDelete
  18. "I know you said "almost" but I would say it is a bit further away from "exclusive" than almost."

    You are right: I mistook the whole for the part, I meant its subclade R1b1a2a1a1b2a-M153. I did not realize till yesterday that this one and R1b1a2a1a1b2b (L176.2/S179.2) had been found to be relatives.

    "L165, a subclade of Z196, appears (to Ethnoancestry anyway) a Norse marker".

    I did not know that: it's all new to me (in fact it's all probably quite new and product of unequal research by DNA companies). No paper I have dealt with paid any attention to this lineage or its parts, except those focused in the Basque Country or Iberia (i.e. Adams 2008) Only very old papers mentioned it at c. 3% frequencies in Bavaria, but it looked like an odd exception anyhow.

    "I have no problem with their finding, but do you notice their poor logic? They complain about Y STR diversity as being a poor indicator and then they use it as the basic data analysis to make their primary argument in the paper - the argument that Barlaresque wrong. You can't have it both ways. It's no good but you use it for your primary proof".

    I have replied in the Basque & Gascon Y-DNA entry. The authors do seem, as you suggest to hope that the STR markers can still be used (after due refinement of the method) to estimate age but they argue that the recentist estimates do not make sense, so, with due caution, they seem to back Morelli's estimate of 25-80 Ka as plausible and totally reject the estimates by Balaresque and Myres which converge near the Neolithic era.

    I have also corrected the link, which had gone broken.

    ReplyDelete
  19. I corrected the distribution description. Thanks for the correction, Mike.

    R1b1a2a1a1b2 (R-Z196) is still anyhow most basally diverse among Basques and Gascons, which are the only ones to have both basal subclades: R1b1a2a1a1b2a (M153) and R1b1a2a1a1b2b (L176.2/S179.2), at least at notable frequencies. I'd think of an Aquitanian origin for this particular clade, always depending on what is to be found in the rest of the French Republic, specially Occitania and Guyenne.

    ReplyDelete

Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).