June 21, 2015

Some improved knowledge of major R1b sublineage S116

While it is far from being the last word on the matter, the new study by researchers of the University of the Basque Country adds some important information to our knowledge of the main European R1b subhaplogroup S116, which dominates much of the continent with a south-western centrality, spanning from Ireland to Italy and from Iberia to Germany.

Laura Valverde, María José Illescas et al., New clues to the evolutionary history of the main European paternal lineage M269: dissection of the Y-SNP S116 in Atlantic Europe and Iberia. European Journal of Human Genetics, 2015. Pay per view (as usual supp. materials are freely accessible) → LINK [doi: 10.1038/ejhg.2015.114]

Preliminary status of the research

The improvement of the knowledge of this major European lineage had been stuck, as far as I know, since the various studies published in 2010, notably Myres et al. (discussed HERE). At risk of repeating myself, I will again display here some of the maps derived from the data of that key paper, as they are very useful references for discussing the new one:

Frequency of R1b subclades relative to overall R1b (per Myres 2010)
note: M529 is wrongly labeled M259
Composite image showing the overall frequency of R1b-S116 (red) and R1b-U106 (blue)
according to Myres et al. 2010

The new data

The new study is not as comprehensive in their sampling as that of Myres, so it heavily relies in previously published data, which is enough for the already studied subclades. The resulting maps are however somewhat different from Myres, because a lot newer Basque and Iberian data is present here (Myres did not sample Basques, whose frequencies and diversity for S116 are outstanding):

Fig. S1

They do however consider a third major sublineage of S116, defined by the mutation DF27, which is strongest among Basques and other SW Europeans. They also considered a sublineage described by the SNP L238 but only could find a single individual carrying it (a Breton from Brest), so it should be considered as part of the wider S116* paragroup and not relevant on its own. 

These are the results for DF27 and S116*, i.e. S116(xM529,U152,DF27):

An important detail is that, after excluding the three major subhaplogroups, the remaining S116* seems concentrated in Ireland and the Basque Country. However we should await for new information that could come from France, Southern Germany or even parts of England in the future (these areas showed some notable S116* in Myres 2010 but DF27 was not excluded then).

In this sense it must be mentioned that a sublineage of DF27 (SRY2627) has been known since more a decade ago, before even the modern nomenclature arose in 2001 (Rosser 2000 called it Hg22), and was indeed spotted not just among Basques but also among some Bavarians. So I would not dare to exclude at least some presence of DF27 further northeast than what this study shows. However it is indeed clear that its primary distribution is in Iberia and particularly among Basques. 

It is also important to underline that the maps may be a bit misleading because of the existence of two different Basque samples: a rural one (mostly with Basque surnames) with overall R1b and also S116* frequencies similar to the Irish and a urban one (of more mixed ancestry) with somewhat lower frequencies.

From the supplementary material I gather that Basques have the following frequencies (green: rural Basques, blue: urban Basques):
  • S116*: 16%, 8%
  • DF27: 71%, 51%
  • M529: 2%, 3%
  • U152: 3%, 1%
Otherwise the frequency of S116* is most notable among the Irish (18%) and Central-East Iberians (8-12%) but lower in Brittany (6%), Cantabria (6%) and Portugal (4%), being absent in Galicia and Asturias.

The frequency of DF27 is highest among Basques (63% on average, 71% for the rural sample) and then similarly high across Iberia (40-48%). It reaches 17% among Bretons and just 1% among the Irish. It must be noted that when you apportion DF27/S116, the result is similar through all Iberia (72-80%), Basques included (hat tip to Jean).

The frequency of M529 is very high among Irish (54%) and Bretons (52%) but under 5% everywhere else, except the following: 6% in both Asturias and in Cantabria, 7% among Galicians. The lineage is present in all sampled populations except Alicante and Andalusia. Therefore, inside Iberia, it shows some NW-SE clinality.

The frequency of U152 is under 10% across the board but also found in all sampled populations. The lowest ones are urban Basques (1%) and the highest ones Galicians and Asturians (9% and 8% respectively).

Note: figures above corrected (Jun 22) because there was a confusion between Cantabrians and Galicians in the first version of this entry. Thanks to Cousso for noticing.

Some conclusions

In the original entry I wrote here that I had found very striking a very sharp contrast between Basques, on one side, and Cantabrians and Asturians on the other. This was wrong because I committed an error in parsing to notes and confused therefore Cantabrians with Galicians. Hence the "sharp contrast" at the Western edge of the Basque Country is not so sharp after all, because Cantabrians act as buffer. There is still a curious contrast between Basques and the old Gallaecia province (later also Suabian Kingdom and Kingdom of Asturias-León), that is: Galicians+Asturians. These show the highest peninsular frequencies of M529 and U152, while Basques only have low frequencies of them; even more significantly maybe Basques have one of the highest S116* frequency in Iberia (just below the Irish in the overall sample), while Galicians and Asturians have none of it. In any case all this is reminiscent of the overall genetic contrast between West Iberia and the rest of the peninsula mentioned in other occasions, contrast that affects many haplogroups but not all.
[Paragraph edited: Jun 22].

The other conclusion is that, while we must await for further data, particularly from the French state and also from Southern Germany, the combination of this new data with that of Myres 2010 only ratifies me in my previous conclusions, which are:
  • R1b overall originated in West Asia, expanding in several directions from that region.
  • R1b-M269 (the main West Eurasian subclade) expanded from either the Balcans or Highland West Asia (Iran?)
  • The subsequent expansion in Europe (L23, M412 and L11 stages) is not too clear but could well have followed a double Central European and Mediterranean routes. More research is needed for these transitional stages between the Balcanic/West Asian phase and the Western European ones. 
  • Once in Western Europe, two L11 sublineages experienced parallel expansions:
    • U106 probably expanded from the Netherlands or Frisia (or maybe Doggerland in a Paleolithic scenario). Detailed research awaits however.
    • S116 surely expanded from somewhere in what is now France (possibly towards the Atlantic, judging on where S116* is most common), with three main subclades, each one following its own pattern of expansion:
      • M529 towards the Northwest (Brittany, Britain, Ireland...)
      • U152 towards the East (most notable in Switzerland and Italy, but also important in France, Germany and Britain, with offshoots of plausible Celtic transport in the Balcans and even Anatolia).
      • DF27 mostly to the South, peaking among Basques but also important in much of Iberia. It remains to be discerned how important it is in other European regions.
I put these notions on a map. It must be considered a rough sketch, a working hypothesis, because there is not enough data to be reasonably certain about all the details:

I would not dare to give tempos here. The sketched pattern of expansion can be equally consistent with a Neolithic or a Paleolithic modeling. The important pivotal role of France and the Netherlands could weight in favor of a Paleolithic model but it is true that aDNA and certain prehistoric reconstructions could allow for the French role (at least) to fit within an Atlantic Neolithic (Megalithism + Bell Beaker) theory for the expansion of S116 and I see no reason why the Netherlands could not have also played a similar role in NW Europe.

Thanks to Jean and Mike for the heads up.

Update (Jul 4): two "forgotten" papers of relevance:

1. George B.J. Busby et al., The peopling of Europe and the cautionary tale of Y chromosome lineage R-M269. Royal Society Proceedings B, 2011 → LINK.

Must read: demolishing (and well deserved) criticism of Balarasque 2010 and to some extent also of Myres 2010. They totally dismiss STR-based age estimates as wrong, misleading.

2. Rosa Fregel et al., Demographic history of Canary Islands male gene-pool: replacement of native lineages by European. BMC Evolutionary Biology, 2009 → LINK.

The anciente Guanche mummies' Y-DNA pool includes 10% R1b-M269. Considering that the islands were colonized c. 1000 BCE, I can only imagine that the Steppe Horde will find some way to blame the forgotten squire of Herakles for that. Or something...

Thanks to Georg for mentioning: that's the kind of feedback I love.

June 16, 2015

Alentoft 2015: more ancient DNA from beyond the Rhine

I've got some "friendly spam" insisting that I write something on this paper:

Morten E. Alentoft et al., Population genomics of Bronze Age Eurasia. Nature 2015. Pay per view (supp. materials are freely accessible though) → LINK [doi:10.1038/nature14507]

On the positive side, I think the study is much better quality than Haak's, with much more extensive and good quality supplementary materials, many of which are interesting. On the negative side, it totally lacks of any Western European samples, excepted the already known Epipaleolithic ones, being therefore pretty much useless to the understanding of the formation of the modern European genetic pool, except in a negative sense (stuff that is still clearly missing with the Central and Eastern references we have).

This huge blank in Western Europe, who nobody seems willing to fill up, would be quite apparent in the locator map except that it is poorly dissimulated by cutting a good chunk of Europe out of it:

from fig.1
What do we get in this paper then? Lots of info about Central Europe, notably the Carpatho-Danubian region, also about the Greater Caucasus, Siberia and historical Denmark. Add to that a couple of novel sequences from Bronze Age Montenegro and one from Estonia. 


It's interesting in its own way but I'm still demanding Western European ancient DNA. In fact this study (piling up with others) clearly evidences that the Western European, particularly Atlantic aDNA must be key to the understanding of the formation of European peoples. For example:

Something is amiss, right? Well, that is the LCT allele. Bell Beaker Blogger spotted it very well but he could not explain it. So I told him: There are no samples from Western Europe. That's why! 

Later I added:
Actually Chalcolithic Basques from the Ebro banks already had more rs4988235-T than any of the samples shown here: 27% overall with as much as 31% in San Juan Ante Porta Latinam.

Notice anyhow that T allele frequency is not equal to "lactase persistance" because, on this allele alone, it may be distributed unequally through individuals (and that's precisely what we see in the Chalcolithic Basque samples, suggesting two different populations). Also there may be other alleles producing the same effect, just that they are not that famous or even known at all.

I must say that the recent finding of similar mtDNA pools to those found in ancient Basques and resembling modern ones in the Seine basin, does reinforce the notion of Western Europe being key, not just for this allele but in general for the conformation of modern genetic pools in much of Europe.

It would seem (Mälstrom 2007) that the Gokhem people (Megalithic SW Swedes) also had some notable LCT frequencies, which are missing everywhere else (in Central-East and Mediterranean Europe, that is) before modernity. Hence the European LCT allele must have expanded from the Atlantic.


It's not the only this particular trait, mind you. For example, another detail that Bell Beaker Blogger spotted in this paper's data, is that the first known modern S116-derived is in a Bell Beaker individual. Not in Yamnaya, not in Corded Ware, not in Unetice... but in the Bell Beaker carriers that (culturally at least) came from the Southwest.

Specifically it is R1b-U152 (alias S28), which has sometimes been dubbed the "Celtic" subclade because of its distribution across the Alps, being particularly important in Switzerland and North Italy (including non-Celtic regions like Tuscany, Piamonte and Liguria, as well as Corsica). It has also some notable presence in much of France, SW Germany and Belgium. 

I already argued back in 2010 that R1b-S116 must have expanded from Southwestern Europe, possibly Southern France. No need to extend myself on this matter because there has been nothing new in all these five years (sadly enough). Just attach some old maps here for your convenience:

Relative prevalence of R1b-M269 subclades
note: M529 is wrongly listed as M259

Frequencies of the main European R1b subclades: S116 (red) and U106 (blue)


Hidden deep in the supplemental materials there is sup. fig. 6, which is an ADMIXTURE analysis of ancient and modern sequences. I simplified it by removing modern Asian and African samples as well as low K scores, all them pretty much irrelevant, adding more clear labels and rotating it:

The most interesting aspect is that this analysis pretty much gets rid of the exaggerated Yamna influence attributed by Haak, bringing it down to much more acceptable levels. Even Corded Ware peoples were, it seems now, only weakly related to Yamnaya and much more strongly related to Paleo-European hunter-gatherers (tan) and Mediterranean early farmers (yellow). Some of the Caucasus (teal) component was anyhow already present in pre-Indoeuropean farmers from Hungary and it is only the minor Siberian element (blue shades) what really marks the distinctiveness of Indoeuropeans. 

Notice that in this analysis, early European farmers like Stuttgart are not single color but dual: they appear as almost exactly a 50-50 mixture of Paleo-Europeans (tan) and Mediterranean farmers (yellow). The only (almost) true yellow reference are Naqab Bedouins, pointing again to a PPNA origin of the migrant farmers, who, after admixture with European aborigines, surely in Thessaly, spread the Neolithic through the subcontinent. 

When, at K=19, the Naqab Bedouins form their own distinctive component (pale yellow) the yellow component suddenly expands in all other samples at the expense of the rest. It changes meaning at that point, beware: it has become the Sardinian component, although one can well argue that this component is very much dominant among early European farmers (but always mixed with some Paleo-European, some Caucasus, some Bedouin even), there is not any single sample that is clearly dominated by it at >80% frequencies (visual estimate), only Sardinians score that high. Admittedly it'd be interesting to re-run this without Sardinians and see what happens. 

From the Basque origins viewpoint, I find notable that (again) the Basque and Swedish farmer sequences are similar all along. The latter have more yellow component however, implying that Basques are even more Paleo-European than they were. 

In contrast, in extended data fig. 6, Basques appear as less related to WHG than our immediate neighbors (Spanish and French), what is probably an artifact of Indo-European admixture in the latter, as Indoeuropeans were no doubt largely Paleo-European and the Siberian elements they carried may also weight in that same direction. This is also visible in the ADMIXTURE graph above: French clearly carry some more tan Paleo-European, in addition to the Caucasian teal (and in some K-values also minor Siberian blue shades).

In general we see an apparent increase of the Paleo-European component as we move away from the Neolithic but while in the French case this can be partly attributed to Indoeuropean flows from, ultimately, Eastern Europe, in the Basque case that is not the case. There must be another source of that excess Paleo-European element and that source must necessarily in Atlantic Europe. 

Time to do proper research, time to sample the ancient Far West.