September 4, 2015

Neolithic Catalan aDNA provides further insight into European genesis

Again Marnie points me towards a very interesting study, this time totally new and dedicated to archaeogenetics.

Iñigo Olalde et al., A common genetic origin for early farmers from Mediterranean Cardial and Central European LBK cultures. Molecular Biology and Evolution, 2015. Open accessLINK [doi:10.1093/molbev/msv181]


The spread of farming out of the Balkans and into the rest of Europe followed two distinct routes: an initial expansion represented by the Impressa and Cardial traditions, which followed the Northern Mediterranean coastline; and another expansion represented by the LBK tradition, which followed the Danube River into Central Europe. While genomic data now exist from samples representing the second migration, such data have yet to be successfully generated from the initial Mediterranean migration. To address this, we generated the complete genome of a 7,400 year-old Cardial individual (CB13) from Cova Bonica in Vallirana (Barcelona), as well as partial nuclear data from five others excavated from different sites in Spain and Portugal. CB13 clusters with all previously sequenced early European farmers and modern-day Sardinians. Furthermore, our analyses suggest that both Cardial and LBK peoples derived from a common ancient population located in or around the Balkan Peninsula. The Iberian Cardial genome also carries a discernible hunter-gatherer genetic signature that likely was not acquired by admixture with local Iberian foragers. Our results indicate that retrieving ancient genomes from similarly warm Mediterranean environments such as the Near East is technically feasible.  

Very interestingly, the main hunter-gatherer input into early European farmers is not best explained by truly Western hunter-gatherers (Lochsbour, La Braña) but rather by a close relative from Hungary (KO1), very probably hinting to a Balcanic true source of this admixture, which Treemix puts at the very root of the Neolithic branch as something proto-KO1 and not truly close to KO1 as such:

Fig. S7. TreeMix (Pickrell and Pritchard 2012) analysis considering: (...) (B) four migration edges.

Note: the proto-Stuttgart admixture axis to Mbuti may well be, I believe, an artifact of the famous "Basal Eurasian" thing, which is probably African admixture in the opposite direction, which is not yet well understood but quite undeniable (most patently in Y-DNA E1b). 

The overall data is analyzed in both PCA and with the ADMIXTURE algorithm:

Figure 2. Genetic affinities of CB13. (A) Procrustes PCA of hunter-gatherers, Early Neolithic, Middle Neolithic and Copper Age farmers. The PCA analysis was performed using only transversions (to avoid confounding effects related to post-mortem damage). (B) Ancestry proportions assuming 11 ancestral components, as inferred by ADMIXTURE analysis.

All this is very much similar to what we are used to see in other comparable studies, however it's not exactly the same and I like to emphasize that slightly (or more rarely strikingly) different points of view on autosomal data, whose processing is always subject to the limitations of statistical analysis, are important to consider, because sticking too tenaciously to any one such single POVs may cause confusion and bias. 

So I annotated the supplemental materials' version of the above PCA as follows:

I merely drew two axes of admixture: firstly one that is strictly parallel to the PC1 axis which pretty much describes the axis of West Asian - Paleoeuropean admixture, using KO1 as reference. In West Asia it falls on the (Naqab) Bedouin B sample. While early Neolithic ancient samples approximate this axis (at a roughly 60:40 apportion location), it is very apparent that they do not fall strictly on it but are rather spread above and below the main axis. Those falling above the axis may have minor extra EHG or other "Oriental" admixture but many fall below it and so far we lack ancient references that could inform us about what is causing that deviation (although intuitively it would seem something African or at least "hyper-Mediterranean"). This tendency is reinforced in all the Middle Neolithic (Early Chalcolithic) samples, which also tend (along with Spain_EN) further towards the WHG polarity (something also apparent in the ADMIXTURE graph). So it is probably something that was in that WHG-like influence, which is however "too Mediterranean" for even La Braña.

Is it something from the original Balcanic HG (Lazaridis' UHG) admixture? Iberian but distinct from La Braña? Is it something North African? Or something else? I can't say at this point, all I can say is that the references needed for that fine tuning are missing so far.

The other axis I annotated was something that the PCA was almost commanding me to do: the axis spanning from Samara_HG to Sardinia and the bulk of Early Neolithic samples was just there crossing right in between the bulk of modern European samples. In this graph, even if there are no Yamna nor Corded Ware samples, they seem not to be needed: Eastern European hunter-gatherers explain everything on their own. On the other hand Ma1 (alias ANE) is sitting up there, almost hidden between the legend boxes, not needed anymore to explain anything, because Samara_HG does the job much better.

So what is left of what some have called the fateful triangle of European admixture, first proposed by Lazaridis 2014 as a {x.EEF+y.WHG+z.ANE} formula? In this PCA at least it seems we can ignore the three elements and replace them for a better fitting {x.Spain_MN+y.Samara_HG+Bedouin-B} formula:

It would need of course further testing but at least in this PCA it works very well and even most Sicilians and some Maltese fit within it, something that was not the case with the Lazaridis formula, whose triangle (not shown) run between Stuttgart, Lochsbour and Ma1 and would effectively exclude Sicilians and Maltese also in this PCA, as well as leaving huge empty areas towards the left (WHG) and top (Ma1). 

I've tried to do something similar using Starcevo (KO2) and Corded Ware or Yamna (not shown here) but they tend to exclude a much larger number of modern Europeans, not just Balcanics and Italians but also some Northern Europeans like Lithuanians. It can of course explain many things but it's not so inclusive. 

Of course, it does not need to be a triangle. In fact it is very likely that a fourth polarity runs between Basques and Cypriots towards the North Levant, going through Spain and Sicily. This axis does not seem to be explained by either Neolithic or Kurgan-like admixture since the late Chalcolithic and very probably indicates some other specifically Mediterranean source of admixture that at the very least influenced Italy and the Balcans.

In fact this cross-like structure of European autosomal genetics is invariably more apparent when only (or almost only) Europeans are considered. It is also apparent here but goes rather diagonally because of the heavy weight of the West Asian (and various Jewish) samples. 

In any case, I can draw on this PCA a rather obvious trapezoid that clearly includes all modern Europeans and also almost all Neolithic and post-Neolithic known ones:

It works much better than any triangle I could imagine. Within it I also drew (dotted lines) what seem to be the most important axes of European variation, which should roughly correspond to PC1 and PC2 in any Europe-only analysis. The first one runs between KO2 (Starcevo culture) and Samara_HG, and, as I said before, the figure itself was calling for it, being almost identical to the one I drew above. It is likely that the best reference is not quite KO2 but some Greek hunter-gatherer that still awaits to be studied and should be further down in the PC2. 

The other axis I drew is also demanded by the scatter of the graph and would roughly correspond to the PC2 of a Europe-only analysis. This axis is clearly displaced to the Samara_HG polarity relative to the Neolithic and Early Chalcolithic European samples, what seems to suggest that this admixture corresponds to a time after the first Kurgan impact, possibly the Bronze or Iron Age. 

However the period when Cyprus (which seems to be one of the polarities) would seem to play a most active role in the Mediterranean overlaps with the Kurgan expansion and only barely touches the Bronze Age, being rather Chalcolithic. One could also think of Phoenicians but it does not make much sense that Phoenicians had such an impact (they did not even colonize most of Sicily, let alone Italy or the Balcans). I don't have a good answer yet but my impression is rather something of the Late Chalcolithic, early Bronze at the latest, and hence overlapping in time with the Kurgan impact but with a Mediterranean dimension. 

Of course the Cypriot reference may be misleading and is something more loosely Anatolian (even if not quite like modern Turks either). In any case the Bronze Age (Late Bronze in the Eastern Mediterranean), when Mycenaean Greeks ruled the waves, after the collapse of the Eteocretan civilization, seems a bit late. I'll leave it at that: the window for this change affecting especially SE Europe seems rather narrow: late Chalcolithic and early Bronze Age at the latest.


  1. The Ardgei and Balkar of the North Caucasus area fits nicely with the "german" LBK_EN ( 10 samples ), where does the Spanish_EN come into play with these "german" and Ardgei/Balkar.

    Are we still playing around with this Phoenician movement in regards to the spanish?

    1. I don't understand what you mean? Do you mean in the PC1 (horizontal axis)? I would not jump to any conclusions based only on that personally, as there is no structure nor continuity of any sort nor other data ever have proposed any sort of relationship among those.

      "Are we still playing around with this Phoenician movement in regards to the spanish?"

      It would not be Phoenician and it would be in regard to Sicilians particularly (and peninsular Italians and Balcanic peoples more broadly), not Spanish. So no.

      The issue is that Sicilians and Maltese diverge so strongly in that direction that they were clearly not fitting in the Lazaridis triangle or triple admixture model. What I argue here is that the best fitting triangle, or more likely trapezoid, that fully explains the genesis of modern Europeans may well be better described in terms different than those of Lazaridis. The trapezoid figure retains one of the poles of Lazaridis (EEF, although moved to Starcevo/KO2 for improved fit), replaces ANE (Ma1) for Samara_HG and WHG (Lochsbour) by Spain_MN (Portalón), close to Gokhem in this graph. This alone would make a nice triangle similar in the basics to that of Lazaridis but still excluding Sicilians, etc. This is solved either replacing KO2 by an arbitrary Bedouin (presumably close to the West Asian ancestors of EEFs) but probably is better solved by factoring Cypriots (or some nearby population) in as fourth polarity. This 4-poles solution actually fits way too well with usual spontaneous self-organization of European-only PCAs along the Russia-Sardinia PC1 and the Basque-Malta (or Basque-Adigey too with other samples) PC2. This second axis of the European PCA self-organization was already intriguing to me and I have mentioned it in other entries or comments in other blogs. Nobody answered... what I interpret as "ok, sure, but what does it mean?" Well, obviously an extra axis of admixture no or much less apparent towards the North of Europe, where everyone tends to cluster towards the Russia-Sardinia axis, i.e. EHG-EEF.

      It is quite obviously an element of extra complexity. It is not apparent in Spanish, who also cluster on the EHG-EEF axis, but it's apparent in SE Europe (Italy, Balcans, Sicily very especially) and, in the opposite direction, tending towards greater Paleouropean admixture, among Basques.

      In the case of Basques it should be clear that it is greater HG admixture much as among "MN" ancient samples, notably the Western ones. This is probably a Megalithic trait. In the case of SE Europeans however it means less, notably less, Paleoeuropean component than even Starcevo, so almost necessarily implies another source of admixture from West Asia, for example Cyprus, which played an important but obscure role in the Chalcolithic and Bronze Age Mediterranean.

      Open question anyhow.

  2. In regards to your first paragraph.
    The only modern group that sits with LBk_EN in your charts are the north-caucasus group of Ardgei and Balkar . Why would we not assume anything else when no other modern group can compare?

    1. I think you are confusing Adygei with Sardinians. Both are coded as "greenish" squares, one is darker and the other lighter and a bit more bluish. As for the Balkars, no idea, even if you're confusing them with Turks or Bulgarians (green-blue triangles, because those also fit far away from the Neolithic samples).

      Unless you are only focusing only on the PC1 (horizontal axis) and disregarding the vertical distance (PC2). In any case you are failing to explain yourself properly.

  3. Thanks Maju. It would have been interesting to see where *non-Jewish* Moroccans, Tunisians, and Libyans would fit on the above PCA. Closer to the Canary Islanders and Sardinians I expect. We already know that the Canarians have strong Berber-affinity, and last week Razib Khan confirmed that Sardinians also have strong Berber-affinity [which many already knew or suspected].
    Razib: "In any case, I also did a little analysis of North Africans, Spaniards, and Orcadians and French as token outgroups[..]The gene flow into the Spaniards, and Sardinians, is Berber-like, not Middle Eastern."

    So if Canarians and Sardinians are proven to have very strong Berber-affinity, and they also cluster close to 'early farmers' on your PCA..and all of these cluster quite far away from middle eastern populations..I would say that is a strong hint that 'EEF' or 'basal Eurasian' is North African and not Middle Eastern.

    1. A PCA will always be conditioned by the samples included. If you include North Africans two things can happen: (a) that they are very few and hence they are artificially forced to align relative to others or (b) that they weight enough to take over a PC dimension, in which case European variance would be even blurrier. Personally I think that a Europe-only PCA is almost invariably the most interesting one when we are trying to study Europeans. In these all West Eurasia PCAs, dimension 1 is invariably overweight and aligned along the Europe vs West Asia polarity. This one, luckily enough is similar to the (roughly defined) early farmers vs late farmers axis but it's not quite the same. In most cases internal West Asian variability, along with parallel European one, also takes over PC2. However in Skoglund's study, with many more Swedish samples, West Asians were compacted and everything in Europe was shifted slightly and maybe significantly with PC2 being clearly defined as Swedes vs Sámi polarity, ha! Clearly different sampling strategies produce different statistical analyses, be as PCAs, as ADMIXTURE plots, etc. But in PCA this becomes even more important, as we can't never go as deep and as detailed as with ADMIXTURE. PCAs are, even in the best case, simplistic. Careful then.

      It's a pity that I had to stop following Razib because of his continuous far right political ranting because he does now and then (but not often and interspersed with dozens of far right pamphlets) post something interesting in the field of population genetics. However he clearly has it wrong when he says that "Spaniards are the direct descendants of early farmers" because where you see where the dots fall, it is obvious it's not Spaniards but Basques (the "Spanish_North" sample is a Basque sample).

      As for the TreeMix graph he posts, it's not from the paper but something else he did, resulting in an apparent (but faux) North African ascendancy of all West Eurasians. In any case the Tunisian (not Moroccan nor Algerian nor Libyan nor Egyptian) admixture axes are strange and curious and to me they suggest (with doubts) Carthaginian admixture in the time 3rd century BCE (when much of Iberia was unified by the Barca family under the colonialist banner of the Carthaginian Republic). This would also explain the influence in Sardinia, which was also a Carthaginian colony just years earlier.

      But in any case the whole tree is a fallacious construction which would be easily corrected by including Asian and African outgroups, unavoidably resulting in a completely different and admixture axes. Sampling strategy is most important and you can totally change the results of statistical analyses by manipulating just that.

      "So if Canarians and Sardinians are proven to have very strong Berber-affinity"...

      In those graphs it's all proto-Tunisian (other North Africans behave very differently) and Tunisian samples are (1) extremely low in diversity and (2) extremely European-like, even in Neanderthal ancestry fraction. The HGDP/1KGP Tunisian samples tend to distort everything, even when you study North Africa. I would rather not use them, really.

  4. Quite interesting and a good analysis.


Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... OFF (keep it that way, please)