August 30, 2015

France's autosomal genetics highlight Gascon-Basque distinctive cluster

A rather decent analysis of French autosomal genetics has been privately pre-published recently (thanks to Jean Secques for calling my attention to it and to the lead author for making it available online). 

Aude Saint Pierre et al. The fine-scale genetic structure of the French population. Submitted to the American Journal of Human Genetics in 2015. Freely accessibleref. LINK, direct PDF link.

A highlight of the study is that the samples all belong to people born in the 1930s and locations refer to their place of birth, so the results should be reflecting the historical demographics of the French Republic in the early 20th century. 

There are no supplemental materials available at this point, so it's only possible to get a glimpse of the general results and we can't go too much into fine detail. These general results are anyhow interesting. Let's see:

Figure 6: Prediction of geographic location of individuals from the test set (n=3,733) using multiple linear regression model. A) Expectation: The seven geographical regions of France according to the geographical coordinates of individuals in the test sample; B) Prediction of geographical coordinates according to the multiple linear regression model.

This figure alone synthesizes the findings: most French citizens cluster in a single unit, which geographically would correspond to NE France (GE region), only SW French (Gascons and Basques mostly) deviate very clearly and roughly fit their own geography towards the Bay of Biscay (or Bay of Gascony, as the French call it). Some samples from the SE (MED and RA regions) also follow this trend. A few outlier samples from the East (GE, RA) look rather Rhenish German, although the lack of controls from outside the Hexagon does not allow me to confirm this appearance. 

You may have noticed that I ignored the IDF samples but that is because it is the Paris region (Île-de-France), which was already back in the 1930s too cosmopolitan to be informative. That is of course reflected in all the results with "orange" dots being nearly of all affinities. 

Follow the principal component analyses, whose more salient information is again the peculiarity of Southwesterners, i.e. Gascons, Basques, and nearby populations.

Figure 2: The scatter plot of the first three PCs from PCA performed on the SNP
genotype data of the 4,433 individuals from the 3 Cities study. Individuals are coloured
according to the region where they were born. (Note: the legend corresponds to both PCAs)

Other than the "Gascon" specificity, which takes over PC1, I'd say that PC2 shows an "anti-Mediterranean" tendency and that PC3 instead shows a "pro-Mediterranean" tendency. This I gather from the relative position of the "red" MED cluster. They both weight the same.

Interestingly there is a prominence of the GO region (Mid-West between the Seine and the Garonne) which may indicate some sort of "Armorican" or "Briton-like" specificity. In appearance it could melt both the "pro-" and "anti-Mediterranean" tendencies but without being able to discern the particular dots (ID and location), I cannot swear for that. 

Much more clear is the "anti-Mediterranean" tendency of Gascons, Basques and allies when they are strongly detached from the main French cluster and instead they show a "pro-Mediterranean" tendency, overlapping at the extreme with the MED cluster, the closer they are to mainstream French. This happens in both PC2 and PC3. 

Little more to say, honestly. Maybe that the small Eastern group of outliers prominent in "anti-Mediterranean" tendency in PC2 probably corresponds pretty well with the outliers of the first graph, which looked German-like. So I guess that the positive side of PC2 probably corresponds with a Northern European tendency.

Interested on what you have to say on this one very particularly, reader.


  1. Seems like Caesar may have been right.

    1. We don't know if we can extrapolate these data to the Roman period. Remember the consequences of the Albigensian crusade for example and also the persecution of Huguenots later on, most of which were again southerners.

      Caesar was quite obviously right on ethno-linguistic matters but extrapolating that to genetics is not correct, more so when the SW tendency seems to be (as we already knew from other studies) towards the Basque very distinctive cluster and not towards generic Iberians, who may be closer to mainline French instead, especially those with the Mediterranean tendency.

    2. Or in other words: the Iberian tendency seems to be rather the low PC2 and high PC3 and not the high PC1, which seems Basque-Gascon specific.

  2. People were sampled in Bordeaux, Dijon and Monpellier where they live, not in Paris.

    " the fact that the 3C study only includes elderly people born before 1935 at a period in time where migrations were rather limited was also an advantage for this study. Their place of birth was, except perhaps for the IDF region, a good indicator of the region of origin of their ancestors. This was not the case however for the places where they were sampled that could not be used to trace back their origin. This raises serious concerns on the studies that used sampling places as surrogate for geographical origins."

    Maju, what do those results mean?

    " The largest difference was observed between NO and SO
    (Fst = 0.068%) followed by GE and SO (Fst = 0.049%) and NO and MED (Fst =0.046%) "

    " the strongest Fst values found between North-East and South-West regions (Fst=0.0007 between NO and SO)"

    "All the pairwise Fst values we computed here between the 7 regions were below 0.001 but still we could detect some differences between the regions "

    1. Montpellier, eh? My bad. I'll correct that.

      ... "what do those results mean?"

      (1) " The largest difference was observed between NO and SO
      (Fst = 0.068%) followed by GE and SO (Fst = 0.049%) and NO and MED (Fst =0.046%) "

      That the greater statistically estimated genetic distances were between NO and SO, i.e. along the Atlantic facade of France, followed at a distance by the NE-SW and the NW-SE diagonals, which would be more natural to expect as max. distances (but nope). I guess they correspond well with the PC axes.

      (2) " the strongest Fst values found between North-East and South-West regions (Fst=0.0007 between NO and SO)"

      Same as the first part of the previous sentence, just that the figure is expressed as raw fraction rather than percentage (so two zeros more).

      " the strongest Fst values found between North-East and South-West regions (Fst=0.0007 between NO and SO)"

      That overall distances are in any case low even among those most distant.

  3. The data set seems pretty good, but the analysis is really surprisingly shallow compared to par for the course in these kinds of papers these days. Razib and Eurogenes, for example, frequently do more sophisticated data analysis with appropriate comparisons to public databases for comparison purposes in ho hum blog posts.

    1. This comment has been removed by the author.

    2. This comment has been removed by the author.

    3. OK, you're talking about the paper. Well Latin underdevelopment, or AFAIK British for whatever is worth too. Only the Germans and sometimes the Swedes are at the level you require. But not the French, not anymore.

  4. According to you, why between 40 to 75% of individuals are assigned to each region by the KNN algorithm?

    What can you say about Froh1 and Froh5? Are the means significant?

    Thank you Maju for sharing your knowledge.

    1. I'm not familiar with the KNN algorithm. I thought it was very weird the way it behaved: every region was dominated by the GE blue column except the SO. It looked almost totally useless and so declared the authors, who found ADMIXTURE much more capable.

      I'm also not familiar with the F-sub-ROH, I guess it refers to heterozygosity but without further explanation I'm lost.

      Thanks to you for pointing to this interesting paper. I only know that much anyhow, I'm a reporter not a researcher.

  5. If it is of any interest, I got a friend of mine tested. He's from the region between the city of Vichy and la Creuse, with no recent ancestor from outside since the French Revolution. Despite his quite northern location, he still displays some affinity with Basques and actually does not cluster with most French people but in an intermediate position between the two aforementioned populations. Based on the little I know, he is less ANE than most French but is not as WHG as most Basques (Basques being at Northern European levels of WHG affinity.interestingly).
    One sample is obviously not enough but this example could possibly mean Basque-like affinity is present much farther North and East than expected.

    1. Well, it's of anecdotal interest because he'd be in the RA region and there are a few people from the RA area that tend strongly towards Basques and Gascons and even overlap with them, not the majority though. Vichy is in Northern Auvergne, a mountainous region that may well have helped older genotypes to survive through the centuries and millennia in less well-connected valleys (unsure, just a hunch).

      "One sample is obviously not enough but this example could possibly mean Basque-like affinity is present much farther North and East than expected".

      It seems quite apparent that some individuals in those areas (and that may well mean some specific districts which have been less exposed to migrations or whatever) do tend to overlap with the SO cluster. These are often MED/RA people although also from GO and even GE. In the case of GO, I'm personally very intrigued re. Angoumais, who visually appeared to me particularly Basque-like but I presume there are other such "islands" of Basque affinity scattered.

    2. "Basques being at Northern European levels of WHG affinity.interestingly"

      Depend who you read or rather how you assess what is "WHG". There are at least three main sources of Paleoeuropean (HG) blood:

      1. WHG (and maybe SHG). IBD analysis shows this is stronger in populations from the area: mainline French are for example much more directly related to Lochsbour man than Danes, even if in Lazaridis Danes have more overall "WHG" (generic Paleoeuropean) blood (most likely owed to other sources like EHG). Similarly I've read somewhere that Basques and Iberians are more closely related IBD-wise to La Braña than other Europeans.

      2. UHG ("unknown HGs") which makes up about 50% of the EEF component and is probably of Balcanic origin. Very similar to WHG (low ANE) but not the same thing ultimately. When EEF is used as reference this fraction of the Paleoeuropean ancestry is diminished naturally.

      3. EHG (Eastern European HGs), which was brought mostly by Indoeuropean invasions and is directly associated with the ANE component. This seems to be the main cause of excess HG blood among Northern Europeans but still needs improved qualification. Here again, as happens with EEF and its intrinsic UHG, when Yamna or Corded Ware are used as direct reference, this part of Paleoeuropean ancestry gets diminished and I'm guessing is the kind of figure you have in mind.

  6. This study is quite fascinating. Such a fine scale, regionalised and "historical" sampling of France is something that Vincent (Heraus) and I have long fantasised about.

    I found the mapping of the PC back onto French geography to actually be the most interesting.
    PC1 clearly follows the principal divide in France: ie. Basque/Gascon vs everyone else.
    PC2 seems inversely related to post-neolithic middle-eastern/levantine admixture, perhaps of Roman/Phoencian/Greek origin, it is very low along the mediterranean coast and increases northward while avoiding Gasconny, and is at is lowest among some outlying Alsacien or Lorrain samples, which I can believe to Jewish ancestry (more common in that area), or even Italian ancestry (especially in Lorraine, where Italian immigration dates back to the 19th century).
    PC3 seems inversely related to HG ancestry, having its lows in Gasconny and the North/Northeast.

    I dearly hope that more comes out of all this data, this time placed among a European/International context and making use of newer tools too.

    On a humourous note, the most notable cultural overlap to PC1 is this

    1. LOL, I like the map! As chocolate-lover (and hater of chocolate sanwiches my, largely non-Basque, mum prepared when I was a kid) I can't but agree 200%. We are sweet-tooths at a massive scale, while, south of the border, Castilians (or local "Celts") tend to disdain sugar instead. I once had a girlfriend of Castilian parents who made cakes that were almost like bread because of lack of sugar and my Castilian uncle-in-law always got up from the table without dessert, go figure!

      Actually I'm having a hard time trying to figure a single Spanish (non-Basque) dessert. Turrón, I guess, crema catalana (= créme brulée) but they are not Castilian in any way. Wonder if this anthropological cuisine issue existed already when the Celts arrived. Did Celts hate sweet?

    2. Anyway, what's up with Vincent (Heraus)? I haven't heard from him like in ages. I hope it is for the good.

      I have to question your interpretation of PC2 and PC3. IMO it is PC3 the one showing the Mediterranean influence, which may well be more scattered and be older than Greeks and Romans. Shouldn't it be rather persistence of "Neolithic 1" (EEF) genotype, which is rather extremely Mediterranean? I guess only comparison with Sardinians (would Corsicans do?) can clarify but my impression is that one, if nothing else because it is in PC3 where the positive score correlate with the Mediterranean group, while in PC2 the correlation is negative. As for HG ancestry mainline French have it very high, Basque-like levels per the latest study (Günther & Valdiosera), even if a sizable fraction of it should surely be attributed to EHG (Indoeuropean) influence.

      What intrigues me the most is that GO anomaly, which I suspect Breton, that tends both to positive scores in PC2 and PC3, i.e. Mediterranean and anti-Mediterranean at the same time. Why?


Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... OFF (keep it that way, please)