March 14, 2012

Basque and Gascon Y-DNA survey

Using the same nice sampling strategy as in the recent mtDNA paper by Behar et al., researchers from the same team have now published on Y-DNA:

Begoña Martínez Cruz, Evidence of pre-Roman tribal genetic structure in Basques from uniparentally inherited markers. PNAS 2012. Pay per view (for 6 months or outright free in some world regions). 
Abstract
Basque people have received considerable attention from anthropologists, geneticists and linguists during the last century due to the singularity of their language and to other cultural and biological characteristics. Despite the multidisciplinary efforts performed to address the questions of the origin, uniqueness and heterogeneity of Basques, the genetic studies performed up to now have suffered from a weak study-design where populations are not analyzed in an adequate geographic and population context. To address the former questions and to overcome these design limitations, we have analyzed the uniparentally inherited markers (Y chromosome and mitochondrial DNA) of ∼900 individuals from 18 populations, including those where Basque is currently spoken and populations from adjacent regions where Basque might have been spoken in historical times. Our results indicate that Basque-speaking populations fall within the genetic Western European gene pool and they are similar to geographically surrounding non-Basque populations, and also that their genetic uniqueness is based on a lower amount of external influences compared to other Iberians and French populations. Our data suggest that the genetic heterogeneity and structure observed in the Basque region results from pre-Roman tribal structure related to geography and might be linked to the increased complexity of emerging societies during the Bronze Age. The rough overlap of the pre-Roman tribe location and the current dialect limits supports the notion that the environmental diversity in the region has played a recurrent role in cultural differentiation and ethnogenesis at different time periods.

I do not have a copy yet, so I can only discuss the Y-DNA pool as such, borrowed from Dienekes:

Click to expand
Codes: BIG, Bigorre; BEA, Béarn; CHA, Chalosse; ZMI, Lapurdi/Baztan; NLA,Lapurdi Nafarroa; SOU, Zuberoa; RON, Roncal and Salazar valleys; NCO, CentralWestern Nafarroa; NNO, North Western Nafarroa; GUI, Gipuzkoa; GSO, SouthWestern Gipuzkoa; ALA, Araba; BBA, Bizkaia; BOC, Western Bizkaia; CAN,Cantabria; BUR, Burgos; RIO, La Rioja; NAR, North Aragon.



R1b (South Clade)

The dominant lineage is, of course, R1b1a2a1a1b (P312/S116), which is also the most important R1b sublineage worldwide, which I have called in the past the South Clade and will call hereafter as R1b-S, followed when needed by the last letters and digits of the subclade (see also ISOGG). This lineage is widespread in Western Europe, specially in the South-West. Maps from a previous entry (based on Myres 2010 data):

Naming convention is obsolete but distinctions remain (only R1b1a2-M269 is considered).
Typo: M529, also known as L21, is wrongly written as M259.


Approximate dominance of R1b-S (red) and R1b-N (blue).
Basque Country was not sampled in Myres 2010, should be darker red in fact.

Not only the lineage is dominant by numbers but also important in diversity. Per the last revision, R1b-S has three main known sublineages:
  • R1b-S-2  (Z196), which includes:
    • R1b-S-2a (M153): Basques and Gascons almost exclusively
    • R1b-S-2b  (L176.2/S179.2): Gascons and Catalans specially, but more widespread
  • R1b-S-3  (S28/U152): Not too frequent but neither rare either among Basques and Gascons (more common in Pyrenean Navarre: RON and NNO) but widespread through mainland Europe, specially Italy (in plain blue in the first map).
  • R1b-S-4  (L21/M529/S145, L459): Often known as the Irish clade, is not restricted to Ireland at all but does have a mostly Atlantic distribution (West France, England...). Now we come to know that it is also quite common among Basques and Gascons (yes: this is novel data), reaching >5% among all them (but not in the border areas of CAN, BUR and NAR). In turquoise shades in the first map
The South Clade has other formally described three basal (?) sublineages per ISOGG, one described as "private" (very minor) and the other, without branches, may also be private or at least rare enough - none of them was tested for in this survey in any case. It also has lots of  unknown asterisk R1b1a2a1a1b*, what may hide huge basal diversity (or not but probably it does). This is a known problem for all SW European R1b and it does include a good deal of Basque Y-DNA as well, even if Basques have been researched somewhat more than your usual Iberian or French. 

All this reinforces my idea of R1b-S being original from the Franco-Cantabrian Region and scattered with, possibly, Magdalenian culture. Other possibilities may exist but in any case it requires an origin in SW Europe, where it is quite obviously most basally diverse. You cannot just argue R1b1a2-M269 as a whole, you need to specifically explain R1b1a2a1a1b-P312/S116 on its own merits and that demands to talk about the prehistory of SW Europe. 


Other lineages

The most common Neolithic (??) lineage is by far I2a1a (the Sardinian clade), which is present in all samples excepted Bigorre, being >5% in many of them (Bearn, Dax, all Northern Basque Country, most of Navarre, Cantabria, Rioja and North Aragon - less important in the Western Basque Country however). 

Then, of the Mediterranean type, J2a comes next, being important in parts of Navarre, Rioja and Burgos. Less relevant but still deserving mention are E1b-V65 (Araba) and T (Cantabria).

The other two major I subclades are also fairly represented in the Basque/Gascon area, even if not as common as the Sardinian variant. I2a2a (typical of Low Germany) is found  at >5% frequencies in Gascony (Bigorre and Dax), while I1 (typical of Scandinavia) is found (>5%) in Araba. Neither of them looks like Viking legacy at all.

This, together with the practical absence (some but very low frequencies) of R1b-N and R1a makes me think that the distribution of Y-DNA I in general is at least partly Paleolithic and not Neolithic, regardless that chance (drift) has concentrated it in some areas. But hard to say based only on this data.

Also R1b1a2a* (L23) must be mentioned, as it is found (>5%) in Bigorre, with some presence in other areas (Bearn, Lapurdi, Rioja). It may be a Neolithic arrival from, ultimately, Anatolia or a remnant of Upper Paleolithic flows.

Other Neolithic lineages (E1b-V13 or G, documented in ancient DNA from Languedoc or Catalonia) have almost no presence in the Basque/Gascon area. Uralic haplogroup N is found 2/54 in Rioja (maybe a Celtic/Indoeuropean legacy?) and as singleton (erratic) in Zuberoa.

14 comments:

  1. Mahu posted "The dominant lineage is, of course, R1b1a2a1a1b (P312/S116), which is also the most important R1b sublineage worldwide, which I have called in the past the South Clade and will call hereafter as R1b-S "

    I suggest renaming R1b-S as R1b-W for R1b-West as the predominant distribution is not really U106/S21 to the North and P312/S116 to the South but more U106/S21 to the North and P312/S116 to the West and scattered to the North as well. You can see this in the Old Norway Project. The point I have is that P312/S116 has high frequencies even up into Scandinavia, it's just that it's exceptionally high Atlantic facing frequencies drown out the the rest when put on the same chart scales.

    ReplyDelete
  2. Mahu posted "All this reinforces my idea of R1b-S being original from the Franco-Cantabrian Region and scattered with, possibly, Magdalenian culture. Other possibilities may exist but in any case it requires an origin in SW Europe, where it is quite obviously most basally diverse."

    You are possibly right, but I don't think the Basque Y DNA frequencies necessarily support this. The highest diversity within P312/S116 subclades is U152's. U152 barely shows up in Iberia and is heaviest along the Rhine and into Northern Italy (old Cisalpine Gaul.) See http://www.u152.org/

    If U152's STR diversity is higher than P312*'s. As the eldest son, shouldn't U152 be present among P312's launchpad through Europe?

    This is all speculative, but I'm just saying the Basque data may actually support a P312 launch elsewhere than in the Pyrenees. It is worth noting that according to the DNA project data, P312's (all) highest diversity is in SE France and the Alps region. This is not far from the Franco-Cantabrian region so I wouldn't rule out your hypothesis either.

    ReplyDelete
  3. @FlyOverMan: It's a choice of words. I chose 'South' because it is distributed mostly south of the North clade, 'Southwest' fits (but longer to write) and West is kind of confusing because the North clade is found predominantly in NW Europe. Both are similarly 'West' in fact: it's 'North' and 'South' within the Western European or generally Western (worldwide) subclade of R1b.

    @Mike: the matter is the U152 is indeed found in SW Europe: West France, Occitania and, at lower frequencies, Vasconia and Iberia. The other clades are not found at high frequencies in Central or North Europe and we still have lots of R1b-S* to be analyzed in SW Europe (while North and Central Europe have been much more throughly surveyed by DNA companies. There's very little dark blue in Central Europe: that's the real issue.

    But sure I did not mean that Basque frequencies of U152 alone meant that. Actually they do not: they are low - although widespread, suggesting old dispersal.

    "It is worth noting that according to the DNA project data, P312's (all) highest diversity is in SE France and the Alps region".

    That's part of the Franco-Cantabrian Region in fact (from Asturias to the Alps, from the Pyrenees to the Loire). Anyhow I would survey Perigord, Aquitaine, Poitou... before reaching to sharp conclusions. Perigord (Dordogne department) was in fact 'the Metropolis of Upper Paleolithic Europe', hosting very large densities of sites and developing surely both Solutrean and Magdalenian techno-cultures.

    One of our problems is lack of data for Southern France (or for France in general, specially in a regionalized manner, as 'The Hexagon' is obviously not a single homogeneous bloc and has played a major role in the demographic history of Europe. I've calling for the genetic survey of France, specially the South, for years now. Also for Iberia at more detailed level for R1b and other clades (in this sense I congratulate of the detail produced for haplogroup I in this study).

    ReplyDelete
  4. Thanks for this very informative article !
    As a remainder, my Y-DNA is R1b-S-2b (formerly R1b1b2a1a2c) which appears to be more widespread than initially thought (I think it is by far very prevailing in the valley of Aran).

    ReplyDelete
  5. Mahu wrote, "I chose 'South' because it is distributed mostly south of the North clade, 'Southwest' fits (but longer to write) and West is kind of confusing because the North clade is found predominantly in NW Europe. Both are similarly 'West' in fact: it's 'North' and 'South' within the Western European or generally Western (worldwide) subclade of R1b."

    My response:
    I don't think both are "similarly west" at least far as the far west/Atlantic fringe is concerned. P312/S116 is of exceptionally high frequency along the Atlantic coast, whether it is north or south. That's the problem I have with the word "south" for describing P312.

    The true picture is that P312/S116 is very dense along the Atlantic but of significance further to the east, into Scandinavia, Germany, Switzerland, Northern Italy, etc.

    U106/S21 is not of greater frequency in Scandinavia, just about the same so the divide is not really north vs. south but really with quite a bit of overlap and P312/S116 just being very strong to the west.

    Why not just call them P312 and U106? That is objective.

    It doesn't matter. They are both still members of the L11 family.. and of about the same age. You are right, this is a problem for a SW Europe origin for L11(S127). P312 and U106 should be from approximately the same origin.

    U106 just doesn't show up along the Atlantic. The truth is P312's eldest son, U152, doesn't either. It is true that SE France is a good candidate for P312's launchpad. U106 doesn't fit well with that though.

    This is why I lean towards the Alps themselves or the Danube Valley as a launchpad for L11. Some went along to the Rhone valley and the Franco-Cantabrian region whereas others went up the Rhine and others may have even gone along the east side of the Carpathians to the Baltic.

    However, as far as an Ice Age refugium, I don't think the Franco-Cantabrian region fits at all. The age of these L11 subclades can be argued about but they are still younger in Europe that E, G and I, etc. If your talking about an Ice Age refugium, the Franco-Cantabrian is hard to place timing wise.

    ReplyDelete
  6. @Mike: Whether we prefer a Paleolithic or Neolithic explanation, it's clear that the Atlantic Islands are a destination and not the origin and, in mainland Europe, their distribution is South and North, although of course the coast slides to the East as we go northwards.

    The again my theory is that R1b-S is original from the Franco-Cantabrian region, while R1b-N would be from Doggerland (now under the North Sea). If that's correct, then it's almost strictly a South-North axis.

    "The true picture is that P312/S116 is very dense along the Atlantic but of significance further to the east, into Scandinavia, Germany, Switzerland, Northern Italy, etc."

    Most of that (notably Italy, Switzerland) is "South" relatively to where R1b-N is dominant. Scandinavia would, like the Atlantic Islands be a destination and not an origin in any case.

    I could settle for R1b-Main and R1b-North but it's just a matter of choice and I'm not changing the wording of this entry just because.

    "Why not just call them P312 and U106? That is objective".

    Because I have to look at the ISOGG page almost each time a lineage is defined by a mutation name, including these. It's too easy to mix U106 with, say, U105 - and then they are very different lineages.

    "They are both still members of the L11 family.. and of about the same age".

    I have no idea which may be their ages because the "molecular clock" is and has always been a sham. Even Dienekes admitted that it's totally broken months ago, at least for Y-DNA (Y-STR based "clock"), ensuing then an ugly argument with Klyosov.

    "You are right, this is a problem for a SW Europe origin for L11(S127)."

    I would say that the transitional clade R-M412* and its subset R-L11* is from Central Europe or Italy, although it's scattered enough for that to be blurry (also common in both forms in the SW). But my impression is Hungary >> North Italy >> SW Europe, which is exactly the route of Aurignacoid industries (excluded the Upper Danub, where L11 but not M412 is found).

    (By the way, I'm posting a larger version of that reference map because the one I linked to is tiny).

    ...

    ReplyDelete
  7. ...

    "P312's eldest son, U152"

    I cannot accept that claim. Unless proven otherwise all clades are of indefinite age and, as the Y-STR MC methodology has been declared null and void. And was never demonstrated right to begin with (what is a shame, specially how many researchers use it as if it was "God's word").

    "This is why I lean towards the Alps themselves or the Danube Valley as a launchpad for L11".

    L11 maybe but R1b-S not. But the "age" of U152 has nothing to do with it. First R1b-S (P312) must coalesce: there's no direct link between L11 and U152. You cannot explain P312 based on L11, while P312 would be one of three pillars (along with U106 and L11*) in order to discern L11 itself.

    And where did R1b-S coalesce? Where the geometry and/or diversity of its basal subclades point us, which, as far as I can discern, is SW Europe. There's a lot of R1b-S* to discern yet but most of it is in SW Europe, so I am pretty confident that my prediction will be eventually confirmed with the unveiling of that hidden diversity.

    The main doubt is whether it coalesced in the Franco-Cantabrian Region (what demands a Paleolithic origin) or in Portugal (where it could agree with a Neolithic-Megalithic spread. However this last would have needed to carry around other Portuguese lineages like E-M81 (and with the occasional exception) we do not find that anywhere.

    "The age of these L11 subclades can be argued about but they are still younger in Europe that E, G and I, etc".

    I simply do not accept any "age" argument: age cannot be inferred from STR variance, at least not with any certainty. We have to explore the history of lineages being "blind" to the temporal dimension, which we can only infer (barring the odd aDNA finding) from the geography.

    ReplyDelete
  8. Mahu wrote, "Unless proven otherwise all clades are of indefinite age and, as the Y-STR MC methodology has been declared null and void. And was never demonstrated right to begin with."

    Nothing is proven beyond a shadow of a doubt when trying to associate genes with old population movements. Only ancient DNA will prove much of anything but that much because it won't be a representative and in-depth sample for a long, long time, if ever.

    You can declare STR diversity as being null and void as a useful tool but that doesn't make it so. Scientists are still using it, and with good reason, mutations occur with generations and generations occur with time. Diversity is the result of time.

    Yes, you can argue it is a very imprecise tool and the mutation rates are controversial, but at worst case relative STR diversity between clades provides information as to the relative timing (if not absolute timing) of clades. That's why I say U152 is P312's eldest son. It has the highest diversity.

    Busby's paper has a couple of major holes in it. I don't know if it worth the time to go through it here, but Busby in fact does use STR diversity in building his basic counter-argument to Barlaresque. Busby doesn't declare STR diversity null and void, they just say Barlaresque used the wrong markers and L11(S127) is about the same age all across Western Europe. To me, this finding is significant in and of itself. The L11 family must have spread quickly.

    ReplyDelete
  9. "@FlyOverMan:"
    I apologize. I'm not sure why that identity showed. The identity should say Mike or MikeW.

    ReplyDelete
  10. It's not beyond a shadow of doubt, it's minimal scientific credibility. When C14 was first implemented it had to pass some tests (accurately dating items of known age such as Egyptians mummies or Greco-Roman artifacts) before people began trusting it. This that only seems a minimal requirement for any allegedly scientific claim, is disregarded in the case of the molecular clock, which has not been proven even once. The MC is a mere scholastic construct with weak theoretical basis and, crucially, zero empirical evidence.

    And it's Maju, with "j" and pronounced closer to the English "j" than to the Spanish "j" (kh), though it's maybe more like a "y": mah-yoo.

    "You can declare STR diversity as being null and void as a useful tool but that doesn't make it so".

    It's not just me: it's a landmark scientific paper (subscribed by 28 experts from 24 different faculties and similar from all Europe). Oddly enough, even Dienekes (a long time defender of the shortest possible version of the MC conjecture) admitted it was correct and sound and that it dealt a death blow to the method.

    "Scientists are still using it"...

    They are obsolete. Sadly it happens a lot: the same that they still claim once and again that the Pan-Homo divergence would be 7-5 Ma, when it's been demonstrated time and again that it can't be less than 8 Ma.

    In any case it has never been demonstrated to any level of satisfaction it is just a complacent belief on the power of statistical inference and, through it, of the power of population genetics beyond what it can really do. The greater the claim, as long as is believed, the greater the gain and the prestige... but that has more to do with economy and power than with science: it actually approaches dangerously religion in fact (which uses the same scam methodology: everybody says it, join the bandwagon, don't be a loony).

    "but at worst case relative STR diversity between clades provides information as to the relative timing"...

    I am totally skeptic. STR neighbor-joining trees can sometimes give some information in absence of proper SNP-based phylogenies but otherwise their power of inference has been shown once and again to be low.

    ReplyDelete
  11. As for the last you say, it'd be best to debate in the proper entry, but anyhow, I quote from the paper:

    Contrary to common belief, estimates of ASD [average squared distance], and therefore T [coalescence time], vary widely when different subsets of STRs are used with the same sample.

    ...

    Interestingly, despite the fact that Myres et al. and Balaresque used different STR mutation rates and dating approaches, their TMRCA estimates overlap: 8590–11 950 years using a mutation rate of 6.9 × 10−4 per generation, and 4577–9063 years using an average mutation rate of 2.3 × 10−3, respectively. Separately, Morelli calculated the TMRCA based only on Sardinian and Anatolian chromosomes, and estimated the R-M269 lineage to have originated 25 000–80 700 years ago) [22], based on the same evolutionary mutation rate [25,41] as Myres et al.

    ... we found that different sets of STRs gave different values for T. It is clear, then, that coalescence estimates explicitly depend on the STRs that one uses.

    They do seem to hope to be able to find a realistic formula in the future but they feel unable to offer one that gives any guarantees as of now:

    For now, we can offer no date as to the age of R-M269 or R-S127, but believe that our STR analyses suggest the recent age estimates of R-M269 [20] and R-S116 [21] are likely to be younger than the true values, and the homogeneity of STR variance and distribution of sub-types across the continent are inconsistent with the hypothesis of the Neolithic diffusion of the R-M269 Y chromosome lineage.

    ReplyDelete
  12. I hope that one day we can actually get Y-DNA from Paleolithic skeletons, or can get a more complete sequence from some skeleton from Paleolithic Europe. Ancient DNA is probably the only way to finally resolve where and when the different R1b groups developed.

    I don't trust the molecular clock, I think Dieneke's discussion of the topic best fits my understanding. So until we get enough ancient DNA, I don't think it can be resolved.

    The distribution of the Magdalenian and R1b-L23 (in Western Europe) is so close that I would like to believe there is a connection.

    Any explanation of the distribution needs to explain how it became so dominant along the Atlantic region if it wasn't originally there.

    However, I also find it hard to believe that any of the Paleolithic Y-DNA survived the influx of Neolithic people (with much higher population levels) into the area. Men tend to fair badly during such changes.

    At any rate, I am holding out for Ancient DNA for any more definite conclusions.

    Cheers!

    ReplyDelete
  13. Maju, a quick question: The Basque people are Euskara, the Norsk Vinland saga refers to Skraelings, might the Norsk believed that the natives there were Basque?

    I figure the vik-ing (veg/vey/way + ing) Norsk knew that Basque people were whaling and fishing folks, but possibly they had never met them personally, so didn't know the clothing or culture. Does Eusk- mean something?

    ReplyDelete
    Replies
    1. "the Norsk Vinland saga refers to Skraelings, might the Norsk believed that the natives there were Basque?"

      For sure not. It's widely acknowledged it's an appellative for Native Americans of some kind.

      There were surely no Basques in Newfoundland so early in time. Basque presence in Newfoundland is only faithfully attested since 1520 or so. They were there surely earlier but they were also maybe following the tracks of the Portuguese who most likely knew of America before the Castilians and some even say they sent Columbus to distract them from their primary goals: Africa (gold) and the African route to the Indies (spices).

      It's also possible that Basques "discovered" Newfoundland on their own from either Ireland (where they went to fish traditionally) or the Portuguese base of Azores. The Azores, which are half-way to Newfoundland, may have been known since 1350 or so (although settlement only began in the 1440s).

      Either the discovery and settlement of Azores (largely by Flemings) and other Atlantic islands (Canary Is. was first a Norman feudal claim) or the fishing expeditions to Newfoundland were not in any case matter of any single people (at least before the states monopolized them) but generically of various Atlantic European sailing populations. In the case of Newfoundland we find other peoples from "France" and "Castile", notably Bretons and Galicians. While Basques are the most famed and surely the core, they were not the only ones involved.


      Euskara is the name of the Basque Language, which comes from a root eusk- and suffix -ara or -era (euskera also exists). -era/-ara means "language" or "dialect" or, more generally, "mode", "way of...". The root "eusk-" is more debated but IMO must be that of verb "eutsi" which means "to persist" or "to sustain", also used in the meaning of "to resist". So maybe "euskara" means "the way (or language) of persistence". Eutsi > eusk- is well attested: euskarri (pillar), euskailu (bowl, cup, container), etc.

      Whatever the case the name for Basques is traditionally euskaldun, which is short for euskara duen(-a) = who has the euskara or Basque language. In other words: Basque speaker. Modernly as Basque proficiency has been decaying between persecution and lack of institutional support, neologisms have been invented to mean Basque in the socio-political sense, such as euskotar. But these are not traditional words: euskalherriko is more genuine but long while euskaldun has attained a second less linguistic meaning as well.

      Delete

Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).