April 27, 2012

Ancient mitochondrial DNA from the Basque Country and Cantabria: unmistakable mtDNA H in Magdalenian Cantabria

This seems a very important paper because, if everything is correct, it is the first peer-reviewed publication to establish conclusively that mitochondrial haplogroup H, the most common matrilineage of Europe today, existed in Paleolithic populations in Europe, specifically in Cantabria, Spain.

There have been previous reports of such important presence of mtDNA H in the area, notably Chandler 2005 (on Epipaleolithic and Neolithic Portuguese) and Kéfi 2005 (on Upper Paleolithic Moroccans). However both studies were informal for some reason, what has been used by some to grasp for straws in their claim of this lineage having arrived to Europe (and North Africa) only after Neolithic and oversimplify the Paleolithic genetic pool of Europe to almost only variants of haplogroup U (never mind that U is such a large, ancient and widespread haplogroup that saying "U" is almost like saying nothing).

Edited section: the authors did test for RFLPs, the haplogroup assignation is as good as it can be.

After the first read to the paper my impression was that they had only tested the HVS (both subsectors I and II), leaving some, even if smaller, doubt about the adscription of the Cantabrian Magdalenian lineages to mtDNA H.

But, luckily for truth, Jean Lohizun (see comments) found that the authors, who are partly the same team as Izagirre & De la Rúa 1999 and are using the same methodology up to a point, have actually tested not for coding region SNPs but for a standard list of RFLP markers. However the mention is so cryptic that it is most difficult to find: 

In order to classify the mitochondrial variability of the individuals analyzed in this study, we proceeded to amplify 11 markers, which are required for defining the 10 Caucasian haplogroups described [63]. The protocol and primers are described in [17], [20], [42]. The digestion patterns were verified using a fragment Bioanalyzer (Agilent Technologies).

This seemed to imply that when the authors say "H" they mean -7025a (aka -Alu I) but this marker as such are listed nowhere that we can see easily.

So I squeezed my mind until I concluded that Jean must be correct but a very slight doubt remained, so I emailed the authors, who quickly responded confirming Jean's keen reading.

Excerpts and translations:

Concepción De la Rúa (corresponding author) said:

Para conocer con más seguridad el haplogrupo, se hicieron además  RFPLs, que es una tecnica que permite evaluar la existencia de mutaciones puntuales en la region codificante del mtDNA.
Translation: In order to know with greater certainty the haplogroup, RFLPs were performed, which are a technique that allows for evaluation of punctual mutations in the coding region of mtDNA.

She added in relation to a previous study of her authorship:

Ambos analisis se realizaron en el paper recien publicado, pero no en el paper de Izagirre & de la Rúa de 1999, en donde solo se hicieron RFPLs.

Translation: Both analysis [RFLPs and HVS] were made in the recently published paper but not in that of Izagirre & de la Rúa from 1999, where only RFLPs were tested for.

Montserrat Hervella (lead author) said:

También, hemos realizado PCR-RFLPs con el fin de determinar los SNPs de la región codificante y así reconfirmar el haplogrupo al que pertenecen cada uno de los haplotipos que hemos obtenido mediante secuenciación. Los enzimas que hemos utilizado aparecen en varias citas bibliográficas en el articulo (17, 20, 42, 63).

La cuestión que nos plantea en torno a la determinación del haplogrupo H, la respuesta es que si hemos usado la enzima Alu I para determinar el SNP en la posición  7025 de la region codificante del ADNmt.

Translation: We have also performed PCR-RLFPs with the purpose of determining the SNPs from the coding region and therefore confirm the haplogroup to which each of the haplotypes obtained by sequenciation belong. The enzimes we used appear in several bibliographic citations in the article (17, 20, 42, 63).

About the question that you ask on the matter of the determination of haplogroup H, the answer is that we have indeed used the enzyme Alu I to determine the SNP in position 7025 of the coding region of mtDNA.

This should dispel any doubt: there was mtDNA H (and H6) in Magdalenian Cantabria.

(End of the edited section)

In any case these are the results as reported by the researchers (and synthesized by me in a single image):

Notice that the dots are not really precise in their geographic locator role (I may try creating a better map later).

Instead of using numerals I repeated the name of the reported haplogroup with the intention of giving a more visually intuitive impression (without going through the process of creating charts). 

The results for Paternabidea (tested for coding region markers) were discussed in November. They basically seem to establish that modern Basques are almost identical to Neolithic Basques.

But what about Paleolithic ones? The results of this one study rather suggest that a little bit of both: on the side of continuity, the same U5 sequence is reported for Erraila (Magdalenian) and Marizulo (Neolithic) but, on the side of change, all Paleolithic Basque sequences (n=2) are U5 lineages, while U5 today is a much less important matrilineage. 

It may well be argued that this result is a fluke, a coincidence, but it is consistent with what we see in Epipaleolithic Central and North Europe (mostly U5 with some U4, Bramanti 2009), suggesting that, while the model of recolonization of much of Europe from the Franco-Cantabrian region may well be correct, the story does not end there at all.

On the other side we see that the dominant lineage today among Basques and Western Europeans in general, haplogroup H, was already present in Magdalenian times in Cantabria, what is consistent with it being found in Epipaleolithic Portugal and Oranian Morocco and really puts to rest the hypothesis that promoted that it had arrived from West Asia with Neolithic colonists (something argued as a matter of fact but without any clear evidence). 

Food for thought (a hypothesis)

As of late, although I have yet to formalize it somehow, I've been thinking that a serious possibility is that mtDNA might have spread partly with Dolmenic Megalithism. While there is some apparent Neolithic expansion of H in the Basque area (apparent in this paper and resulting in an almost modern genetic pool), in other parts of Europe this is less obvious, with H showing up but not reaching modern levels just with Neolithic. In fact the loss of other Neolithic lineages like N1a strongly suggests that the populations of Central Europe were largely replaced after the Early Neolithic. Where from? Again from Southwest Europe, I suspect but not in the context that was once believed of Magdalenian expansion, but maybe in the context of Megalithic expansion instead, with origin not in the Franco Cantabrian region but in Portugal. 

This is just a draft hypothesis that I have mentioned before only in private discussions or at best in the comments section somewhere, and certainly it would need more research. My main argument is that before these results for Cantabria, the only pre-Neolithic location where the data clearly suggested very high levels of mtDNA H (near 75%) was Portugal (Chandler 2005) and Portugal played a major role in Neolithic and Chalcolithic Western Europe. As I said above, ancient Portuguese apparently developed the Dolmenic Megalithic pehnomenon (culture, religion...), later they developed some of the earliest Western European civilizations, since the Third millennium BCE, specially Zambujal (also at Wikipedia), which were central elements of this long-lasting Megalithic culture and later of the Bell Beaker phenomenon as well.

And we need very high levels of mtDNA H in a colonizer population to changed the genetic landscape from c. 20% H into c. 45% H, we need something like a 70-80% H ancestral population unless replacement was total, what I think unlikely. 

What happened to that overwhelmingly H population of Portugal (assuming that the hypothesis is correct)? They were probably colonized in due time, possibly in the Bronze Age (the mysterious archaeological "horizons" that replace urban life in much of Southern Portugal in that period with their strange crab-shaped elite tombs, vaguely resembling Mycenaean circular walled ones)  and/or in the period of Celtic invasions from inland Iberia later in the Iron Age (Hallstat periphery).

It's just a tentative hypothesis so far but one that I believe I must state for your consideration.

Air view of some the acropolis of Zambujal (the roofed area is a modern farm that sits atop)


  1. Maju I looked at the "Analysis of the variability of mtDNA" section, and it turns out they did test for the "11 markers/SNPs that define the 10 Caucasian haplogroups."

    See here:

    "Sequencing of mtDNA HVR-I, nucleotide positions (nps) 15,998–16,400, and mtDNA HVR-II, nps 16504-429 as per [61], was undertaken in six overlapping fragments, each with a length of approximately 100 bp (base pair). HVR-II sequencing was carried out in samples with no polymorphic positions in HVR-I (Table S6). Similarly, the fragment between primers 8F and 8R (Table S6) was amplified in all samples to determine position 73 of HVR-II. The PCRs were performed in 25 μl of reaction mixture containing 10 mM Tris-HCl pH 8.3; 2 mM of MgCl2, 0.1 μM of each dNTP, 0.4 μM of each primer, 5 units of AmpliTaq Gold (Applied Biosystems) and 10 μl of diluted DNA (1 μl of DNA extract in 10 μl of 1 mg/ml BSA). Cycling parameters were 95°C for 10 min; followed by 40 cycles of 95°C for 10 sec, annealing temperature for 30 sec, 72°C for 30 sec; and a final step of 72°C for 10 min. The annealing temperatures of the primers of HVR-I were as follows: 60°C for the A1/A1R primer pair, 58°C for 2F/2R and 4F/4R, 57°C for 1F/1R and 55°C for 3F/3R and 5F/5R, the primer sequences are listed in [62]. The sequence and annealing temperatures of the HVR-II primers are shown in Table S6. The amplification of each fragment was undertaken in independent PCRs and each fragment was amplified and sequenced twice from two independent DNA extract. In the case of positive amplification and the absence of contamination, the amplifications were purified by ExoSAP-IT (USB Corporation), with subsequent sequencing in an ABI310 automatic sequencer using chemistry based on dRhodamine. The results obtained were edited with BioEdit software (http://www.mbio.ncsu.edu/BioEdit/bioedit​.html) and the sequences were aligned manually.

    In order to classify the mitochondrial variability of the individuals analyzed in this study, we proceeded to amplify 11 markers, which are required for defining the 10 Caucasian haplogroups described [63]. The protocol and primers are described in [17], [20], [42]. The digestion patterns were verified using a fragment Bioanalyzer (Agilent Technologies)."

    So it seems to me they did test for coding regions.

    1. I do not see that: the 11 markers analyzed are probably HVS markers. There's no HVS that describes H however but would they have analyzed coding region SNPs they would have published the results, and they did not.

      Sadly not.

    2. If you look at the reference they give, which is listed as 63, you will see the following study:

      63.Macaulay V, Richards M, Hickey E, Vega E, Cruciani F, et al. (1999) The emerging tree of West Eurasian mtDNAs: a synthesis of control-region sequences and RFLPs. Am J Hum Genet 64: 232–49.

      So the 11 markers refer to control-region sequences that define 10 West Eurasian haplogroups.

    3. Then why haven't they listed them anywhere as would be almost normative?

    4. Sorry: "control region" is synonim with HVS, D-loop or HVR. It's clear that they never tested for coding region sites (sadly enough).

    5. Here is the paper:


      I believe that the 11 markers refer to RFPL markers, otherwise why mention that, when they already mentioned that they sequenced the HVR-I, and HVR-II. Moreover, this is from the referenced paper:

      "By using high-resolution RFLP analysis, Torroni et al.
      (1994a) had previously identified four clusters (H, I, J,and K) among North Americans of European ancestry.
      Subsequently, Torroni et al. (1996) applied the same
      methodology to two Scandinavian population samples
      and identified five additional clusters (T, U, V, W, and X), which, together with the previous four clusters, appeared to encompass virtually all examined European mtDNAs."

  2. You know of the paper published back in 1999 by two of the authors of this study. Neskutz Izagirre, and Concha de la Rua. This is an excerpt from that paper:

    "We typed those nucleotide positions strictly necessary to correctly identify the nine haplogroups characteristic of Caucasians: 7025 AluI, 4577 NlaIII, 8249 AvaII, 9052 HaeII, 8994 HaeIII, 12308 HinfI, and 13704 BstNI."

    So it is very likely that when the authors above mentioned the 11 markers they are referring to 7025 AluI, 4577 N1aIII, 8249AvaII, 9052 HaeII, etc.

    1. After due consideration you may well be right, Jean. But the reference is so cryptic that I'm still in doubt: they insist a lot about HVS I and II and do not mention again in all the paper nor the supplemental materials the RFLPs.

      If you are right, then that means that the reported haplogroups of Izagirre & De la Rúa for the peripheral Basque Country in the Metal Ages are more or less solid (and I was displaying them apart as just "reported" because I could not contrast the sequences!) It is also consistent with the paucity of details provided in previous studies by this same time.

      Well, thanks for your insistence in any case because you may have pointed to a key issue, although right now I do not know how to solve it other than writing to the authors, what I may still do.

    2. I've been going through all it again and you must be right: the original quote makes explicit mention of digestion enzymes: there's no other option. Still it would not be too much if they were a bit more explicit about it in table 2 for example (or somewhere).

  3. I have just sent an email to Concepción De la Rúa, who is the corresponding author. I am already 99% persuaded that you must be right but if I do not ask, I blow up.

    1. I already have the answers: they did indeed used RFLPs and that means that when they say H it means H. I will edit the entry accordingly.

  4. About N1a, I wouldn't worry too much about it, because LBK has been documented to pick up lots of random haplogroups along its way, and also started off with just some random collection in the Balkans.

    As I have stated elsewhere, the H/U/V conundrum could possibly be explained by a "spring" effect:" just before LBK, U was the last haplogroup to seek refuge, in the northernmost Franco-Cantabrian refugia, and then (of course) also was the first to re-settle the North, with H "springing back" later and behind. Just a thought.

    1. Maybe but the question is when and how did mtDNA H (notably) re-expand so as to make the modern population genetic pools. I'm here proposing a work hypothesis for that, regardless of what happened before.

  5. "unmistakable mtDNA H in Magdalenian Cantabria"

    At last. Vindicated. :)

  6. " What happened to that overwhelmingly H population of Portugal (assuming that the hypothesis is correct)?"

    There is an interesting country not far from Portugal, and forming part too of the franco-cantabrian region. This country is rich in H mtDNA, megalithic monuments and ancient rock art.





    1. The text of Alinei is not correct in many aspects, just to put an example fig. 9 has almost nothing to do with the reality of Dolmenic Megalithism. It's an ideological rant and little more.

      AFAIK, the UP of Galicia and North Portugal (historical Galicia) is restricted to the areas not far from Asturias, specially Valiña, near Monforte. The rest of the country only really gets strong indications of habitation by modern H. sapiens since the Neolithic. This may be a mere coincidence but hard to imagine considering the density of findings in nearby areas like Asturias or even Estremadura (Portugal).

      It's likely in any case that some people lived in Galicia (besides the near-Asturias strip) in the Paleolithic/Epipaleolithic era BUT the real signs of habitation only begin with Neolithic and very specially Megalithism, both most likely original from Portugal.

      In fact Galicia and other West Iberian areas (León, Extremadura, Asturias, West Andalusia, even parts of Cantabria) share genetic pool with Portugal, including some minor North African elements that are lacking in the rest of the peninsula. The exact why of this shared West Iberian genetic pool is indeed debatable but in the case of Galicia I'd say it's because its origins are in Portugal and Asturias. Another less clear issue is why Portugal and Asturias have such similitudes... but I don't want to write so much in a single comment.

      In any case there is not a single cultural phenomenon that we can say is particularly centered in Galicia. Megalithism and successors (Bell Beaker, Atlantic Bronze) have Galicia in them but I see no reason to imagine Galicia as central and not just one more country of such an ample league.

      Instead with South Portugal I do have plenty of reasons: their Megalithism is older than any other by at least a thousand years, appearing soon after Neolithic (the very first Dolmenic Megalithism on Earth), the area hosted many towns, typically fortified since the Third Millennium BCE, notably the already mentioned Castro do Zambujal (Torres Vedras), whose true dimensions are so far unexplored, and which was the capital of a relatively small but prosperous and surely influential civilization of Estremadura (both Portuguese peninsulas of Lisboa and Setubal).

      If any other country played a major role in the extension of early Megalithism in Atlantic Europe, often linked to first Neolithic, that was Brittany (more precisely Brittany + Loire + Upper Normandy = ancient Armorica). Its peculiarly hierarchical (priestly?) burials may specially have influenced British Megalithism and, I imagine, the origins of Druidism much later in time [Druidism is not originally Celtic: Iberian Celts had no druids, the Vaccei were even described as "atheists" - because they had been isolated from the mainland and Britain since before La Tène - Roman historians attested that Druidism had only recently arrived from Britain to mainland Europe].

      Another reason to say Portugal is that, using Chandler 2005 as references, and regardless if I consider the reported lineages or only the ones whose ascription I can confirm using PhyloTree, the apportion of mtDNA H among Neolithic Portuguese was c.75%, something without comparison in any modern population (nor any ancient one that we know of). And we'd need something like that to generate mtDNA pools of c. 45% H after admixture with populations of Central-North Europe where only some 20-25% H was reported at the beginning of Neolithic. You either need LOTS of extra H or resorting to a hypothetical biological adaptiveness o this haplogroup.

      I would not totally discard the latter but I'm looking for alternative explanations first of all.

  7. Maju,

    I agree with the idea that mtDNA H could have spread with Megalithism, and all from a source in Southern Portugal.

    One thing that may need to be explored is whether this Southern Portugal was intrusive from NW Africa or native?!

    1. I must say that I have my many doubts myself but some of the data does suggest such idea, while other might be contrary.

      For example there is, as we know, some North African genetic influence in West Iberia. If this influence is early Neolithic or even Paleolithic and the Megalithic replacement hypothesis is correct, we should see it also scattered through Atlantic Europe and we do not (extremely thin or localized at the most).

      Also two Megalithic samples (West France, West Sweden) have produced mtDNA pools that are only 33% H, quite lower than the hypothesis seems to demand and, while a bit higher than other Neolithic samples, they are not really out-of-line with them and they are not yet modern enough either.

      We probably lack some data or some better explanation than mine.

  8. Firstly, thanks for this great work and discussion. While I am a novice on DNA, I have quite some knowledge of South West Iberia including deep scan data (see www.Merlinburrows.com). The core piece I think you are missing in your hypothesis is understanding that this region was absolutely devastated by a 300m mega Tsunami that would have killed a huge number of the population. I use as a small part of my evidence the Betica Depression and the inland sea (last one of six over the millenia was about 12000yrs ago). This would significantly affect your results as the source population was depleted and also account for movement of that population. You have to understand that the (large) Dolmens are markers of such a cataclysm. See the position of Dolmen de Soto and the Dolmen de Alberite. These indicate the movement of the dolmen builders was forced. Do you have any update on this great work you have done?

    1. That's highly speculative reasoning, Andreas. Even if what you say about the tsunamis is correct, Iberia is very rugged and elevated territory and even the largest one, even a total polar melting, would only submerge small areas. I trust much more empirical data on actual populations such as archaeological findings, their density along time, ancient DNA, etc.

      Dolmens are clannic tombs, they exist near the sea and on top of mountains, their "invention" in SW Iberia c. 4800 BCE and expansion elsewhere many centuries later is totally unrelated to whatever tsunami. On the other hand it's been speculated it could be related with cod fishing.


Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).