June 7, 2014

Y-DNA macro-haplogroup K-M526 originated in Indonesia

Most probably did, although there is always some uncertainty. This is what a new study demonstrates almost beyond doubt.

Tatiana M. Karafet et al., Improved phylogenetic resolution and rapid diversification of Y-chromosome haplogroup K-M526 in Southeast Asia, EJHG 2014. Pay per viewLINK [doi:10.1038/ejhg.2014.106]

It also demonstrates that "Australasian" haplogroups M and S, as well as several other K sublineages from that area belong to the same subhaplogroup, "brother" of P and "cousin" of NO. 

The sample, focused in SE Asia and Oceania, is quite massive (4413 K-M526 samples) so there is very limited chance that further studies will produce major changes in this understanding. However there are some geographic blanks like Myanmar which can produce surprises when they are finally properly studied. Mitochondrial DNA from the Bamar (ethnic Burmese) showed in a recent study to have very high top-level diversity, suggesting that their ancestors played some key role in the formation of the peoples of Asia and beyond. 

But while we await for those future studies or even the political chance to perform them, let us see what this excellent paper can tell us.

First of all the new data allows for a re-drawing of the K haplogroup tree, including renaming proposals:

For easier understanding, I annotated in red the new version of the tree with the populations carrying each of the sublineages in SE Asia and Australasia (but excluding island Oceania because of its recent colonization date and simplicity). I also annotated in green the proposed timeline of formation of various nodes downstream of K, per this study:

The presence of so many basal haplogroups and paragroups (signaled with an asterisk) in Island SE Asia makes compulsory to accept that K2 (formerly known as K(xLT) or MNOPS and right now listed in ISOGG as just K) but also its descendants K2b, K2b1 and K2b2 (P) must have originated in what is now the Malay Archipelago but was once a large emerged peninsula known as Sundaland

This is my reconstruction of the likely centroids of K2 sublineages (named) and the K2* paragroup (stars):

The map originally included several work layers in order to analyze the geographical scatter of the downstream haplogroups within K2b but, for visibility reasons, I chose to to make them invisible. 

Instead I made the following map of approximate plausible routes for the various sublineages of K2:

I must say that K-247, labeled as K2e here but reported as close relative of NO in a previous study, which named it "X", and which is found only in India (reported in two men) may add some extra complexity to the K2a (NO) arrow. It is for example possible that K2a'e and P migrated northwards jointly, splitting ways somewhere in Indochina (K2e migrating to India with P1 and maybe some already formed Q remaining in Indochina as well). This matter however requires more investigation and so far other possibilities such as later independent minor flows between South and SE Asia are equally likely.

Although not detailed enough to capture the nuances of the rare basal sublineages found in the various populations of Island SE Asia, this map may be of help for some in order to illustrate the importance of patrilineal haplogroup K-M526 globally:

Overall this study underlies and vindicates my repeated claim of SE Asia playing also an important role in the formation of the Asian+ branch of Humankind, together with South Asia. Something I have repeatedly suggested is that mtDNA macro-haplogroup N appears to have coalesced in SE Asia, while its most prolific "daughter" R instead seems original from South Asia, but that both have left a legacy East and West of the Brahmaputra regional divide. 

I am not sure on how exactly couple mtDNA N/R with the spread of Y-DNA K2 but it seems almost certain that they are related to a great extent. 

I also suspect that the Toba supervolcano catastrophe may well have caused enough damage to allow for a sudden expansion of one or several human populations after it. I would think that the Toba catastrophe marks the beginning of the expansion of Y-DNA K2 and mtDNA N, although it is quite possible that some other lineages like C were also involved in secondary roles in this secondary, yet so influential, expansion in Asia and Oceania.

Another possible element which may have aided this expansion could be dog domestication, which, although so far cannot be documented before 33,000 years ago in Altai, is suspected to have happened first in SE Asia.


  1. There is another explanation. If you look at glacial maximum maps, you will see that the only tropical/rain forest zones, that also had had gateway to other places of the world, was Sundaland. That, with except with Southern Africa, which was cut off by an immense super arid zone, much larger than the actual super arid zone of Sahara, Holdridge system, (which is actually now a rather small stripe) and India. So, maybe this is not something so old.

    1. I don't follow your logic at all, sorry, and anyhow it seems that there is a misconception in your idea of "an immense super arid zone, much larger than the actual super arid zone of Sahara".

  2. I got it from here:


    If you look at the average precipitation in sahara, you will see that it doesn't achieve a very low extreme value as the description from the LGM. And if you look at its actual map, satellite view, you will see that the zone of extreme aridity is actually a thin stripe.

    1. Alright. Interesting (although I miss a legend for those maps I can guess the meanings more or less).

      The key issue here is that LGM style conditions only affected to that period. Although there were fluctuations in the Mid-Late Pleistocene, excepting some punctual periods, mostly caused by super-volcanoes, the glaciation conditions were in general milder.

      Here we are talking (my understanding) of period since the Toba event (~74 Ka BP) and the well documented colonization of Western Eurasia ~50-40 Ka BP, including Altai (key in tracking Y-DNA Q to Siberia and America). While the immediate millennia after Toba were probably extreme, later conditions ameliorated until the HE4 (Campanian Ignimbrite event, ~40 Ka BP), so there is a bracket of 30 millennia when those extreme conditions did not exist.

      Even with the super-volcanoes' cooling effects, there was never anything like the LGM since the previous LGM, c. 140 Ka ago (before the OoA): → http://anthro.palomar.edu/homo/images/Pleistocene_temp_change_graph.gif

      Here instead we perceive an LGM-like episode after Toba but later clearly ameliorated since c. 50 Ka BP, and also an even milder climate before Toba:

      → http://i90.photobucket.com/albums/k247/dhm1353/Climate%20Change/PleistoceneCO2vTemp.png

      So most of your arid zones were not really that arid most of the time. Only in specific periods. It is an error assuming that the climate of the whole Late-Middle and Upper Pleistocene is all like the LGM, not at all.

    2. Yes, I am aware of that. But I am also noticing the human carrying capacity of different biomes, even during very short times, when humans would be close to extinction, except for Sundaland.

      The Toba even would be destructive everywhere, but I think it should be worse for places close to it, due immediate aftereffects of the explosion. So, humanity would be constrained to small groups everywhere.

      In the case I pointed out, the carrying capacity of Sundaland, without the Toba, would be much higher than anywhere else and the repopulation wave from there would be very intense and fast, against very sparse population groups elsewhere.

    3. I had doubts about N coming north from Sundaland because Shi 2013 didn't find any in Thai, Malaysians or Filipinos, and only 0.09 in Indonesians (could that have got there via later trade with China?). I think N and O must have split in South China, with only O going south.

    4. This study does not deal directly with NO but with its ancestors and side relatives. I estimated for my reconstruction maps above that NO coalesced (i.e. the NO split) in South China, but in any case, its precursor ("pre-NO") almost certainly traveled northwards from Sundaland, either following the coast or the Mekong or whatever.

      That's mostly determined because K2*, K2c and K2d are only found in that area, while the same kind of reconstruction applied to K2b strongly suggests also Sundaland as origin.

      In the case of NO anyhow, it must be emphasized that what Karafet calls K2e has been previously reported (under the name of "lineage X") as a brother lineage to NO and different from MP (the core of K2b identified in that study) and exclusive of India (n=2). The exact position within the new refined phylogeny needs to be confirmed but, if it's actually par of "pre-NO" or K2a, then it would indicate a previous split in (probably) Indochina. It's just a detail but worth considering if what you have in mind is NO.

  3. How much of that Y-DNA formerly called N did this study actually find in Southeast Asia, south of China? It seems to have a much more northerly distribution than the Y-DNA formerly called O. Wang & Li 2013, http://www.investigativegenetics.com/content/pdf/2041-2223-4-11.pdf

    1. N and O retain their names, even if they also get (proposed) a new equivalent nomenclature in terms of K (K2b for NO in Karafet's terms).

      N seems most basally diverse for all I know in Southern China, while N1 may have originated closer to Tibet. O also seems to originate towards Southern China or SE Asia, so overall a SE Asian or South China origin for NO seems pretty conclusive.

      The overall pattern suggested by Wang & Li (your link) does not seem incorrect (grosso modo) but there should be less emphasis in a hypothetical West Asian origin.

      While it is no doubt correct that the OoA migration went via Arabia c. 125-90 Ka BP, it is also quite demonstrated by now that modern human presence in India and China is as old as c. 100 Ka BP. On the other hand the modern human penetration in West Eurasia other than Arabia/Palestine (and probably Persian Gulf, then a marsh) can only be tracked to the first Upper Paleolithic ("Aurignacoid" industries), which does not seem much older than 50 Ka BP.

      That means that the OoA had the following main phases:
      1. Preliminary stage in Arabia (since 125 Ka BP).
      2. South and SE Asian stage (since 100 Ka BP).
      3. Post-Toba readjustments (where I believe that the expansion of K2 and mtDNA N/R belongs to), surely including the colonization of Australasia.
      4. Further expansions: (a) West Eurasia (with some back-flow to Africa) and (b) NE Asia (and later America). These areas presented further difficulties: Neanderthals (a) and cold (b and parts of a) and therefore were not readily available right after the OoA.

      See also this: http://forwhattheywereweare.blogspot.com/2013/04/synthesis-of-spanish-language-series-on.html

  4. How much of that Y-DNA formerly called N did this study actually find in Southeast Asia, south of China? It seems to have a much more northerly distribution than the Y-DNA formerly called O. Wang & Li 2013, http://www.investigativegenetics.com/content/pdf/2041-2223-4-11.pdf

    1. I think you must to learn more about mtDNA Maternal line to identify a different ethnic group in Far East Asian region. In fact a Northern and a Southern Chinese have a similar Paternal Hg but they have a quite difference Maternal Hg and probably an mtDNA are slight more represant an Asian physical appearance phenotipe rather than a Y Chromosome even though these 2 Haplogroups aren't responsible to make a human physical appearance at all.

    2. I think you must to learn more about mtDNA Maternal line to identify a different ethnic group in Far East Asian region. In fact a Northern and a Southern Chinese have a similar Paternal Hg but they have a quite difference Maternal Hg and probably an mtDNA are slight more represant an Asian physical appearance phenotipe rather than a Y Chromosome even though these 2 Haplogroups aren't responsible to make a human physical appearance at all.

  5. So when did the K1 and K2a group emerge........how many years before the K2b2 group?

    1. According to Karafet, the expansion of K was very fast. If the expansion of K2b happened less than 3 Ka after the K node, that says it all: in less than 3 millennia (pre-)K1, K2, (pre-K2a) and K2b formed.

      They do not study K1 (LT) nor K2a (NO) here, so there are no specific age proposals for them but at the very least the distinctive stem ("pre-" stage) was already there.

      As the P1 node is at most 20 Ka younger than K, the whole process probably happened in just that time-span, including the formation of K2a/NO (i.e. the basal division between N and O, at the very least). K1/LT is more difficult to ascertain, as it evolved in a different geographic context (Pakistan and surroundings). Whatever the case these are just age estimates and always subject to revision, refining and even demolishing criticism, it's not "rocket-science" at all.

    2. Well I assumed Karafet lines where relevent and that M20, M184, M214 started the same time as P331 .
      Maybe I got confused because the old system had G,H,I,J,K,L,T breaking off from F way way way before the R group was even formed or even before P formed.

    3. What you say about F subclades is a good example: the stem of G ("pre-G") is the first line to diverge from F but it is probably also one of the most recent to coalesce as such haplogroup, i.e. to diversify in various sublineages (G1, G2, etc.) A line can remain as a minor lineage for a huge time, even up to present day (in which case it does not get a name but would fall within the F* or F-others category). At any point, if luck is on its side, it can expand and then, and only then, it becomes a haplogroup (emphasis in "group": a set of related haplotypes).

      So the stem is not "the flower", the haplogroup. Divergence at any bifurcation is approximately simultaneous (at least in Y-DNA, mtDNA is more ambiguous) but the formation of the haplogroup as such, as the set of multiple related lineages we perceive from the vantage point of present time, may happen much later.

      When exactly? That is what molecular clock estimates try to guess but so far I am skeptical of the methodology and results, although recently some full chromosome sequencing instances have provided a more realistic approximation than the crude "traditional" STR methods, although always subject to calibration bias.

    4. Then you need to correct the new branches and state that M20 and M184 have never been found east of modern Bangladesh.


      and your paper in 2012 says so as well

      Clearly the paper should at least say when M526 split from P326 to make any sense of the rest of the paper, without this its another "fictional" attempt IMO

    5. I only annotated in red those lineages found in Island SE Asia and Oceania (excluding for practical reasons the islands of post-Neolithic colonization). I thought it was clear but maybe there's room in my phrasing for confusion.

      It is implicit that neither L nor T have been found in SE Asia (no annotation in red) and that is also reflected in the arrows' map, where K2 (and not K as a whole) is shown arriving from South Asia.

      "and your paper in 2012 says so as well"

      My "paper"? I have never published any "paper" (except very arguably a non-reviewed article in Spanish that I formatted in PDF for easier sharing).

      "Clearly the paper should at least say when M526 split from P326 to make any sense"...

      Why? The authors only mention their estimates for three nodes and all relative to K. Of course this is arguable, like all molecular clock estimates, but I thought worth mentioning for reference and because it implies, in their opinion a quick branching of K and K2 in just few millennia.

      You are free to agree or not. I'm personally agnostic on that aspect but the study is in any case most interesting, even if ignoring that part, as it clarifies the structure of haplogroup K2 (a most important advance) and shows that the most likely origin of this haplogroup is in Sundaland. Those findings are the highly informative core of this study and that is what makes it so interesting.

      A few relative chronological estimates are much less relevant but they are said with enough humility, relative to the K node and not absolute, that even these deserve my respect even if only for not being just another futile attempt to misinterpret everything with risky methodology. What these chronological notes underline is that the expansion (split) of K, K2b and K2b1 (P) happened in "rapid" sequence with few millennia between each node.

      Or in other words:
      1. There was a split between K1 (LT) and K2 as this one migrated eastwards.
      2. Then K2 split in Sundaland
      2. Then K2b expanded in what is now Borneo.
      3. Then P expanded from ~ Northern Borneo, with a branch (P1) back-migrating to South Asia.

      This whole process took, according to Karafet not more than 5000 years but if you think that, say, 10,000 years is better for whatever reason I am not going to argue.

  6. Hello Maju, I am back! I found this interesting post on Internet on Burmese yDNA (http://www.anthrogenica.com/showthread.php?2234-Y-DNA-data-from-Myanmar):
    "The following study by MS Peng et al let us have a peek into Myanmarese ydna

    Bamar people is the main ethnic group which makes uo two thirds of total population. They show some variety in their ydna and have some non-trivial south asian input as well as some intresting ones like D and NO .

    C3-M217 1
    D-M174 3
    F*-P14 3
    H1a-M82 1
    J2a2-L27 3
    J2b2-M241 1
    L1c-M357 1
    NO*-rs9341279 1
    O1a1-P203.1 2
    O1a2-M110 1
    O2a*-PK4 1
    O2a1*-M95 10
    O2a1a-M88 2
    O3a2c-8Y2897 21
    Q1a3-L56 3
    R1a1a1-Page7 3
    R1b1a2a-L23 1
    R2a-M124 1
    Total 59

    Rakhine/Arakanese is also similar to bamar On the other hand other groups chin and naga are almost exclusively of haplogroup O .All the 15 naga samplese belong O3a2c-8Y2897."

    To sum up: a lot of different O clades, that might be the oldest, with the exception of NO and perhaps F* and D; plus a few Indian lines. The presence of D is always interesting. The relationship of Q1a3-L56 with other Q1a3 lines would also be worth a further research in order to assess its possible older presence. Moreover, I am wondering what the relation of NO*-rs9341279 is to haplogroup X.

    1. The Bamar are also very diverse mtDNA-wise, particularly in the M* paragroup but also in general (80 different matrilineages in a sample of n=327, not more than 6 individuals in any specific haplogroup). It seems clear that Myanmar, and particularly the Bamar, is quite relevant to Asian+ paleohistory.


Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).