October 24, 2010

MtDNA stars (notes)

[Updated to include some data that showed up at the comments and I did not know initially]

On February, I wrote a 'working note' at Leherensuge on mitochondrial DNA stars. As I said then, star-like structures in haplogroups (stars for short) indicate rapid expansion in the time of few thousand years after a founder effect. They are in fact very hot markers in the genealogical tree.

There are the following categories of stars (slightly modified from the original February note):
  • Giant: only M and H, both around the 40 basal sublineages
  • Large: R, H1 and D4, with 15-18 basal sublineages
  • Medium: N and M4"64, with 12-13 basal sublineages (counting only CR mutations!!!)
  • Small: a host of them (5-10 basal sublineages)
All the medium-to-giant stars can be said to belong to the following two conceptual categories:
  • Early Eurasian Expansion: M, M4"64, N and R (in chronological order counting from L3)
  • Peripheral Eurasian Expansion: H, H1 and D4 (in Europe and NE Asia)
Many small stars also belong to these categories. However a number seem to happen relatively late, specially in West Eurasia (and occasionally Japan and America).

Some stars also belong to the African geography but all happen in the earlier period (simultaneous to the main Eurasian expansion) with the only exception of the mother of all Eurasians (and many Africans), L3, which is necessarily older than M and, therefore, is the first star-like structure one can detect in the human matrilineal genealogy.

While the mtDNA phylogeny goes a lot deeper than L3, there is no sign of strong rapid expansion before this matri-clan. This may be a time marker and may refer to the Abbassia Pluvial, c. 120-90 Ka ago.


The chronology of stars is as follows (loosely based on this previous exploration):
  • A very long period of African coalescence and gradual expansion (almost 30 molecular clock ticks, about half of human history). No star-like structures (no node with 5 or more basal branches)
  • L3 in East Africa (Sudan-Ethiopia-Eritrea) - Abbassia Pluvial?
  • Two ticks without major events
  • M in South Asia and into East Asia and Melanesia - beginning of Eurasian expansion. Huge demic explosion
  • M4"64 in South Asia.
  • Triple expansion (good climatic moment?)
    • N, probably in SE Asia (Burma?), with expansion Westwards
    • M30 in South Asia
    • L1b1a, maybe around Chad
  • R in South Asia SE Asia with expansions to West, North and South
  • P in Melanesia and B4'5 in SE/East Asia (updated)
  • Four expansions:
    • D4 in East/NE Asia
    • M5a in South Asia
    • HV in West Asia
    • L3e1 maybe at Sudan
  • D1 in Beringia?
  • Several expansions (7 haplogroups in 4/5 regions):
    • H and V in Europe. Huge demic explosion (Aurignacian?)
    • M1a and U2'3'4'7'8'9 in West Asia (two different centers possibly)
    • Z and A in NE Asia
    • L2a1 in East Africa probably
  • Two expansions (three haplogroups but two centers):
    • H1 and H3 in West Europe with eventual penetration into North Africa
    • X2, probably from the Levant, into Central Asia and eastwards (reaching to America eventually)
  • Six haplogroups in four regions:
    • C, D4a1 and A2 in NE Asia
    • R2a in West Asia
    • H2a in Europe
    • U6a in North Africa (Egypt?)
  • A tick without anything notable happening
  • Two expansions:
    • M7a1a in Japan (?)
    • J1c in West Asia
  • I and U5b3 in Europe probably
  • W in West Asia, B2 in America (updated).
  • D4h3a in America
  • Two ticks without major events
  • T2 and K2a in West Eurasia
  • Another idle tick
  • K1a1b and T2b in West Eurasia (Danubian Neolithic?)
  • No more stars till present
Color code: purple: L3(xM,N), red: M, blue: N(xR), green: R, black: others

Size code: largest type: giant stars, large type: medium and large stars, normal type: small stars.

Note on chronology:  My understanding of the molecular clock is that you must count from up-down, i.e. from older to younger node. This is different than most geneticists would do, what explains their failure in providing archaeologically reasonable dates, because they need to average the length of downstream branches, causing massive distortions. In my understanding, large haplogroups tend to remain stable as the dominant lineages generally 'win' the tug-of-war of genetic drift, so mutations tend to accumulate specially in smaller haplogroups, belonging originally to small isolated populations where drift would be more purely chaotic (similar odds for all existing lineages, as they were all similarly tiny). 

However I realize that this method 'causes' a very early founder effect in America, maybe c. 45 Ka ago, much earlier than generally claimed. I don't know yet how to explain this well but either is a founder effect limited to Beringia or it is the genetic indicator for the somewhat ghostly Paleo-Indians (maybe with an initial limited spread along the Pacific coast of North America?)

Noticeable patterns:
  • All secondary star-like (fast) expansions of R happen in West Eurasia except P (Melanesia). The same can be said of N, except for A (NE Asia).
  • All secondary star-like (fast) expansions of M happen in NE Asia, except the first ones, which happen in South Asia.
  • There are three or four moments of geographically diverse expansiveness that may be associated to good climatic conditions and may help to calibrate. These are:
    • The expansion of N, M30 and L1b1a, soon after the beginning of the Eurasian expansion
    • The expansion of D4, M5a, HV and L3e1, before the colonization of Europe (maybe in the early Mousterian Pluvial - c. 50 Ka ago?)
    • The expansion of H, V, M1a, U2'3'4'7'8'9, Z, A and L2a1 (maybe late Mousterian Pluvial, c. 45-40 Ka ago?)
    • The expansion of C, D4a1, A2, R2a, H2a and U6a (soon after, Gravettian era?) - it is more regionally-specific than the others, as there are no Tropical African lineages involved.
That's all for now.

18 comments:

  1. "a very early founder effect in America, maybe c. 45 Ka ago, much earlier than generally claimed. I don't know yet how to explain this well but either is a founder effect limited to Beringia or it is the genetic indicator for the somewhat ghostly Paleo-Indians"

    I suspect it's the signature of a relatively large and diverse immigrant population. They had possibly been in Beringia (or nearby) for nearly 45,000 years. Many of the American haplogroups have been drifted out in Eastern and Northern Eurasia by later arrivals.

    "R in South Asia with expansions to the East and West
    P in Melanesia"

    No mention of B? Haplogroup R11'B rapidly vdiversified into three. Then R11 with one click formed three hapolgroups (R11, R24 and B6), and B4'5, with also with one click, split into B4 and B5. Then B4 split immediately into B4a, B4b'd'e, Bc and Bf. Looks like easily as rapid expansion as haplogroup P to me.

    ReplyDelete
  2. "No mention of B? Haplogroup R11'B rapidly diversified into three".

    I did not consider anything under five basal branches to be a "star", small ones have 5-9 branches. It's more than two but not much anyhow. Also all branches must hang from the same node (at least by CR criteria).

    The kind of tree you see in R11'B actually tells of a gradual, sustained but limited expansion, not of any explosive growth of any sort.

    ReplyDelete
  3. Hi Maju, I don't know if it has nothing to do with this post or not, but I'd like to know if you've already read this or not:

    http://www.metrolyrics.com/2010-ozzy-osbourne-a-descendant-of-a-neanderthal-news.html

    I've commented it on my blog, but I don't know if it's real (because I couldn't find the study anywhere) or it's all crap. What's your opinion?

    ReplyDelete
  4. Looks like a propaganda stunt. There's nothing in all that list is unlikely for your typical West European. I'm just not sure if it's his propaganda stunt or that of the DNA testing company.

    ReplyDelete
  5. "The kind of tree you see in R11'B actually tells of a gradual, sustained but limited expansion, not of any explosive growth of any sort".

    Especially if you include the control region mutations, which I note you eliminate when you consider haplogroup R as a whole. Do you change what evidence you are prepared to consider depending on what result you're looking for?

    ReplyDelete
  6. Regardless of whether you consider them or not. When I considered the 'stars' it was already only with coding region mutations. Anyhow the differences are not really notable in spite of you wanting to make an issue of that.

    If you know of a star of five or more branches I am missing, then report it by name, please.

    ReplyDelete
  7. "If you know of a star of five or more branches I am missing, then report it by name, please".

    If you disregard the control region mutations B4'5 has a star of six haplogroups: B5, B4b'd'e, B4c, B4f, B4a and B4g. Even more evidence for an SE Asian origin for mtDNA F.

    ReplyDelete
  8. Aha. I count them now. Not for F but indeed for B4'5 (six branches). I was surely misled by the HVS-defined branches, which are several here.

    I also see another 10-pointed star in B2 (under B4b). This is interesting, indeed.

    ReplyDelete
  9. I have updated to reflect the new stars. It's notable that B2 and D4h3a seem near simultaneous, possibly indicating a moment of colonization of America, regardless of the D1 'anomaly'.

    ReplyDelete
  10. "It's notable that B2 and D4h3a seem near simultaneous, possibly indicating a moment of colonization of America, regardless of the D1 'anomaly'".

    Quite likely. But what do you mean by 'the D1 'anomaly'?

    ReplyDelete
  11. "N, probably in SE Asia (Burma?), with expansion Westwards"

    But how did it get to Burma? It seems unlikely that it was through India.

    ReplyDelete
  12. "But what do you mean by 'the D1 'anomaly'?"

    That I already mention in the main post: D1's star appears to be as old as H or HV, looking like some 40-50 Ka old. AFAIK D1 is an exclusively Native American lineage.

    Of course there are molecular clock estimates that say otherwise (17 or 20 Ka, in line with the rest) but that is not what I'm seeing in the phylogeny.

    "But how did it get to Burma? It seems unlikely that it was through India".

    Why? Between Arabia and Burma... you have to go through India. There's no way you can take India out of the equation: it's right in the middle of all and its role is supported by genetics and archaeology at all levels.

    ReplyDelete
  13. "Between Arabia and Burma... you have to go through India. There's no way you can take India out of the equation: it's right in the middle of all and its role is supported by genetics and archaeology at all levels".

    Talk about 'stubborn'. I give up until I'm proved correct yet agian.

    ReplyDelete
  14. Why not N in South Asia?

    Also, what do you think of the idea of using mtDNA variation as a measure of population size? This is very much in line with John Hawks observation that mutation rates at a population level are very much a product of population size. Hence, we would expect the mutation clock to tick much more slowly in thin populations than dense ones.

    ReplyDelete
  15. Palanchinamy's paper is the classical reference (thanks for mentioning) to argue for a coalescence of R in South Asia. However the knowledge of smaller haplogroups seems to have expanded as of late and changed the perception somewhat. Now, following Wikipedia, there seems to be more R basal sublineages East than West of Assam. That's the lucky shot of Terry.

    As for N, you already see in that paper that East/West lineages are 4/4 but we know now of four other lineages that are all Eastern (SEA and Oceania), so now it is 8/4 in favor of the East.

    Most geneticists and I will agree that larger basal variance is strongly suggestive of origin. Alternatively you can use geometrical models to calculate centers of gravity of such sets of basal sublineages, which should again be strongly suggestive of origin. Both systems are the same one after all, as each basal lineage weights the same regardless of numbers and stem length.

    Another thing I've tried to do is to focus on those lineages that fork (expand) the earliest: the results are typically similar to those considering all branches but this other method emphasizes the earliest centers of such expansion anyhow, where lineages had not to be latent for long but could continue their ancestor's expansion right away.

    It's all very visual and quite simple and working with mtDNA has the advantage that normally all the mutations are known (at least for the largest clades). Anybody familiar with maps and some ability to work in fine detail can explore these methods I use (and that must be essentially correct).

    "... what do you think of the idea of using mtDNA variation as a measure of population size?"

    It's an interesting but still tentative idea. Population size is one of the dynamic variants involved in the molecular clock equations, however it is also one that is usually ignored because we know little to nothing about it. IF the results happen to be coincident with other data (as happened with the estimates for Neanderthal population size in Europe, coincident with estimates for early/middle UP population in the same continent based on the archaeological record) then we can say it's probably a correct estimate.

    But for something as ambitious as Atkinson's paper, it's difficult to say. It probably needs some fine tuning before the results can be accepted. For instance I notice that fig. 2 does not indicate a post-LGM expansion in Europe, while Bocquet-Appel sees that population must have grown then by almost 6 times!! On the other hand, while similar estimates on Neolithic demographics do indicate clear growth at the arrival of Neolithic, this is usually followed by decline sooner than later, and after the decline the overall growth is of about 2-3 times, not more.

    So Neolithic expansion may have been less pronounced than Late Paleolithic one. This is not visible at all in Atkinson's paper, so the result do not seem to pass the reality check well enough.

    ReplyDelete
  16. I overlooked this:

    "Hence, we would expect the mutation clock to tick much more slowly in thin populations than dense ones".

    I think it's exactly the opposite:

    In "large" populations mutations happen more often (because of mere numbers) but these novel mutations have almost no chance of being consolidated because the majority is not them and the odds are with the majority. Drift defeats novel mutations in relatively large populations almost always, pushing them once and again to the verge of extinction or real extinction (specially if a contraction happens, and they do happen).

    In very small populations however, novel mutations happen less (because of smaller numbers) but they have much better chances of becoming consolidated because the odds can be maybe 1/5 instead of 1/1000 (for instance). So I understand that the molecular clock ticks faster in smaller populations because novel mutations consolidate a lot more often.

    This seems to be ratified by the quite smaller number of downstream mutations in large stars like M and H, when compared with less dramatically expansive "sisters" such as N and U. This is, I understand, another serious distortion in the usual equations applied in molecular clock age estimates.

    ReplyDelete
  17. "Why not N in South Asia?"

    From the link:

    "We identified five new autochthonous haplogroups (R7, R8, R30, R31, and N5) and fully characterized the autochthonous haplogroups (R5, R6, N1d, U2a, U2b, and U2c)"

    So apart from R, which I've finally convinced Maju coalesced in SE Asia, not India, the Indian N haplogroups are downstream mutations. Again from the link:

    "The new haplogroup N5 is characterized by (at most) six transitions in the coding region (at sites 1719, 5063, 7076, 9545, 11626, and 13434) and two in the control region (at sites 16111 and 16311)"

    And N5 splits off N1'5. N1 is widely spread through SW Asia, and only N1d is Indian. So, in spite of what the authors say, ('For the less frequent haplogroups (R8, R30, R31, N1d, and N5), comparative HVS-I information would suggest their indigenous status, too'),it looks most likely that even the autochthonous N5 is an immigrant, from the west. So they should presumably be included in:

    "In India, a minority of lineages are of western Eurasian ancestry; the ancestral population probably entered Pakistan and India either from the west (Iran) or the north (via Central Asia)"

    "That's the lucky shot of Terry".

    I keep telling you that luck had nothing to do with it. It came about as a result of actually looking at the evidence with no pre-conceived beliefs.

    "Another thing I've tried to do is to focus on those lineages that fork (expand) the earliest"

    And you've done a marvelous of of that.

    "In 'large' populations mutations happen more often (because of mere numbers) but these novel mutations have almost no chance of being consolidated because the majority is not them and the odds are with the majority. Drift defeats novel mutations in relatively large populations almost always, pushing them once and again to the verge of extinction or real extinction".

    Which is exactly the reason why it's extremely unlikely that haplogroup N moved east through India to SE Asia, and then back west through India without being wiped out along the way in either direction.

    "In very small populations however, novel mutations happen less (because of smaller numbers) but they have much better chances of becoming consolidated"

    Which again argues for N having coalesced in some region where M was not yet present. Even if I were prepared to concede N coalesced in SE Asia that still leaves the problem of a total lack of basal descendants in India.

    ReplyDelete
  18. I'm a little fed up with some of your style. Examples from the last comment:

    "I've finally convinced Maju"


    Diagnostic: self-aggrandizing, repetitive, insisting not on the good of mutually illuminating constructive debate but on the importance that you and your pre-conceptions have.

    "it looks most likely that even the autochthonous N5 is an immigrant, from the west"

    Changing what should be "equally likely" by an arbitrary "most likely" without providing a single detail of evidence to support that in all the paragraph (nor elsewhere).

    "It came about as a result of actually looking at the evidence with no pre-conceived beliefs".

    Wishful thinking of your own process and motivations. You may deceive yourself but you are not deceiving me nor surely any other casual reader.

    When you finally stumbled on some new data that changed the things for R, you were unable to explain that and I had to guess or randomly find where you were drinking from and that, for once, you were right that time (it seems).

    We discussed all this matter in Spring 2009 and, while you were helpful providing some info on the location of some rare clades back then, there was no evidence then altering the likelihood of R being of SA origin (having most basal sublineages in SA, and SA+WEA in general).

    "Which is exactly the reason"...

    No. If I am right or wrong on the MC ticking somewhat or even a lot faster in smaller populations/lineages, that does not help at all to clarify the routes of N and R. Because:

    1. They could always move faster than one single mutation takes place.

    2. In most relevant branches, many mutations have in fact accumulated before they fork.

    3. We do not even know if these proto-lineages were distinct populations or just minority lineages in populations dominated by some other clade.

    Point 3 is particularly important.

    4. My point is rather to help explain why some large haplogroups have so few mutations downstream of the main node/s (in comparison to others and in spite of being well researched): why the clock "stopped" (slowed down a lot) for some lineages and not others. It did because "mum ate her children", metaphorically speaking.

    It does not say that the clock ticks faster for any clade (it does not), just why the clock almost stopped ticking for some large clades instead, why the "ticks" are amiss.

    "Which again argues for N having coalesced in some region where M was not yet present".

    Evidently N coalesced (expanded, became haplogroup) in SEA and I agree that probably did where M was rather rare. But that has nothing to do, as far as I can tell, with the issue of MC "ticks" (novel mutations) being erased in large populations.

    Unless you are able to write a proper article demonstrating how these two things are related. Just because your recurrent obsessive neurones believe they have found another handle to grab in their quest for accumulating anecdotal-information-pretending-to-be-evidence-in-support-of-your-preconceptions, that doesn't mean it is actually evidence. I think it's not and you will have to demonstrate it with some consistent homework, which I think you will fail to do (you may try but won't produce the results you hope - prove me wrong).

    "... a total lack of basal descendants in India" [for N].

    False: N1'5 and N2 are still there. Together with R they are nearly all the Western N. Only X exists in that zone you claim.

    ReplyDelete

Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).