April 14, 2013

Southern Native American Y-DNA: no correlation with language, extensive info on haplogroup C3

Genetics does not necessarily correlate with linguistic families. It often does not. This seems to be the case with Native Americans as well.

Lutz Roewer et al., Continent-Wide Decoupling of Y-Chromosomal Genetic Variation from Language and Geography in Native South Americans. PLoS Genetics 2013. Open accessLINK [doi:10.1371/journal.pgen.1003460]


Numerous studies of human populations in Europe and Asia have revealed a concordance between their extant genetic structure and the prevailing regional pattern of geography and language. For native South Americans, however, such evidence has been lacking so far. Therefore, we examined the relationship between Y-chromosomal genotype on the one hand, and male geographic origin and linguistic affiliation on the other, in the largest study of South American natives to date in terms of sampled individuals and populations. A total of 1,011 individuals, representing 50 tribal populations from 81 settlements, were genotyped for up to 17 short tandem repeat (STR) markers and 16 single nucleotide polymorphisms (Y-SNPs), the latter resolving phylogenetic lineages Q and C. Virtually no structure became apparent for the extant Y-chromosomal genetic variation of South American males that could sensibly be related to their inter-tribal geographic and linguistic relationships. This continent-wide decoupling is consistent with a rapid peopling of the continent followed by long periods of isolation in small groups. Furthermore, for the first time, we identified a distinct geographical cluster of Y-SNP lineages C-M217 (C3*) in South America. Such haplotypes are virtually absent from North and Central America, but occur at high frequency in Asia. Together with the locally confined Y-STR autocorrelation observed in our study as a whole, the available data therefore suggest a late introduction of C3* into South America no more than 6,000 years ago, perhaps via coastal or trans-Pacific routes. Extensive simulations revealed that the observed lack of haplogroup C3* among extant North and Central American natives is only compatible with low levels of migration between the ancestor populations of C3* carriers and non-carriers. In summary, our data highlight the fact that a pronounced correlation between genetic and geographic/cultural structure can only be expected under very specific conditions, most of which are likely not to have been met by the ancestors of native South Americans.

There's only so much to say about language families and patrilineages: that they do not agree in any obvious way:

Table 1. Correlation between Y-SNP haplogroup and language class.

However the paper also address the interesting matter of NE Asian and Native American paragroup C3(xC3b), which is almost only found among Ecuadorean Natives (Kichwa and Waorani speakers). The only other known case among Native Americans, according to the authors, is an individual of Southern Alaskan native ancestry. 

Figure 1. Origin of male native South American samples.
For each sampling site, its geographic location as well as the size (proportional to the circle area) and Y-SNP haplogroup composition of the respective sample are shown. Blue lines: major aquatic systems; dashed gray lines: current national boundaries.

Overall distribution of Y-DNA C3* (yellow), which I understand to mean C3(xC3b) for this study:

Figure 4. Prevalence of Y-SNP haplogroup C-M217 (C3*) around the Pacific Ocean.
Light blue: previous studies; dark blue: present study; yellow: relative frequency of C-M217 (C3*) carriers.

The most interesting information anyhow may be in the haplotype network:

Figure 5. Median-joining network of 167 different Asian and American Y-STR haplotypes carrying Y-SNP haplogroup C3* (from this and previously published studies).
The median-joining network is based upon markers DYS19, DYS389I, DYS389II-DYS389I, DYS390, DYS391, DYS392, DYS393 and DYS439 (see Materials and Methods for details). ALA: Alaskan; KOR: Korean; CHI: Chinese, including Daur, Uygur, Manchu; MON: Mongolian, including Kalmyk, Tuva, Buryat; ANA: Anatolian; INDO: Vietnamese, Thai, Malaysian, Indonesian, Philippines; JAP: Japanese; TIB: Tibetan, Nepalese; ALT: Altaian, including Kazakh, Uzbek; SIB: Teleut, Khamnigan, Evenk, Koryak; ECU: Ecuadorian, including Waorani, Lowland Kichwa, COL: Colombia, including Wayuu; RUS: Russian.

The network clearly shows that the Native American C3* haplotypes are mostly or totally related to a cluster of Altaian, Mongol and Chinese roots. The Altaian connection is particularly strong for all but one of the lineages. This is very much concordant with a proto-Amerind patrilineal origin in Altai (where NE Asian and American Y-DNA Q and mtDNA X2 variants surely originated in the early Upper Paleolithic) which traveled to Beringia via Mongolia or nearby regions, spreading the mode 4 (blade tech) to East Asia c. 30,000 years ago.

This is not the view of the authors but mine. The authors instead speculate with (i) a late wave or (ii) even naval contact between East Asia and South America. I find both hypothesis lacking merit and I lean for a founder effect model instead.

On the other hand, the C3b presence in NW North America, critically among Na-Dene speakers, may still represent a second wave: that of Na-Dene speakers, whose "recent" linguistic connections to Siberia (Yenisean family) have found strong support in the last years. 


  1. "Virtually no structure became apparent for the extant Y-chromosomal genetic variation of South American males that could sensibly be related to their inter-tribal geographic and linguistic relationships."

    1. This supports the hypothesis of Greenberg that at least all of the indigeneous languages of South America (and probably at least some of North and Central America) ought to be grouped in one big "Amerind" macrofamily.

    2. This favors the hypothesis that technology transfers from Poverty Point cultures to Central American and in turn from there to South American Incas and possibly Amazonian settled communities as well, must have been mostly cultural transfers or independent innovatioons, rather than the demic migrations seen in many cases outside the New World.

    3. This supports the case from multiple lines of genetic evidence for a single wave, serial founder effect model of South American indigeneous population structure (with South American genetic diversity being a homogeneous subset of proto-Amerind genetic diversity) in contrast to at least three (proto-Amerind and Na-Dene and Inuit) and maybe four or five (an additional NE North American and an addition Doresett Paleo-Eskimo) waves of mass migration in North America.

    4. EXCEPT for that pocket of C*

    The Altaic link you note on the maps makes it pretty clear that C* could not have been from Austonesian maritime exchange that brought the Kumura from South America to Oceania and some possible minor Oceania to South American traces. The attunated nature of the traces left on both sides suggest that only a handful of voyages may have been involved in the Oceania-South America link.

    C* therefore had to get to Ecuador separately via the Pacific Coast of North America and then South America without leaving traces anywhere else - which is not how a full fledged "wave" of migration at any thousands of year old time period would look.

    While a founder effect in the initial wave of Amerind migration is possible, the absence of any other traces of it upstream or downstream in an otherwise monolithically genetically homogeneous population has problems as well. It would have had to have been a very low frequency part of the initial founding population of the Americas (perhaps just one individual) who left no decendents at any early stage of the migration to South American and did not have any of his descedents participate in further settlement (or had so few that they dropped out due to genetic drift despite a wildly expanding population in virgin territory). And, if that had happened, it should have had some distinct New World only variants by then.

    Given known early post-Columbian contacts between Japan/Korea/North China and this region (the signature dish of Korea, Kim-Chee, has New World derived ingredients, for example), continuing Japanese populations in Pacific coastal South America, and the lack of evidence of pre-Columbian maritime contact by this route, my bet would be on a cryptic and historically unattested early post-Columbian source for this Y-DNA in Ecuador perhaps from just one or two Asian sailors who got locals pregnant in that era before there was any significant European population there.

    1. Just for the record, there's been some speculation of pre-Columbian contacts between NE Asia and parts of America. Speculation that the authors of this paper echo. I doubt it has much merit and I don't have time to search for relevant papers right now but I think I must mention it.

      What you say of Amerind, I'd personally support tentatively. However modern linguists tend to dismiss Amerind as unproven. An issue may be that if Amerind is real, it should be quite older than originally imagined: from c. 17 Ka ago maybe, what is too much for conventional linguistics to confirm (or deny).

    2. I agree that linguistic data is quite hard pressed to reveal much at a time depth of 17 kya, although it isn't entirely out of the question because isolated populations that don't come into contact with people of other languages can be quite linguistically conservative. There is good reason to believe that a model in which most linguistic change over time is due to random linguistic drift is wrong and that instead language change is punctuated around instances of language contact, language schism, and encounters with new unnamed things (either due to new geography or new technology). South American might very well have been static linguistically relative to most Old World societies that have histories involving many waves of migration of ethnically distinct populations interacting with each other and more technological change than South American experienced until quite late (there is evidence of linguistic change in this direction due to technological change in Mesoamerica, but a lot of those technologies didn't reach South America).

    3. Not only there was a lot of pre-Columbian conflict and rise-and-fall of civilizations in South America but also at least some naval contacts with North America (Mesoamerica if you wish). For example we recently discussed here how tomato was originally South American but modern one comes from Mexico, where it was finally domesticated on a semi-domestic Andean template. AFAIK the Incas and the Mexicas had naval trade routes between them along the Pacific coast. However it does seem like contact was limited, with most crops and domestic animals belonging to either area and not both.

  2. Ecuadorian NRY C3* could be the counterpart of mtDNA D4h3a which was detected at a frequency of 23% in the Cayapa sample (Rickards et al. 1999). C3* was also reported in a Tlingit sample and On Your Knees Cave, where the 10,300 yo D4h3a sequence was obtained, was in Tlingit territory.

    1. Then we should be prudent before it is claimed that "Na-Dene" has C3 samples. Tlingit is not a typical Na-Dene-Athabaskan language but a parallel branch.

  3. Nice reading, I absolutely agree with Maju.

  4. I'd like to emphasize that the old idea of a single wave of Amerindian people / languages (resuscitated by Greenberg) is just completely false. I'm currently working on some Californian languages that have Uralic connections and should publish a paper on that this year.
    So I'm not surprised that genomics shows heterogeneity in Amerindian people. My PoV is that the more Amerindian genomics is studied the more it will surface that they do not amount to a single population / wave.
    As far as C3 is concerned, I understood that this was a kind of Austronesian intrusion from other sources. Though you (=Maju) seem to deem that unlikely there's plenty of indications that Polynesian people crossed the Pacific, so they actually landed in South American in such numbers as to have a genomic impact. This is what this new study shows, as far as I understand it.

    1. I remain cautious on the Amerindian language superfamily issue but I am of the opinion that mass lexical comparison has its merits, very especially when you have to dig beyond the objective limits of the comparative method (c. 10 Ka) and/or need to issue a preliminary assessment while waiting for more throughout studies. Linguistics is not rocket science in any case.

      Contrary to what you claim, I am of the strong opinion in favor of Polynesians arriving to South America (chickens, sweet potato, even recently some direct mtDNA evidence). However I fail to see how this can have any relation with C3*, being C3 a NE Asian lineage. I also imagine that Polynesians left very little genetic or linguistic legacy in a continent that was fully colonized long before their arrival (with population numbers much greater than all Polynesia) and of comparable technological level.


Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).