February 2, 2011

Amerindian autosomal genetics

For some reason I missed the alert for this new paper on Native American genetics, luckily Andrew blogged on it yesterday.


The authors make an effort to compare autosomal genetic structure with linguistic and geographic groups with some success, producing this nice-looking map for the k=7 level of analysis:


Sadly it is becoming more and more common to show the readers only one level (the one the authors prefer) of the structure analysis instead of the whole sequence, this appears to be intended to avoid critical analysis from the readers, who might well find that, say, k=9 is more to their linking or explains stuff that k=7 does not.

I must mention this older paper (Sija Wang 2007), which reaches to similar results offering much more complete information in general.

I also remember another paper from that epoch (2006-07) which dug down to k=16 or something like that. But sadly I cannot find it right now.

14 comments:

  1. That survey seriously contradicts the idea that there were only three waves of Amerindian people.
    I don't know to which extent their findings in genetics can be correlated with linguistics.
    A

    ReplyDelete
  2. "That survey seriously contradicts the idea that there were only three waves of Amerindian people."

    Not necessarily. Cluster analysis doesn't tell you how different one cluster is from another, and my brief read didn't see Fst or other similar statistics to measure the extent to which clusters were different. You could get seven clusters from seven villages in central Germany, they'd just be shallow ones. A Maju points out, a single k level doesn't tell you which divides are more or less fundamental.

    ReplyDelete
  3. Andrew:
    Cluster analysis doesn't tell you how different one cluster is from another, and my brief read didn't see Fst or other similar statistics to measure the extent to which clusters were different.
    ***
    ok, that's probably correct. But it's possible that there are actually significantly different.
    What is Fst ?
    A.
    ***


    You could get seven clusters from seven villages in central Germany, they'd just be shallow ones. A Maju points out, a single k level doesn't tell you which divides are more or less fundamental.
    ***
    Maybe you could develop that last point a bit more, I don't really understand what it means.
    A

    ReplyDelete
  4. I really recommend to look at the cluster analysis in the other paper mentioned because at least it has all the Ks up to 9, what allows us to see an important detail: in spite of being so numerous in the survey, the Mesoamerican-Andean cluster only shows up very late in the cluster analysis (k=8 in this other paper), what indicates it is not too strong.

    Everything else equal, if all clusters would have identical "strength", then the Mesoamerican-Andean cluster would have showed up since k=2 or k=3 (just because of numbers). It did not happen, so I consider this cluster to be a likely artifact (and that's why, among other reasons, I demand deeper analysis).

    ReplyDelete
  5. Also, in regards to your first comment, Andreu, Sija Wang does indeed argue for a single origin and wave and actually argues for a coastal route in the colonization of America.

    ReplyDelete
  6. Oops, I meant Arnaud. Andreu would be Andrew in Catalan... :D

    ReplyDelete
  7. "What is Fst ?"

    Fst is an abbreviation for "Fixation Index" and is one of several standard ways to assign a number to how different two populations are genetically. For example, the Fst distance between East Asians and Africans is about 0.19 for autosomal genetics, while the Fst distance between Germans and Swedes is about 0.001. The index is not ideal for all purposes, but provides a relatively easy to understand measure of how great the genetic distance is between two clusters of people.


    "You could get seven clusters from seven villages in central Germany, they'd just be shallow ones. A Maju points out, a single k level doesn't tell you which divides are more or less fundamental.
    ***
    Maybe you could develop that last point a bit more, I don't really understand what it means.
    A"

    Random genetic drift, found effects and some degree of inbreeding makes any population that is well defined and relatively isolated for a long period of time genetically distinct. If you have seven villages in Germany where almost everyone has ancestors going back many centuries in the same village with only modest contributions from outside that grow weaker with distance (something that is empirically the case for very large swaths of rural areas in the Old World), each one will become genetically distinct as slightly different mixes of allelles reach an equilibrium in each population, and private mutations (i.e. mutations not found in other populations, sometimes harmless ones) slowly accumulate in the population. Those genetic clusters will tend to accumulate until something shuffles around the populations of those villages until nobody can accurately guess from which village someone lives in where their ancestors came from.

    The differences may be subtle between those villages (i.e. the Fst distance may be very low), but as long as the differences are very consistent, clusters will emerge in cluster analysis. Even if the only difference between the population is in the percentage of say North European v. Southern European genes, and the range is only from 7% to 10% Southern European in any given village, if each village has a very precise Southern European percentage (e.g. everyone in Aberdeen is 7% Southern European +/- 0.1%, and everyone in Blat is 9% Southern European +/- 0.1%), then the software will pure each village in its own neat cluster if you allow the program to make enough clusters. But, if you increase the number of clusters one by one, these very similar populations will be some of the last to break out from the whole as different from each other. If you have seven French village and seven German village, the k=2 split will have the French villages in one and the German villages in the other. But, the k=14 split will put almost every village in its own cluster.

    This will happen even if the total Fst distance between the two most different villages in the sample is 0.0000001. Clustering is about consistency of cluster members sufficient to make them identifiable as distinct, not about absolute degree of difference.

    ReplyDelete
  8. I'll try to complement Andrew's explanation with different words: if you run fro k=3 you always (or almost always) get 3 clusters, if you run k=5 you get five and so on.

    It doesn't matter much if the five clusters are made of inbred cousins or of geographically and genetically remote populations.

    "If you have seven French village and seven German village, the k=2 split will have the French villages in one and the German villages in the other".

    Not necessarily. France specially is a very heterogeneous country, depending on which villages, it's likely that a cluster is made up of some French and all German villages, while the other would include the remaining French villages, probably from the South.

    "But, the k=14 split will put almost every village in its own cluster".

    More or less. It also depends on what algorithm you use and how inbred or outbred the villagers are. This kind of 14 individual village clusters are more likely to happen between mountain valleys than between better communicated villages, where overlap is much more likely.

    But you will get 14 clusters in any case and, as long as the various populations have some personality of their own, it's likely that each will make a distinct cluster more or less perfectly.

    ReplyDelete
  9. "You could get seven clusters from seven villages in central Germany, they'd just be shallow ones."

    Andrew, glad to see someone else using this argument. ;) Great short explanation of Fst etc., BTW.

    "Everything else equal, if all clusters would have identical "strength", then the Mesoamerican-Andean cluster would have showed up since k=2 or k=3 (just because of numbers). It did not happen, so I consider this cluster to be a likely artifact"

    Having traveled to the region, and knowing quite a number of people/friends from it, my guess is that this cluster is very real - even if it appears rather late. The late (meaning: high k) appearance may just reflect the significant diversity in (geologically) two continents, which maybe were only settled 20,000 - 15,000 years ago - but with some settlers being ~40,000 ya Beringians, others ~15,000 ya NE Asians, and others relatively recent Inuits.

    For comparison, no one would doubt that there are one or two rather ancient Scandinavian contributions in Europe, or a distinct Central vs. Mediterranean one, or a Baltic one, or a Slavic one, or a Levantine/SW Asian one, or a West Asian/ Caucasian one, etc. - all also less than 40,000 years old, in a much, much smaller space.

    ReplyDelete
  10. "For comparison, no one would doubt"...

    Sorry but, said as you say it, I have to doubt all those claims. As a matter of principle and because I fail to see how these are real in the data I know (at least in most cases).

    You look too much "polluted" by Dienekes' biased readings, specially those using global instead of West Eurasian populations.

    I do not know how "real" is the Mesoamerican-Andean single cluster but I would say it's a weak one, not a strong cluster, even if it is visually attractive and specially seductive for people with "recentist" ideas about demic coalescence.

    ReplyDelete
  11. Maju:
    You look too much "polluted" by Dienekes' biased readings, specially those using global instead of West Eurasian populations.
    ***
    What does Dienekes say?
    A.
    ***

    ReplyDelete
  12. Read his blog(s) but basically I think that he likes to emphasize (even up to the point of artificiality) N-S European dichotomy while obscuring (often intently) the clear W-E differences.

    His own work, shows a clear distinct SW European and NW African cluster (red, number 8 at K=15), which is not apparent in SE Europe at all. He will never make a comment on that but rather brush this under the carpet because for him Romanians and Spaniards must be close because they both speak Latin dialects (and stuff like that).

    This is the kind of distortion he likes to induce because of his own prejudices. That's why I disagree with him (among other reasons, including his right-wing, probably fascist and racist, political stand).

    ReplyDelete
  13. Maju:
    Read his blog(s) but basically I think that he likes to emphasize (even up to the point of artificiality) N-S European dichotomy while obscuring (often intently) the clear W-E differences.

    His own work, shows a clear distinct SW European and NW African cluster (red, number 8 at K=15), which is not apparent in SE Europe at all. He will never make a comment on that but rather brush this under the carpet because for him Romanians and Spaniards must be close because they both speak Latin dialects (and stuff like that).

    This is the kind of distortion he likes to induce because of his own prejudices. That's why I disagree with him (among other reasons, including his right-wing, probably fascist and racist, political stand).
    ***

    Yes people who think they discuss anthropology (=mankind-ology) and then only talk about genes must have some kind of internal
    "problem".
    Mankind is about culture not genes.
    A.

    ReplyDelete
  14. "Yes people who think they discuss anthropology (=mankind-ology) and then only talk about genes must have some kind of internal "problem"".

    I can well feel offended by that remark as well. Because I do look at genes a lot and these are important in understanding human prehistory. They are like fossils in our cells: they provide very valuable information about ourselves.

    Also I do not like the word "mankind" because it only includes half of Humankind, you are surely forgetting about "womankind"...

    "Mankind is about culture not genes".

    Humankind is about all. There is no blank slate and genes do matter indeed. However there is also a most important part that is environmental (experience-based, epigenetic, cultural...). In general I do favor a greater weight for this part but "to Caesar what is of Caesar", as Christians say, some stuff is indeed genetic.

    And in all this what matters is not so much if genes do something or just stay there idle (they do not for sure) but where do those genes come from, what prehistorical (and in some cases historical) connections they evidence, etc.

    This cannot be attacked just by a priori anti-genetic ideas that does not correspond with reality. Biology does matter, even if it's complex and interactive and often hard to grasp.

    And these are not the kind of stuff that has brought me to have different opinions to those of Dienekes: the kind of stuff is how you interpret them and if you are slanting and deforming the data to fit your preconceptions and agenda.

    ReplyDelete

Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).