For what they were... we are: Autosomal DNA of NE Europeans

March 12, 2013

Autosomal DNA of NE Europeans

A paper of some interest is available these days at the Public Library of Science:

Andrey V. Kruhnin et al., A Genome-Wide Analysis of Populations from European Russia Reveals a New Pole of Genetic Diversity in Northern Europe. PLoS ONE 2013. Open access → LINK [doi:10.1371/journal.pone.0058552]

Abstract

Several studies examined the fine-scale structure of human genetic variation in Europe. However, the European sets analyzed represent mainly northern, western, central, and southern Europe. Here, we report an analysis of approximately 166,000 single nucleotide polymorphisms in populations from eastern (northeastern) Europe: four Russian populations from European Russia, and three populations from the northernmost Finno-Ugric ethnicities (Veps and two contrast groups of Komi people). These were compared with several reference European samples, including Finns, Estonians, Latvians, Poles, Czechs, Germans, and Italians. The results obtained demonstrated genetic heterogeneity of populations living in the region studied. Russians from the central part of European Russia (Tver, Murom, and Kursk) exhibited similarities with populations from central–eastern Europe, and were distant from Russian sample from the northern Russia (Mezen district, Archangelsk region). Komi samples, especially Izhemski Komi, were significantly different from all other populations studied. These can be considered as a second pole of genetic diversity in northern Europe (in addition to the pole, occupied by Finns), as they had a distinct ancestry component. Russians from Mezen and the Finnic-speaking Veps were positioned between the two poles, but differed from each other in the proportions of Komi and Finnic ancestries. In general, our data provides a more complete genetic map of Europe accounting for the diversity in its most eastern (northeastern) populations.

I'm not too sure of how to analyze this paper because, on one side, there's some missing data, especially in regards to the ADMIXTURE analysis (FST distances between components) and then for some reason the Chinese control was totally removed from further analysis as well, making very difficult for example to estimate if and how much East Asian admixture exists in these NE European populations. Then on the other side, nearly all Finno-Ugrian peoples (as well as the Mezen Russians, genetically Finno-Ugrian as well) are highly endogamous peoples, what almost invariably distorts ADMIXTURE analysis by creating many localized components of dubious relevance.

The ADMIXTURE analysis was presented, as often happens quite incorrectly, for values under the cross-validation optimum, which in this case is at least known: K=6 and K=7 (very similar lowest values):

Figure 4. ADMIXTURE clustering of individuals from the populations studied.

Results obtained at K = 2 to 5 are shown. Each individual is represented by a vertical line composed of colored segments, in which each segment represents the proportion of an individual’s ancestry derived from one of the K ancestral populations. Individuals are grouped by population (labeled on the bottom of the graph). In addition to populations used in principal component analysis, a Chinese sample (Han Chinese from Beijing [22]) was included. The results at K = 5 are also accompanied by average ancestral proportions by population (*). Population designations are the same as in Figure 1.

[From fig. 1:] Key: Komi_Izh – Izhemski Komi, Komi_Pr – Priluzski Komi, Rus_Tv – Russians from Tver, Rus_Ku – Russians from Kursk, Rus_Mu – Russians from Murom, Rus_Me – Russians from Mezen, Finns_He – Finns from Helsinki, Finns_Ku – Finns from Kuusamo, Rus_HGDP – Russians from the Human Genome Diversity Panel.

At least in the supplemental materials we find the missing K-values:

Figure S4. Results of ADMIXTURE clustering at K = 6 to 8. The number of populations and their order are the same as at Figure 4.
[Note: per fig. S5, the optimal K-values are K=6 and K=7]

Something that may call your attention is the relatively high value of the Chinese component in Italians (Tuscans, judging on the locator map). This anomalous effect (unheard of in other studies) may well be caused because a West Asian control is clearly missing and Italians have relatively high West Asian affinity, being otherwise relatively isolated within this Northern European sample.

Notice also how every single endogamous Finno-Ugric population forms their own cluster: a generic Finno-Ugrian component at K=3 (red), a distinction between the Komi and the Finnic component at K=4 (red and purple), then at K=5 we get a mini-break with a more general North/South Europe distinction showing up (yellow and blue components), but at "optimal" K=6 and K=7, we still see other localized components forming: first Komi_Pr (brown) and then the Vepsian one (grey). So out of seven "optimal" components (K=7), four are local corresponding to highly endogamous populations.

But I'm running a bit ahead of myself, admittedly. The endogamy index is analyzed as ROH values: nROH for the mean and cROH for the average:

Table 2. Summary of ROH statistics of 16 European populations.

We can see here that large and relatively cosmopolitan populations like Germans and Italians have low ROH values. Czechs and Central Russians come next, with Poles already showing a bit higher endogamy index. Latvians and Estonians are still relatively low but Northern Finno-Ugrian peoples (including Mezen Russians) deviate a lot, with values (at the non-asterisk columns) that are at best almost double than those for Estonians and, at worst, six times higher.

So in this particular case, and quite exceptionally, I'd say that K=2 or K=3 are the most realistic K-values, in spite of scoring quite poor in the cross-validation test. Of course that the N-S European distinction shown at K=5 is also real and not caused by any "effect" but otherwise the clusters showing up correspond to extreme drift caused by isolation and endogamy and therefore only tell us about that peculiarity of the European Far North.

K=2 is surely the most informative level for East Asian genetic influence, except for the already mentioned Italian anomaly (which may also affect to lesser extent Central Europeans). However because this study is so limited in this aspect, I'd encourage the development of more informative studies, which could for example ponder the FST distances between components, always informative, and/or use other population sampling strategies that better capture this aspect.

After all this is a study focused on Russia, even if that way it has also produced some valuable information for much of NE Europe.

Figure 3. Principal component analysis of the combined autosomal genotypic data of individuals from Russia and seven European countries (Finnland, Estonia, Latvia, Poland, Czech Republic, Germany [5] and Italia [22]).
The first two PCs are shown. The color legend for the predefined population labels is indicated within the plot. Population designations are the same as in Figure 1.

Appendix: Finno-Ugrian peoples/languages map by Marting/Nug (anti-copyright):

35 comments:

eurologistMarch 14, 2013 at 11:57 AM
As I have mentioned elsewhere, while drift is of cause very important here, I don't think it alone can explain the PC results. The main reasons being,

(i) PC1 (or, more accurately, a component slightly slanted from top left to bottom right) also distinguishes Italians from Central Europeans (CE), and, e.g., Czechs from Balts. That surely is not due to drift, alone, but signifies a major South-North gradient that likely is related to the varying West Asian, Mediterranean, and Northern Mesolithic admixtures. In other words, Latvians are not (at all) heavily admixed Finns who forgot to speak that language, and Finns were already extreme in that respect "when they got started", in addition to having drifted.

(ii)Komi and Finns share some of this behavior with respect to Central Europeans. In other words, Komi started out somewhere where the Estonians are, in the 2-D PC plot -- not in CE.

(iii) Komi and Finns share a language subgroup and are currently neighbors. Their genetic distance, in addition to drift, may thus also be explained by different admixture histories before they (more recently) became close neighbors (again).

(iv) Unlike Finns, Komi live close to and intermarry with Mansi and Nenets. I strongly suspect that part of the second (top right to bottom left tilted) dimension in the PC diagram is admixture with N / NE Ural people (not necessarily recent, extant "Uralic" - but also those who have been displaced). There really hasn't been enough time since the Permic-Finnish split to explain the huge genetic distance between them, except considering a different admixture history.

(v) I think it would have been highly instructive to see the Nenets and Mansi on the PC plot - even if just projected onto it (i.e., without contributing to the components, proper).
ReplyDelete
Replies
eurologistMarch 15, 2013 at 11:56 AM
Yes, I meant Finnish speaking people, not just Finns in Finland - most importantly, including those of Karelia and the Veps. "Neighbors" is of course, relatively speaking.

My main gripe is not at all directed at you, but at the notion that drift can explain the first few PC components in a sufficiently large study. We don't see that with Sardinians, Irish, Orcadians, or Jews, for example - and for a reason.

Drift is of course important, also for PC2, here - but when properly done, the matrix elements of the analysis are weighted with a function of the allele frequency (e.g., EIGENSOFT; Patterson, Price, and Reich, 2006). This strongly emphasizes rare SNPs over SNPs that are mostly missing in a particular, small group, only (i.e., due to drift). You can see that in (the slightly tilted) PC1, because otherwise, as I mentioned, it would not be the main S-N differentiator. SNPs that Finns lost due to subsequent drift actually dilute the S-N differentiation (moves Northern and Central Europeans closer to Italians), which, however, is obviously not the case.

At K=4, it looks as though Central Europeans have a good chunk of admixture with what is modal in Komi and Finns, respectively. At K=5 it becomes clear that this is in fact Baltic admixture - which (i) makes much more sense, and (ii) are known, true SNP signatures - not just the lack thereof due to drift. So, up to Estonians the (tilted) PC1 is by a vast majority due to characteristic SNPs - not the lack thereof, and thus not due to drift away from a large original population.

I am quite confident that similarly, much of the genetic distance of Komi is a set of unique SNPs, rather than a lack of them - in this case, incorporated via admixture. As you mentioned, Nenets (and surely also Mansi) are heavily removed from extant Europeans, and so likely also were the original people who were displaced/ incorporated during either-side Uralic expansion.

The main question I posed is: is this simply a N Asian signature, or perhaps an ancient NE Uralic element that is distinct, and thus of great importance in our understanding of Mesolithic and Paleolithic population structure.
ReplyDelete
Replies
KristiinaMarch 15, 2013 at 9:01 PM
As for our previous discussion on the origin of western Uralic N1c, am I right that this study does not support the mixture of these people with the mainstream Chinese but rather a more Central Asian origin of N1c, as in the centroid map found here at http://www.pnas.org/content/suppl/2009/11/16/0910803106.DCSupplemental/pnas.200914264SI.pdf?
ReplyDelete
Replies
KristiinaMarch 16, 2013 at 10:02 AM
On paternal side, Komi are a mixture of Uralic N1c (29%) and N1b1 (18%), but they possess a high amount of steppe haplogroups R1a (33%), R1b (16%) and a little bit of I (5%). On maternal side, they have very little East Asian Mtdna, Z (1.6%), D (1.6%) and A (1.6%). The Komi Zyryan (Northern branch) western Eurasian haplogroups are: H (33.9%), U4 (24.2), U5 (9.7%), J (9.7%) and T (16,1%; of which T1 3.2%) and in addition, they have a little bit of K, U8 and W. For me, this combination resembles the Sargat culture in Western Siberia http://eurogenes.blogspot.fi/2012/07/ancient-mtdna-from-western-siberia-aka.html , but as Komi have, compared to ancient Sargat people, much less eastern Mtdna haplogroups (A,C,Z), they could represent a more western culture in the western Ural area. Also Dieneken has commented the Sargat culture http://dienekes.blogspot.fi/2010/11/ancient-mtdna-from-sargat-culture.html. In this study he comments on his blog, this ancient Mtdna is connected with Ugric branch of Uralic, i.e. Khanty, Mansi and Ugric people. However, I would say that the Komi people are a mixture of Uralic and Kurgan people.

As for the Finns, they must have a different history. On paternal side, the Finns have 61% of N1c, 23% of I1, 10% of R1a and 2% of R1b, 2% of K and 1% of E3b (according to the old nomenclature). As far as I have understood, both N1c and I1 have, for the most part, developped locally, but R1a has diverse origins http://www.familytreedna.com/public/r1a/default.aspx?section=results. On maternal side, the Finnish haplogroups are X 1.3%, H 13.9%, H1 12.6%, H2 5%, H3 2.5%,H5 5.1%, I3.8%, J1 2.6%, T 2.5%, K 2.5%, U4 2.5%, U5a 5.1%, U5b 14%, V 5.1%,W 10.1% and Z 2.5%. These haplogroups are not derived from Swedish subclades http://zincavage.org/Lappalainen2008.pdf. Hg H and U5a are surely old in Finland, as they have been detected in an old burial site in North Western Russia already 7500 years ago (H was possibly Eastern H2a). If we want to see a bronze age or later connection between Finnish and Baltic haplogroups, good candidates are I1, J, including J1b , T2, W. In Finland there are quite high frequencies of both Mtdna I and W. As the frequency of X is higher in Finland than in Sweden or Baltic area, I would say that it has arrived in Finland from Russia, having looked at the haplogroup maps here http://www.sciencedirect.com/science/article/pii/S0002929711005453.

In my opinion, the Finns do not have a Scandinavian origin maternally or paternally and it would also be somewhat unnatural, since there is a sea between Finland and Sweden, but the Eastern route is open. Compared to Finns, the Swedes have a lot of T (connected to R1a?) and K (connected to ydna R1b?).
ReplyDelete
Replies
KristiinaMarch 16, 2013 at 10:23 AM
With this last remark I wanted to challenge the comment that Finns are intruders who intermarried with native Scandinavians!
ReplyDelete
Replies
KristiinaMarch 16, 2013 at 3:03 PM
Okay, Finns proper may have a small amount of recent Swedish admixture, but it is probably Finland Swedes who have more. The haplogroup frequencies I listed above represent however whole Finland and not Helsinki Finns!

Anyway, I do not know how this pool of Helsinki Finns is gathered in this Kruhmin et al research, but if there is a bigger share of Finland Swedes in this Helsinki component, then of course the difference between Kuusamo Finns and Finns proper is smaller than in this study.
ReplyDelete
Replies
KristiinaMarch 16, 2013 at 3:10 PM
Eurologist, you say that half of the Helsinki Finns are closer to Swedes (or Estonians) than they are to extreme Finns. Where do you get that, as I can't find Swedes in this reasearch?
ReplyDelete
Replies
KristiinaMarch 16, 2013 at 3:45 PM
Okay, it is Nelis et al research! It seems that Swedes, Estonian and Russians are on the same plot while Helsinki Finns and Lithuanians are on the same plot and Kuusamo Finns stand apart. During the bronze age (and later?) there were close contacts between Southern Finland and Baltic area and there was surely admixture. Perhaps there were also very close contacts between Russians, Estonians and Swedes and these contacts were not unidirectional, initially there may have been more contacts and admixture from East to West and later on more from West to East.
ReplyDelete
Replies
KristiinaMarch 16, 2013 at 5:20 PM
The biggest haplogroups in Kainuu, Northern Finland are the following: H 37%, V 9%, U 36%, W 10 % and M 6% (in the whole country H 39.3%, V 5.1%, U 27.9%, W 10,1%, M 2.5%. Kainuu Finns seem to be admixted with Saami people, as Finnnish Saami have 37.7% of V and 15,9% of Z and D5. It is noteworthy that there is no difference in the amount of W and, in fact, the share of older haplogroup U is bigger in Kainuu than in the South. On the other hand, the share of H is bigger in the South, as is expected as the frequencies of H are in Estonia 41,1%, 35,2% in Latvia and 45,6 in Sweden (according to Lappalainen et al.)

ReplyDelete
Replies
KristiinaMarch 17, 2013 at 3:37 PM
Eastern haplogroups, D and Z1a, were found at a 3,500 year old burial site in extreme North (Bol'shoy Oleni Ostrov). Also U4a1, U5a and U5a1, T and C and C5 were found. I would say that Saami people combine recent geneflow from the Eastern Arctic area in an old Northern European population (Mtdna U5b and ydna I1, + ydna N1c). They lack all Neolitic or Southern haplogroups that Finns have, but they have a lot of Mtdna V, and I do not quite understand the route of this haplogroup.
ReplyDelete
Replies
KristiinaMarch 17, 2013 at 3:38 PM
As haplogroup H is very important around the Baltic Sea, I took a careful look at the median-joining networks found here:
Fig. 3 http://mbe.oxfordjournals.org/content/21/11/2012.long
Fig. 4 http://onlinelibrary.wiley.com/doi/10.1111/j.1469-1809.2007.00429.x/full

The root of H* seems well anchored in Carelia. The route of H2 (age estimate 11,000) to Finland seems to go through Eastern Slavs and the entrance is again through Carelia. It seems instead that Western H3 (age estimate 16,000) arrived in Finland through the Baltic region. Also the route of H5 seems to be Turkey - Eastern Slavs – Finland. Haplogroup H1 (age estimate 23,800) seems to be well anchored in Carelia, but looking at the map, it seems that it might have arrived in Finland from the East (even from Volga-Ural area where it has a high frequency).

It is highly interesting that the first study says that ”in contrast to that found in Europeans, sub-Hgs H6 and H8 among Central Asian/Altaian populations are characterized by distinctly divergent haplotypes This finding may reflect a long-time separation of Asian and European H6 and H8 mtDNA pools and/or an earlier expansion of H6 in the eastern part of its present range. Indeed, the coalescence age of H6 in Central Asians is very deep—40,400 years.”

As for Sweden, H2 may have arrived to Sweden through Poland - Germany (these countries are not included in the study, in the map there is only a Lithuanian circle). Based on Fig 4, you would say that H3 arrived to Sweden through Latvia, but the route through Germany is probably more plausible, as the haplogroup came from Iberia. H5 is quite rare in Sweden, and based on the map you would say that it came to Sweden from the Baltic region (Latvia-Estonia). H1 arrived in Sweden probably from the South, and also H1a and H1b, and in fact, these two sub-clusters are not found in Finland! In the median-joining network, I really was not able to find any case where a H-haplotype arrived in Finland from Sweden.

As for the old U5a, I am not sure how to read the map. I am wondering if U5a is derived from U5b and if U5b has come to Finland through the Baltic region (Latvia). Anyway, the Saami motif U5b1b1 seems to have taken the Eastern route to Lapland through Carelia.

I studied carefully also the instances of R1a in Finland (http://www.familytreedna.com/public/r1a/default.aspx?section=results) and noticed that for the most part, it arrived in Finland from the East, possibly through Carelia. This applies for Central European R1a1a1g1. This applies also for Carpathian-Russian R1a1a1b1a2. R1a1a1b1a2 - Central Eastern European branch, Southern Baltic type is found also in Northern Finland. R1a1a1b1a2d East Slavic I is found only in Carelia. The only Scandinavian R1a cluster is R1a1a1b1a3 - Scandinavian branch and it is also found in Russia.

According to Wikipedia, ”the Corded Ware culture (in Middle Europe ca. 2900–2450/2350 cal. BC), alternatively characterized as the Battle Axe culture or Single Grave culture, is an enormous European archaeological horizon that begins in the late Neolithic (Stone Age), flourishes through the Copper Age and culminates in the early Bronze Age. Corded Ware ceramic forms in single graves develop earlier in Poland than in western and southern Central Europe, already 3000 BC.” The Corded ware culture arrives in Southern and South Western Finland from North Western Poland and the Baltic region in c. 2500 BC. By looking at the haplogroup routes above, you would say that at least part of ydna R1a must have arrived in Finland during the Corded Ware period through the Baltic area. I suggest that mtdna W arrived also from Poland during that period, as its highest frequencies in Europe are in Poland, Latvia and Lithuania. Also the majority of mtdna H3, I and T might have taken that route to Finland.
ReplyDelete
Replies
KristiinaMarch 17, 2013 at 6:05 PM
As for my understanding of the Saami roots, forget the "recent" geneflow from the Eastern Arctic area. If this Eastern geneflow (D and Z1a) is from Bol'shoy Oleni Ostrov, dated 3,500 uncal. yBP, it is not very recent! In this study of ancient Mtdna in North Western Russia, it is however pointed out that these Bol'shoy Oleni Ostrov people were not the ancestors of modern Saami people.
ReplyDelete
Replies
KristiinaMarch 17, 2013 at 8:17 PM
I completely forgot the Saami R1a! Finnish Saami have only 4.5% of R1a, but Swedish Saami have 20% and Kola Saami 21.7%. Swedish average is 17%. Northern Norwegians have 27.1% and the Southernmost Norwegians only 13.2%, but I don't know their type and I have no idea whether they form a cluster of their own.

Now I am wondering if Mtdna V that has extreme frequencies among Saami is somehow connected to an old type of R1a. According to Family Tree search (https://my.familytreedna.com/snp-map.aspx), there is an old Pan-European R1a1a M198 that is found also in Northern Norway. For example, Poles have a lot of pre-V Mtdna.

Then, I noticed that the only R1a sub-cluster that might be old enough to be connected with Corded Ware culture in Finland, is R1a1a1b1a2 - Central Eastern European branch, Southern Baltic type. According to Family Tree, it forms a special Finnish cluster.
ReplyDelete
Replies
France_LGCApril 20, 2013 at 8:57 AM
As regard Figure3, I understand that we have the "Average Europeans" on the left in the positive values.
What surprises me is that Uralic people do not seem to add up to a real group. Logically Komi people are closest to unmixed Finno-Ugric people, so it would seem that the Balto-Finnic people + "Arctic" pseudo-Russians are another population that was Uralicized.
Is this a possible scenario in light of data?
ReplyDelete
Replies

Add comment

Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).

Pages

March 12, 2013

Autosomal DNA of NE Europeans

35 comments: