Showing posts with label IBD. Show all posts
Showing posts with label IBD. Show all posts

September 23, 2015

Negligible genetic flow in Slavic expansion to the Balcans

A new genetic study comes to confirm what most of us already knew: that Southern Slavs don't show any significant signature of immigration from the core Slavic area North and NE of the Carpathian Mountains that can be attributed to the so-called Slavic migrations of the Dark Age.

Alena Kushniarevich et al., Genetic Heritage of the Balto-Slavic Speaking Populations: A Synthesis of Autosomal, Mitochondrial and Y-Chromosomal Data. PLoS ONE 2015. Open accessLINK [doi:10.1371/journal.pone.0135820]

Abstract

The Slavic branch of the Balto-Slavic sub-family of Indo-European languages underwent rapid divergence as a result of the spatial expansion of its speakers from Central-East Europe, in early medieval times. This expansion–mainly to East Europe and the northern Balkans–resulted in the incorporation of genetic components from numerous autochthonous populations into the Slavic gene pools. Here, we characterize genetic variation in all extant ethnic groups speaking Balto-Slavic languages by analyzing mitochondrial DNA (n = 6,876), Y-chromosomes (n = 6,079) and genome-wide SNP profiles (n = 296), within the context of other European populations. We also reassess the phylogeny of Slavic languages within the Balto-Slavic branch of Indo-European. We find that genetic distances among Balto-Slavic populations, based on autosomal and Y-chromosomal loci, show a high correlation (0.9) both with each other and with geography, but a slightly lower correlation (0.7) with mitochondrial DNA and linguistic affiliation. The data suggest that genetic diversity of the present-day Slavs was predominantly shaped in situ, and we detect two different substrata: ‘central-east European’ for West and East Slavs, and ‘south-east European’ for South Slavs. A pattern of distribution of segments identical by descent between groups of East-West and South Slavs suggests shared ancestry or a modest gene flow between those two groups, which might derive from the historic spread of Slavic people.


This is most evident in the identity-by-descent (IBD) analysis:


Fig 4. Distribution of the average number of IBD segments between groups of East-West Slavs (a), South Slavs (b), and their respective geographic neighbors.
The x-axis indicates ten classes of IBD segment length (in cM); the y-axis indicates the average number of shared IBD segments per pair of individuals within each length class.





For non-acquainted: shorter segments (left) indicates older relatedness, now very fragmented by repeated chromosome recombination, while longer segments (right) indicate more recent one, which had less time to be chopped into pieces.



The authors explain:
The presence of two distinct genetic substrata in the genomes of East-West and South Slavs would imply cultural assimilation of indigenous populations by bearers of Slavic languages as a major mechanism of the spread of Slavic languages to the Balkan Peninsula. Yet, it is worthwhile to add here evidence from the analysis of IBD segments: the majority of Slavs from Central-East Europe (West and East) share as many IBD segments with the South Slavs in the Balkan Peninsula as they share with non-Slavic populations residing nowadays between Slavs (Fig 4A and 4B; Table G in S1 File). This even mode of IBD sharing might suggest shared ancestry/gene flow across the wide area and physical boundaries such as the Carpathian Mountains, including the present-day Finno-Ugric-speaking Hungarians, Romance-speaking Romanians and Turkic-speaking Gagauz. A slight peak at 2–3 cM in the distribution of shared IBD segments between East-West and South Slavs (Fig 4A and 4B) might hint at shared “Slavonic-time” ancestry, but this question requires further investigation.


Another graph of interest is surely the Principal Component Analyses of the three types of genetic markers:

Fig 2. Genetic structure of the Balto-Slavic populations within a European context according to the three genetic systems.
a) PC1vsPC3 plot based on autosomal SNPs (PC1 = 0.53; PC3 = 0.26); b) MDS based on NRY data (stress = 0.13); c) MDS based on mtDNA data (stress = 0.20). We focus on PC1vsPC3 because PC2 (S1 Fig) whilst differentiating the Volga region populations from the rest of Europeans had a low efficiency in detecting differences among the Balto-Slavic populations–the primary focus of this work.

In the mtDNA graph (c) it is hard to discern any pattern, as the various studied populations seems to form rings of eccentricity around the Balcans, probably because no Western Europeans are present in this particular PCA. 

However in the autosomal (a) and Y-DNA (b) figures more defined patterns do emerge. Quite apparently in all three graphs, South Slavs appear as strictly Balcanic. 

More interesting is probably the relative position of Russian and Baltic speakers: the first showing very notable diversity almost representative of the whole East European region and again indicative of assimilation rather than replacement being the main drive in Russian ethnic expansion, at least in the North. 

Balto-Slavic peoples appear intermediate between Russians and Finns (and overlapping Estonians) in the Y-DNA graph and somewhat extreme in the autosomal graph, something that comes as no surprise, as they seem the best preserved vessel of Eastern Paleoeuropeans. Curiously a few Sorbian individuals also tend to that same extreme, what may well be a reason to increase interest on the study of this forgotten and neglected Slavic minority of Eastern Germany. Their Y-DNA is, also intriguingly, most similar to that of Swedes, rather than to their geographic neighbors or ethno-linguistic relatives.

Other Western Slavs, form two clear distinct sub-clusters: with Czechs being notably more Western than Poles and Slovaks, who tend to cluster with mainline Russians and Ukrainians instead. One can of course think that this Polish-Slovak-Ukranian-Russian cluster could be the demic or genetic core of the Slavic cluster. However I can't but wonder how much of that clustering, as well as the differences shown by Czechs and Sorbians should be attributed to older periods like those of Corded Ware Culture, Eastern Bell Beaker, etc.

May 15, 2014

North African and West Asian affinity in Europe

Ryan mentioned this rather interesting study from a year ago on IBD trans-Mediterranean affinities of Europeans.

Laura R. Botigué et al., Gene flow from North Africa contributes to differential human genetic diversity in southern Europe. PNAS 2013. Freely accessible by now → LINK [doi:10.1073/pnas.1306223110]

Abstract

Human genetic diversity in southern Europe is higher than in other regions of the continent. This difference has been attributed to postglacial expansions, the demic diffusion of agriculture from the Near East, and gene flow from Africa. Using SNP data from 2,099 individuals in 43 populations, we show that estimates of recent shared ancestry between Europe and Africa are substantially increased when gene flow from North Africans, rather than Sub-Saharan Africans, is considered. The gradient of North African ancestry accounts for previous observations of low levels of sharing with Sub-Saharan Africa and is independent of recent gene flow from the Near East. The source of genetic diversity in southern Europe has important biomedical implications; we find that most disease risk alleles from genome-wide association studies follow expected patterns of divergence between Europe and North Africa, with the principal exception of multiple sclerosis.

The most interesting section is surely the one titled Long identical-by-descent haplotypes. Here the authors use long IBD readings to estimate "recent" genetic flows. However they cannot discern the direction of these flows, i.e. flows from Europe to West Asia and North Africa will look exactly the same as the reverse ones.

From fig. 2:
Haplotype-based estimates of genetic sharing between Europe and Africa show a significant latitudinal gradient where the highest sharing is in the Iberian Peninsula. Genetic sharing between geographic regions is represented as a density map of WEA estimates for 30 European populations where haplotypes are IBD with (A) Sub-Saharan Africa [not shown here, as it is comparatively very small and clearly related to the other clines], (B) North Africa and (C) the Near East. The Canary Islands are shown in the Lower Left. (...)
It seems obvious that North African affinity is concentrated in Iberia, especially in the Western half, what is consistent with previous data, and that West Asian affinity is concentrated in SE Europe. The Iberian extension of this one may be partly related to North African affinity (or not), as North Africans also have some clear West Asian affinity (as should be apparent from the Canarian inset).

I insist that directionality of this affinity is not clear. In the case of West Asian one, it seems plausible that most of it is caused by Neolithic inflows into Europe but in the case of the North African affinity cline, it probably represents bidirectional flows, because previous mtDNA and autosomal data show also quite apparent Iberian influx into North Africa, although the reverse flow is also real.

I find interesting the low levels of West Asian IBD affinity among Basques when compared with estimates of ancestry by early European farmers (EEF, partly West Asian themselves). In the map above Basques score just like other Atlantic populations in this element, yet in the Lazaridis study (see also here and here), Basques score quite high in EEF ancestry, much like the French, which in this graph are clearly higher in "recent" West Asian affinity. That makes me suspect that confounding factors may be at play and reinforces the notion of taking autosomal DNA statistical analyses with some care and try to contrast different approaches before reaching to conclusions.

Also interesting is fig. 3, which pinpoints the specific North African (or Arabian) regions which may show the strongest IBD affinities for the various European regions:

Fig. 3.
Population-specific estimates of haplotype sharing (in centimorgans) between North Africa and Europe. Estimates of WEA (scaled by 100 for ease of presentation) between each European population (x axis) and each North African population and the Qatari are represented by colors and symbols. A substantial increase in haplotype sharing is detected between southwestern European populations and Maghrebi populations in comparison with the remainder of the European continent. The excess of sharing between the Near East and southern central and Eastern Europe is also noteworthy.


As expected,  NW Africa is the most common source-or-destination of Iberian long IBD affinities, instead Qatar or Egypt are most outstanding regarding Italy and SE Europe, what underlines that (North) African genetics in Europe arrived mostly via West Asia (Neolithic), with the notorious Iberian exception.

April 23, 2014

Lochsbour's IBD in modern Europeans is greatest among Danish but most direct among French

This is a most interesting issue I forgot to discuss when previously addressing the massively interesting Lazaridis et al. study on European ancestry based on ancient autosomal DNA (see here and here). 

Identity by descent (IBD) data shows interesting differences between populations in Supplemental Information's article 18. While Stuttgart's (early farmer) ancestry is more or less the same by both measures (Sardinians first, followed by Slovakians and some other Balcanic and Central European populations), there are important differences in the ancestry of Lochsbour (Epipaleolithic hunter-gatherer from Luxembourg). While the Danes score highest in overall IBD block number (rough relatedness to Lochsbour) it is the French who score highest in IBD length (indicating a more direct relatedness, even if in smaller amounts). 



The difference between the French and Danish is quite significant, I believe, and seems to suggest that Lochsbour's relatives had a direct impact on modern French genetics, while the impact of Lochsbour as such on other populations should be considered more indirect (i.e. via other hunter-gatherer populations). 

This implies that there was some important diversity among the hunter-gatherer groups that influenced modern European genetics and that Lochsbour must be considered a mere generic proxy. Possibly if Motala or La Braña would have been used as reference instead, we would get some important differences in the results, as would be the case if Balcanic or Eastern European hunter-gatherers would be thrown into the equation, no doubt. 

You may have noticed that there are some notable samples unmarked in the graphs, that's because they are colonial populations such as Zimbabwean or North American Whites, whose exact ancestry is not easy to track. The green and red texts are my illustrative additions.

While not marked, I find also notable and rather perplexing that Lebanon shows up as the fourth non-colonial population more related to Lochsbour by IBD length, after Scotland but before Ukraine, the Netherlands and Sweden.

In any case you can parse the data for the 10 more notable samples of each measure in the supplemental material, chapter 18.

Referenced study:

Iosif Lazaridis, Nick Patterson, Alissa Mittnik, et al., Ancient human genomes suggest three ancestral populations for present-day Europeans. BioArxiv 2013 (preprint). Freely accessibleLINK [doi:10.1101/001552]

June 22, 2013

The less homogeneous European "populations" are Italians and French

This comes from a recent IBD study on Europe:

Peter Ralph & Graham Coop, The Geography of Recent Genetic Ancestry across Europe. PLoS Biology, 2013. Open accessLINK [doi:10.1371/journal.pbio.1001555] 
Abstract

The recent genealogical history of human populations is a complex mosaic formed by individual migration, large-scale population movements, and other demographic events. Population genomics datasets can provide a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long segments of shared genomic material. We make use of genomic data for 2,257 Europeans (in the Population Reference Sample [POPRES] dataset) to conduct one of the first surveys of recent genealogical ancestry over the past 3,000 years at a continental scale. We detected 1.9 million shared long genomic segments, and used the lengths of these to infer the distribution of shared ancestors across time and geography. We find that a pair of modern Europeans living in neighboring populations share around 2–12 genetic common ancestors from the last 1,500 years, and upwards of 100 genetic ancestors from the previous 1,000 years. These numbers drop off exponentially with geographic distance, but since these genetic ancestors are a tiny fraction of common genealogical ancestors, individuals from opposite ends of Europe are still expected to share millions of common genealogical ancestors over the last 1,000 years. There is also substantial regional variation in the number of shared genetic ancestors. For example, there are especially high numbers of common ancestors shared between many eastern populations that date roughly to the migration period (which includes the Slavic and Hunnic expansions into that region). Some of the lowest levels of common ancestry are seen in the Italian and Iberian peninsulas, which may indicate different effects of historical population expansions in these areas and/or more stably structured populations. Population genomic datasets have considerable power to uncover recent demographic history, and will allow a much fuller picture of the close genealogical kinship of individuals across the world.



Most interesting in my understanding is table 1 (right), which describes the IBD relation of the sampled populations within themselves and with other Europeans.

From this table it seems very apparent that Italians and French are not homogeneous at all and therefore, in my opinion, should not be treated as single populations in genetic studies but butchered at least a bit by regions (whose optimal dimensions are yet to be determined).

The degree of internal homogeneity of the samples (only n=5 or greater) can be simplified as follows:
  • Very low (<1): Italy, France.
  • Quite Low (1-1.4): Germany, UK, Belgium, England, Austria, French-Swiss, 
  • Somewhat low (1.5-1.9): Spain, German-Swiss, Greece, Portugal, Netherlands, Hungary.
  • Somewhat high (2-2.9): Czech R., Romania, Scotland, Ireland, Serbia, Croatia,
  • Quite high (3-3.9): Sweden, Poland
  • Very high (4-5): Bosnia, Russia*
  • Extremely high (>10): Albania

Notes: 
  • I ignored strangely labeled samples like "Switzerland" and "Yugoslavia", which seem to mean actually "other" within these labels.  I retained the "United Kingdom" category for its large sample size, much larger than its obvious parts.
  • The level of relatedness of Russians may be exaggerated by the small sample: n=6, still above my cautionary threshold. 
  • I suspect that the extreme disparity of sample sizes may influence the results to some extent.

Eastern Europeans seem much more strongly related with others, especially other Eastern Europeans, than Western ones, while NW Europeans are more related with other groups (usually at regional level) than SW ones. In fact the Italian and Iberian peninsula show very low levels of "recent" relatedness with other populations, which is a bit perplexing, considering their non-negligible roles in Medieval and Modern European history. I guess that this may be partly caused by geographic barriers (mountains) and also by these areas having large populations since Antiquity or before. 

Figure 3. Geographic decay of recent relatedness.
In all figures, colors give categories based on the regional groupings of Table 1. (A–F) The area of the circle located on a particular population is proportional to the mean number of IBD blocks of length at least 1 cM shared between random individuals chosen from that population and the population named in the label (also marked with a star). Both regional variation of overall IBD rates and gradual geographic decay are apparent. (G–I) Mean number of IBD blocks of lengths 1–3 cM (oldest), 3–5 cM, and >5 cM (youngest), respectively, shared by a pair of individuals across all pairs of populations; the area of the point is proportional to sample size (number of distinct pairs), capped at a reasonable value; and lines show an exponential decay fit to each category (using a Poisson GLM weighted by sample size). Comparisons with no shared IBD are used in the fit but not shown in the figure (due to the log scale). “E–E,” “N–N,” and “W–W” denote any two populations both in the E, N, or W grouping, respectively; “TC-any” denotes any population paired with Turkey or Cyprus; “I-(I,E,N,W)” denotes Italy, Spain, or Portugal paired with any population except Turkey or Cyprus; and “between E,N,W” denotes the remaining pairs (when both populations are in E, N, or W, but the two are in different groups). The exponential fit for the N–N points is not shown due to the very small sample size. See Figure S8 for an SVG version of these plots where it is possible to identify individual points.

We can also see in the above figure (bottom) how most of the relatedness, especially along longer distances belongs to the oldest dates (1-3 cM).

The authors suggest that low heterogeneity within some of these groupings is influenced by regional variation, what makes good sense to me. This they illustrate with the examples of Italy and Great Britain:

Figure 2. Substructure in (A) Italian and (B) U.K. samples.
The leftmost plots of (A) show histograms of the numbers of IBD blocks that each Italian sample shares with any French-speaking Swiss (top) and anyone from the United Kingdom (bottom), overlaid with the expected distribution (Poisson) if there was no dependence between blocks. Next is shown a scatterplot of numbers of blocks shared with French-speaking Swiss and U.K. samples, for all samples from France, Italy, Greece, Turkey, and Cyprus. We see that the numbers of recent ancestors each Italian shares with the French-speaking Swiss and with the United Kingdom are both bimodal, and that these two are positively correlated, ranging continuously between values typical for Turkey/Cyprus and for France. Figure (B) is similar, showing that the substructure within the United Kingdom is part of a continuous trend ranging from Germany to Ireland. The outliers visible in the scatterplot of Figure 2B are easily explained as individuals with immigrant recent ancestors—the three outlying U.K. individuals in the lower left share many more blocks with Italians than all other U.K. samples, and the individual labeled “SK” is a clear outlier for the number of blocks shared with the Slovakian sample.

In the UK, there is a negative correlation between blocks shared with Ireland and those shared with Germany, what seems to imply a dual origin of Britons. 


Age estimates (double them?):

The authors also get to estimate ages, however it seems obvious from their own data that the results should be multiplied by 2.2 or something like that to make good sense:

Figure 4. Estimated average number of most recent genetic common ancestors per generation back through time.
Estimated average number of most recent genetic common ancestors per generation back through time shared by (A) pairs of individuals from “the Balkans” (former Yugoslavia, Bulgaria, Romania, Croatia, Bosnia, Montenegro, Macedonia, Serbia, and Slovenia, excluding Albanian speakers) and shared by one individual from the Balkans with one individual from (B) Albanian-speaking populations, (C) Italy, or (D) France. The black distribution is the maximum likelihood fit; shown in red is smoothest solution that still fits the data, as described in the Materials and Methods. (E) shows the observed IBD length distribution for pairs of individuals from the Balkans (red curve), along with the distribution predicted by the smooth (red) distribution in (A), as a stacked area plot partitioned by time period in which the common ancestor lived. The partitions with significant contribution are labeled on the left vertical axis (in generations ago), and the legend in (J) gives the same partitions, in years ago; the vertical scale is given on the right vertical axis. The second column of figures (F–J) is similar, except that comparisons are relative to samples from the United Kingdom.

I say that mainly because the shared ancestry between Balcans and both Italy and France is dated here to around 3000 or 3500 years ago, when it would fit much better to c. 7500 years ago (as much as 8000 BP for some parts of Italy), when the Neolithic expansion was ongoing. There is no particular reason why the Balcans would be related to France and Italy c. 3000 years ago specifically, unless one believes in undocumented massive Mycenaean migrations or something like that (and what about Albania then?)

However I am getting a headache with this issue because no correction, low or high seems good enough for all pairs, so, well, just take this part with your usual dose of healthy skepticism.

Some (annotated) excerpts:

In most cases, only pairs within the same population are likely to share genetic common ancestors within the last 500 years [i.e.: ~1100 years]. Exceptions are generally neighboring populations (e.g., United Kingdom and Ireland). During the period 500–1,500 ya [i.e. ~1100-3300 years ago: most of the Metal Ages], individuals typically share tens to hundreds of genetic common ancestors with others in the same or nearby populations, although some distant populations have very low rates. Longer ago than 1,500 ya [i.e. before ~3300 years ago: before the Late Bronze Age crisis], pairs of individuals from any part of Europe share hundreds of genetic ancestors in common, and some share significantly more.

On Italy:
There is relatively little common ancestry shared between the Italian peninsula and other locations, and what there is seems to derive mostly from longer ago than 2,500 ya [i.e. ~5500 y.a.: Megalithic era onwards]. An exception is that Italy and the neighboring Balkan populations share small but significant numbers of common ancestors in the last 1,500 years [i.e. after 3750 years: since the Mycenaean period] ...

On Iberia:
Patterns for the Iberian peninsula are similar, with both Spain and Portugal showing very few common ancestors with other populations over the last 2,500 years [i.e. 5500 years: Megalithic era onwards]. However, the rate of IBD sharing within the peninsula is much higher than within Italy... 

The low Iberian relationship with other populations seems to preclude this region as source for the conjectured re-expansion of mtDNA H and other Western lineages. I would suggest looking to (Western) France for an alternative source, as this state's heterogeneous population shares more intense relations with other Western peoples around what could be c. 6200 BP, what is at the very beginning of Megalithic spread in Atlantic Europe, for which Armorica (Brittany and neighboring Western France) could well have been a major source (and definitely was in the case of Britain).

Of course, if you prefer to use the authors' estimates, it would have no influence on the hypothesis because they simply can't reach so far back in time, it seems. But I feel more comfortable overall reformulating the hypothesis towards Armorica.

For better reading of each pair of relationships through time, I include here fig. S16:


The maximum likelihood history (grey) and smoothest consistent history (red) for all pairs of population groupings of Figure S12 (including those of Figure 5). Each panel is analogous to a panel of Figure 4; time scale is given by vertical grey lines every 500 years. For these plots on a larger scale, see Figure S17.

As said before, I suggest to read each vertical grey line (counting from left) as meaning ~1100 years rather than just 500.



Update (Jun 23): on IBD-based molecular-clock-o-logy:

I have now and then found strange insistence on IBD-based chronological estimates being almost beyond reasonable doubt. I admittedly don't know a great deal on the matter, so when Davidski (see comments) insisted again on that, I asked him for a reference, so I could learn something. He kindly suggested me to read Gusev et al. 2011, The Architecture of Long-Range Haplotypes Shared within and across Populations, which is indeed a good paper. However I could not find the clearly explained basis for the chronological estimates in general, probably buried deep in the bibliography. What I found instead was a clear example of these being short from historical reality by a lot.

This example corresponds to one of the best documented populations to have suffered a "recent" bottleneck event: Ashkenazi Jews (AJ). According to Gusev et al., these would have suffered a bottleneck (founder effect of some 400 nuclear families followed by expansion) around 20 generations ago (~600 years = 1400 CE) or, a few lines later more specifically: 23 generations ago (~1320 CE). So here we do have a clear case study.

When we look at historical reality however, it is just impossible that AJ would have their founder effect bottleneck so late. Historical records document them often already in the Frankish period and they were definitely a vibrant expanding community by the time of the founding of Prague and Krakov c. 900 CE. A historical reasonable estimate for the AJ founder effect should be instead c. 700 CE, when they begin to appear in historical records, or maybe even a bit earlier, because of the lack of documentation in the Dark Ages.

That is not at all a mere 20-23 generations ago but almost double (counting generation time = 30 years, if gen-time would be 27 years, for example, the difference between estimates and reality would be even greater). Assuming a very reasonable AJ founder effect at 700 CE, then:
  • For gen-time = 30 years → 43 generations till now → 43/23 = 1.9 times for realistic correction
  • For gen-time = 27 years → 48 generations → 48/23 = 2.1 times for realistic correction
  • For gen-time = 25 years → 52 generations → 52/23 = 2.3 times for realistc correction
While it has become nowadays standard issue to assimilate generation time to 30 years, this is not any absolute measure because the actually observed generation time (i.e. the age difference between parental and child generations on average) varies in real life depending on cultural factors (such as marriage age), gender (female generation time is almost invariably shorter than male), life expectancy (mothers dead at birth at young age, for example, don't have any more children), etc. So it is in the fine detail a somewhat blurry issue, with some significant variability among cultures and surely also through time.

Another issue is if this "short term" estimate correction is stable along time or does in fact vary somewhat. I can't say.

Whatever the case, the approximate x2 correction proposed above, seems to stand in general terms.