March 7, 2013

Y-DNA survey of Bulgaria

Filling in a blank in the Y-DNA mapping of Europe comes this new paper:

Sena Karachanak et al., Y-Chromosome Diversity in Modern Bulgarians: New Clues about Their Ancestry. PLoS ONE 2013. Open accessLINK [doi:10.1371/journal.pone.0056779]


To better define the structure and origin of the Bulgarian paternal gene pool, we have examined the Y-chromosome variation in 808 Bulgarian males. The analysis was performed by high-resolution genotyping of biallelic markers and by analyzing the STR variation within the most informative haplogroups. We found that the Y-chromosome gene pool in modern Bulgarians is primarily represented by Western Eurasian haplogroups with ~ 40% belonging to haplogroups E-V13 and I-M423, and 20% to R-M17. Haplogroups common in the Middle East (J and G) and in South Western Asia (R-L23*) occur at frequencies of 19% and 5%, respectively. Haplogroups C, N and Q, distinctive for Altaic and Central Asian Turkic-speaking populations, occur at the negligible frequency of only 1.5%. Principal Component analyses group Bulgarians with European populations, apart from Central Asian Turkic-speaking groups and South Western Asia populations. Within the country, the genetic variation is structured in Western, Central and Eastern Bulgaria indicating that the Balkan Mountains have been permeable to human movements. The lineage analysis provided the following interesting results: (i) R-L23* is present in Eastern Bulgaria since the post glacial period; (ii) haplogroup E-V13 has a Mesolithic age in Bulgaria from where it expanded after the arrival of farming; (iii) haplogroup J-M241 probably reflects the Neolithic westward expansion of farmers from the earliest sites along the Black Sea. On the whole, in light of the most recent historical studies, which indicate a substantial proto-Bulgarian input to the contemporary Bulgarian people, our data suggest that a common paternal ancestry between the proto-Bulgarians and the Altaic and Central Asian Turkic-speaking populations either did not exist or was negligible.

Surely the most informative material is the haplogroup list (fig. 2, organized by pre-1999 provinces, in order to be as accurate as possible with the locality of paternal ancestry):

See ISOGG for the most up-to-date standard nomenclature

I find notable that the most common haplogroup seems to be I (27.6%) in all main variants: I1 (4.3%), I2a1b (20.2%) and I2a2a (1.7%), plus 0.4% I*. The only important missing clade is Western Mediterranean I2a1a. It reinforces my notion of I originating in SE Europe, Eastern Balcans or Ukraine maybe. Feel free to correct me if you think you know better anyhow. 

The second most common lineage is haplogroup E1b1b1 (M35), most of which belongs to E1b1b1a1b-V13 (18.1%), belonging to a local Balcan structure (although it is ultimately African, of course). This paper claims an Epipaleolithic origin for this Balcanic cluster, although experiencing Neolithic expansion. Not really surprised about this, I must say.

The third most notable haplogroup is R1a1a (17.5%), an Eastern European and Northern South Asian lineage with some spread in Central Asia as well. The authors remain cautious on this lineage's origins and dates of spread in wait of further phylogenetic resolution.

Within R1b (10.7%) the authors mention the relatively high frequency of R1b1a2a* (L23) at 5.2%. There is still some research to make in understanding the possible roots of R1b1a2 (M269) and its first sublayer R1b1ba2a (L23) in the Balcans (where both are important, at least relative to the frequencies of R1b in general) and in West Asia, where the highest diversity seems to be in Iran. I'm recycling here for the purpose of comparison a map I drew in 2010 (based on Myres et al., see here):

Notice that the nomenclature is a bit obsolete and that marker M529 was mistyped as M259

Haplogroup J2 (10.7%) is also important, while the presence of J1 is minor (3.4%), both in reasonable accordance with what is found in nearby populations. The authors attribute a Neolithic origin to the expansion of subclade J2b2 (M241) in the Balcans.

Less important are G (4.8%) and T (1.6%). Minimal presence exists of E(xE1b1b1) (0.4%), Q (0.4%) and C (0.5%). This last seems to reject any significant retaining of the presumable Turkic ancestry of the Bulgarian ethnicity (the original Bulgars were Turkic speakers, somehow related to the Huns). 


Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... OFF (keep it that way, please)