Genetic make up and structure of Colombian ... - Wiley Online Library

7 downloads 0 Views 687KB Size Report
Jun 8, 2010 - 1Laboratory of Molecular Genetics, Institute of Biology, University of ... 2Institute for Immunological Research, University of Cartagena, ...
AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 143:13–20 (2010)

Genetic Make Up and Structure of Colombian Populations by Means of Uniparental and Biparental DNA Markers Winston Rojas,1* Marı´a Victoria Parra,1 Omer Campo,1 Marı´a Antonieta Caro,1 Juan Guillermo Lopera,1 William Arias,1 Constanza Duque,1 Andre´s Naranjo,1 Jharley Garcı´a,1 Candelaria Vergara,1,2 Jaime Lopera,1 Erick Hernandez,1 Ana Valencia,1 Yuri Caicedo,1 Mauricio Cuartas,1 Javier Gutie´rrez,1 Sergio Lo´pez,1 Andre´s Ruiz-Linares,3 and Gabriel Bedoya1 1

Laboratory of Molecular Genetics, Institute of Biology, University of Antioquia, Medellı´n, Colombia Institute for Immunological Research, University of Cartagena, Cartagena, Colombia 3 Department of Genetics, Evolution and Environment, University College, London, UK 2

KEY WORDS

Colombia; mestizo; genetic markers; admixture

ABSTRACT Colombia is a country with great geographic heterogeneity and marked regional differences in pre-Columbian native population density and in the extent of past African and European immigration. As a result, Colombia has one of the most diverse populations in Latin America. Here we evaluated ancestry in over 1,700 individuals from 24 Colombian populations using biparental (autosomal and X-Chromosome), maternal (mtDNA), and

paternal (Y-chromosome) markers. Autosomal ancestry varies markedly both within and between regions, confirming the great genetic diversity of the Colombian population. The X-chromosome, mtDNA, and Y-chromosome data indicate that there is a pattern across regions indicative of admixture involving predominantly Native American women and European and African men. Am J Phys Anthropol 143:13–20, 2010. V 2010 Wiley-Liss, Inc.

Colombia has one of the most diverse populations in Latin America. Of a total population of 50 million, 86% recently self-reported as of mixed ancestry, 10.5% as African-Colombian, and 3.4% as Native American (DANE, 2006). The population of mixed ancestry concentrates mainly in urban areas, particularly on the Andes. African-Colombians live predominantly on the Caribbean and Pacific coasts and islands. Native American populations concentrate mainly in the East (on the vast Orinoco and Amazon river basins) and in rural areas of the SouthWest and North of the country. Genetic studies can provide refined descriptions of patterns of admixture in human populations. These studies can provide insights into the biological history of these populations. They are also important for the identification of trait-associated genes through ‘‘admixture mapping’’ (Smith et al., 2001; Collins-Schramm et al., 2002; Patterson et al., 2004; Smith and O’Brien, 2005; Seldin, 2007) and because of the need for correcting for population stratification in association studies (Pritchard and Rosenberg, 1999; Hoggart et al., 2003). Admixture analyses use so called ‘‘ancestry informative markers’’ (AIMs; i.e.. autosomal and X-chromosome markers showing large allele frequency differences in the parental populations) (Chakraborty et al., 1992; Dean et al., 1994; Parra et al., 1998; Tang et al., 2005), as well as Y-chromosome and mtDNA variants defining paternal and maternal lineages with well established geographic distributions (e.g., American, European, or African). In recent years, admixture studies have documented the considerable heterogeneity of the population of Latin America, albeit with great variation in the number and type of genetic markers used across studies (Dipierri et al, 1998; Carvajal-Carmona et al., 2000; Sans, 2000; CollinsSchramm et al., 2002; Rodas et al., 2003; Bertoni et al., 2005; Bolnick et al., 2006; Mao et al., 2007; Price et al.,

2007; Seldin et al., 2007; Wang et al., 2008). We have previously reported detailed analyses of admixture in the province of Antioquia in North-West Colombia. Here we aim to perform a broader description of admixture across Colombia through the analysis of autosomal, X-chromosome, mtDNA, and Y-chromosome markers in 24 population samples from regions with different settlement histories. This represents one of the largest surveys of admixture carried out so far in a Latin American country.

C 2010 V

WILEY-LISS, INC.

C

MATERIALS AND METHODS Population samples A total of 24 population samples were examined across Colombia (Fig. 1 and Table 1). These comprise 15 urban centers (from the Andes and the sea coasts), 8 Native Additional Supporting Information may be found in the online version of this article. WR and MVP participated equally in this work. Grant sponsor: Colombian Institute for the Development of Science and Technology, COLCIENCIAS; Grant number: 111540520279; Grant sponsor: Programa de Sostenibilidad CODI 2009–2010 Universidad de Antioquia. *Correspondence to: Winston Rojas, Laboratorio de Gene´tica Molecular, Instituto de Biologia, Universidad de Antioquia, Medellı´n, Colombia. E-mail: [email protected] Received 13 August 2009; accepted 10 December 2009 DOI 10.1002/ajpa.21270 Published online 8 June 2010 in Wiley InterScience (www.interscience.wiley.com).

14

W. ROJAS ET AL. Americans groups and 1 Afro-Colombian population. The Native populations examined have been classified into three linguistic stocks (Chibchan, Andean, and Equatorial-Tucano). Several samples from previous population studies were used: Antioquia, Ticuna, Ingano, Zenu, Embera (Carvajal-Carmona et al., 2000; Bedoya et al., 2006); Cauca and Valle del Cauca came from repositories of paternity tests and were provided by the Genetics Unit of Valle University Hospital in 2004. Others samples were collected between 2003 and 2007 and stored as anonymous DNA samples in the laboratory of G.B.B; some samples were collected as controls in association studies of Asthma (Bolivar), Parkinson (Peque), Malaria (Waunana, Afro choco and Choco), and Chagas disease (Arhuaco, Kogi, Arzario, Magdalena, and Casanare). A standard genealogical interview was used to confirm that sampled individuals were not closely related and had four local grandparents. All individuals provided informed consent. Ethical approval for this study was provided by the Ethical committee of the Institute of Biology of the Universidad de Antioquia.

Genotyping Fig. 1. Geographic distribution of mestizo (rectangle), Native American (triangle) and African Colombian (circle) populations used in this study. The populations code as Table 1. In gray is represented the Andean region.

Uniparental markers. mtDNA ancestry was characterized in 1,737 individuals through PCR-RFLP and PCRAFLP using five single nucleotide (SNPs) and one indel according to experimental conditions described previously (Hertzberg et al., 1989; Bailliet et al., 1994;

TABLE 1. Geographic distribution of mestizo (rectangle), Native American (triangle) and African Colombian (circle) populations used in this study Auto reported ancestry (%)b Code

Population

Urban centera

Region

Classification

Native

African

MAc

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Bolı´var Magdalena Antioquia Pequed Caldas Quindio Norte Santander Santander Cundinamarca Casanare Narin˜o Cauca Huila Valle del Cauca Choco´e Afro chocoe Waunanae,f Emberaf Zenug Arhuacof Kogif Arzariof Inganoh Ticunai

Cartagena Santa Marta Medellı´n Peque Manizales Armenia Cu´cuta Bucaramanga Bogota´ Yopal Pasto Popaya´n Neiva Cali Quibdo Quibdo Quibdo Dabeiba Sincelejo Santa Marta Santa Marta Santa Marta Mocoa Leticia

Caribbean Caribbean North west North west North west Central west Northeast Northeast Central Central east South west South west South west South west Pacific Pacific Pacific Caribbean Caribbean Caribbean Caribbean Caribbean South west South east

Mestizo Mestizo Mestizo Mestizo Mestizo Mestizo Mestizo Mestizo Mestizo Mestizo Mestizo Mestizo Mestizo Mestizo Mestizo Afrodescent Native Native Native Native Native Native Native Native

0.1 0.8 0.5 10.0 4.3 0.4 0.6 0.1 0.3 1.5 11 22 1.1 0.5 12 0 100 100 100 100 100 100 100 100

27.6 9.83 10.8 2.0 2.54 2.46 1.85 3.15 3.37 1.44 18.82 22.19 1.17 27.2 82.7 100 0 0 0 0 0 0 0 0

72.3 89.4 88.7 88.0 93.2 97.1 97.5 96.7 96.3 97.1 70.4 56.3 97.8 72.2 5.42 0 0 0 0 0 0 0 0 0

a

The Capital Urban Center. According to 2005 Census (Departamento Administrativo Nacional de Estadı´stica, DANE). Mixed ancestry. d This population is localized within Antioquia but is separated [200 km away in the mountain regions. e The three populations (Mestizo, Native and Afro descend) come from near sites each other within Choco. f Chibcha. g Nonclassified. h Andean. i Ecuatorial-tucano. b c

American Journal of Physical Anthropology

ADMIXTURE IN COLOMBIAN POPULATIONS Macaulay et al., 1999). These markers discriminate six mitochondrial (mtDNA) haplogroups; major Native Americans haplogroups were defined as ‘‘A’’ (1HaeIII 663), ‘‘B’’ (deletion of 9 pairs of bases in the intergenic region COII/tRNAlys), ‘‘C’’ (2HincII 13259) and ‘‘D’’ (2AluI 5176). In addition, African ‘‘L’’ paragroup (1HpaI 3592) and European H haplogroup (2AluI 7025) were determined. Individuals that could not be assigned to the above haplogroups were classified as ‘‘other.’’ Paternal ancestry was determined in 1,116 Y- chromosomes using eight biallelic markers (through PCRRFLP). Haplogroups were defined using the current Y-chromosome nomenclature (Consortium, 2002; Karafet et al., 2008). In summary, ‘‘derived’’ condition YAP and ‘‘ancestral’’ M2 define DE*(xE1b1a); ‘‘derived’’ M2 is diagnostic for E1b1a; ‘‘ancestral’’ M9 define F*, although some of them could be A and B (almost entirely restricted to Africa) or C (at high frequencies in Asia). Though of African origin, B has been previously described for South America at low frequencies (Guerreiro-Junior et al., 2009). ‘‘Ancestral’’ M9 and ‘‘derived’’ 12f2a are descriptive for J; ‘‘derived’’ M9 and ‘‘ancestral’’ 92R7 are diagnostic for K*(xP); ‘‘derived’’ 92R7 and ‘‘ancestral’’ M167 are distinctive for P* and R*, which is defined as P*(xQ); ‘‘derived’’ M167 is diagnostic for R1b1b2d; ‘‘derived’’ M242 define haplogroup Q*(xQ1a3a) and ‘‘derived’’ M3 defined Q1a3a. Lineages that were not defined on the basis of a diagnostic character (indicated by the symbol *), describes all chromosomes except those between parenthesis and can be called ‘‘paragroups’’ because they were potentially paraphyletic. Both Q haplogroups are the major lineages seen amongst Native Americans (Bortolini et al., 2003). Lineage E1b1a, which includes the more frequent haplogroups within Africa, was taken as the paternal African contribution, although it is present at low frequency in Euro Asiatic populations (Karafet et al., 2008). All the remaining Y-chromosome haplogroups were considered European (although some of them have been observed in East Asia, Oceania, Indonesia and Australia there has been no substantial migration from these regions to Colombia; this was further excluded through the generalogical interview applied to each volunteer). The J lineage has high frequency in eastern and central Mediterranean populations (Di Giacomo et al., 2004). The majority of Western European Y-chromosomes can be assigned to the R* haplogroup, with the R1b1b2d haplogroup being most frequent in the Spanish Basque and Catalan populations (Flores et al., 2004). Y chromosome diversity was further evaluated with six short tandem repeats (STRs) on the Y-chromosome nonrecombinant region: DYS19, DYS388, DYS390, DYS391, DYS392, and DYS393. These were typed as previously described (Thomas et al., 1999; Bedoya et al., 2006). Biparental markers. Eleven unlinked autosomal AIMs (indels and SNPs) were typed using PCR-RFLP and PCR-FLP in 1,781 individuals using previously described protocols (Parra et al., 1998; Shriver et al., 2003). Additionally, eight X-linked STRs were genotyped in 385 males as described in detail in Collins-Schramm et al., (2002). The X-chromosome markers could not be evaluated in Peque, Embera, Arzario, Zenu, Ingano, and Ticuna populations.

Statistical analyses The proportion of maternal and paternal ancestry of each population sample was estimated as the percentage

15

of continental-specific mtDNA and Y chromosome haplogroups. The ‘‘expected’’ autosomal ancestry was taken as the mean of the mtDNA and Y-chromosome ancestries. Within and between population diversity estimates were obtained using Arlequin v3.1 (Excoffier et al., 2005) available on http://cmpg.unibe.ch/software/arlequin3. Autosomal and X-chromosome admixture proportions were obtained using Long’s WLS method (Long, 1991) as implemented in the ADMIX.PAS program. For these calculations, allele frequencies for Colombian natives were used, assuming that they contributed to the admixture of local mixed populations (Wang et al., 2008). Public databases and the literature were used to obtain allelic frequencies of autosomal and X-Chromosome markers for European and African populations (Parra et al., 1998; Shriver et al., 2003; Bedoya et al., 2006) (http:// www.marshfieldclinic.org/research/pages/index.aspx). Individual ancestry proportions were obtained using Structure v2.2 (Pritchard et al., 2000) (http://pritch.bsd.uchicago.edu). We used a model of admixture with K 5 3 (parental populations), correlated allelic frequencies and unsupervised runs of 50,000 burn-in followed by 50,000 iterations. The major proportion of K1 in Colombian Native Americans and K2 in Afro Choco were used to infer the parental putative Native American and African populations, respectively; European ancestral population was deduced as K3. Estimates of within and between population diversity were obtained by using Arlequin v3.1 (Excoffier et al., 2005).

RESULTS Uniparental ancestry The frequency of six major mtDNA linages in the Colombian populations examined are shown in Table 2. In the urban populations, the average frequency for Native American mtDNA haplogroups (A–D) was 90% (range: 56–100%). Haplogroup A has the highest frequency in the combined sample (37%), followed by haplogroup B (33%) and haplogroup C (15%). The highest frequency of African paragroup L was observed in the sample from Afro Choco (53%) and the lowest in Caldas (2%). The lowest mtDNA diversity was observed in Peque (0.53) and Cundinamarca (0.46) and the highest in Santander, Magdalena, and Narin˜o (0.75). Diversity is similar among populations from the north (Bolivar, Magdalena), northeast (Santander del Norte and Santander), Center west (Quindı´o), and Southwest (Narin˜o). Urban centers showed a significant differentiation in mtDNA haplogroup frequencies (Fst 5 0.066, P \ 0.01). A high differentiation was observed amongst Native populations (Fst 5 0.38, P \ 0.001), which also showed relatively lower mtDNA diversity. Urban populations had a high frequency of non-Native Y-chromosome haplogroups (Table 3) with a predominance of European lineages (average haplogroup P*(xQ) frequency of 43,5%), followed by Native lineages (Haplogroup Q frequency of 11.9%), J (10.2%), F* (9.9%), E1b1a (8.9%), DE*(xE1b1a) (7.4%), K*(xP) (5.6%), and R1b1b2d (2.6%). European Y lineages were observed in Native (Embera and Zenu), Afro Choco, and all urban populations. Amerindian haplogroups Q had highest frequency ([ 20%) in Magdalena, Peque, Quindı´o, Cundinamarca, and Narin˜o and lowest frequency (\ 5%) in Antioquia and Caldas. The highest frequency of the African E1b1a haplogroup was observed in the samples from Afro Choco´ (24%), Bolivar (26%), and Valle del Cauca American Journal of Physical Anthropology

16

W. ROJAS ET AL. TABLE 2. Frequency of mtDNA haplogroups in Colombian populations Native

Population Bolivar Magdalena Antioquia Peque Caldas Quindı´o Norte Santander Santander Cundinamarca Casanare Narin˜o Cauca Huila Valle Choco Average Mestizo Afro choco Arhuaco Kogi Arzario Embera Waunana Zenu Ingano Ticuna a

n

A

B

C

D

80 32 80 163 193 58 35 82 24 24 206 61 24 109 161 1332 47 54 75 18 43 74 46 20 28

0.21 0.38 0.45 0.17 0.53 0.36 0.48 0.22 0.65 0.55 0.30 0.36 0.33 0.21 0.37 0.37 0.19 0.88 0.80 0.68 0.76 0.22 0.19 0.14 0.13

0.24 0.22 0.37 0.66 0.37 0.43 0.09 0.34 0.35 0.20 0.32 0.16 0.38 0.44 0.37 0.33 0.15

0.06

0.05 0.16 0.01

0.07 0.11 0.08 0.14 0.09 0.07

African

H

L

Other

Ha

0.21 0.13 0.05 0.06 0.02 0.05 0.14 0.09

0.23 0.11

0.71 0.75 0.64 0.53 0.58 0.67 0.71 0.75 0.46 0.63 0.75 0.67 0.71 0.69 0.71 0.66 0.65 0.21 0.36 0.44 0.37 0.64 0.72 0.66 0.63

0.05

0.02 0.20 0.28

0.25 0.20 0.41 0.13 0.15 0.14 0.13 0.13 0.12 0.20 0.32

0.24 0.51 0.41 0.43 0.14

European

0.15 0.02 0.08 0.04 0.03 0.07

0.03 0.30 0.39 0.37

0.03 0.02 0.08 0.07

0.03 0.08 0.07 0.06 0.53

0.01

0.24 0.05

0.01 0.02 0.03

0.05 0.04 0.02

0.34

Diversity defined as H 5 n (1 2 Spi )/n 2 1; n 5 numbers of copies; pi 5 frequency of i haplogroup. 2

TABLE 3. Frequency of Y-chromosome haplogroups in Colombian populations Native Population Bolivar Magdalena Antioquia Peque Caldas Quindı´o Norte Santander Santander Cundinamarca Casanare Narin˜o Cauca Huila Valle del Cauca Choco Average mestizo Afro choco´ Waunana Embera Zenu Arhuaco Kogi Arzario Ingano Ticuna a

European

African

n

Q*(xQ1a3a)

Q1a3a

P*(xQ)

R1b1b2d

K*(xP)

F*

J

DE*(xE1b1a)

E1b1a

Ha

183 12 51 62 190 14 31 58 22 9 170 33 7 36 76 954 21 29 13 22 19 17 6 9 26

0.005

0.027 0.333 0.020 0.306 0.011 0.286 0.097 0.034 0.182 0.111 0.212 0.152 0.143 0.028 0.066 0.097

0.290 0.167 0.392 0.500 0.489 0.643 0.387 0.517 0.455 0.444 0.394 0.545 0.857 0.389 0.605 0.435 0.286

0.005

0.038 0.083 0.196 0.016 0.121

0.131 0.250 0.078 0.016 0.100

0.098

0.148 0.083 0.059 0.032 0.063

0.257 0.083 0.039 0.032 0.037

0.097

0.032

0.017 0.045

0.190

0.045

0.071

0.018 0.030

0.112 0.091

0.091 0.111 0.065 0.121

0.028

0.083 0.026 0.056 0.095

0.083 0.092 0.099

0.028 0.039 0.102 0.095

0.028 0.053 0.074 0.048

0.333 0.079 0.089 0.238

0.045

0.091

0.83 0.82 0.69 0.66 0.70 0.58 0.70 0.75 0.73 0.73 0.76 0.75 0.40 0.76 0.61 0.70 0.86 0.37 0.28 0.48 0.52

0.065 0.011 0.032 0.034 0.045 0.041

0.039 0.022 0.238 0.931 0.077 0.182

0.889 0.269

0.069 0.846 0.591 1.000 1.000 1.000 0.111 0.731

0.059 0.016 0.086

0.026

0.077 0.091

0.157 0.032 0.153 0.071 0.355 0.121 0.136 0.333 0.053 0.061

0.035

0.22 0.32

Diversity as defined in mtDNA haplogroups.

(33%). Y-chromosome diversity is similar across urban populations but relatively low in the Native American samples. A significant differentiation in Y-chromosome haplogroup frequency between urban populations was observed (Fst 5 0.042, P \ 0.01). Summary statistic of STR haplotype diversity are shown in Table 4 (Allelic frequencies for Y-chromosome STRs are given in AppenAmerican Journal of Physical Anthropology

dix 1 and the absolute frequency of each six-locus haplotype are shown in Appendix 2). Haplogroup P*(xQ) had the lowest diversity (modal 1 one step derived haplotypes: 0.68, number of pairwise differences: 1.76). The highest diversity was observed in the Amerindian Q*(xQ1a3a) and Q1a3a haplogroups, modal 1 one step derived haplotypes: 0.33 for each one and a number of

17

ADMIXTURE IN COLOMBIAN POPULATIONS pairwise differences of 2.38 and 2.72, respectively). The remaining haplogroups showed highly differentiated haplotypes and a low frequency of the modal and related haplotypes. Some Q haplotypes found in urban populations (No. 79, 139, 268, and 299 in Appendix 2) were not observed in Colombian Natives nor have been reported elsewhere (www.yhrd.org).

Biparental ancestry Allelic frequencies for autosomal markers in Colombian populations are listed in Appendix 3. Available data for African and European populations were used to estimate admixture proportions in each population (Table 5). On average, urban populations had a similar Native American and European autosomal ancestry (47 and 42%, respectively) and relatively low African ancestry (10%). However, there is considerable heterogeneity across regions. African ancestry is relatively high in Afo Choco´ ((68%), in the urban samples from the Caribbean coast (Bolivar 44% and Magdalena 28%), and the southwest (Cauca 24% and Valle del Cauca 22%). A greater Amerindian than European ancestry is observed in Bolivar, Peque, Norte de Santander, Casanare, Narin˜o, Cauca, and Huila; while a higher European than TABLE 4. Diversity estimatives of haplotypes within Y-chromosomes haplogroups in Mestizo Populations of Colombia

Haplogroup

Haplotype Haplotype Modal 1 n Haplotypes Modala one step

P*(xQ) 415 R1b1b2d 25 K*(xP) 53 F* 94 J 97 DE*(xE1b1a) 71 E1b1a 85 Q*(xQ1a3a) 21 Q1a3a 93 a b

82 7 21 50 35 25 42 12 46

1 1.1 11 25 12 7 14 3 3.1

0.68 0.64 0.32 0.12 0.24 0.46 0.25 0.33 0.33

MPDb 1.76 1.24 2.43 3.42 2.49 2.33 2.62 2.38 2.72

6 6 6 6 6 6 6 6 6

1.02 0.81 1.34 1.77 1.36 1.29 1.41 1.35 1.46

Modal haplotype as defined in Table S2. MPD 5 Mean number of pairwise differences.

Native ancestry is seen in Magdalena, Antioquia, Caldas, Quindio, Santander, and Afro Choco´. A similar Amerindian and European ancestry is observed in Cundinamarca, Valle del Cauca, and Choco. Relative Amerindian/African contribution is smaller in Bolivar, Magdalena, and Afro Choco. Admixture proportions for each population were also obtained with X-chromosome markers (Table 5; allelic frequencies are provided in Appendix 4). These data show that, for all populations, Native American ancestry on the X-chromosome is higher than on the autosomes. The expected ancestry (with uniparental markers) for each parental population is given in Table 5. When the autosomal estimated and expected Native American ancestry was compared, both were similar (with differences \5%) only in some populations (Peque, Santander del Norte, Narin˜o, Cauca, and Huila); while in others the estimated ancestry was relatively lower than the expected (Bolivar, Magdalena, Antioquia, Caldas, Quindio, Santander, Cundinamarca, Valle del cauca, and Choco). Estimated and expected European ancestry is similar only in Peque, Santander del Norte, Cundinamarca, Huila, Valle del Cauca y Choco. In Bolivar, Magdalena, Antioquia, Cauca, and Valle del Cauca a higher African ancestry was observed compared to the expected. The distribution of estimated individual ancestry proportions in each Colombian population are shown in Figures 2–4. A high proportion of individuals (50%) with a high African ancestry (60%) was observed in Bolivar and Magdalena. In Valle del Cauca and Cauca a similar level of African ancestry was only found in 30% of individuals (in all other populations less than 20% of individuals had this level of African ancestry). A high Native American ancestry (60%) was observed in a large proportion of individuals from Casanare (55%), Peque (33%), Narin˜o (31%), Cauca (28%), and Cundinamarca (27%).

DISCUSSION Consistent with other studies in Latin America, the Colombian populations we examined here have evidence of a past biased admixture in that mtDNA ancestry is predominantly Native and Y-chromosome ancestry pre-

TABLE 5. Percentage of expected (mtDNA frequency 1 Y-chromosome frequency/2) and estimated (autosomal and X-Chromosome) Native, European, and African admixture in Mestizo populations of Colombia Expected autosomic admixture Population Bolivar Magdalena Antioquia Peque Caldas Quindio Norte Santander Santander Cundinamarca Casanare Narin˜o Cauca Huila Valle del Cauca Choco Afro Choco a

Native

European

African

40 61 45 66 50 66 48 49 61 55 62 56 56 46 51 33

34 27 50 30 48 32 44 45 40 45 38 42 44 38 43 34

27 13 5 5 2 3 9 6 0 0 1 3 0 16 7 34

Estimated autosomic admixture Native 32.9 21.8 26.0 62.2 36.4 38.3 53.3 42.4 51.6 74.7 65.2 56.9 60.8 39.3 44.8 10.8

6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6

4.1 12.1 9.0 5.5 7.7 7.9 6.5 7.9 12.1 6.6 5.2 7.8 8.5 5.9 9.2 10.8

European 23.3 50.0 63.5 32.1 59.3 57.3 42.0 56.2 45.4 24.5 32.1 19.6 39.6 39.2 46.6 21.1

6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6

3.5 10.9 4.8 5.3 7.4 7.6 6.3 7.8 11.8 6.6 5.1 6.9 8.4 5.4 8.6 9.4

Estimated X-chromosome admixture

African

Native

European

African

6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6

70.9 6 10.3 75.1 6 12.9 43.5 6 8.3 NDa 63.3 6 13.8 96.9 6 2.7 79.4 6 5.9 77.8 6 16.9 87.7 6 6.2 81.8 6 7.1 96.3 6 9.8 76.3 6 13.4 69.8 6 13.9 78.7 6 9.1 87.2 6 7.1 58.7 6 16.8

20.4 6 20.4 37.5 6 26.7 45.3 6 15.2 NDa 55.7 6 26.4 21.3 6 13.9 31.9 6 12.7 66.9 6 37.8 4.5 6 13.3 64.9 6 17.9 0 6 26.4 0 6 22.5 0.4 6 23.9 31.1 6 19.3 5.4 6 15.4 30.5 6 30.9

4.8 6 17.7 0 6 23.0 11.2 6 13.3 NDa 0 6 23.0 0 6 11.1 0 6 10.9 0 6 32.1 7.8 6 11.5 0 6 15.0 6.5 6 22.1 33.7 6 19.8 29.7 6 21.1 0 6 16.6 7.4 6 13.2 10.8 6 27.1

43.8 28.2 10.6 5.8 4.3 4.4 4.7 1.4 3.0 0.8 2.7 23.5 0.0 21.5 8.6 68.1

2.7 7.6 4.8 2.6 32.6 3.3 2.9 2.4 4.7 2.0 2.0 5.0 1.0 3.6 4.6 7.1

Not determined.

American Journal of Physical Anthropology

18

W. ROJAS ET AL.

Fig. 2. Ancestral proportion of Native American populations in several stratification levels in mestizo populations.

Fig. 3. Ancestral proportion of European populations in several stratification levels in mestizo populations.

American Journal of Physical Anthropology

Fig. 4. Ancestral proportion of African populations in several stratification levels in mestizo populations.

dominantly non-Native. The most common mtDNA lineages in the admixed Colombian populations are Native haplogroups A and B. These lineages are also the most frequent in Native American Chibchan groups widely spread across Colombia, consistent with a genetic continuity between pre and post Columbian populations in the country (Carvajal-Carmona et al., 2000; Wang et al., 2007). Maternal ancestry shows regional differentiation, particularly an important African maternal contribution in the Atlantic (Bolivar and Magdalena), Pacific (Afro Choco), and Northeast (Norte Santander). Interestingly, certain populations (Caldas, Cundinamarca, Casanare, and Peque) have high frequencies of a single mtDNA haplogroup, suggesting a possible maternal founder effect in these populations. There is considerable heterogeneity in Y-chromosome ancestry across the Colombian populations examined. European ancestry is detectable across the country; African ancestry is high mainly in the West and Northwest (in Choco and the mestizo populations from Bolivar and Valle del Cauca). The occurrence, at similar frequencies, of highly differentiated African E haplotypes suggests an ancestry from differentiated African populations. The relatively low diversity of the common European haplogroup P*(xQ) suggests that this lineage was introduced by a relatively small number of European founders (Melo, 1996; Bedoya et al., 2006). The relatively high diversity of Amerindian Q haplogroups is in agreement with genetically differentiated Native male founders. Additionally, the higher differentiation level in maternal mtDNA compared with paternal Y-chromosome ancestry in urban populations is suggestive of a greater gene flow of men in relation to women.

ADMIXTURE IN COLOMBIAN POPULATIONS

Fig. 5. Proposed models of interethnic admixture in the formation of mestizo populations from Colombia. 1) Admixture between Native Americans women and European men in approximate proportions 1:1 without posterior introduction of any parental populations after the first admixture event (white); 2) Same as 1 but also an extra introduction of Native Americans Y-Chromosomes (gray clear). 3) Same as 1 but also an extra introduction of European Y-Chromosomes (gray shadow). 4) Same as 1 but with an African contribution across the admixture process.

The higher Native and African ancestry estimated from an X-chromosome markers, compared with autosomes, agrees with the sex-biased admixture at the origin of the populations examined inferred from the mtDNA and Y-chromosome data (Bedoya et al., 2006; Wang et al., 2008). Admixture between Native American, European, and African groups may have occurred to different extents in different parts of Colombia, further contributing to geographic structure in the patterns of genetic variation in mestizo populations. When the differences between estimated (autosomal) and expected (mtDNA and Y-chromosome) admixture were analyzed, patterns allow to raise several probable scenes in the formation of Colombian hybrids population’s (see Fig. 5): The first one in which Amerindian women and European men were admixtured in approximate proportions 1:1 with no posterior introduction of any parental populations, i.e., Norte Santander, Cundinamarca, and Choco. Another in which there was posterior introduction of Native Americans Ychromosomes, later the formation of the hybrid population, which can explain the major Native American contribution in some populations as Peque, Casanare, Narin˜o, Cauca, and Huila. The third one where a gene flow of European men following the initial admixture, which could arise the greater European admixture relative to Native American (Antioquia, Caldas, Quindio, and Santander); finally, another one in which the contribution of three parental populations Native American, African, and European took place (Bolivar, Magdalena, Cauca, and Valle del Cauca). The autosomal estimates of admixture show that genetic ancestry can vary substantially not only across the country but even between regions located relatively close to each other. For instance, although Peque is located within the political unit of Antioquia, it has a

19

markedly different ancestry than the sample obtained in Medellin (which is likely to approximate a region-wide average). Conversely, the ancestry of the province of Caldas is similar to that of Antioquia; probably because Caldas was established by migrants from the central Antioquian settlements during the 18th to 19th centuries (a colonizing movement denominated ‘‘Colonizacio´n Antioquen˜a). The results suggest that African autosomic estimated ancestry was more correlated with African autoreported ancestry in 2005 Census (r2 5 0.472, P 5 0.003, data not shown), mainly for Bolivar, Valle del Cauca, and Cauca, but not for Native American or mixed ancestry individuals (r2 5 0.161, P 5 0.123 and r2 5 0.157, P 5 0.127, respectively). This highlight that ethnic self-identification in Colombia is culturally and biologically complex, and is not sufficient for association studies using this kind of variable. Our results also suggest that there is some degree of substructuration within populations; for example, it is possible to find population’s fractions with lower and higher ancestry for any putative ancestral groups. These results emphasize the importance of appropriate population matching when designing case/control studies for human diseases in Latin American populations (even within regions of a specific country).

ACKNOWLEDGMENTS The authors thank all the individuals who agreed to participate in this study. They also thank Claudio Bravi for comments on earlier versions of the manuscript. This article is dedicated in memory of Sergio Lo´pez who passed away while the work was being carried out. They thank Colciencias (Programa Nacional para Estudios a Nivel de Doctorado).

APPENDIX Appendix 1: Allelic frequencies of Y-Chromosome STR markers in Colombian Populations. Appendix 2: Absolute Frequencies of six-STR haplotypes within each haplogroup in Colombian Populations. Identical haplotypes but in different haplogroups are distinguished by a comma. Appendix 3: Allelic frequencies of autosomal markers in Colombian Populations. The allele in each locus is defined as the presence of the insertions or the absence of the polymorphic restriction sites. Appendix 4: Allelic frequencies of X-Chromosome marker in Colombian Populations.

LITERATURE CITED Bailliet G, Rothhammer F, Carnese FR, Bravi CM, Bianchi NO. 1994. Founder mitochondrial haplotypes in Amerindian populations. Am J Hum Genet 55:27–33. Bedoya G, Montoya P, Garcia J, Soto I, Bourgeois S, Carvajal L, Labuda D, Alvarez V, Ospina J, Hedrick PW, Ruiz-Linares A. 2006. Admixture dynamics in Hispanics: a shift in the nuclear genetic ancestry of a South American population isolate. Proc Natl Acad Sci USA 103:7234–7239. Bertoni B, Jin L, Chakraborty R, Sans M. 2005. Directional mating and a rapid male population expansion in a hybrid Uruguayan population. Am J Hum Biol 17:801–808. Bolnick DA, Bolnick DI, Smith DG. 2006. Asymmetric male and female genetic histories among Native Americans from Eastern North America. Mol Biol Evol 23:2161–2174.

American Journal of Physical Anthropology

20

W. ROJAS ET AL.

Bortolini MC, Salzano FM, Thomas MG, Stuart S, Nasanen SP, Bau CH, Hutz MH, Layrisse Z, Petzl-Erler ML, Tsuneto LT, Hill K, Hurtado AM, Castro-de-Guerra D, Torres MM, Groot H, Michalski R, Nymadawa P, Bedoya G, Bradman N, Labuda D, Ruiz-Linares A. 2003. Y-chromosome evidence for differing ancient demographic histories in the Americas. Am J Hum Genet 73:524–539. Carvajal-Carmona LG, Soto ID, Pineda N, Ortiz-Barrientos D, Duque C, Ospina-Duque J, McCarthy M, Montoya P, Alvarez VM, Bedoya G, Ruiz-Linares A. 2000. Strong Amerind/white sex bias and a possible Sephardic contribution among the founders of a population in northwest Colombia. Am J Hum Genet 67:1287–1295. Chakraborty R, Kamboh MI, Nwankwo M, Ferrell RE. 1992. Caucasian genes in American blacks: new data. Am J Hum Genet 50:145–155. Collins-Schramm HE, Phillips CM, Operario DJ, Lee JS, Weber JL, Hanson RL, Knowler WC, Cooper R, Li H, Seldin MF. 2002. Ethnic-difference markers for use in mapping by admixture linkage disequilibrium. Am J Hum Genet 70:737–750. DANE. 2006. Colombia: Una Nacion Multicultural. Su Diversidad Etnica. Bogota: DANE. 43 p. Dean M, Stephens JC, Winkler C, Lomb DA, Ramsburg M, Boaze R, Stewart C, Charbonneau L, Goldman D, Albaugh BJ, Goedert JJ, Beasley RP, Hwang L, Buchbinder S, Weedon M, Johnson PA, Eichelberger M, O’Brien SJ. 1994. Polymorphic admixture typing in human ethnic populations. Am J Hum Genet 55:788–808. Di Giacomo F, Luca F, Popa LO, Akar N, Anagnou N, Banyko J, Brdicka R, Barbujani G, Papola F, Ciavarella G, Cucci F, Di Stasi L, Gavrila L, Kerimova MG, Kovatchev D, Kozlov AI, Loutradis A, Mandarino V, Mammi C, Michalodimitrakis EN, Paoli G, Pappa KI, Pedicini G, Terrenato L, Tofanelli S, Malaspina P, Novelletto A. 2004. Y chromosomal haplogroup J as a signature of the post-neolithic colonization of Europe. Hum Genet 115:357–371. Dipierri JE, Alfaro E, Martinez-Marignac VL, Bailliet G, Bravi CM, Cejas S, Bianchi NO. 1998. Paternal directional mating in two Amerindian subpopulations located at different altitudes in northwestern Argentina. Hum Biol 70:1001–1010. Excoffier L, Laval G, Schneider S. 2005. Arlequin (version 3.0): an integrated software package for population genetics data analysis. Evol Bioinform Online 1:47–50. Flores C, Maca-Meyer N, Gonzalez AM, Oefner PJ, Shen P, Perez JA, Rojas A, Larruga JM, Underhill PA. 2004. Reduced genetic structure of the Iberian peninsula revealed by Y-chromosome analysis: implications for population demography. Eur J Hum Genet 12:855–863. Guerreiro-Junior V B-MR, Marrero A, Hunemeier T, Salzano F, Bortolini MC. 2009. Genetic signatures of parental contribution in black and white populations in Brazil. Genet Mol Biol 32:1–11. Hertzberg M, Mickleson KN, Serjeantson SW, Prior JF, Trent RJ. 1989. An Asian-specific 9-bp deletion of mitochondrial DNA is frequently found in Polynesians. Am J Hum Genet 44:504–510. Hoggart CJ, Parra EJ, Shriver MD, Bonilla C, Kittles RA, Clayton DG, McKeigue PM. 2003. Control of confounding of genetic associations in stratified populations. Am J Hum Genet 72:1492–1504. Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL, Hammer MF. 2008. New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree. Genome Res 18:830–838. Long JC. 1991. The genetic structure of admixed populations. Genetics 127:417–428. Macaulay V, Richards M, Hickey E, Vega E, Cruciani F, Guida V, Scozzari R, Bonne-Tamir B, Sykes B, Torroni A. 1999. The emerging tree of West Eurasian mtDNAs: a synthesis of control-region sequences and RFLPs. Am J Hum Genet 64:232– 249. Mao X, Bigham AW, Mei R, Gutierrez G, Weiss KM, Brutsaert TD, Leon-Velarde F, Moore LG, Vargas E, McKeigue PM, Shriver MD, Parra EJ. 2007. A genomewide admixture map-

American Journal of Physical Anthropology

ping panel for Hispanic/Latino populations. Am J Hum Genet 80:1171–1178. Melo J. 1996. Historia de Colombia: El establecimiento de la Colonizacion Espan˜ola. Bogota: Bibloteca Luis Angel Arango. 218 p. Parra EJ, Marcini A, Akey J, Martinson J, Batzer MA, Cooper R, Forrester T, Allison DB, Deka R, Ferrell RE, Shriver MD. 1998. Estimating African American admixture proportions by use of population-specific alleles. Am J Hum Genet 63:1839– 1851. Patterson N, Hattangadi N, Lane B, Lohmueller KE, Hafler DA, Oksenberg JR, Hauser SL, Smith MW, O’Brien SJ, Altshuler D, Daly MJ, Reich D. 2004. Methods for high-density admixture mapping of disease genes. Am J Hum Genet 74:979–1000. Price AL, Patterson N, Yu F, Cox DR, Waliszewska A, McDonald GJ, Tandon A, Schirmer C, Neubauer J, Bedoya G, Duque C, Villegas A, Bortolini MC, Salzano FM, Gallo C, Mazzotti G, Tello-Ruiz M, Riba L, Aguilar-Salinas CA, CanizalesQuinteros S, Menjivar M, Klitz W, Henderson B, Haiman CA, Winkler C, Tusie-Luna T, Ruiz-Linares A, Reich D. 2007. A genomewide admixture map for Latino populations. Am J Hum Genet 80:1024–1036. Pritchard JK, Rosenberg NA. 1999. Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet 65:220–228. Pritchard JK, Stephens M, Donnelly P. 2000. Inference of population structure using multilocus genotype data. Genetics 155:945–959. Rodas C, Gelvez N, Keyeux G. 2003. Mitochondrial DNA studies show asymmetrical Amerindian admixture in Afro-Colombian and Mestizo populations. Hum Biol 75:13–30. Sans M. 2000. Admixture studies in Latin America: from the 20th to the 21st century. Hum Biol 72:155–177. Seldin MF. 2007. Admixture mapping as a tool in gene discovery. Curr Opin Genet Dev 17:177–181. Seldin MF, Tian C, Shigeta R, Scherbarth HR, Silva G, Belmont JW, Kittles R, Gamron S, Allevi A, Palatnik SA, Alvarellos A, Paira S, Caprarulo C, Guilleron C, Catoggio LJ, Prigione C, Berbotto GA, Garcia MA, Perandones CE, Pons-Estel BA, Alarcon-Riquelme ME. 2007. Argentine population genetic structure: large variance in Amerindian contribution. Am J Phys Anthropol 132:455–462. Shriver MD, Parra EJ, Dios S, Bonilla C, Norton H, Jovel C, Pfaff C, Jones C, Massac A, Cameron N, Baron A, Jackson T, Argyropoulos G, Jin L, Hoggart CJ, McKeigue PM, Kittles RA. 2003. Skin pigmentation, biogeographical ancestry and admixture mapping. Hum Genet 112:387–399. Smith MW, Lautenberger JA, Shin HD, Chretien JP, Shrestha S, Gilbert DA, O’Brien SJ. 2001. Markers for mapping by admixture linkage disequilibrium in African American and Hispanic populations. Am J Hum Genet 69:1080–1094. Smith MW, O’Brien SJ. 2005. Mapping by admixture linkage disequilibrium: advances, limitations and guidelines. Nat Rev Genet 6:623–632. Tang H, Peng J, Wang P, Risch NJ. 2005. Estimation of individual admixture: analytical and study design considerations. Genet Epidemiol 28:289–301. Thomas MG, Bradman N, Flinn HM. 1999. High throughput analysis of 10 microsatellite and 11 diallelic polymorphisms on the human Y-chromosome. Hum Genet 105:577–581. Wang S, Lewis CM, Jakobsson M, Ramachandran S, Ray N, Bedoya G, Rojas W, Parra MV, Molina JA, Gallo C, Mazzotti G, Poletti G, Hill K, Hurtado AM, Labuda D, Klitz W, Barrantes R, Bortolini MC, Salzano FM, Petzl-Erler ML, Tsuneto LT, Llop E, Rothhammer F, Excoffier L, Feldman MW, Rosenberg NA, Ruiz-Linares A. 2007. Genetic variation and population structure in Native Americans. PLoS Genet 3:e185. Wang S, Ray N, Rojas W, Parra MV, Bedoya G, Gallo C, Poletti G, Mazzotti G, Hill K, Hurtado AM, Camrena B, Nicolini H, Klitz W, Barrantes R, Molina JA, Freimer NB, Bortolini MC, Salzano FM, Petzl-Erler ML, Tsuneto LT, Dipierri JE, Alfaro EL, Bailliet G, Bianchi NO, Llop E, Rothhammer F, Excoffier L, Ruiz-Linares A. 2008. Geographic patterns of genome admixture in Latin American Mestizos. PLoS Genet 4:e1000037.