A genome-wide association study identifies new susceptibility ... - Nature

27 downloads 87 Views 420KB Size Report
Oct 30, 2011 - the second leading cause of cancer-related deaths worldwide. .... VOLUME 43 | NUMBER 12 | DECEMBER 2011 NATURE GENETICS letters.
letters

© 2011 Nature America, Inc. All rights reserved.

A genome-wide association study identifies new susceptibility loci for non-cardia gastric cancer at 3q13.31 and 5p13.1 Yongyong Shi1,9, Zhibin Hu2–4,9, Chen Wu5,9, Juncheng Dai2, Huizhang Li2, Jing Dong2, Meilin Wang6, Xiaoping Miao7, Yifeng Zhou8, Feng Lu2, Hanze Zhang2, Lingmin Hu2, Yue Jiang2, Zhiqiang Li1, Minjie Chu2, Hongxia Ma2, Jiaping Chen2,3, Guangfu Jin2,3, Wen Tan5, Tangchun Wu7, Zhengdong Zhang6, Dongxin Lin5 & Hongbing Shen2–4 Gastric cancer, including the cardia and non-cardia types, is the second leading cause of cancer-related deaths worldwide. To identify genetic risk variants for non-cardia gastric cancer, we performed a genome-wide association study in 3,279 individuals (1,006 with non-cardia gastric cancer and 2,273 controls) of Chinese descent. We replicated significant associations in an additional 6,897 subjects (3,288 with non-cardia gastric cancer and 3,609 controls). We identified two new susceptibility loci for non-cardia gastric cancer at 5p13.1 (rs13361707 in the region including PTGER4 and PRKAA1; odds ratio (OR) = 1.41; P = 7.6 × 10−29) and 3q13.31 (rs9841504 in ZBTB20; OR = 0.76; P = 1.7 × 10−9). Imputation analyses also confirmed previously reported associations of rs2294008 and rs2976392 on 8q24, rs4072037 on 1q22 and rs13042395 on 20p13 with non-cardia gastric cancer susceptibility in the Han Chinese population. Approximately 40% of all cases of gastric cancer worldwide occur in China, and this form of cancer remains one of the key public health issues in cancer prevention and control. Gastric cancer includes noncardia gastric and cardia gastric carcinoma. Cardia gastric carcinoma is localized to the gastroesophageal junction and differs from noncardia gastric carcinoma in epidemiological characteristics, etiology and clinical features. It has been shown that smoking and dietary deficiencies contribute to the risk of cardia gastric cancer, whereas education may have a protective effect1,2. In contrast, gastroesophageal reflux and obesity increase the risk of cardia gastric cancer only in individuals of European ancestry and rarely affect risk in individuals

of Chinese descent1,3. In non-cardia gastric cancer, Helicobacter pylori infection is an established etiological factor4. Although the prevalence of H. pylori infection ranges from 40% to 80% in humans, only a small proportion of infected individuals develop non-cardia gastric cancer4, suggesting that genetic factors may have an important role in gastric cancer pathogenesis. Thus, the identification of genetic variants associated with gastric cancer could improve understanding of the etiology and mechanisms of gastric cancer development, which in turn is important for early detection, diagnosis and targeted treatment of this malignancy. Recently, two genome-wide association studies (GWAS) related to gastric cancer in the Chinese population have been published. One study reported that the rs2274223 single-nucleotide ­polymorphism (SNP) at 10q23 and the rs13042395 SNP at 20p13, which are associated with risk of esophageal squamous cell carcinoma, are also associated with risk of cardia gastric carcinoma5. However, this study did not test whether these two SNPs are associated with risk of non-­cardia gastric carcinoma. The other study identified two SNPs at 10q23 (rs2274223) and 1q22 (rs4072037) that were significantly associated with risk of mixed types of gastric cancer in the GWAS, but these findings were not replicated in an independent validation cohort with a relatively small sample size6. In the latter study, rs2274223 at 10q23 was shown to be a susceptibility locus specific for cardia gastric cancer but not for non-cardia gastric cancer6. Although there might be shared susceptibility loci for both cardia and non-cardia gastric cancer, the identification of phenotype-specific loci would represent an important advancement in understanding the pathogenesis of different subtypes of gastric cancer.

1Bio-X

Institutes, Ministry of Education Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, Shanghai, China. 2Department of Epidemiology and Biostatistics, Ministry of Education Key Laboratory for Modern Toxicology, School of Public Health, Nanjing Medical University, Nanjing, China. 3Section of Clinical Epidemiology, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Cancer Center, Nanjing Medical University, Nanjing, China. 4State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China. 5Department of Etiology and Carcinogenesis, State Key Laboratory of Molecular Oncology, Cancer Institute & Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China. 6Department of Occupational Medicine and Environmental Health, School of Public Health, Nanjing Medical University, Nanjing, China. 7Department of Epidemiology and Biostatistics, Ministry of Education Key Laboratory for Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China. 8Cyrus Tang Hematology Center, Jiangsu Institute of Hematology, Medical College, Soochow University, Suzhou, China. 9These authors contributed equally to this work. Correspondence should be addressed to H.S. ([email protected]), D.L. ([email protected]) or Y.S. ([email protected]). Received 7 June; accepted 21 September; published online 30 October 2011; doi:10.1038/ng.978

Nature Genetics  VOLUME 43 | NUMBER 12 | DECEMBER 2011

1215

letters 11 10 9 8 –log10 P

7 6 5 4 3 2 1 0 1

2

3

4

5

6

7 8 9 10 11 12 13 14 1516 18 20 22 Chromosome

© 2011 Nature America, Inc. All rights reserved.

Figure 1  Plot of the genome-wide P values of association with non-cardia gastric cancer in the Han Chinese population. Scatter plot of P values in −log10 scale from the additive model in 1,006 cases and 2,273 controls. Blue horizontal line, P = 1.0 × 10−5; red horizontal line, P = 1.0 × 10−7; green dots, rs9841504 and rs13361707.

Here we performed a GWAS of non-cardia gastric cancer in a case-control collection of individuals of Han Chinese descent by ­genotyping 906,703 SNPs in 1,033 subjects with cardia gastric cancer and 2,285 control subjects using Affymetrix Genome-Wide Human SNP Array 6.0 chips. Cases were individuals with non-cardia gastric cancer, and controls were cancer-free individuals. After standard quality control filtering, 555,896 SNPs from 1,006 cases (550 recruited from Nanjing and 456 from Beijing) and 2,273 controls (1,155 from Nanjing and 1,118 from Beijing) remained in the final statistical analysis (Supplementary Table 1). Assessment of ancestral origin by principal-component analysis (PCA) confirmed that all the subjects were of Han Chinese descent and demonstrated moderate genetic stratification between case and control subjects, especially for the Nanjing cohort (λ = 1.074 for the Nanjing study and 1.014 for the Beijing study; converting to a sample size of 1,000 cases and 1,000 controls7 resulted in λ = 1.097 and 1.021, respectively; Supplementary Fig. 1). The GWAS analysis was therefore performed using logistic regression with principal component–based correction for population stratification and treating the analysis of samples from the Nanjing and Beijing cohorts as independent studies. The scatter plot of P values derived from a meta-analysis of the Nanjing and Beijing GWAS using logistic regression based on an additive model with adjustment for the first principal component, age,

gender, and smoking and drinking status is shown in Figure 1. We selected SNPs for the first stage of replication according to a series of criteria (see Online Methods). Following this selection, we included a total of 17 SNPs with Pmeta ≤ 1.0 × 10−5 for all GWAS samples and consistent associations of P ≤ 1.0 × 10−2 in both the Nanjing and Beijing studies in the first replication stage (Supplementary Table 2). The results for these 17 SNPs in the GWAS and the replication study are shown in Supplementary Tables 2 and 3. Among them, only rs13361707 at 5p13.1 and rs9841504 at 3q13.31 were consistently validated in the first replication stage in 1,894 cases and 2,090 controls, with the same direction of association as in the GWAS ana­ lysis (Supplementary Table 3). We further evaluated these two SNPs in an independent replication study consisting of 1,394 cases and 1,519 controls, and the results were similar to those seen in the GWAS analysis and the first stage of replication (Table 1 and Supplementary Table 3). In the combined analysis of pooled replication and GWAS samples, the associations of rs13361707 and rs9841504 with risk of non-cardia gastric cancer achieved genome-wide significance (P < 5.0 × 10−8) after adjustment for the first principal component, age, gender, and smoking and drinking status (OR = 1.41, 95% confidence interval (CI) of 1.32–1.49, Pcombined = 7.6 × 10−29 and OR = 0.76, 95% CI of 0.69–0.83, Pcombined = 1.7 × 10−9, respectively) (Table 1). We obtained consistent and significant results for both SNPs in the two validation studies (P for heterogeneity = 0.436 for rs13361707 and 0.386 for rs9841504; I2 = 0.0% for both SNPs). There was some heterogeneity between the validation studies and the GWAS for rs9841504 (P for heterogeneity = 0.009; I2 = 79.0%) but not for rs13361707 (P for heterogeneity = 0.714; I2 = 0.0%). To increase the spectrum of variants tested for association in these two regions for association with non-cardia gastric cancer, we performed imputation using the HapMap 3 Chinese Han in Beijing (CHB) and Japanese in Tokyo (JPT) samples as references (imputed r2 > 0.3, MAF > 0.05, 400 kb on either side of the two marker SNPs). We then performed tests to identify associations of all directly genotyped and well-imputed SNPs spanning each locus that were conditional on the index SNP. After imputation, we were able to test the associations of 1,109 imputed SNPs with non-cardia gastric cancer risk (Fig. 2). We found that, although residual associations were detected at a number of SNPs across the region encoding PTGER4, TTC33 and PRKAA1 at 5p13.1, the peak signals occurred at the rs13361707 index SNP and its highly correlated SNPs (Fig. 2a and Supplementary Table 4). None of the imputed SNPs around rs9841504 at 3q13.31 showed significant (P < 1.0 × 10−5) association (Fig. 2b and Supplementary Table 4).

Table 1  Summary of GWAS and replication studies for the two newly discovered risk loci Cases (N = 4,294) SNP

Study

rs9841504 at 3q13.31

C/Ga

WT WT Variant WT WT Variant homozygote heterozygote homozygote homozygote heterozygote homozygote Cases Controls

ORadd (95% CI)c

Padd

828

169

4

1,637

563

43

0.09

0.14

0.58 (0.47–0.72) 6.1 × 10−7

1,423 1,096 3,347

425 283 877

24 15 43

1,483 1,112 4,232

549 373 1,485

37 34 114

0.13 0.11 0.11

0.15 0.15 0.15

0.81 (0.71–0.93) 1.9 × 10−3 0.78 (0.67–0.92) 3.4 × 10−3 0.76 (0.69–0.83) 1.7 × 10−9

GWAS

160

517

302

607

1,154

507

0.57

0.48

1.42 (1.24–1.62) 4.3 × 10−7

Replication I Replication II Combined

371 237 768

941 675 2,133

561 480 1,343

578 392 1,577

1,034 745 2,933

464 376 1,347

0.55 0.59 0.57

0.47 0.49 0.48

1.38 (1.26–1.51) 2.8 × 10−12 1.46 (1.31–1.63) 6.9 × 10−12 1.41 (1.32–1.49) 7.6 × 10−29

GWAS Replication I Replication II Combined

rs13361707 T/Ca at 5p13.1

MAFb

Controls (N = 5,882)

ORadd, OR in additive model; Padd, P values for additive model. aMajor

allele/minor allele. bMinor allele frequency. cAdjusted for the first principal component, age, gender, and smoking and drinking status.

1216

VOLUME 43 | NUMBER 12 | DECEMBER 2011  Nature Genetics

letters

a 30

r2

–log10 P

20

80

rs13361707: combined P = 7.6 × 10–29

60

15

–10

rs13361707: imputed P = 5.2 × 10

40

10

20

5 0

Recombination rate (cM/Mb)

0.8 0.6 0.4 0.2

25

100

5p13.1: rs13361707

0 PTGER4

PRKAA1 TTC33

C7

RPL37

C6

HEATR7B2

SNORD72 CARD6

40.4

40.6

40.8

41.0

Position on chr. 5 (Mb)

b 10

–log10 P

© 2011 Nature America, Inc. All rights reserved.

2

0.8 0.6 0.4 0.2

100

3q13.31: rs9841504

rs9841504: combined P = 1.7 × 10

80

–9

6

60 –4

rs9841504: imputed P = 1.6 × 10 4

40

2

20

Recombination rate (cM/Mb)

8

r

0

0 TIGIT

ZBTB20

MIR568

114.0

114.2

114.4 Position on chr. 3 (Mb)

114.6

Figure 2  Regional plot of the two new susceptibility loci for non-cardia gastric cancer. (a) 5p13.1 and (b) 3q13.31. Association results (−log10 P) of both genotyped (filled circle) and imputed (open diamond) SNPs in the GWAS samples are shown for SNPs in the regions 400 kb upstream and downstream of the marker SNPs. The marker SNPs are shown in purple, and the r2 values of the remaining SNPs are indicated by color. The genes within the region are annotated and shown as arrows.

To evaluate whether these two loci are also associated with cardia gastric cancer, we genotyped these SNPs in 905 individuals with cardia gastric cancer. As shown in Supplementary Table 5, the allele frequencies of rs13361707 and rs9841504 in cardia gastric cancer cases differed significantly from those in non-cardia gastric cancer cases (P = 1.62 × 10−4 and P = 1.41 × 10−3, respectively), and neither was significantly associated with cardia gastric cancer risk (Supplementary Table 5). We also performed stratification analyses for the associations of rs13361707 and rs9841504 with non-cardia gastric cancer risk. Heterogeneity tests showed that there were no significant differences in odds ratios between subgroups that were stratified by sex, age, or drinking or smoking status (Supplementary Table 6). We also tested for gene-gene multiplicative interactions between the two SNPs and for gene-environment multiplicative interactions between the SNPs and smoking or drinking, but we found no significant interactions. The rs13361707 SNP is located in the first intron of PRKAA1 (encoding protein kinase, AMP-activated, alpha 1 catalytic subunit) at 5p13.1. The PRKAA1 protein is one of the subunits of the mammalian 5′-AMP–activated protein kinase (AMPK), a central metabolic switch found in all eukaryotes that governs glucose and lipid metabolism in response to alterations in nutrient and intracellular energy levels. AMPK has been implicated in a number of diseases Nature Genetics  VOLUME 43 | NUMBER 12 | DECEMBER 2011

related to energy metabolism, including cancer8. By imputation, we found that significant signals in the 5p13.1 region also came from PTGER4 (encoding prostaglandin E receptor 4). The PTGER4 ­protein has important functions in the regulation of cell migration9 and in the immune response10. It has been reported that activation of the PTGER4 and PTGER2 pathways can induce growth inhibition of human gastric carcinoma cell lines11. Moreover, the PTGER4 locus might be associated with Crohn’s disease12. rs9841504 is located in the intron of ZBTB20 (encoding zinc finger and BTB domain containing protein 20) at 3q13.31. It has been suggested that the ZBTB20 protein is a key regulator of α-fetoprotein expression in adult liver, and it is highly homologous to BCL6 and may be involved in hematopoiesis, immune responses and oncogenesis13,14. We also examined the associations of previously reported gastric cancer susceptibility loci from GWAS analyses5,6,15 of our study population. Because of the different genotyping platform used, only rs2274223 at 10q23 was included in our GWAS. We did not find significant association between rs2274223 and non-cardia gastric cancer risk (OR = 1.11, P = 0.136), which is consistent with the previous report that rs2274223 is a marker specific for cardia gastric cancer6. Our imputation analysis did confirm previously reported associations with non-cardia gastric cancer susceptibility for rs4072037 (OR = 0.73, P = 1.0 × 10−4) at 1q22 (ref. 6), rs2294008 (OR = 1.36, P = 2.1 × 10−7) and rs2976392 (OR = 1.35, P = 3.7 × 10−7) at 8q24 (ref. 14) and rs13042395 (OR = 0.80, P = 6.9 × 10−4) at 20q13 (ref. 5) in our study population. Of interest, several SNPs with strong signals in our original GWAS data were not replicated (including rs11165389, rs12463773 and rs13295747). Possible explanations for the inability to replicate the associations for these SNPs may be disease heterogeneity, chance findings in the GWAS, genotyping error, the approaches used for SNP selection for validation and replication sample size. To evaluate possible genotyping error, we genotyped the two replicated SNPs (rs9841504 and rs13361707) and the above three candidate SNPs in 380 randomly selected samples from the GWAS using TaqMan assays. The concordance rates between the Affymetrix chips and the TaqMan assays were 99.5% for rs9841504, 100% for rs13361707, 92.9% for rs11165389, 96.8% for rs12463773 and 99.5% for rs13295747. Therefore, part of the reason that associations for SNPs were not ­replicated (especially for rs11165389) could be genotyping error in the screening platform at the first stage. Nevertheless, in this GWAS of non-cardia gastric cancer in the Han Chinese population, we identified two new susceptibility loci at 5p13.1 and 3q13.31 and confirmed previously reported susceptibility loci at 1q22, 8q24 and 20q13. URLs. PLINK1.07, http://pngu.mgh.harvard.edu/~purcell/plink/; R 2.11.1 statistical environment, http://www.cran.r-project.org/; MACH 1.0, http://www.sph.umich.edu/csg/abecasis/MACH/index. html; LocusZoom 1.1, http://csg.sph.umich.edu/locuszoom/. Methods Methods and any associated references are available in the online ­version of the paper at http://www.nature.com/naturegenetics/. Note: Supplementary information is available on the Nature Genetics website. Acknowledgments We thank all the study participants, research staff and students who took part in this work. Funding was provided by grants from the China National High-Tech Research and Development Program (2009AA022705, 2009AA022704 and 2009AA022701), the National Natural Science Foundation of China (30700684, 81130022, 30671814, 81001276, 81072380 and 31000553), the Jiangsu Natural Science Foundation (BK2008221), the Foundation for the Author of National Excellent Doctoral

1217

letters Dissertation (201081 and 201026), the Program for New Century Excellent Talents in University (NCET-09-0550 and NCET-10-0178) and the Priority Academic Program for the Development of Jiangsu Higher Education Institutions (Public Health and Preventive Medicine). AUTHOR CONTRIBUTIONS H.S. took full responsibility for the study, including study design, obtaining financial support, data interpretation and manuscript submission. D.L. and Y.S. co-directed the study, obtained financial support and were responsible for study design, the interpretation of results and writing the manuscript. Z.H. contributed financial support, performed overall project management, performed statistical analyses along with J. Dai and drafted the initial manuscript. H. Li, F.L., H.Z., L.H., Y.J., Z.L. and M.C. were responsible for sample processing and managed the genotyping data. M.W., Y.Z., H. Li, J. Dong, H. Ma, G.J., J.C. and Z.Z. were responsible for subject recruitment and sample preparation for the Jiangsu samples. C.W., X.M. and W.T. were responsible for subject recruitment and sample preparation for the Beijing samples and reviewed the manuscript. T.W. participated in the study design and reviewed the manuscript. All authors approved the final manuscript.

© 2011 Nature America, Inc. All rights reserved.

COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. Published online at http://www.nature.com/naturegenetics/. Reprints and permissions information is available online at http://www.nature.com/ reprints/index.html. 1. Tran, G.D. et al. Prospective study of risk factors for esophageal and gastric cancers in the Linxian general population trial cohort in China. Int. J. Cancer 113, 456–463 (2005).

1218

2. Gammon, M.D. et al. Tobacco, alcohol, and socioeconomic status and adenocarcinomas of the esophagus and gastric cardia. J. Natl. Cancer Inst. 89, 1277–1284 (1997). 3. Mayne, S.T. & Navarro, S.A. Diet, obesity and reflux in the etiology of adenocarcinomas of the esophagus and gastric cardia in humans. J. Nutr. 132, 3467S–3470S (2002). 4. Peek, R.M. Jr. & Blaser, M.J. Helicobacter pylori and gastrointestinal tract adenocarcinomas. Nat. Rev. Cancer 2, 28–37 (2002). 5. Wang, L.D. et al. Genome-wide association study of esophageal squamous cell carcinoma in Chinese subjects identifies susceptibility loci at PLCE1 and C20orf54. Nat. Genet. 42, 759–763 (2010). 6. Abnet, C.C. et al. A shared susceptibility locus in PLCE1 at 10q23 for gastric adeno­carcinoma and esophageal squamous cell carcinoma. Nat. Genet. 42, 764–767 (2010). 7. Freedman, M.L. et al. Assessing the impact of population stratification on genetic association studies. Nat. Genet. 36, 388–393 (2004). 8. Shackelford, D.B. & Shaw, R.J. The LKB1-AMPK pathway: metabolism and growth control in tumour suppression. Nat. Rev. Cancer 9, 563–575 (2009). 9. Sugimoto, Y. & Narumiya, S. Prostaglandin E receptors. J. Biol. Chem. 282, 11613–11617 (2007). 10. Kabashima, K. et al. Prostaglandin E2–EP4 signaling initiates skin immune responses by promoting migration and maturation of Langerhans cells. Nat. Med. 9, 744–749 (2003). 11. Okuyama, T. et al. Activation of prostaglandin E2-receptor EP2 and EP4 pathways induces growth inhibition in human gastric carcinoma cell lines. J. Lab. Clin. Med. 140, 92–102 (2002). 12. Libioulle, C. et al. Novel Crohn disease locus identified by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of PTGER4. PLoS Genet. 3, e58 (2007). 13. Xie, Z. et al. Zinc finger protein ZBTB20 is a key repressor of alpha-fetoprotein gene transcription in liver. Proc. Natl. Acad. Sci. USA 105, 10859–10864 (2008). 14. Zhang, W. et al. Identification and characterization of DPZF, a novel human BTB/POZ zinc finger protein sharing homology to BCL-6. Biochem. Biophys. Res. Commun. 282, 1067–1073 (2001). 15. Sakamoto, H. et al. Genetic variation in PSCA is associated with susceptibility to diffuse-type gastric cancer. Nat. Genet. 40, 730–740 (2008).

VOLUME 43 | NUMBER 12 | DECEMBER 2011  Nature Genetics

© 2011 Nature America, Inc. All rights reserved.

ONLINE METHODS

Study populations. Demographic and clinical information is summarized in Supplementary Table 1. We performed a three-stage case-control analysis. The GWAS included 1,033 non-cardia gastric cancer cases and 2,285 controls derived from studies in Nanjing (565 cases and 1,162 controls) and Beijing (468 cases and 1,123 controls). The first-stage validation included 1,894 cases and 2,090 controls from subjects in Jiangsu Province, and the second-stage validation samples consisted of 1,394 cases and 1,519 controls from subjects in Beijing. In addition, 905 individuals with cardia gastric cancer were recruited from Jiangsu. All non-cardia gastric cancer case-control samples were hospital based, and the subjects were unrelated individuals of Han Chinese descent, some of whom participated in previous studies16–18. The case subjects were histopathologically or cytologically confirmed to have non-cardia or cardia gastric cancer by at least two local pathologists. All cancer-free control subjects were selected from individuals receiving routine physical examinations in local hospitals or participating in community screening for non-communicable diseases, and they were frequency matched for age, gender and geographic region to each set of non-cardia gastric cancer cases. We collected information about smoking and drinking habits through interviews. Those who had smoked an average of less than one cigarette per day and less than 1 year in their lifetime were defined as nonsmokers; otherwise, they were considered smokers. Individuals were classified as alcohol drinkers if they drank at least twice a week and continuously for 1 year during their lifetime; otherwise, they were considered nondrinkers. At recruitment, informed consent was obtained from each subject, and this study was approved by the Institutional Review Boards of all participating institutions. Quality control in the GWAS. The GWAS was conducted using Affymetrix Genome-Wide Human SNP Array 6.0 chips. We performed systematic quality control on the raw genotyping data to filter out both unqualified samples and SNPs. The samples with overall genotype completion rates of 0.25). Heterozygosity rates were calculated and samples >6 s.d. from the mean19 were excluded (eight samples). SNPs were excluded if they (i) were not mapped to autosomal chromosomes (for chr. 24 and 26, 1,584 SNPs; and for chr. 23, 35,817 SNPs for the Nanjing study and 35,846 SNPs for the Beijing study), (ii) had a call rate 6 s.d. from the mean) were identified and excluded. The genomic-control inflation factor (λ, based on the bottom 90% of the quantilequantile plot) was calculated separately for the Nanjing (λ = 1.074) and Beijing (λ = 1.014) studies with adjustment for the first principal component. SNP selection and genotyping in replication phases. After GWAS analyses of data in the discovery phase, SNPs for the first stage of replication were

doi:10.1038/ng.978

selected using the following criteria: (i) SNP had Pmeta ≤ 1.0 × 10−5 for all GWAS samples (meta-analysis of Nanjing and Beijing studies) and had a consistent association with P ≤ 1.0 × 10−2 in both the Nanjing and Beijing studies, (ii) only the SNP with the lowest P value was selected when multiple SNPs showed strong linkage disequilibrium (r2 ≥ 0.8) and (iii) clear genotyping clusters were present. A total of 17 SNPs that matched these criteria were included in the first stage of replication (1,894 cases and 2,090 controls). The SNPs that were significantly associated with non-cardia gastric cancer risk in the first stage of replication were also genotyped in the samples from the second stage of replication (1,394 cases and 1,519 controls), as well as in 905 cardia gastric cancer samples. Genotyping in the two replication sample sets was performed by TaqMan assays (Applied Biosystems). The primers and probes used are available upon request. Laboratory technicians who performed genotyping experiments were blinded to case-control status. We randomly selected 10% of the samples and repeated genotyping with a reproducibility of 100%. Statistical analysis. GWAS analysis was performed using the additive model by logistic regression analysis. Population structure was evaluated by PCA in the software package EIGENSTRAT 3.0 (ref. 20). We used PLINK 1.07 for general statistical analysis21. SNPs for the first stage of replication were selected based on a separate meta-analysis for every SNP from the Nanjing and the Beijing GWAS. Our default meta-analysis used a fixed-effect model with inverse variance weighting and a calculation of Cochran’s Q statistic and the I2 statistic for hetero­ geneity22. When there was no indication of heterogeneity for a SNP (P for Q > 0.05), the fixed-effect model was kept in place. When heterogeneity was present (P for Q ≤ 0.05), we adopted a random-effects model (DerSimonian-Laird) for that SNP. The Manhattan plot of −log10 P was generated using R 2.11.1. We used MACH 1.0 software to impute ungenotyped SNPs using the LD information from the HapMap 3 database (CHB and JPT as reference set, released Feb. 2009). The chromosome region was plotted using an online tool, LocusZoom 1.1. PCA identified one significant (P < 0.05) eigenvector, which we included in the logistic regression analysis with the other covariates of age, gender, and smoking and drinking status for both the GWAS and the combined analysis. The chi-squared (χ2)-based Cochran’s Q statistic was also calculated to test for heterogeneity between groups in stratified analysis. Gene-gene and geneenvironment multiplicative interactions were tested by a general logistic regression model using the equation Y = b0 + b1 × A + b2 × B + b3 × ( A × B) + e where Y is the logit of case-control status, A and B are factors (SNP or environmental), b0 is constant, b1 and b2 are the main effects of factor A and B, respectively, and b3 is the interaction term. P values are two sided, and the ORs reported in the manuscript are from an additive model by logistic regression analyses unless otherwise specified. The analyses were also performed using SAS version 9.1.3 (SAS Institute) or Stata version 9.2 (StataCorp LP).

16. Wu, C. et al. Two genetic variants in prostate stem cell antigen and gastric cancer susceptibility in a Chinese population. Mol. Carcinog. 48, 1131–1138 (2009). 17. Lu, Y. et al. Genetic variation of PSCA gene is associated with the risk of both diffuse- and intestinal-type gastric cancer in a Chinese population. Int. J. Cancer 127, 2183–2189 (2010). 18. Zhang, H. et al. Genetic variants at 1q22 and 10q23 reproducibly associated with gastric cancer susceptibility in a Chinese population. Carcinogenesis 32, 848–852 (2011). 19. Purdue, M.P. et al. Genome-wide association study of renal cell carcinoma identifies two susceptibility loci on 2p21 and 11q13.3. Nat. Genet. 43, 60–65 (2011). 20. Price, A.L. et al. Principal components analysis corrects for stratification in genomewide association studies. Nat. Genet. 38, 904–909 (2006). 21. Purcell, S. et al. PLINK: a tool set for whole-genome association and populationbased linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007). 22. Higgins, J.P. & Thompson, S.G. Quantifying heterogeneity in a meta-analysis. Stat. Med. 21, 1539–1558 (2002).

Nature Genetics