Genes and Immunity (2012) 13, 71–82 & 2012 Macmillan Publishers Limited All rights reserved 1466-4879/12 www.nature.com/gene
ORIGINAL ARTICLE
Identification of biomarkers for tuberculosis disease using a novel dual-color RT–MLPA assay SA Joosten1, JJ Goeman2, JS Sutherland3, L Opmeer1, KG de Boer1, M Jacobsen4,5, SHE Kaufmann4, L Finos2, C Magis-Escurra1,6, MOC Ota3, THM Ottenhoff1 and MC Haks1 Department of Infectious Diseases, Leiden University Medical Center, Leiden, The Netherlands; 2Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands; 3Bacterial Diseases Programme, Medical Research Council Laboratories, Banjul, The Gambia and 4Department of Immunology, Max Planck Institute for Infection Biology, Berlin, Germany 1
Owing to our lack of understanding of the factors that constitute protective immunity during natural infection with Mycobacterium tuberculosis (Mtb), there is an urgent need to identify host biomarkers that predict long-term outcome of infection in the absence of therapy. Moreover, the identification of host biomarkers that predict (in)adequate response to tuberculosis (TB) treatment would similarly be a major step forward. To identify/monitor multi-component host biomarker signatures at the transcriptomic level in large human cohort studies, we have developed and validated a dual-color reversetranscriptase multiplex ligation-dependent probe amplification (dcRT–MLPA) method, permitting rapid and accurate expression profiling of as many as 60–80 transcripts in a single reaction. dcRT–MLPA is sensitive, highly reproducible, high-throughput, has an extensive dynamic range and is as quantitative as QPCR. We have used dcRT–MLPA to characterize the human immune response to Mtb in several cohort studies in two genetically and geographically diverse populations. A biomarker signature was identified that is strongly associated with active TB disease, and was profoundly distinct from that associated with treated TB disease, latent infection or uninfected controls, demonstrating the discriminating power of our biomarker signature. Identified biomarkers included apoptosis-related genes and T-cell/B-cell markers, suggesting important contributions of adaptive immunity to TB pathogenesis. Genes and Immunity (2012) 13, 71–82; doi:10.1038/gene.2011.64; published online 29 September 2011 Keywords: dcRT–MLPA; host biomarkers; tuberculosis
Introduction Biomarkers are defined as ‘characteristics that are objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes or pharmacological responses to a therapeutic intervention’.1 Biomarkers, or rather, ‘surrogate endpoints’, offer the means to classify disease status, disease activity, disease prognosis and disease progression, as well as the effects of interventions (drugs, surgery, vaccines and so on). Biomarkers can be analyzed in any tissue or body fluid, but peripheral blood is the most commonly used source in clinical practice. Biomarkers can be determined at different levels, for example, at the cellular-, protein- or transcript level, but informational complexity increases from transcriptome to proteome to cellulome. Quantitative changes in RNA expression levels are currently Correspondence: Dr MC Haks, Department of Infectious Diseases, Leiden University Medical Center, Group of Immunology and Immunogenetics of Mycobacterial Infectious Diseases, Albinusdreef 2, 2333 ZA Leiden, The Netherlands. E-mail:
[email protected] 5 Current address: Department of Immunology, Bernhard-NochtInstitute for Tropical Medicine, Hamburg, Germany. 6 Current address: Department of Pulmonary Diseases, Nijmegen University Medical Center Dekkerswald, Groesbeek, The Netherlands. Received 25 May 2011; revised 1 August 2011; accepted 12 August 2011; published online 29 September 2011
being analyzed using either genome-wide (microarray) or single-gene (real-time PCR) screening methods. As biomarker signatures encompass multiple gene transcripts, neither method is ideally suited for monitoring expression of a restricted set of genes involved in defined biological processes. Furthermore, microarray analysis and real-time PCR are technically challenging and too costly to be applied on a routine basis in resource-poor settings. Moreover, gene expression profiling using microarray analysis is characterized by complex data analysis, whereas the rather limited dynamic range (2–3 logs)2 compromises the ability to accurately quantify RNA expression levels. Here we describe a novel technique that is especially designed to combine a set of markers at the transcriptomic level. This technique, dual-color reverse-transcriptase multiplex ligation-dependent probe amplification (dcRT–MLPA) is inexpensive, fast, robust, and permits rapid and accurate RNA expression profiling of as many as 80 transcripts in a single reaction. Genes of interest can be selected on a tailor-made basis, and a PCR amplification step within dcRT–MLPA ensures assay sensitivity, which is an essential prerequisite for the relative quantification of scarcely expressed genes. As this assay is high-throughput (96-well format) and requires low amounts of RNA (100 ng), it is an exceptionally suitable technique to determine biomarker signatures in larger cohort studies.
Biomarker identification using a novel dcRT–MLPA assay SA Joosten et al
72
A particularly useful application of dcRT–MLPA is the identification and monitoring of host–biomarker profiles to investigate the human immune response to infection on a population scale. In this report we investigated the potential of dcRT–MLPA to identify and monitor host– biomarker profiles in tuberculosis (TB). It is estimated that one-third of the global population is infected with Mycobacterium tuberculosis (Mtb).3,4 However, immediate progression to active TB is rare and most frequently, the infection is initially contained by the host immune system resulting in latent TB infection (LTBI). LTBI can later (re)activate resulting in TB disease, which occurs in an estimated 3–10% of cases, 80% of whom within 2 years after infection.5 A major roadblock in the development of effective new TB vaccines (both preventive and post exposure) is our lack of understanding of the factors that constitute protective immunity during natural infection with Mtb or induced by vaccination.5 Currently, the success of a vaccine is measured by the prevention of active TB outbreak in study group participants in large (phase III efficacy) trials. Thus, the diagnosis of incipient TB disease serves as the clinical endpoint in necessarily large and long-lasting TB vaccine trials. Considering the relatively low disease incidence and the potentially long latency period, clinical efficacy trials would hugely benefit from the identification of biomarkers that could serve as early surrogate endpoints that predict the outcome at an early stage and can replace clinical endpoints. Thus, there is an urgent need to identify host biomarkers of protective host immunity, which will allow the early identification of those LTBIs at risk of progressing to active TB and in need of preventive treatment after exposure, versus those whose immune system will effectively contain the infection. Beside these important applications of biomarkers predicting progression to TB disease and/or vaccine efficacy, the identification of biomarkers that predict (in)adequate response to TB treatment would similarly be a major step forward, as it would allow more effective treatment and thus reduce the occurrence of multidrug-resistant/extensive drugresistant Mtb strains. Here we characterized and evaluated the potential of dcRT–MLPA, and used this novel technique to evaluate gene expression profiles in several cohort studies in two genetically and geographically diverse populations, from the Gambia and Paraguay. A biomarker signature is identified that is associated with active TB disease, and is profoundly distinct from that associated with treated TB disease, latent infection or uninfected healthy controls, demonstrating the discriminating power of our biomarker signature by dcRT–MLPA technology.
Results Principle of the dcRT–MLPA technique The MLPA assay is based on the ligation of two halfprobes hybridized adjacently to a target sequence, followed by a quantitative PCR amplification of the ligated products that are size-separated using capillary electrophoresis, and was originally described to determine the copy number of DNA sequences.6 More recently, in order to enable sensitive detection of RNA transcripts, a RT step with gene-specific primers before the probe annealing stage was introduced.7 However, a Genes and Immunity
major disadvantage of both the assays is the laborintensive and costly preparation of the M13-derived half-probes. To bypass this serious drawback, we replaced the production of M13 vector-based half-probes by chemically synthesized oligonucleotides. Furthermore, instead of using ‘spacer’ sequences, the discriminative length of each amplicon was assured by varying the length of the target-specific sequence in each probe set. The inherent restriction of this approach is the oligonucleotide synthesis length and as a consequence the number of probes that can be combined within a single assay. To circumvent these limitations and to maximize the number of target genes that can be analyzed within a single RT–MLPA assay, we designed a dcRT–MLPA assay, combining two sets of half-probes, each amplified by primers labeled with a different fluorophore.8 The principle of this technique is outlined in Supplementary Figure S1. Validation of the dcRT–MLPA technique To reliably monitor changes in gene transcription, it is essential that a novel technique such as dcRT–MLPA has an extensive dynamic range, is sensitive and has a low assay cutoff. To validate the dcRT–MLPA assay for these parameters and to be able to tightly control target gene copy numbers, cDNA was replaced as a hybridization template by a mixture of chemically synthesized oligonucleotides. To determine the dynamic range and sensitivity of the assay, serial dilutions of the synthetic template oligonucleotides were prepared and subjected to dcRT–MLPA analysis using a validation probe set. The gene targets included in the validation probe set encompassed genes known to be induced (IFNG, TNF and IL2) or downregulated (CD8A, RAB33 and CD14) upon mitogenic stimulation of peripheral blood mononuclear cells (PBMCs) and genes whose transcriptional activity was not affected by mitogenic stimulation of PBMCs, including reference genes (GUSB and GAPDH). As shown in Figure 1a, dcRT–MLPA was found to have an extensive dynamic range of 4–7 logs, with an average of 5 logs and a detection threshold as low as 20 oligonucleotide copies of any given gene. These observations indicate that when comparing a single-gene expression assay such as real-time PCR with a multiplex gene expression assay such as dcRT–MLPA, only minor compromises in sensitivity and dynamic range have to be accepted. To establish the dcRT–MLPA assay cutoff, the average percentage s.d. of decreasing log2 transformed peak area intervals was calculated (Figure 1b). Because assigned peaks, with a peak area p7.64, could only be called by the GeneMapper software (Applied Biosystems, Warrington, UK), if the peaks were perfectly shaped, the percentage s.d. profoundly increased below 7.64. Therefore, the cutoff value of this assay was established to correspond with the threshold value for noise cutoff in GeneMapper software. In conclusion, the potential of this assay to sensitively and accurately quantify gene expression levels over a broad range of synthetic template copy numbers highlights its prospective use as a new tool for biomarker identification and monitoring. Comparison between dcRT–MLPA and Taqman real-time PCR As both dcRT–MLPA and real-time PCR heavily depend on amplification of target products by PCR, it was
Biomarker identification using a novel dcRT–MLPA assay SA Joosten et al
73 Relative gene expression levels (log2)
15.5
Gene
14.5 CD14 CD8A GAPDH GUSB IFNG IL2 RAB33A TNF
13.5 12.5 11.5 10.5 9.5 8.5
20 molecules
4 5 6 7 5 5 4 5
1
5x 10 2 5x 10 3 5x 10 4 5x 10 5 5x 10 6 5x 10 7 5x 10 8 5x 10 9 5x 10 10 5x 10 1
7.5
Dynamic range (log10)
Template input (pmol) Cut- off Average % stdev
100 80 60 40 20 > 12.29
11.97-12.29
11.55-11.97
10.97-11.55
8.81-9.97
9.97-10.97
9.64-9.81
9.45-9.64
9.23-9.45
8.97-9.23
8.64-8.97
8.23-8.64
7.64-8.23
< 6.64
6.64-7.64
0
Intervals of peak areas (log2)
Figure 1 Dual-color RT–MLPA assay validation. To determine the dynamic range and sensitivity of dcRT–MLPA, cDNA was replaced as a hybridization template by a mixture of chemically synthesized oligonucleotides that were complementary to the RNA sequence and encompassed the combined target-specific sequences of the left- and right-hand half-probes. Serial dilutions of chemically synthesized template oligonucleotides were analyzed in four independent experiments by dcRT–MLPA using the validation probe set. (a) Shown is the median of the relative expression (log2 transformed peak areas) of GAPDH (solid symbols) and CD8A (open symbols) ±s.d. of triplicate reactions plotted against input concentrations of the synthetic template oligonucleotides (left panel) and the detected dynamic range of the different target genes present in the validation probe set (right panel). Calculation of the detection threshold was corrected for the facts that only a small proportion of the ligation products are PCR amplified and only a fraction of the PCR products are analyzed on a capillary sequencer. The dotted line represents assay cutoff. (b) The cutoff of the dcRT–MLPA assay was determined by calculating the average percentage standard deviation of decreasing log2 transformed peak area intervals.
anticipated that these two methods display similar assay characteristics. To directly compare gene expression profiles using dcRT–MLPA and real-time PCR, PBMCs from five Dutch healthy donors were stimulated for 6 h with phytohemagglutinin or anti-CD3/CD28 beads (Figure 2a) or phytohemagglutinin and subsequently mixed with unstimulated PBMCs at the indicated ratios (Figure 2b) before RNA isolation was performed followed by dcRT–MLPA (black bars) and real-time PCR (white bars) analysis on the same samples using the validation probe set. Data illustrated in Figure 2 clearly show that the results obtained with both RNA expression-profiling techniques are highly comparable. Moreover, both weak and strong variations in gene expression levels are faithfully detected by dcRT–MLPA, demonstrating the accurateness and robustness of this method to quantify specific changes in RNA expression levels. Reproducibility of the dcRT–MLPA assay To evaluate the reproducibility of the dcRT–MLPA assay, whole blood was collected from 12 Dutch healthy donors at two different time points (B2 months apart). At both the time points, whole blood of each donor was collected in five separate PAXgene tubes (BD Biosciences, Breda, The Netherlands). Gene expression profiles of all the
collected samples were analyzed in two independent dcRT–MLPA assays using the complete probe set (Supplementary Table S1). To evaluate the inter-assay variation of dcRT–MLPA, the relative gene expression profiles of several independent duplicate samples were compared. Interassay correlation between representative data sets of independent samples was excellent (0.998) for both scarcely and abundantly expressed genes (Figure 3a), highlighting the reliability of this technique to monitor the (subtle) changes in gene expression profiles. Subsequently, variance component estimation was calculated on the obtained data set. Initially, the total observed variation in gene expression profiles, regardless of its size, was set at 100% and the proportional contribution of each of the four components (donor, collection, tube and assay) to this total variation was determined. Figure 3b displays examples of gene expression profiles that were profoundly affected by variations in one of these components. Clearly, direct ex vivo RNA expression levels of CD4 were highly comparable between donors and also appeared steady over time. In sharp contrast, CD8A gene expression levels differed considerably between donors (but were constant over time), whereas B2M expression levels changed significantly over time, excluding CD8A as a Genes and Immunity
Biomarker identification using a novel dcRT–MLPA assay SA Joosten et al
74 IFNG
10000
TNF
IL2
Fold change
1000 100 10
CD3/28
PHA
-
CD3/28
PHA
-
CD14
CD8A
2.0
RAB33A
1.5 1.0 0.5
dcRT-MLPA Taqman real-time PCR
1:0.002
1:0.004
1:0.008
1:0.032
1:0.063
PBMC dilutions (Unstimulated:PHA)
RAB33A
IFNG
10
100
10
1:0.002
1:0.004
1:0.008
1:0.016
1:0.032
1:0.063
1:0.125
1:0.25
1:0.5
1:1
1 0:1
1:0.002
1:0.004
1:0.008
1:0.016
1:0.032
1:0.063
1:0.125
1:0.25
1:0.5
1:1
0:1
1
Fold change
Fold change
1:0.125
1:0.25
1:0.5
0.1
PBMC dilutions (Unstimulated:PHA)
0.1
CD3/28
PHA
-
CD3/28 0:1
1
1:1
Fold change 1:0.002
1:0.004
1:0.008
1:0.016
1:0.032
1:0.063
1:0.25
1:0.5
1:1
0:1
1:0.125
10 1
TNF
10
100
0.1
PHA
-
IL2
1000 Fold change
CD3/28
-
PHA
0.0
1:0.016
Fold change
CD3/28
-
0.1
PHA
1
dcRT-MLPA Taqman real-time PCR
0.1 PBMC dilutions (Unstimulated:PHA)
PBMC dilutions (Unstimulated:PHA)
Figure 2 Comparison between dcRT–MLPA and Taqman real-time PCR. PBMCs from healthy Dutch donors were stimulated for 6 h with (a) phytohemagglutinin and anti-CD3/CD28 beads or (b) phytohemagglutinin and subsequently mixed with unstimulated PBMCs at indicated ratios before RNA isolation was performed followed by dcRT–MLPA (black bars) and Taqman real-time PCR (white bars) analyses on the same samples. Expression profiles were determined of those genes present within the validation probe set. Mean gene expression levels were normalized to GAPDH and fold induction was calculated relative to the unstimulated control samples±s.d. of triplicate reactions. Data shown correspond to one representative experiment out of four performed and one representative healthy donor out of five healthy donors analyzed.
potential useful biomarker in Dutch adults of European descent and excluding B2M as a suitable reference gene in European cohort studies. The relative variation introduced by each component can have an insignificant as well as a major impact on the overall variation, depending on the range of the overall variation. Therefore, we recalculated the variance component estimation and displayed the data as fold-change that each gene contributed to the variation of each component (Figure 3c). The large majority of the genes within the complete probe set contributed only to a minor extent to the variation introduced by each of the four components. Furthermore, as expected, the predominant factor introGenes and Immunity
ducing variation to the data was the donor component, whereas the components collection, tube and assay introduced comparable but significantly less variation to the data, underscoring the reproducibility of dcRT– MLPA. Identification and monitoring of biomarker signatures To investigate the potential of dcRT–MLPA in identifying host–biomarker profiles in direct ex vivo whole-blood samples in a clinically relevant setting, we started out to characterize the human immune response to infection with Mtb, with particular emphasis on the expression of genes associated with TB disease. Gene expression
Relative expression experiment 2 (log2)
Biomarker identification using a novel dcRT–MLPA assay SA Joosten et al
75 19 17 15 13 11
R2=0.998 R2=0.998
9 7 9
7
11
13
15
19
17
Relative expression experiment 1 (log2) CD8A
Relative gene expression levels (log2)
CD4 14
14
13
13
12
12
11
11
15 14
10
10 1 2 3 4 5 6 7 8 9 101112 Donor Donor : Collection : Tube : dcRT-MLPA :
27% 22% 30% 21%
B2M 16
13 1 2 3 4 5 6 7 8 9 101112 Donor Donor : 59% Collection : 2% Tube : 13% dcRT-MLPA : 26%
1 2 3 4 5 6 7 8 9 101112 Donor Donor : Collection : Tube : dcRT-MLPA :
27% 47% 15% 11%
Month 0 Month 2
Counts (number of genes)
45 Donor
40
Collection
35
Tube
30
dcRT-MLPA
25 20 15 10 5 0 1-1.1
1.1-1.2 1.2-1.3 1.3-1.4 1.4-1.5 1.5-1.6 1.6-1.7 1.7-1.8 1.8-1.9 1.9-2.0 Fold change
Figure 3 Reproducibility of the dcRT–MLPA assay. Two independent dcRT–MLPA assays were performed on whole-blood RNA extracted from five PAXgene tubes collected from 12 Dutch healthy donors at two different time points. Variance component estimation was performed to calculate the contribution of the following four components to the variation in the gene expression profiles: (1) donor variation—the variation within a cohort of healthy individuals, (2) collection variation—the variation within the same healthy individual over time, (3) tube variation—the variation between the five separate PAXgene tubes collected from the same healthy donor, and (4) assay variation—the variation between temporally different RT–MLPA assays. (a) Plotted are the relative expression levels of genes (log2 transformed) in two representative independent pairs of samples showing highly concordant data for both weak and strong signals. (b) Direct ex vivo RNA expression levels (bars represent median peak areas normalized to GAPDH and log2 transformed±s.d. of five separate PAXgene tubes) of the indicated genes are displayed for all donors at both time points (top panel). Variance component estimation, calculating the proportional contribution of each of the four components to the overall variation (set at 100%), is shown below each corresponding bar diagram. (c) Calculation of the variance component estimation taking into account the range of the overall variation. Shown is the fold-change that each gene contributes to the variation of each component.
profiles of 79 TB patients at recruitment (TB cases), 83 Mtb-infected healthy controls (LTBIs) and 74 uninfected healthy controls from The Gambia were analyzed by dcRT–MLPA using the complete probe set (Supplementary Table S1). The complete probe set contained genes that have been associated with active TB disease or protection against disease in literature (IL4/IL4d2, CCL22,
CD163, TGFBR2),9–12 genes identified by microarray that were differentially expressed between PBMCs from active TB patients and healthy household contacts (FPR1, BPI, MARCO, SEC14L1, RAB24, RAB13, RAB33A, FCGR1A, LTF),13 genes identified by microarray that were differentially expressed between tissue located near and distant from tuberculomas (CCR7, CCL13, IL22RA1, Genes and Immunity
Biomarker identification using a novel dcRT–MLPA assay SA Joosten et al
76
IL10, CCL4, TNFRSF18, IL2RA), effector T-cell markers (IFNG, CXCL10) and reference genes (GAPDH, ABR, GUSB, B2M). Analysis of variance testing for global differences in gene expression profiles indicated significant differences between the study groups (P ¼ 5.4 1028), whereas pairwise testing of the obtained
SPP1, BLR1, CCL19, MMP9, TIMP2), apoptosis-related genes differentially regulated by Mtb (TNFRSF1A, TNFRSF1B, BCL2, CASP8, TNF, FASLG),14 genes that identify different lymphocyte subsets (CD3E, CD4, CD8A, CD14, CD19, NCAM1), regulatory T-cell-associated markers (FOXP3, IL7R, TGFB1, CTLA4, LAG3, IL7R
CD8A
BLR1
***
***
***
***
15
Relative gene expression levels (log2)
CD3E *** ***
15 14
14
***
14
***
* 12 11
12
13 13
10 10
12
9
11 12
8
8 CD19
FCGR1A
*** ***
10.5 10.0 9.5 9.0 8.5 8.0
***
***
13.5
CD163
MMP9
*** ***
11.5 11.0 10.5 10.0 9.5 9.0 8.5 8.0 7.5
11.5 9.5 7.5
12 11 10 9 8 TB cases LTBIs Uninfected controls
TB cases versus LTBIs
TB cases versus uninfected controls
LTBIs versus uninfected controls
Gene (Intercept)
Gene (Intercept)
Gene (Intercept)
Regressioncoefficients -46.61
LTF FOXP3 FCGR1A CD14 IL4δ2 IL2R BPI CD4 CD19 IL10 BLR1 CD8A IL7R
-63.36
IL2R CD14 FCGR1A TGFB1 CTLA4 CCR7 FOXP3 CD8A CXCL10 CD4 CD19 NCAM1 BLR1 IL7R CD3E
-0.31 -0.18 -0.13 -0.09 -0.09 -0.07 0.10 0.21 0.28 0.45 0.81 0.95 1.91
TB cases vs LTBIs
Sensitivity
Regressioncoefficients
-0.18 -0.10 -0.09 0.01 0.03 0.04 0.14 0.23 0.30 0.37 0.37 0.40 0.81 1.30 1.56
TNF IL2R FOXP3
TB cases vs uninfected controls 1.0
1.0
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
AUC = 90.8% 0.0
0.2
0.4
0.6
0.8
1.0
AUC = 86.0%
0.0 0.0
0.2
0.4
0.6
0.8
-0.15 -0.13 0.16
LTBIs vs healthy controls
1.0
0.0
Regressioncoefficients 0.90
1.0
AUC = 53.1%
0.0 0.0
0.2
0.4
0.6
0.8
1.0
1-Specificity
Predicted probability
TB cases vs LTBIs
Genes and Immunity
TB cases vs uninfected controls
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
LTBI vs uninfected controls
TB cases LTBIs Uninfected controls
Biomarker identification using a novel dcRT–MLPA assay SA Joosten et al
data set revealed profound differences in the overall gene expression profiles between TB cases versus LTBIs and TB cases versus uninfected healthy controls (Pp1016 and Pp1014, respectively), whereas the difference between LTBIs and uninfected healthy controls was statistically significant but much less pronounced (P ¼ 6.7 105). Subsequently, individual biomarkers that were differentially expressed between the study groups were identified (Supplementary Figure S2, Supplementary Table S2a) and examples of gene expression profiles displaying strong association with TB disease (CD3E, IL7R, CD8A, BLR1, CD19, FCGR1A and MMP9) or as a comparison, no association at all (CD163) are shown in Figure 4a. To determine biomarker signatures with the highest discriminatory power, the data set was analyzed using the Lasso regression model, which is a shrinkage and selection method for linear regression.15 It minimizes the usual sum of squared errors, with a bound on the sum of the absolute values of the coefficients. The biomarker signatures that classified TB cases versus LTBIs, TB cases versus uninfected healthy controls and LTBIs versus uninfected healthy controls encompassed 13, 15 and 3 biomarkers, respectively (Figure 4b). The classifying capability of these biomarker signatures is displayed either as a receiver operating characteristic curve where the true positive rate (sensitivity) is plotted against the false positive rate (1specificity or 1true negative rate, Figure 4c) or as a box-and-whisker plot, indicating the predicted probability using the identified biomarker signatures as classifiers (Figure 4d). Clearly, although the biomarkers that are differentially expressed between LTBIs and uninfected healthy controls have limited classifying value (area under the curve of 53.1%) when combined into a biomarker signature, the biomarker signatures that are used to classify TB cases versus LTBIs or TB cases versus uninfected healthy controls have excellent classifying values (area under the curve of 90.8% and 86.0%, respectively). Adding more biomarkers to the biomarker signatures did not further improve the classifying capability of these signatures as a nonselective Ridge regression (including all genes comprising the complete dcRT–MLPA set) displayed area under the curve values for TB cases versus LTBIs or TB cases versus uninfected healthy controls of 91.8% and 84.5%, respectively, (data not shown). As these data indicate that the identified biomarker signature to classify TB cases versus LTBIs is composed of biomarkers that are strongly associated with TB disease, we next evaluated the ability of this biomarker signature to classify TB patients in the Gambia during treatment. As shown in Figure 5a and Supplementary Table S2b, the discriminative power of this biomarker signature is already apparent in TB patients treated for 2
months only, whereas at 4 months after therapy, biomarker profiles of treated TB patients have become indistinguishable from those of LTBIs. Examples of gene expression profiles (CD3E, IL7R, CD8A, BLR1, CD19, FCGR1A and MMP9) during the treatment of TB patients are shown in Figure 5b.
77
Comparison of biomarkers associated with TB disease between an African and South-American population To investigate whether the biomarkers associated with TB disease are specific for the Gambian (or African) population, gene expression profiles were determined on direct ex vivo whole-blood RNA samples in a small Paraguayan (South American) cohort consisting of TB cases, TB patients treated for 4 weeks and healthcare workers (latently infected, long-term exposed to TB). Despite the limited cohort size, a comparison of gene expression patterns between TB cases and healthcare workers already confirmed 16 of the 25 genes (for example, CD3E, IL7R, BLR1, CD19, FCGR1A, CXCL10, CD4, TNF, BCL2, CASP8 and CCL4) that were differentially expressed in TB cases in the Gambia (Figure 6), suggesting that the discriminatory power of the majority of the identified biomarkers is not restricted to the Gambia or even to Africa. In conclusion, dcRT–MLPA is a sensitive, robust and reproducible assay for biomarker detection and biomarker signature profiling. The first data demonstrate the power of dcRT–MLPA to identify biomarker signatures in direct ex vivo whole-blood samples with sufficient discriminating power when using small cohorts.
Discussion Using the novel dcRT–MLPA assay, we identified multiple genes that were differentially expressed with great statistical significance between TB cases, LTBIs and uninfected healthy controls (Supplementary Table S2). From the 25 genes (out of 45 tested) that were differentially regulated between TB cases and LTBIs, 20 genes also displayed distinct expression patterns between TB cases and uninfected healthy controls. In contrast, differences in direct ex vivo RNA expression levels between LTBIs and uninfected controls were less pronounced and only eight genes were found to be differentially expressed between these two study groups, including three regulatory T-cell markers (FOXP3, IL2RA and TGFB1). In concordance with these findings, Lasso regression analysis easily identified biomarker signatures with excellent classifying value between TB cases versus LTBIs and TB cases versus uninfected controls (area under the curves of 90.8% and 86.0%, respectively), but was unable to identify a biomarker signature
Figure 4 Identification of biomarker signatures. Dual-color RT–MLPA was performed on direct ex vivo RNA isolated from PAXgene tubes derived from TB cases, LTBIs and uninfected healthy controls from the Gambia. (a) Median gene expression levels (peak areas normalized to GAPDH and log2 transformed) of the indicated genes are shown as box-and-whisker plots (5–95 percentiles; dotted line represents assay cutoff). Significant differences between study groups were determined using Kruskal–Wallis H and Dunn’s multiple comparison tests. *0.01oPo0.05, **0.001oPo0.01 and ***Po0.001. (b) Composition of biomarker signatures identified by the Lasso test to classify TB cases versus LTBIs, TB cases versus uninfected healthy controls and LTBIs versus uninfected healthy controls. Numbers represent coefficients of logistic regression (log-odds ratios). (c) Receiver operator characteristics curves or (d) box-and-whisker plots (5–95 percentiles) showing the accuracy of identified biomarker signatures to discriminate between TB cases, LTBIs and uninfected healthy controls. AUC, area under the curve. Genes and Immunity
Biomarker identification using a novel dcRT–MLPA assay SA Joosten et al
78 1.0
Predicted probability
0.8
TB cases TB cases treated - 2 months TB cases treated - 4 months TB cases treated - 6 months Uninfected controls
0.6
0.4
0.2
0.0
CD3E
IL7R
***
***
*** 15
15 14 13 12 11 10 9 8
14
Relative gene expression levels (log2)
13 12 11 12 CD19 *** ***
10
*
7
13
***
11 10 9 8 7
9.5 7.5 TB cases
*** ***
14
12
13
11
12
10
11
9
10
8
9
7
8
TB cases treated - 2 months TB cases treated - 6 months
CD163
MMP9 *** *** 13
11.5
***
12
*** 13.5
9 8
***
FCGR1A *** ***
*** 11
***
***
***
14 13
*** ***
***
***
***
BLR1
***
***
*** 15
CD8A
TB cases treated - 4 months Uninfected controls
Figure 5 Monitoring of biomarker signatures. Dual-color RT–MLPA was performed on direct ex vivo RNA isolated from PAXgene tubes derived from TB cases, LTBIs and TB patients treated for 2, 4 or 6 months from the Gambia. (a) Box-and-whisker plots (5–95 percentiles) showing the accuracy of the biomarker signature that discriminates between TB cases and LTBIs, to classify treated TB patients. (b) Median gene expression levels (peak areas normalized to GAPDH and log2 transformed) of the indicated genes are shown as box-and-whisker plots (5–95 percentiles; dotted line represents assay cutoff). Significant differences between study groups were determined using Kruskal–Wallis H and Dunn’s multiple comparison tests. *0.01oPo0.05, **0.001oPo0.01 and ***Po0.001.
with sufficient discriminative power between LTBIs and uninfected controls. The classifying value of the identified biomarker signatures may be further enhanced in future studies by selecting other markers for dcRT– MLPA. Interesting candidate markers to include may be derived from recent microarray studies, highlighting a potential role for type I interferon-ab signaling pathways.16 Certainly, improving the classifying value of the biomarker signature that classifies the infection status in healthy controls may not sufficiently benefit from this approach as the differences in gene expression levels between LTBIs and uninfected controls may be too subtle to detect using ex vivo whole blood. To circumvent this issue, antigen-specific stimulation could be applied to enhance differences and thereby allow such discrimination. Another advantage of incorporating antigen-specific stimulation is that LTBIs are a very heterogeneous group that may have been exposed recently to other pathogens endemic to the Gambia, and therefore direct ex vivo profiles may not necessarily directly relate to TB Genes and Immunity
disease outcome. Moreover, antigen-specific stimulation studies combined with direct ex vivo RNA profiling of patients (co)infected with other pathogens will help to determine the specificity of the identified biomarker signature for TB disease. Genes encompassing the complete dcRT–MLPA set were derived primarily from literature and previous microarray studies.9–14,17,18 Several biomarkers identified earlier to discriminate between TB cases and LTBIs in German and South African cohorts (for example, FCGR1A (CD64), LTF and RAB33A),13,19 were confirmed in the current study cohort from the Gambia. Intriguingly, FCGR1A is the only biomarker from our signature that was also identified by a recent large-scale microarray analysis in UK and South-African TB patients,16 discriminating active TB disease from LTBI, despite being not specific for TB when compared with other inflammatory diseases. Identification of FCGR1A in all these studies emphasizes the potential significance of FCGR1A as a TB biomarker and warrants further
Biomarker identification using a novel dcRT–MLPA assay SA Joosten et al CD3E
IL7R
Relative gene expression levels (log2)
***
CD8A
***
16
16
15
15
*** **
14
14
14
13
13
12 10
13
12
11
9
12
11
10
8
CD19
FCGR1A
*** 11 10 9 8 7
*
11
12
13
79
BLR1
CD163
MMP9
*** **
13 12 11 10 9 8 7 TB cases
**
13
11.5
12
11.0
11
10.5
10
10.0
9
9.5
TB cases treated
Health care workers
Figure 6 Identification of biomarkers associated with TB disease in a small Paraguayan cohort. dcRT–MLPA was performed on direct ex vivo RNA isolated from PAXgene tubes derived from TB patients, TB patients treated for 4 weeks and latently infected healthcare workers. Median gene expression levels (peak areas normalized to GAPDH and log2 transformed) of the indicated genes are shown as box-andwhisker plots (5–95 percentiles; dotted line represents assay cutoff). Significant differences between study groups were determined using Kruskal–Wallis H and Dunn’s multiple comparison tests. *0.01oPo0.05, **0.001oPo0.01 and ***Po0.001.
analysis of the role of FCGR1A and antibodies in TB pathogenesis. The suggestion that humoral immunity could have a more important role in TB pathogenesis than is currently anticipated,20,21 is also supported by the identification of multiple B-cell-related genes that are differentially expressed between TB patients and LTBIs (in both our Gambian and Paraguayan cohort), including besides FCGR1A, CD19 and BLR1 (CXCR5). In addition to a strong correlation between B-cellrelated genes and TB disease, a clear association between T-cell lineage markers and TB disease could also be observed. Expression levels of CD3E, CD4, CD8A, IL7R and FOXP3 were significantly distinct between TB cases and LTBIs, and with the exception of FOXP3, genes were more abundantly expressed in LTBIs compared with TB cases. The lower FOXP3 gene expression levels in recently infected individuals that were previously described by Burl et al.22 and now confirmed in the present study, may represent a shifted balance between effector and regulatory T-cell populations in early stages after infection or mirror increased regulatory T-cell migration to the tissue, or local lymph nodes, in early phases of disease,23 as has also been suggested during acute viral infections.24 The decreased expression of T-cell subset markers CD3E, CD4 and CD8A during acute TB disease has neither been reported before, nor have significant decreases in T-cell numbers. However, although most prominent in our cohort from the Gambia, also TB patients from Paraguay displayed significantly reduced expression levels for CD3E and CD4. The value of these markers as novel TB biomarkers will have to be confirmed in future large-scale cohorts. Recently, a set of apoptosis-related genes was identified in an Ethiopian cohort whose expression was associated with TB disease.14 Indeed, four out of six genes included in the complete dcRT–MLPA set (TNRFSF1A, CASP8, BCL2 and TNF) were confirmed to
be associated with TB disease in our Gambian cohort, supporting their value as TB biomarkers. Although the nature of the biomarkers identified in this study is quite diverse, they all appear promising and contribute to the biomarker signature discriminating between TB cases and LTBIs. During treatment of TB cases, gene expression profiles changed over time and at the end of treatment (6 months) were similar to gene expression profiles observed in LTBIs. This indicates that TB treatment does affect gene expression profiles in peripheral blood, but time is needed to achieve levels observed in healthy infected contacts. Interestingly, the changes in expression levels of different genes during treatment showed distinct kinetics. Several genes returned to the expression levels observed in LTBIs within 2 months (for example, CD3E and BLR1), whereas others reached control levels only after the full 6 months of treatment (for example, FCGR1A). The time needed for particular markers to normalize to levels detected in LTBIs may indicate response to treatment, but more detailed analyses are required to support this hypothesis. As demonstrated by the identification of powerful biomarker signatures discussed above, dcRT–MLPA proved to be a valuable tool for monitoring gene expression profiles in large cohorts. The assay was adapted from an assay described by Eldering et al.7 into a dual-color assay8 to increase the number of transcripts to be analyzed, and using synthetic oligonucleotides to greatly accelerate the probe manufacturing process. We validated the assay extensively and showed that the assay is sensitive, reproducible and robust, with results comparable to Taqman real-time PCR. Variation primarily resulted from donor to donor variation and was not attributable to inherent assay variation. dcRT–MLPA is positioned in between microarray and real-time PCR, as it will give quantitative expression data of multiple Genes and Immunity
Biomarker identification using a novel dcRT–MLPA assay SA Joosten et al
80
genes. dcRT–MLPA is cheap compared with microarray and real-time PCR, requires less handling time and only low amounts of RNA, and can be performed in a highthroughput (96-well) format. Importantly, technology transfer to resource-poor settings is possible, making dcRT–MLPA an interesting method for identification and monitoring of biomarkers in large cohorts in TB endemic regions. Application of dcRT–MLPA is not limited to vaccination studies or infectious diseases, but any (disease) stage associated with specific changes in marker expression can be assessed using dcRT–MLPA. During TB infection, dcRT–MLPA may be useful in follow-up analysis of LTBIs to identify those LTBIs at risk of progressing to active TB and in need of preventive treatment after exposure and in vaccination studies using novel candidate vaccines potentially to shorten follow-up times required to demonstrate efficacy. Moreover, techniques may be adapted to measure biomarker profiles in other biological fluids, for example, sputum. The longevity of biomarkers after infection or vaccination is unknown and will have to be determined to estimate the predictive value of these biomarkers, especially in case–contact settings and in long-term follow-up vaccination studies. However, similar profiles may be re-expressed in PBMCs after short-term recall antigen triggering, especially for genes expressed by memory T cells, and is currently explored. In conclusion, we have demonstrated that dcRT– MLPA is highly suitable for biomarker evaluation in large cohorts of contacts or vaccine, as the technique is robust, reproducible, sensitive and quantitative. Costs per gene and the 96-well format of the assay make it extremely suitable for screening large groups of individuals even in less developed settings. Initial analysis using a 45-gene dcRT–MLPA set allowed excellent discrimination between TB cases versus LTBIs and TB cases versus uninfected controls, illustrating the power of biomarker monitoring using dcRT–MLPA. Biomarkers that allowed discrimination of infection and disease included CD3E, CD4, CD8, BLR1, FCGR1A and CD19, suggesting important contributions of adaptive immunity to TB pathogenesis.
Materials and methods Ethics statement The research was approved by the Institutional Review Boards of the Leiden University Medical Center (the Netherlands), MRC (The Gambia) and the Ministry of Public Health and Social Welfare (Paraguay). Informed consent was obtained from all participants, and the clinical investigation was conducted according to the principles expressed in the ‘Declaration of Helsinki’. Donors and reagents PBMCs purified from anonymous buffy coats were collected from healthy blood bank donors (Dutch, adults) that all had signed informed consent. PBMCs (2 106 per 24-well) were stimulated for 6 h with phytohemagglutinin (2 mg ml1, Remel, Oxoid, Haarlem, the Netherlands) or anti-CD3/CD28 beads (Dynal Biotech, Hamburg, Germany) in Iscove’s modified Dulbecco’s medium (Invitrogen, Breda, the Netherlands) containing 10% Genes and Immunity
pooled human serum. Subsequently, cells were pelleted (5 min, 1200 r.p.m.), resuspended in 1 ml Trizol reagent (Invitrogen) and total RNA was purified according to the manufacturer’s protocol. Whole-blood samples were collected in PAXgene tubes (BD Biosciences) from a Dutch cohort (12 healthy subjects), a Gambian cohort (79 TB patients at recruitment (TB cases), 83 Mtb-infected healthy controls (LTBI), 74 uninfected healthy controls, 23 TB patients treated for 2 months, 12 TB patients treated for 4 months and 29 TB patients treated for 6 months), and a Paraguayan cohort (8 TB patients at recruitment, 12 TB patients treated for 4 weeks and 21 latently infected healthcare workers). PAXgene whole-blood RNA isolation Total RNA from venipuncture PAXgene blood collection tubes (stored at 80 1C, Supplementary Figure S3) was extracted and purified using the PAXgene Blood RNA kit (BD Biosciences) including on-column DNase digestion according to the manufacturer’s protocol. The RNA yield from 2.5 ml of whole blood was determined by a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA) and ranged from 4.2 to 8.5 mg of total RNA (average 6.02±1.5 mg) with an average OD260/280 ratio of 2.0±0.04. To assess the quality and integrity of the RNA, samples were run on an Agilent 2100 BioAnalyzer (Agilent Technologies, Amstelveen, The Netherlands) using the RNA 6000 Nano Chip kit. The average RNA integrity number of the total RNA samples obtained from Paxgene tubes was 9.5±0.08. dcRT–MLPA assay, data analysis and quality control Half-probes consisted of chemically synthesized oligonucleotides (Sigma-Aldrich Chemie B.V., Zwijndrecht, the Netherlands) and right-hand half-probes were 50 phosphorylated to facilitate ligation. For each targetspecific sequence, a specific RT primer was designed that is complementary to the RNA sequence and located immediately downstream of the probe target sequence. As a positive control, chemically synthesized oligonucleotides were used that were complementary to the RNA sequence and encompassed the combined targetspecific sequences of the left and right hand half-probes. To avoid detection of contaminating DNA fragments, all target sequences have an exon boundary near the probe ligation site. Moreover, splice variants and singlenucleotide polymorphisms present in the mRNA were taken into account. All half-probes and RT primers used in this study are described in detail in Supplementary Table S1. Development of the dcRT–MLPA required multiple adaptations to the protocol described by Eldering et al.7 RNA samples (2.5 ml of a 50 ng ml1 solution) were mixed with 1 MMLV reverse transcriptase buffer, dNTPs (0.4 mM of each nucleotide) and 80 nM of the targetspecific RT primers in a final volume of 4.5 ml. After heating for 1 min to 80 1C and incubation for 5 min at 45 1C, 30 U (1.5 ml) of MMLV reverse transcriptase (Promega, Leiden, the Netherlands) was added and incubated for 15 min at 37 1C before heat inactivation of the enzyme for 2 min at 98 1C. Subsequently, 1.5 ml of half-probe mix (4 nM) and 1.5 ml of SALSA MLPA buffer (MRC-Holland, Amsterdam, the Netherlands) were added to the reaction, heat denatured for 1 min at 95 1C
Biomarker identification using a novel dcRT–MLPA assay SA Joosten et al
followed by hybridization for 16 h at 60 1C. Ligation of the annealed half-probes was performed for 15 min at 54 1C by adding 3 ml ligase-65 buffer A, 3 ml ligase-65 buffer B, 25 ml H2O and 1 ml of ligase-65 (MRC-Holland). Heat inactivation of the ligase enzyme was subsequently performed for 5 min at 98 1C. Ligation products were amplified in a 20-ml PCR reaction containing 5 ml ligation product, 2 ml SALSA PCR buffer, 1 ml SALSA enzyme dilution buffer, 1 ml SALSA FAM-labeled MLPA primers, 1 ml HEX-labeled MAPH primers (2 mM each, forward primer 50 -GGCCGCGGGAATTCGATT-30 , reverse primer 50 -GCCGCGAATTCAC TAGTG-30 ), 14.75 ml H2O and 0.25 ml SALSA polymerase (MRC-Holland). Thermal cycling conditions encompassed are as follows: 33 cycles of 30 s at 95 1C, 30 s at 58 1C and 60 s at 72 1C, followed by 1 cycle of 20 min at 72 1C. PCR amplification products were 1:10 diluted in HiDi formamide-containing 400HD ROX size standard and analyzed on an Applied Biosystems 3730 capillary sequencer in GeneScan mode (Applied Biosystems). Trace data were analyzed using the GeneMapper software package (Applied Biosystems). The areas of each assigned peak (in arbitrary units) were exported for further analysis in Microsoft Excel spreadsheet software. Signals below the threshold value for noise cutoff in GeneMapper (log2 transformed peak area p7.64) were assigned the threshold value for noise cutoff. Results from target genes were calculated relative to the average signal of one of four reference genes present within the gene set (ABR, GUSB, GAPDH or B2M). Subsequently, the percentage s.d. was calculated to determine which reference gene was most stably expressed across the evaluated samples. Following normalization of the data, signals below the threshold value for noise cutoff (peak area p7.64) were again assigned the threshold value for noise cutoff. To monitor assay performance, a negative control (without RNA), a positive control (using synthetic template oligonucleotides as hybridization templates) and a commercial Human Universal Reference RNA (Clontech, Palo Alto, CA, USA) were included on each 96-well plate. To determine which data points should be excluded from the analysis, the total signal intensity of all four reference genes (GAPDH, B2M, ABR and GUSB; HEX- and FAM-labeled genes were analyzed separately) was calculated for each sample followed by calculating the average±3 s.d. of these genes over the total 96-well plate using the following formula: (intensity (B2M þ ABR þ GUSB þ GAPDH) sample) o (((intensity (B2M þ ABR þ GUSB þ GAPDH) sample 1x)/ samples) ±3 s.d.’s of (intensity (B2M þ ABR þ GUSB þ GAPDH) sample 1x))). Statistical analysis To evaluate the reproducibility of the dcRT–MLPA assay, an orthogonal experimental design was created. The statistical model used is a random effects model with nested factors as described by Searle et al.25 In this model, the ‘donor’ represents the highest factor (that is, source of variability within a population), ‘collection’ (that is, variability between time points within the same subject) is modeled as a nested factor within the donor, ‘tube’ (that is, variability between samples of a subject at a certain time point) is nested within the collection and ‘dcRT–MLPA’ (i.e., variability between assays/assay
reproducibility) is the last nested factor. This latter factor is also representative of the reliability of the dcRT–MLPA measurement. The nested model was subsequently fitted on the trace data collected for each target gene. Before performing the statistical analysis, trace data were analyzed and processed as described above (dcRT– MLPA data analysis) followed by log2 transformation of the data values. To test for differences between expression profiles of TB cases, LTBIs and uninfected healthy controls, we used the Global Test (PMID 14693814), testing for pairwise differences between the groups. Next, for each of the three pairwise comparisons, the genes were clustered in a hierarchical clustering, based on average linkage and absolute correlation distance. All single genes, as well as all sets of genes implied in the clustering, were tested for differential expression profiles using the Global Test. Correction for multiple testing was performed separately for each of the three clustering graphs using the method of Meinshausen.26,27 These tests and multiple-testing procedures were performed using the package Global Test (version 5.3.3) in R (version 2.10.1; Supplementary Figure S2). To test for significant differences of single genes between study groups, Kruskal–Wallis H and Dunn’s multiple comparison tests were applied (Figures 4–6). For the Lasso regression analysis, data were randomly split into a training set of 157 subjects and an independent test set of 79 subjects. On the training set, a separate Lasso logistic regression model was fitted for each pairwise comparison between the three groups of TB cases, LTBIs and uninfected healthy controls. The tuning parameter l that determines the amount of shrinkage in the Lasso model was chosen in each of the fitted models by optimizing the cross-validated log likelihood in the training set. Predicted values for the test set were found by applying the fitted model to the test data, and these predicted values (test set only) were used to generate receiver operating characteristic curves. For comparison, a logistic ridge regression model was fitted on the same training and test set, and using the same criterion for selecting the tuning parameter l. All the calculations for Lasso and ridge regression were performed using the package penalized (PMID: 19937997, version 0.9–32) in R (version 2.10.1). The R script used to perform the Lasso regression analysis is depicted in Supplementary Figure S4.
81
Taqman real-time PCR First-strand cDNA was synthesized using 400 ng of RNA and oligo(dT) primers in a 10-ml reverse transcriptase reaction mixture using SuperScript III First-Strand Synthesis System (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s protocol. Quantitative real-time PCR was performed on a 7900HT Fast RealTime PCR System (PE Applied Biosystems, Norwalk, CT, USA) and was conducted in a total volume of 25 ml containing 900 nM forward and reverse primer, 250 nM FAM dye-labeled Taqman MGB probe and 10 ng of cDNA in a 1 TaqMan Universal PCR Master Mix, No AmpErase UNG, including a passive reference (ROX) fluorochrome. Optimized thermal cycling conditions encompassed are as follows: 1 cycle of 10 min at 95 1C, followed by 50 cycles of 15 s at 95 1C and 1 min at 60 1C. Relative gene expression was calculated using the DCt Genes and Immunity
Biomarker identification using a novel dcRT–MLPA assay SA Joosten et al
82
method and normalized to the expression of GAPDH. All assays were performed in triplicate and the data are presented as the mean±s.d. The inventoried TaqMan Gene Expression Assays were purchased from Applied Biosystems and included: IFNG (Hs00989291_m1), TNF (Hs00174128_m1), IL2 (Hs00174114_m1), CD8A (Hs00233520_m1), RAB33A (Hs00191243_m1), CD14 (Hs00169122_g1), GUSB (Hs00939627_m1) and GAPDH (Hs03929097_g1).
Conflict of interest The authors declare no conflict of interest.
Acknowledgements We thank Dr SJ White for his technical support in designing a dual-color RT–MLPA, Dr EMS Leyten and Dr M van Westreenen for their contribution to the Paraguay cohort design and sample collection, and Dr A Geluk and Dr T van Hall for critically reviewing the manuscript. We gratefully acknowledge all the funding that made the work possible. We especially acknowledge the Bill and Melinda Gates Foundation (Grand Challenges in Global Health GC6#74), 6th framework programme TBVAC contract no. LSHP-CT-2003-503367, 7th framework programme NEWTBVAC contract no. HEALTH-F3-2009-241745 (the text represents the authors’ views and does not necessarily represent a position of the Commission that will not be liable for the use made of such information), The Netherlands Organization for Scientific Research (VENI grant 916.86.115) and the Gisela Thier foundation of the Leiden University Medical Center. The funders had no role in study design, data collection and analysis, decision to publish or in preparation of the manuscript.
References 1 Atkinson AJ, Colburn WA, DeGruttola VG, DeMets DL, Downing GJ, Hoth DF et al. Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther 2001; 69: 89–95. 2 Bowtell DD. Options available—from start to finish—for obtaining expression data by microarray. Nat Genet 1999; 21: 25–32. 3 Lonnroth K, Castro KG, Chakaya JM, Chauhan LS, Floyd K, Glaziou P et al. Tuberculosis control and elimination 2010–50: cure, care, and social development. Lancet 2010; 375: 1814–1829. 4 World Health Organization. Global tuberculosis control epidemiology, strategy, financing. WHO Report, WHO, Geneva 2009. 5 Kaufmann SH. Future vaccination strategies against tuberculosis: thinking outside the box. Immunity 2010; 33: 567–577. 6 Schouten JP, McElgunn CJ, Waaijer R, Zwijnenburg D, Diepvens F, Pals G. Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification. Nucleic Acids Res 2002; 30: e57. 7 Eldering E, Spek CA, Aberson HL, Grummels A, Derks IA, de Vos AF et al. Expression profiling via novel multiplex assay allows rapid assessment of gene regulation in defined signalling pathways. Nucleic Acids Res 2003; 31: e153.
8 White SJ, Vink GR, Kriek M, Wuyts W, Schouten J, Bakker B et al. Two-color multiplex ligation-dependent probe amplification: detecting genomic rearrangements in hereditary multiple exostoses. Hum Mutat 2004; 24: 86–92. 9 Bonecini-Almeida MG, Ho JL, Boechat N, Huard RC, Chitale S, Doo H et al. Down-modulation of lung immune responses by interleukin-10 and transforming growth factor beta (TGF-beta) and analysis of TGF-beta receptors I and II in active tuberculosis. Infect Immun 2004; 72: 2628–2634. 10 Demissie A, Abebe M, Aseffa A, Rook G, Fletcher H, Zumla A et al. Healthy individuals that control a latent infection with Mycobacterium tuberculosis express high levels of Th1 cytokines and the IL-4 antagonist IL-4delta2. J Immunol 2004; 172: 6938–6943. 11 Knudsen TB, Gustafson P, Kronborg G, Kristiansen TB, Moestrup SK, Nielsen JO et al. Predictive value of soluble haemoglobin scavenger receptor CD163 serum levels for survival in verified tuberculosis patients. Clin Microbiol Infect 2005; 11: 730–735. 12 Lienhardt C, Azzurri A, Amedei A, Fielding K, Sillah J, Sow OY et al. Active tuberculosis in Africa is associated with reduced Th1 and increased Th2 activity in vivo. Eur J Immunol 2002; 32: 1605–1613. 13 Jacobsen M, Repsilber D, Gutschmidt A, Neher A, Feldmann K, Mollenkopf HJ et al. Candidate biomarkers for discrimination between infection and disease caused by Mycobacterium tuberculosis. J Mol Med 2007; 85: 613–621. 14 Abebe M, Doherty TM, Wassie L, Aseffa A, Bobosha K, Demissie A et al. Expression of apoptosis-related genes in an Ethiopian cohort study correlates with tuberculosis clinical status. Eur J Immunol 2010; 40: 291–301. 15 Tibshirani R. Regression shrinkage and selection via the lasso. J Royal Statist Soc B 1996; 58: 267–288. 16 Berry MP, Graham CM, McNab FW, Xu Z, Bloch SA, Oni T et al. An interferon-inducible neutrophil-driven blood transcriptional signature in human tuberculosis. Nature 2010; 466: 973–977. 17 Jenner RG, Young RA. Insights into host responses against pathogens from transcriptional profiling. Nat Rev Microbiol 2005; 3: 281–294. 18 Mistry R, Cliff JM, Clayton CL, Beyers N, Mohamed YS, Wilson PA et al. Gene-expression patterns in whole blood identify subjects at risk for recurrent tuberculosis. J Infect Dis 2007; 195: 357–365. 19 Maertzdorf J, Repsilber D, Parida SK, Stanley K, Roberts T, Black G et al. Human gene expression profiles of susceptibility and resistance in tuberculosis. Genes Immun 2011; 12: 15–22. 20 Abebe F, Bjune G. The protective role of antibody responses during Mycobacterium tuberculosis infection. Clin Exp Immunol 2009; 157: 235–243. 21 Cooper AM. Cell-mediated immune responses in tuberculosis. Annu Rev Immunol 2009; 27: 393–422. 22 Burl S, Hill PC, Jeffries DJ, Holland MJ, Fox A, Lugos MD et al. FOXP3 gene expression in a tuberculosis case contact study. Clin Exp Immunol 2007; 149: 117–122. 23 Shafiani S, Tucker-Heard G, Kariyone A, Takatsu K, Urdahl KB. Pathogen-specific regulatory T cells delay the arrival of effector T cells in the lung during early tuberculosis. J Exp Med 2010; 207: 1409–1420. 24 Lund JM, Hsing L, Pham TT, Rudensky AY. Coordination of early protective immunity to viral infection by regulatory T cells. Science 2008; 320: 1220–1224. 25 Searle SR, Casella GM, McCulloch CE. Variance components. Wiley-Interscience 1992. 26 Goeman JJ, Solari A. The sequential rejection principle of familywise error control. Ann Statist 2010; 38: 3782–3810. 27 Meinshausen N. Hierarchical testing of variable importance. Biometrika 2008; 95: 265–278.
Supplementary Information accompanies the paper on Genes and Immunity website (http://www.nature.com/gene)
Genes and Immunity