Detection and monitoring of normal and ... - Wiley Online Library

Original Article

Detection and Monitoring of Normal and Leukemic Cell Populations with Hierarchical Clustering of Flow Cytometry Data Karel Fisˇer,1,2 * Toma´sˇ Sieger,3 Angela Schumich,4 Brent Wood,5 Julie Irving,1 Ester Mejstrˇı´kova´,2 Michael N. Dworzak4

1

Northern Institute for Cancer Research, Newcastle University, Newcastle, United Kingdom

2

CLIP Childhood Leukaemia Investigation Prague, Department of Paediatric Haematology and Oncology, Charles University Prague, 2nd Medical School, Czech Republic

3

Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University, Prague, Czech Republic

4

Children’s Cancer Research Institute and St. Anna Children’s Hospital, Vienna, Austria

5

Department of Laboratory Medicine, 1959 NE Pacific St., Seattle, Washington 98195

Received 18 October 2010; Revision Received 5 September 2011; Accepted 11 September 2011 Additional Supporting Information may be found in the online version of this article. Grant sponsor: JGW Patterson Foundation, UK; Grant number: BH070348; Grant sponsor: Czech Ministry of Education; Grant number: MSM0021620813;

Abstract Flow cytometry is a valuable tool in research and diagnostics including minimal residual disease (MRD) monitoring of hematologic malignancies. However, its gradual advancement toward increasing numbers of fluorescent parameters leads to information rich datasets, which are challenging to analyze by standard gating and do not reflect the multidimensionality of the data. We have developed a novel method to analyze complex flow cytometry data, based on hierarchical clustering analysis (HCA) but with a new underlying algorithm, using Mahalanobis distance measure. HCA is scalable to analyze complex multiparameter datasets (here demonstrated on up to 12 color flow cytometry and on a 20-parameter synthetic dataset). We have validated this method by comparison with standard gating approaches when performed independently by expert cytometrists. Acute lymphoblastic leukemia blast populations were analyzed in diagnostic and follow-up datasets (n 5 123) from three centers. HCA results correlated very well (Passing–Bablok correlation coefficient 5 0.992, slope 5 1, intercept 5 20.01) with standard gating data obtained by the I-BFM FLOW-MRD study group. To further improve the performance in follow-up samples with low MRD levels and to automate MRD detection, we combined HCA with support vector machine (SVM) learning. HCA in combination with SVM provides a novel diagnostic tool that not only allows analysis of increasingly complex flow cytometry data but also is less observer-dependent compared with classical gating and has potential for automation. ' 2011 International Society for Advancement of Cytometry

Key terms hierarchical clustering; minimal residual disease; acute lymphoblastic leukemia; support vector machines; multiparameter flow cytometry

IN childhood acute lymphoblastic leukemia (ALL), response to therapy as measured by minimal residual disease (MRD) monitoring is an important biomarker for predicting relapse and stratifying treatment (1–5). MRD can be assessed by molecular analysis of B- and T-cell receptor gene rearrangements or by flow cytometric analysis of aberrant immunophenotypes. Flow cytometric MRD monitoring is a fast and sensitive method and has been incorporated in several large childhood ALL clinical trials (1,6,7). However, flow cytometry generates increasingly large and information-rich datasets, which provide new challenges for analysis. Modern multilaser flow cytometers are able to simultaneously measure up to 12 or more parameters and acquire such information from millions of single cells (8,9). Traditional gating of populations on two-parameter plots is tedious (e.g., 28 plots in six-color flow cytometry, 66 plots for 10-color analysis, 91 plots for 12-color analysis, etc.) and does not reflect the multidimensionality of the data. Moreover, both the setting of the gates and interpreta-

Cytometry Part A 81A: 25 34, 2012

ORIGINAL ARTICLE

*Correspondence to: Karel Fiser; CLIP Childhood Leukaemia Investigation Prague, Department of Paediatric Haematology and Oncology, Charles University Prague, 2nd Medical School, Czech Republic. Email: [email protected]

Published online 11 October 2011 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/cyto.a.21148 © 2011 International Society for Advancement of Cytometry

tion of the results are observer-dependent and require intensive training and high levels of expertise. Therefore, new analytical tools that are less observer-dependent reflect the multidimensionality of the data and enable automatization are needed. This would facilitate the more widespread use and applicability of flow cytometry for MRD monitoring within large international multicenter trials. Alternative analytical methods have been tested for flow cytometry data (10–16), but most of them rely on prior knowledge of the number of clusters (cell populations) expected in the sample. These methods produce limited number of clusters without information on their inner hierarchy (subpopulations). Although hierarchical clustering methods can place each cell in its hierarchical context within the analyzed dataset, their downside is that they often fail to reflect the elliptical shapes of flow cytometric populations. One suggested solution lies in the use of Mahalanobis distance measurement for computing the distance of clusters (17); however, this method was unable to cluster samples from single events, a feature necessary for the detection of small populations such as in monitoring MRD in patients with leukemia. Therefore, we have developed a novel algorithm, harboring the advantages of hierarchical clustering and using Mahalanobis distance measurement, but which has the ability to cluster data from single events. With this approach, it is possible to present complex flow data in one figure, yet allowing easy separation of subpopulations and their quantification. Here, we validate this novel analysis algorithm in 123 MRD datasets from 45 patients with ALL (including data from 38 patients from the I-BFM FLOW-MRD study group QC investigations, similar to recently published work) (18,19).

DESIGN AND METHODS Samples and Flow Cytometry Samples from bone marrow or peripheral blood were collected, processed, and analyzed by flow cytometry using standard protocols (18,20). Samples were contributed by laboratories form respective national reference centers. An example of typical gating strategy is shown in Supporting Information Figure 1. All the works were approved by local Ethics Committees, and informed consent was obtained from patients or patient’s parents or legal guardians according to the Declaration of Helsinki. Data Preprocessing We have analyzed flow cytometry datasets that were acquired with FACSCalibur, CantoII, or LSRII cytometers (Becton Dickinson, San Jose, CA) or Dako Cyan ADP 26

(Beckmann Coulter, Brea, CA). Data were exported from their respective operating software (Cell Quest Pro, DiVa, or Summit into FCS files versions 2.0 or 3.0). Raw data were extracted from FCS files and imported into MATLAB environment (MathWorks, Natick, MA) in which all subsequent steps were carried out. Data from the FACSCalibur were exported as compensated and log-transformed datasets. Data from the CantoII and Dako Cyan were exported as linear, noncompensated datasets. For compensation, the fluorochome spill-over table exported from operating software was transposed and inverted to form a compensation matrix, which was then applied to the data. Linear data were processed by HyperLog transform (21) to allow better visualization of the data distributions. Logarithmic scale display is very useful for medium to high values but does not correctly display low and negative values for which a linear scale is more appropriate. Hyperlog transformation combines both scales. It is an inverse hybrid linear/exponential function that is defined over the real-numbered domain. It allows for smooth transition from negative and low positive values in linear scale to higher values in logarithmic scale resulting in a single continuous display. The resulting display of data is based on the number of decades to be displayed, resolution of the cytometer (or analog to digital conversion) and the coefficient (here, fifth percentile of negative values in each parameter) controlling the range of linearly displayed data. All datasets were normalized (z-score) before further analysis. Hierarchical Clustering Hierarchical clustering analysis (HCA) is an alternative method to traditional gating for identifying cells with similar characteristics. This method measures the similarity of cells based on the complete profile of all recorded parameters (i.e., both light scatter and all fluorescence channels) rather than based on consecutive single or dual parameter comparisons. HCA builds a hierarchical tree (dendrogram) of cell populations by bottom-up merging of clusters of cells based on their similarities. This merging process starts from single cells and results in a single cluster consisting of all the cells. The internal structure of the resulting dendrogram reflects the merging process that, in turn, reflects the hierarchy of populations in the input dataset. We developed a new adaptive linkage algorithm for HCA, called Mahalanobis-average linkage, which is especially suitable for flow cytometry data that often contain cell populations of elongated multidimensional ellipsoid shapes (Supporting Information Fig. 2). Mahalanobis-average linkage algorithm proved to be superior to other HCA metrics as tested on a synthetic dataset (Supporting Information Fig. 3). This translates Hierarchical Clustering of Flow Data

ORIGINAL ARTICLE into correct population recognition when compared with other HCA metrics (Supporting Information Fig. 4). It uses a scale-invariant Mahalanobis distance (22) to define the proximity of clusters. The Mahalanobis distance between an ellipsoid (fitted to a cell cluster) and a point (a single cell) is the Euclidean (ordinary) distance of the point from the center of the ellipsoid compensated by the length of the ellipsoid in the direction from the center to the point. This means, for example, that all the points at the ‘‘surface’’ of the ellipsoid have the same Mahalanobis distance to the center of the ellipsoid. During the merging process, the distance of two clusters is computed as the mean of Mahalanobis distances between all data points from one cluster and the ellipsoid fitted to another cluster and vice versa. For single cells (or observation vectors) and clusters containing small numbers of cells, ellipsoids that would correctly fit (i.e., represent) such small clusters could not be computed, so Euclidean distance was used. Therefore, Mahalanobis-average linkage starts in a pure HCA fashion from single datapoints. The linkage smoothly shifts from Euclidean through weighted Euclidean/Mahalanobis to Mahalanobis distance measurement, when computing intercluster distance. The shift is controlled by a threshold from which only Mahalanobis distance is used. This threshold parameter is proportional to the dataset size. For 104 events, the threshold was set at 0.1% of all events. This means that a minimum of 10 events is used to define a fitted multidimensional ellipsoid needed for Mahalanobis distance measurements. The distance to clusters consisting of fewer observations is computed as the weighted mean of Mahalanobis and Euclidean distances: the fewer observations in the cluster, the weaker the contribution of the Mahalanobis distance. We analyzed datasets with n 5 104 events as the distance of data points requires O(n2) space in memory, and therefore, it becomes impractical (or even impossible) to analyze bigger datasets on current desktop computers. The use of Mahalanobis-average linkage allowed us not only to build the hierarchy from single events but also to retain the advantage of using Mahalanobis distance measurements to compute the distance between larger clusters. The resulting hierarchy of the cells is displayed as a dendrogram (hierarchical tree), accompanied by the dataset table. This table is in the form of a heatmap, where the individual parameter values are color-coded (e.g., blue—low expression, red— high expression). This display, that we term dendroheatmap, allows visualization of the hierarchically clustered flow data with all parameters displayed in a single plot. Cell populations are then selected by the investigator as clusters (branches of the dendrogram) based on their inner compactness and distance to outer clusters. Alternatively clusters can be selected automatically by cutting the dendrogram into clusters at the absolute intercluster distance threshold (which is either chosen or computed). These clusters can be then plotted on traditional scatter plots. Supervised Learning To automate detection of clusters of interest, for example, MRD in follow-up samples, support vector machine (SVM) Cytometry Part A 81A: 25 34, 2012

was used as a supervised learning method (with use of Spider—SVM package for MATLAB) (23). SVM classifiers are trained, based on a known class in the training dataset (here: leukemic blast populations in diagnostic samples). The classifiers are then able to recognize the class of interest in test datasets (MRD populations in follow-up samples), as was also reported previously (24). In our study, classifiers were built based on leukemic populations identified as clusters by HCA. In 10-fold cross-validation, two kernel functions were used to train classifiers: radial basis function kernel [sigma (0.25-8), C (5-100)] and polynomial kernel [order (1-8) C (5-50)]. Crossvalidation was performed on a training dataset, which was split 10 times to form subsets of the same representation of class positive and class negative events (balanced splits). On each subset of the training dataset, all kernel functions and their respective parameters were tested. The best classifier (and therefore kernel function and its parameters) was then chosen based on the estimate of the lowest rate of misclassification. Misclassification means erroneous classification, that is, situations where the classifier assigns an event to the wrong class, and the rate of misclassification is the frequency of such errors. Before testing new samples for the presence of the cluster of interest (i.e., presence of MRD population in new follow up samples), data were scaled to fit the parameter ranges of the training data (diagnostic dataset). Training of SVM classifiers was done on 104 data points. Testing for the presence of the cluster of interest was performed on 104 events (to match HCA, which is limited in the size of dataset to be analyzed), as well as on whole datasets (5 3 104 to 5 3 105 events). Resulting class estimates (MRD values) were compared with both an independent HCA and the original MRD analyses, performed by specialists from the contributing flow cytometry centers. All scripts needed to reproduce presented work (MATLAB M-files) are available from the authors on request. Descriptive Statistics As current cytometer operating software do not allow easy identification of events (even if in the fcs files they are clearly defined by the order they were recorded), we could not compare individual events assignments. We have used overall percentages and we checked overall LAIPs (leukemia-associated immunophenotypes) agreement where possible. For assessing agreement between different analytical methods, Passing and Bablok (PB) regression was used. It is described in text by r-correlation coefficient, s-slope, and iintercept of the regression.

RESULTS Hierarchical Clustering of Multidimensional Flow Cytometry Data Three major populations are usually detected in peripheral blood and bone marrow based on forward (FSC) and side (SSC) light scattering characteristics: lymphoid, monocytoid, and granulocytic cells. As a first test of our method, nine normal bone marrow samples (i.e., samples from end of treatment or samples without disease involvement in bone marrow) 27

ORIGINAL ARTICLE

Figure 1. HCA-Main hematopoietic populations in normal BM. A: Dendrogram with heatmap-HCA of 104 ungated events acquired from normal BM. Heatmap shows relative levels of all eight parameters (columns) in all 104 events (rows) in color coding (blue, low expression; red, high expression). Dendrogram shows the hierarchy of cells based on their similarity in all parameters measured. The x-axis under the dendrogram represents similarity distance. Colored branches of the dendrogram are selected clusters, as displayed in B. B: The main populations (as identified by HCA) are displayed on a conventional forward versus side scatter dot plot. C: All two-parameter combination plots of a defined cluster (marked by red rectangle in A). This population is negative for most of the antibodies from the B cell panel and may be difficult to detect by standard gating as it overlaps with other populations in all 28 two-parameter plots. Nevertheless it is a compact, homogenous population most likely of T cell origin. This red population was drawn on top of the remaining cells (gray). D: Mirror image to C. The gray population was drawn on top of the red population. Fluorescent dyes used were: CD10-FITC, CD22-PE, CD117-PerCPCy5.5, CD38-PE-Cy7, CD34-APC, CD19-APC-Cy7, and in all cases pulse height was used.

were analyzed for the presence of these populations. HCA of six-color (eight-parameter) ungated flow cytometry data allowed us to identify these three populations as prominent clusters in all samples (e.g., Figs. 1A and 1B). Using color coding, the relative FSC and SSC values of all clusters (i.e., populations) can be visualized on the heatmap. Moreover, in all normal samples, we found clusters corresponding to candidate cell populations from B-cell development (Supporting Information Fig. 8). Next, the shape of these clusters (populations) was analyzed on scatter plots. Clusters (populations) defined by HCA reflect similarity in all parameters and, therefore, form compact populations with near lognormal distributions. Hierarchical clustering using Mahalanobis-average distance measurements in the linkage process also proved able to define populations with elongated ellipsoidal shapes, a feature not easily obtainable by conventional HCA metrics, but essential for defining biologically relevant populations. One advantage of the multidimensionality of HCA is that it can visualize populations that are otherwise difficult to detect (hidden populations). This is exemplified in Figure 1. Although a B-cell antibody panel was applied for this analysis, a distinct CD381B-lin2 population was identified that overlapped with other populations in all 28 two-parameter plots (and most likely represent T cells (25)) but is formed by cells with similar characteristics in all parameters. 28

Detection and Quantification of Leukemic Blast Populations by HCA of Flow Cytometry Data To test the relevance of our method for clinical use, datasets from ALL samples at diagnosis (n 5 48) were analyzed. The mean reference blast population, as determined by gating analysis in the participating laboratories, was 67.4% (standard deviation: 23.8%). HCA was performed in a blinded fashion, without prior knowledge of the expected percentages or of the leukemia-associated immunophenotypes (LAIP). Using HCA, we have correctly identified the leukemic blast populations in all samples analyzed. The blast populations were manually selected based on their distinct pattern on the heatmaps and their clear separation in the dendrograms. When clusters were plotted on traditional dot plots, it was confirmed that their LAIP matched that of the original populations as determined in the reference laboratories. The correlation between blast percentage data derived from standard gating and HCA was 0.968 (PB: r 5 0.968, s 5 1.01, i 5 20.52). ALL MRD Monitoring by HCA To further validate our method in a clinical trial setting, we analyzed flow cytometry datasets from 23 BCP-ALL and 15 T-ALL sample pairs engaged in the most recent QC investigation of the I-BFM FLOW-MRD study group (18). Diagnostic (see also above) as well as day 15 datasets of patients treated Hierarchical Clustering of Flow Data

ORIGINAL ARTICLE

Figure 2. Correlation of HCA analysis results with standard gating. 123 samples from three independent centers, including 38 sample pairs from the most recent QC investigations of the I-BFM FLOW-MRD study group, were analyzed by standard gating (5 3 104 to 53105 events) and HCA (1 3 104 events). The reported blast population percentages are calculated as percentage of all events analyzed. A: The Passing and Bablok regression. B: Bland-Altman plot showing differences of two measures (clustering and gating) from the mean values.

with the AIEOP-BFM-ALL 2000 protocol (26) were again analyzed in a blinded fashion. There was excellent concordance of the results between the HCA and the reported values

from the QC trial (Fig. 2). First BCP-ALL: For diagnostic samples (range of expected values: 23.92–90.21%, median: 63.83%), the correlation was 0.984 (PB: r 5 0.984, s 5 1, i 5

Figure 3. Example of detection and quantification of ALL blasts. Diagnostic (left) and day 15 (right) bone marrow samples from one patient with ALL. A: Gates used for conventional gating of the leukemic population (red) using a standard antibody panel (CD58, CD10, CD45, CD34, CD19, CD20). B: Result of independent HCA of the same flow data. Clusters were selected form hierarchy as branches of dendrogram. Cells included in such a branch are colored red. The red clusters are displayed on similar dot plots as for the standard gating. (Note the slightly different shapes of populations, due to differences between biexponential (DiVa) and Hyperlog transformations. This is also reason for better visualization of inner heterogeneity in HCA and SVM of d15 blast population.) C: SVM for automatic detection of the MRD population in the day 15 sample. First, the classifier is trained based on known class labels (here: the red cluster from HCA at diagnosis). Second, the classifier is asked to automatically assign class distribution in a test sample (here: the day 15 sample from the same patient). The results are also displayed in two relevant dot plots, were coloring is result of automatic class selection (here class ‘‘MRD’’ - red).


29

ORIGINAL ARTICLE

Figure 4. Comparison of MRD monitoring by gating, HCA, and SVM. Datasets from five patients were obtained and analyzed without prior knowledge of the immunophenotype or percentage of the leukemic blast population. After HCA of the diagnostic sample, the leukemic cluster was used to train classifiers to be applied in the follow-up samples. In the follow-up samples, SVM was performed on 104 events as well as on all recorded events (5 3 104 to 5 3 105). A: An example of a patient’s follow-up monitoring by all three methods. B—D: Comparisons of HCA and SVM to standard gating.

0.52) and for day 15 MRD samples (range: \0.01–19.37%, median: 0.68%) the correlation was 0.995 (PB: r 5 0.995, s 5 1.02, i 5 20.03). In T-ALL diagnostic samples (range of expected values: 28.00–92.00%, median: 82.00%), the correlation was 0.913 (PB: r 5 0.913, s 5 1.17, i 5 214.45), and in day 15 MRD samples (range: \0.01–81.00%, median: 14.60%), the correlation was 0.996 (PB: r 5 0.96, s 5 1.01, i 5 20.02). The overall correlation of all samples analyzed (n 5 123, including 76 samples from QC trials) was 0.992 (PB: r 5 0.992, s 5 1, i 5 20.01). ALL MRD Monitoring by HCA and SVM For further improved MRD monitoring, we combined the advantages of HCA in characterizing the leukemic population with fast and automatic recognition of the blast population by SVM classifiers. The workflow for the combined use of HCA and SVM analyses is illustrated in Figure 3. First, HCA was performed on the flow dataset of the original diagnostic sample to define the leukemic population as a cluster, and then this cluster was used to train a SVM classifier. The trained classifier was then used in all follow-up samples from the same patient to automatically detect MRD. The MRD estimates were compared with results from independent HCA 30

and with the standard gating analysis from the different reference laboratories. MRD levels were analyzed using SVM classifiers in 15 follow-up samples from five patients with ALL with persistent leukemia throughout induction, as defined by conventional flow MRD analysis (Fig. 4). SVM classifiers trained on diagnostic populations from BM or PB that either constituted the majority or the minority of cells in the sample (range 3.51–92.38%) both performed well in the follow-up samples. HCA and SVM showed a good concordance with conventional gating in follow-up samples with the percentage of persisting blasts ranging from 0.004 to 57.54%, median 0.65% (as estimated by gating). In samples with low MRD levels (less than 0.5%), SVM (performed on the complete dataset) correlated better with standard gating than HCA (performed on 104 events only) correlation 0.967 (PB: r 5 0.967, s 5 1.03, i 5 0.01) and 0.910 (PB: r 5 0.91, s 5 1.33, i 5 0), respectively, Figure 4. Current Challenges in Flow MRD Monitoring One of the key challenges of modern multiparameter flow cytometry is the increasing complexity of its datasets. HCA can be used for analysis of datasets with increasing numbers of parameters as was shown with six and eight parameters Hierarchical Clustering of Flow Data

ORIGINAL ARTICLE

Figure 5. HCA and SVM correctly identify the MRD population despite a shift in CD34 expression. Histogram of CD34 expression on the leukemic populations at diagnosis (black) and at week 11 (gray). The insert provides a comparison of MRD with HCA and SVM (SVM not present at d0 as it is trained at diagnosis).

as well as with 10 parameters (Supporting Information Figs. 5 and 6) datasets. As proof of principle, we also successfully applied this method to a 20-parameter synthetic dataset (Supporting Information Fig. 7).

Specific challenges of flow cytometric MRD monitoring are the identification of leukemic blasts, which have undergone an immunophenotypic shift during therapy and the discrimination of leukemic blasts from regenerating hemato-

Figure 6. Regenerating hematogones are distinguished from leukemic cells by HCA. A: Populations in a day 78 bone marrow as defined by standard gating (i) and HCA (ii). B: Enlarged section of the dendrogram with heatmap corresponding to populations in ii. Red, leukemic population; blue, regenerating hematogones (lower CD10, CD20 expression); green, debris (SYTO41 negative).


31

ORIGINAL ARTICLE gones with a related immunophenotype. There were two such examples in our cohort. Figure 5 shows HCA and SVM analyses of samples in which the leukemic blast population samples downregulated CD34 expression during treatment. Despite this antigen shift, both the methods were able to correctly identify and quantify the MRD population. Similarly, analyses of a postinduction bone marrow (day 78 of ALL-BFM 2000 protocol) are shown in Figure 6 in which HCA correctly distinguished between leukemic blasts and hematogones by putting them to individual branches in the dendrogram. These two populations were—due to their immunophenotypical similarity—otherwise very close in the cluster hierarchy. On the other hand, the SVM classifier was not trained to distinguish between the two populations, because the regenerating population was not present at diagnosis, and therefore, it assigned both the leukemic and the normal regenerating population to one cluster.

DISCUSSION Both for hematological research and diagnostics, modern multiparameter flow cytometry is a powerful tool for phenotyping normal and leukemic cell populations (27). One of its key applications lies in the monitoring of MRD as a clinically important biomarker for relapse prediction and treatment stratification (1–5). However, the rapidly rising complexity of multiparameter flow cytometry datasets creates new challenges. Currently, the analytical standard in the field involves gating, in which one or more gates are defined in each histogram or dual parameter plot and a sequence or combination of gates defines the population of interest. This process is tedious or even unfeasible, already requires highly experienced cytometrists and is observer dependent. Most importantly, the sequential gating is limited in its ability to reflect the multidimensionality of the data. A solution to both the multidimensionality of flow data and the observer dependency lies in the usage of unsupervised learning methods. A number of methods have been suggested for the use in flow cytometry (10–16). Most methods rely on estimate of number of populations leaving behind hierarchical nature of complex biological sample or use the hierarchy of clusters only as a proxy for building models of estimated number of components/clusters (28). HCA, on the other hand, offers a picture of all recorded events in a hierarchically organized fashion so that cells with similar characteristics reside close to each other. HCA in its standard form measures distances between data points once and then uses this distance in the linkage part of the algorithm for merging clusters and thus building the hierarchy. This setting, however, does not reflect the ellipsoidal shape of populations in flow cytometry datasets and, therefore, there has been a lack of unsupervised learning methods applicable to flow cytometry. We have developed a new algorithm for HCA, using adaptive Mahalanobis-average linkage, to cluster flow data. In this algorithm, merging of clusters is based on distances of data points of one cluster to an ellipsoid fitted to another cluster and vice versa. Mahalanobis-average linkage allows the for32

mation of clusters starting from single cells. This feature has two important advantages over some previous methodologies (17): the major one being that clustering from single cells increases the sensitivity of population detection, which has important implications for both MRD monitoring and explorative analyses of flow data. Second, Mahalanobis-average linkage also allows clustering in pure HCA manner without the need of initial data splitting thus avoiding any possible introduction of errors in this first step. As a marker of quality, populations chosen as clusters from HCA using Mahalanobis-average linkage, usually have an even distribution of measured cell parameters. HCA dissects such populations not only when they are relatively well separated from each other but also when they are overlapping with other populations in the sample and therefore would be difficult or even impossible to find using a traditional gating approach. This was exemplified by the distinct CD381B-lin2 population in Figure 1. Using this novel HCA algorithm, it became possible to correctly assign both the immunophenotype and the percentage of leukemic blasts, in a large cohort of diagnostic and follow-up samples from children with ALL. There was an excellent correlation between leukemic levels determined by traditional gating and HCA. The correlation was equally high for both BCP-ALL and T-ALL samples, even if T-ALL is traditionally more challenging due to interfering normal cell populations, blast heterogeneity or loss of immaturity associated markers. Most importantly, this correlation compared well with the interlaboratory QC investigations recently reported from the I-BFM FLOW-MRD study group (18). This comparison served as validation of our method, demonstrating the potential of this new approach for clinical use. The biggest challenge for HCA using this algorithm is the size of dataset, which can be analyzed on current desktop computers, which is limited to 2 3 104 events. The sensitivity of any flow cytometry analysis is limited by the number of events recorded and by the minimal number of cells recognized as a population. For 104 events, datasets analyzed by HCA in this work using 10 cells as a threshold for defining a population, the sensitivity is 0.1%. This can be overcome in MRD monitoring, where high numbers of cells must be analyzed, by usage of SVM. SVM is a supervised learning method able to automatically detect populations of interest in datasets with high numbers of acquired events ([106, data not shown). The combination of hierarchical clustering and SVM allowed detection of low levels of residual disease population in follow-up ALL samples. Other possibilities, not presented here, to overcome cell number limitations is either use SVM on populations derived from other methods (e.g., binning) or to split the data before HCA is applied (either manual—subgating or the sequence of HCA on representative subset [ SVM of chosen subset [ HCA on subset defined by SVM). Other challenges for flow cytometric MRD monitoring and thus for HCA or SVM analysis include immunophenotypic shifts in ALL blasts following therapy, as well as the discrimination of persisting leukemic blasts from regenerating normal hematogones (29–31). In principal, any supervised learnHierarchical Clustering of Flow Data

ORIGINAL ARTICLE ing is prone to be affected by significant changes in the parameters of the population of interest, and these issues will need further investigation. In our hands, immunophenotype modulation did not hamper HCA or SVM analysis (Fig. 5). Regenerating populations, detected with HCA, were, however more challenging for SVM (Fig. 6). This difficulty may be addressed by training the classifiers on several classes simultaneously, for example, both by positive training on the class of interest (here: the malignant population) and by negative training on the remaining data points (here: residual normal bone marrow). Despite these challenges, HCA using Mahalanobis-average linkage opens up a new perspective on how to view flow cytometry data. The inherent multidimensional nature of the analysis leads to identification of homogenous populations not only on histograms or two-parameter dot plots but also in the n-dimensional space (with n representing the number of parameters analyzed). This method can be up-scaled and is easily applicable to modern cutting edge high parameter flow cytometry that is otherwise difficult to analyze. In addition, HCA has the ability to show sample populations not only in the hierarchical context of other populations but also with their inner hierarchy. Dissecting tumor heterogeneity is key to understanding clonal evolution (32), development of drug-resistance (33) or identifying candidate leukemia-propagating stem cell populations (34). Most important for its clinical applicability, HCA is less observer dependent than traditional gating. By reanalyzing 23 follow-up samples from the I-BFM FLOW-MRD study group (18), we have been able to achieve high concordance with standardized flow cytometry analysis, without participating in the respective intensive training and feedback framework. SVM testing is completely observer independent, once the classifier is trained for the recognition of residual disease population, and can be fully automatic. However, the choice of population to train the classifier still is observer dependent and relies on the method used. As HCA clusters form compact populations that provide optimal classes for SVM it appears to be advantageous to combine HCA with SVM. HCA has the ability of identifying various relevant populations without previous knowledge of the sample, whereas SVM, once the classifier is trained, can perform the test automatically, speedily and on very large datasets. The hierarchy of flow cytometry events produced by HCA is independent of person or laboratory where it is performed. The cluster selection can be formalized (description of dendrogram branches) and population size and structure can be discussed. This will allow a new level of easy and formally precise standardization and quality control in large international multicenter trials. In summary, we have developed a new algorithm that allows applying HCA to flow cytometry and which opens new opportunities for the scientific and clinical use of flow cytometry. Most importantly, this approach reflects the multidimensionality and enables analysis of complex, multiparameter flow data. It provides a new tool to study tumor heterogeneity. From a clinical perspective, in combination with SVM learnCytometry Part A 81A: 25 34, 2012

ing, HCA using Mahalanobis-average linkage is applicable to leukemia diagnostics and MRD monitoring. It has been validated against a standardized set of flow MRD data from the IBFM FLOW-MRD study group, and it has the potential for automatization. Author Contributions K.F. designed research, wrote MATLAB scripts, analyzed data, prepared the figures and wrote the paper; T.S. developed Mahalanobis-average algorithm, wrote MATLAB scripts and critically reviewed the manuscript; A.S. retrieved data; B.W., J.I., E.M. and M.N.D. provided data and critically reviewed the manuscript.

ACKNOWLEDGMENTS The authors are grateful to Marian Case for flow cytometry datasets retrieval. This work contains data which were elaborated within an international co-operative study of the IBFM ALL FLOW-MRD study group, represented by M.N. Dworzak (source of data files), G. Basso (Univ. of Padova), G. Gaipa (Tettamanti Research Center, Monza), and L. Karawajew (Robert-Roessle Clinic, Medical University of Berlin Charite´). Senior author: Michael N. Dworzak, coordinator of the IBFM ALL FLOW-MRD network.

LITERATURE CITED 1. Borowitz MJ, Devidas M, Hunger SP, Bowman WP, Carroll AJ, Carroll WL, Linda S, Martin PL, Pullen DJ, Viswanatha D, et al. Clinical significance of minimal residual disease in childhood acute lymphoblastic leukemia and its relationship to other prognostic factors: A Children’s Oncology Group study. Blood 2008;111:5477–5485. 2. Ciudad J, San Miguel JF, Lopez-Berges MC, Vidriales B, Valverde B, Ocqueteau M, Mateos G, Caballero MD, Hernandez J, Moro MJ, Mateos MV, Orfao A. Prognostic value of immunophenotypic detection of minimal residual disease in acute lymphoblastic leukemia. J Clin Oncol 1998;16:3774–81. 3. Coustan-Smith E, Sancho J, Hancock ML, Boyett JM, Behm FG, Raimondi SC, Sandlund JT, Rivera GK, Rubnitz JE, Ribeiro RC, et al. Clinical importance of minimal residual disease in childhood acute lymphoblastic leukemia. Blood 2000;96:2691– 2696. 4. Coustan-Smith E, Behm FG, Sanchez J, Boyett JM, Hancock ML, Raimondi SC, Rubnitz JE, Rivera GK, Sandlund JT, Pui CH, et al. Immunological detection of minimal residual disease in children with acute lymphoblastic leukaemia. Lancet 1998;351:550–554. 5. Dworzak MN, Froschl G, Printz D, Mann G, Potschger U, Muhlegger N, Fritsch G, Gadner H. Prognostic significance and modalities of flow cytometric minimal residual disease detection in childhood acute lymphoblastic leukemia. Blood 2002;99:1952–1958. 6. Langebrake C, Creutzig U, Dworzak M, Hrusak O, Mejstrikova E, Griesinger F, Zimmermann M, Reinhardt D. Residual disease monitoring in childhood acute myeloid leukemia by multiparameter flow cytometry: The MRD-AML-BFM Study Group. J Clin Oncol 2006;24:3686–3692. 7. Schultz KR, Pullen DJ, Sather HN, Shuster JJ, Devidas M, Borowitz MJ, Carroll AJ, Heerema NA, Rubnitz JE, Loh ML, et al. Risk- and response-based classification of childhood B-precursor acute lymphoblastic leukemia: A combined analysis of prognostic markers from the Pediatric Oncology Group (POG) and Children’s Cancer Group (CCG). Blood 2007;109:926–935. 8. Perfetto SP, Chattopadhyay PK, Roederer M. Seventeen-colour flow cytometry: Unravelling the immune system. Nat Rev Immunol 2004;4:648–655. 9. Wood B. 9-color and 10-color flow cytometry in the clinical laboratory. Arch Pathol Lab Med 2006;130:680–690. 10. Lo K, Brinkman RR, Gottardo R. Automated gating of flow cytometry data via robust model-based clustering. Cytometry Part A 2008;73A:321–332. 11. Rogers WT, Moser AR, Holyst HA, Bantly A, Mohler ER, III, Scangas G, Moore JS. Cytometric fingerprinting: Quantitative characterization of multivariate distributions. Cytometry Part A 2008;73A:430–441. 12. Wilkins MF, Hardy SA, Boddy L, Morris CW. Comparison of five clustering algorithms to classify phytoplankton from flow cytometry data. Cytometry 2001;44:210–217. 13. Zeng QT, Pratt JP, Pak J, Ravnic D, Huss H, Mentzer SJ. Feature-guided clustering of multi-dimensional flow cytometry datasets. J Biomed Inform 2007;40:325–331. 14. Boedigheimer MJ, Ferbas J. Mixture modeling approach to flow cytometry data. Cytometry Part A 2008;73A:421–429. 15. Baudry JP, Raftery AE, Celeux G, Lo K, Gottardo R. Combining mixture components for clustering. J Comput Graph Stat 2010;9:332–353. 16. Finak G, Bashashati A, Brinkman R, Gottardo R. Merging mixture components for cell population identification in flow cytometry. Adv Bioinformatics 2009;247646.

33

ORIGINAL ARTICLE 17. Zamir E, Geiger B, Cohen N, Kam Z, Katz BZ. Resolving and classifying haematopoietic bone-marrow cell populations by multi-dimensional analysis of flow-cytometry data. Br J Haematol 2005;129:420–431. 18. Dworzak MN, Gaipa G, Ratei R, Veltroni M, Schumich A, Maglia O, Karawajew L, Benetello A, Potschger U, Husak Z, et al. Standardization of flow cytometric minimal residual disease evaluation in acute lymphoblastic leukemia: Multicentric assessment is feasible. Cytometry Part B Clin Cytom 2008;74B:331–340. 19. Basso G, Veltroni M, Valsecchi MG, Dworzak MN, Ratei R, Silvestri D, Benetello A, Buldini B, Maglia O, Masera G, et al. Risk of relapse of childhood acute lymphoblastic leukemia is predicted by flow cytometric measurement of residual disease on day 15 bone marrow. J Clin Oncol 2009;27:5168–5174. 20. Irving J, Jesson J, Virgo P, Case M, Minto L, Eyre L, Noel N, Johansson U, Macey M, Knotts L, et al. Establishment and validation of a standard protocol for the detection of minimal residual disease in B lineage childhood acute lymphoblastic leukemia by flow cytometry in a multi-center setting. Haematologica 2009;94:870–874. 21. Bagwell CB. Hyperlog-a flexible log-like transform for negative, zero, and positive valued data. Cytometry Part A 2005;64A:34–42. 22. Mahalanobis. On the generalised distance in statistics. Proc Natl Inst Sci India 1936;2:49–55. 23. Vapnik VN. An overview of statistical learning theory. IEEE Trans Neural Netw 1999;10:988–999. 24. Toedling J, Rhein P, Ratei R, Karawajew L, Spang R. Automated in-silico detection of cell populations in flow cytometry readouts and its application to leukemia disease monitoring. BMC Bioinformatics 2006; article id 7:282. 25. Deaglio S, Mehta K, Malavasi F. Human CD38: A (r)evolutionary story of enzymes and receptors. Leuk Res 2001;25:1–12. 26. Ratei R, Basso G, Dworzak M, Gaipa G, Veltroni M, Rhein P, Biondi A, Schrappe M, Ludwig WD, Karawajew L. Monitoring treatment response of childhood precursor Bcell acute lymphoblastic leukemia in the AIEOP-BFM-ALL 2000 protocol with multi-

34

27. 28.

29.

30.

31.

32. 33.

34.

parameter flow cytometry: Predictive impact of early blast reduction on the remission status after induction. Leukemia 2009;23:528–534. Basso G, Buldini B, De Zen L, Orfao A. New methodologic approaches for immunophenotyping acute leukemias. Haematologica 2001;86:675–692. Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL. Model-based clustering and data transformations for gene expression data. Bioinformatics 2001;17:977– 987. Gaipa G, Basso G, Aliprandi S, Migliavacca M, Vallinoto C, Maglia O, Faini A, Veltroni M, Husak D, Schumich A, et al. Prednisone induces immunophenotypic modulation of CD10 and CD34 in nonapoptotic B-cell precursor acute lymphoblastic leukemia cells. Cytometry Part B Clin Cytom 2008;74B:150–155. Gaipa G, Basso G, Maglia O, Leoni V, Faini A, Cazzaniga G, Bugarin C, Veltroni M, Michelotto B, Ratei R, et al. Drug-induced immunophenotypic modulation in childhood ALL: Implications for minimal residual disease detection. Leukemia 2005;19:49–56. Mejstrikova E, Fronkova E, Kalina T, Omelka M, Batinic D, Dubravcic K, Pospisilova K, Vaskova M, Luria D, Cheng SH, et al. Detection of residual B precursor lymphoblastic leukemia by uniform gating flow cytometry. Pediatr Blood Cancer 2010;54:62–70. Shackney SE, Shankey TV. Genetic and phenotypic heterogeneity of human malignancies: Finding order in chaos. Cytometry 1995;21:2–5. Pieters R, den Boer ML, Durian M, Janka G, Schmiegelow K, Kaspers GJ, van Wering ER, Veerman AJ. Relation between age, immunophenotype and in vitro drug resistance in 395 children with acute lymphoblastic leukemia—Implications for treatment of infants. Leukemia 1998;12:1344–1348. le Viseur C, Hotfilder M, Bomken S, Wilson K, Rottgers S, Schrauder A, Rosemann A, Irving J, Stam RW, Shultz LD, et al. In childhood acute lymphoblastic leukemia, blasts at different stages of immunophenotypic maturation have stem cell properties. Cancer Cell 2008;14:47–58.

Hierarchical Clustering of Flow Data

Detection and monitoring of normal and ... - Wiley Online Library

Detection and monitoring of normal and ... - Wiley Online Library

Suggest Documents

Satellitebased detection and monitoring of ... - Wiley Online Library

Detection and identification of ... - Wiley Online Library

Indicators and Monitoring - Wiley Online Library

Understanding and monitoring the ... - Wiley Online Library

Normal and Oncogenic Forms of the Receptor ... - Wiley Online Library

Advances in serviceability and strength of normal - Wiley Online Library

Raman spectroscopy and normal vibrations of ... - Wiley Online Library

Molecule Detection - Wiley Online Library

Protein phosphatase inhibition in normal and ... - Wiley Online Library

LEK1 protein expression in normal and ... - Wiley Online Library

skeletal muscle lipids in normal and dystrophic ... - Wiley Online Library

Telomere Biology in Normal and Leukemic ... - Wiley Online Library

Normal activity limitations, social support, and ... - Wiley Online Library

Modeling Notch Signaling in Normal and ... - Wiley Online Library

miRNAS in normal and diseased skeletal muscle - Wiley Online Library

Calcium and contribution to the normal ... - Wiley Online Library

Teaching and learning normal gynecological ... - Wiley Online Library

Geophysical Monitoring of Simulated ... - Wiley Online Library

Normal urinary and sexual function in men ... - Wiley Online Library

Transplanted neurons form both normal and ... - Wiley Online Library

Abdominal muscle size and symmetry in normal ... - Wiley Online Library

Grain development in normal and high lysine ... - Wiley Online Library

NAP5 and depth of anaesthesia monitoring - Wiley Online Library

Cellular uptake and delivery monitoring of ... - Wiley Online Library