proceedings of spie

0 downloads 0 Views 2MB Size Report
R&D, Craponne; dMicrobiology R&D, La Balme Les Grottes; bioMérieux S.A., ... automated system (bioMérieux, France) appropriate cards used for pathogen ...
PROCEEDINGS OF SPIE SPIEDigitalLibrary.org/conference-proceedings-of-spie

Direct and label-free gram classification of bacterial colonies on agar using hyperspectral imaging

R. Midahuen, P. Barlas, C. Fulchiron, E. Laloum, P. Mahé, et al.

R. Midahuen, P. Barlas, C. Fulchiron, E. Laloum, P. Mahé, D. Leroux, "Direct and label-free gram classification of bacterial colonies on agar using hyperspectral imaging," Proc. SPIE 10685, Biophotonics: Photonic Solutions for Better Health Care VI, 1068537 (17 May 2018); doi: 10.1117/12.2306256 Event: SPIE Photonics Europe, 2018, Strasbourg, France Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 5/25/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

Direct And Label Free Gram Differentiation Of Microbial Colonies On Agar by Hyperspectral Imaging R. Midahuena, P. Barlasb, C. Fulchirond, E. Laloumc, P. Mahéb, D. Leroux*a Clinical Unit R&D, Marcy l’Etoile; bData Analytics Unit R&D, Marcy l’Etoile; cIndustry Unit R&D, Craponne; dMicrobiology R&D, La Balme Les Grottes; bioMérieux S.A., France

a

ABSTRACT Abstract: Diffuse reflectance spectra of bacterial colonies, from hyperspectral images, allowed for a label-free Gram classification into Gram-positive (GP) and Gram-negative (GN) types. Thirty-eight strains belonging to 14 bacterial species typically encountered in urinary tract infections (UTI) were cultivated on chromID CPS Elite translucent chromogenic culture medium to build training and testing sets. Using Support Vector Machine (SVM) supervised learning models, we demonstrated excellent classification rates with a percentage of correctly classified samples as high as 95%. Because determination of discriminant spectral channels is critical both for fundamental reasons to help understand the origin of the discriminant signal and for practical reasons to envision simpler multispectral systems, parsimonious analysis was conducted employing a Fused LASSO (Least Absolute Shrinkage and Selection Operator) or based on an uncertainty test in Partial least squares PLS regression analysis. Two prominent distinct spectral regions were thus identified allowing to hypothesize that cytochrome ratios might be, at least in part, at the origin of the differences observed between Gram-negative and Gram-positive bacteria populations. Keywords: In Vitro Diagnostics, Bacterial colonies, culture medium, Gram testing, Pathogens, Hyperspectral Imaging, Microbiology

1. INTRODUCTION The aim of this study was to evaluate the potential of hyperspectral imaging (HSI) to classify growing microbial colonies on a Petri plate, that is, directly on the culture medium and without any sample preparation besides the microbial culture itself. Our prime motivation was to simplify the microbiological workflow by replacing the rather cumbersome Gram staining1 procedure, that traditionally requires staining and microscopic observation, by a fast and non-invasive procedure, which is is a very attractive proposition, in terms of cost, time to result and environmental impact. As an example, a common practice is to use the results of the Gram test to orient the selection of the Vitek® 2 automated system (bioMérieux, France) appropriate cards used for pathogen identification (ID) or antimicrobial susceptibility testing. Performing a fast and simple Gram test classification to distinguish Gram-positive (GP) from Gram-negative (GN) bacteria would allow to directly or indirectly (with screening test results) select the appropriate card. Interesting alternatives to Gram staining were proposed by B. Park2 and by S. Berezin3 both requiring sub cellular level spatial resolution. Those technical solutions are based respectively on the measurement of light scattering originating from the membrane (requiring an offline test, sample drying and the use of a hyperspectral microscope) and on the measurement of single cell tip-enhanced Raman scattering (TERS). We describe here instead a simpler solution perfectly adapted to cultured samples of microbial colonies, requiring no staining step and only moderate spatial resolution equipment. Also in the field of clinical diagnostics, Alberto Signoroni and Giovanni Turra4 from the University of Brescia investigated, in collaboration with Copan Italia S.P.A (Brescia, Italy), the use of HSI to classify uropathogens’ microbial colonies (Escherichia coli, Enterococcus faecalis, Staphylococcus aureus, Proteus mirabilis and Candida albicans) on a non-chromogenic medium (Columbia agar with 5% sheep blood, COS). The clear

Biophotonics: Photonic Solutions for Better Health Care VI, edited by Jürgen Popp, Valery V. Tuchin, Francesco Saverio Pavone, Proc. of SPIE Vol. 10685, 1068537 © 2018 SPIE · CCC code: 0277-786X/18/$18 · doi: 10.1117/12.2306256 Proc. of SPIE Vol. 10685 1068537-1 Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 5/25/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

potential advantage of this approach is to possibly reduce the cost of pre-identification by substituting expensive chromogenic media with conventional culture media (referred to as “virtual chromogenic agar”).

2. MATERIALS AND METHODS 2.1 Sample preparation. Thirty-eight strains belonging to 14 bacterial species typically encountered in urinary tract infections (UTI) were cultivated on chromID CPS Elite translucent chromogenic culture medium (CPSE) to build training and testing sets (Table 1). All strains were spread onto Petri dishes using an automated PREVI® Isola instrument at 105 CFU/ml concentration (after pre-dilution with aqueous 0.45% NaCl, pH 7.2, and adjustment to McFarland 0.5 density with Densicheck®). Colonies were prepared by picking colonies from a stock culture, subculturing on a solid culture medium and incubating at 37°C for 18 to 24 h. Petri dishes were then removed from the incubator and hyperspectral data acquisition performed. Table 1: List of strains included in training and testing sets (API N° is a strain-specific code internal to bioMérieux) TRAINING SET Species name API ID N° Str. N° Escherichia coli 0008013 1 Escherichia coli 8112023 2 Escherichia coli 9203095 3 Klebsiella pneumoniae 9607019 7 Enterobacter cloacae 8112012 8 Serratia marcescens 9203215 9 Citrobacter freundii 9811037 10 Proteus mirabilis 9001054 11 Providencia stuartii 9203169 12 Morganella morganii 9705006 13 Staphylococcus saprophyticus 8302033 14 Staphylococcus saprophyticus 8306152 15 Staphylococcus saprophyticus 0504058 16 Streptococcus agalactiae 7701031 17 Streptococcus agalactiae 8709013 18 Streptococcus agalactiae 9809084 19 Enterococcus faecium 9503093 4 Enterococcus faecalis 9806310 5 Enterococcus faecalis 1006023 6

Gram

GN

GP

TESTING SET Species name API ID N° Str. N° Escherichia coli 9709127 20 Escherichia coli 0303034 21 Escherichia coli 1006021 22 Klebsiella pneumoniae 9203122 26 Enterobacter aerogenes 9405046 27 Serratia marcescens 9505151 28 Citrobacter freundii 0101069 29 Proteus mirabilis 8607091 30 Providencia rettgeri 8211021 31 Morganella morganii 9504010 32 Staphylococcus saprophyticus 9004035 33 Staphylococcus saprophyticus 8905074 34 Staphylococcus saprophyticus 8712081 35 Streptococcus agalactiae 0510053 36 Streptococcus agalactiae 8602036 37 Streptococcus agalactiae 0411038 38 Enterococcus faecium 8012091 23 Enterococcus faecium 9807061 24 Enterococcus faecalis 8712066 25

2.3. Acquisition of hypercubes and pixel extraction. The HSI hardware system is constituted of a hyperspectral camera (Pika II, Resonon, MT, USA) equipped with an objective lens (Schneider Xenoplan 23 mm). Hyperspectral data were acquired by scanning the culture plate on a moving stage in a so-called “linescan mode”. The camera detector is of the CMOS type, sensitive in the VIS-NIR region. The spectral range extends from 394 to 893 nm with a 2.1 nm spectral resolution (240 spectral channels total). A remotely controlled motorized translational stage interfacing with a custom-made automated incubator through a robotic arm, ensures the translational movement of the sample (details were published in prior work5,6). The lighting system is composed of two 50W “deep blue” halogen-tungsten lamps used in a front lighting configuration and oriented at 45 degrees with respect to the vertical axis. A dark matte surface was used as sample background in order to reduce the projections of colony shadows on the bottom of the culture plate. Software “SpectrononPro v.2.53”

Proc. of SPIE Vol. 10685 1068537-2 Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 5/25/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

(Resonon, Bozeman, MT, USA) was useed for data acqquisition and data d preprocesssing. All classsification stepss were performed inn “R” statisticall software7. Thee scanning line was 640 pixels wide and the scan length 12200 lines long,, covering an arrea of 9 cm x 4.8 4 cm area (Figure 1). This config guration translaates in a sampling step (i.e. maximum m attaiinable spatial resolution) r of 75 7 µm a parallel too the scanning direction. Acqquisition param meters are sum mmarized in Taable 2. in both direcctions normal and Image segmeentation was peerformed semii-manually by extracting 5 reegions of intereest (ROI) from m each hypercuube, of size 4.2 mm x 4.2 mm, and d containing at a least one isollated colony, while w avoidingg to select pixeels from the sppecular reflection reggion (originatin ng from direct reflection on the t hemispheriic shaped colonnies). A mosaiicked image off those selected ROIIs is shown in Figure F 2.

Figure 1: Im mages of Eschericchia coli coloniees on CPSE. Seleection of coloniees to create a subb-hypercube from m original hyperrcube before any binning b (bright regions r are specuular reflection exxcluded from thee analysis) (A) and a after a 4×4×11 (x.y.λ) binningg (B)

Table 2: Param meters of image acquisition Acquisition parameters p (unit)) Exposure tim me (ms) Frequency off line acquisitionn (Hz) Time of acquuisition for one hypercube h (s) Sampling stepp (µm) ROI size of fuull hypercube (cm x cm) #f objective bit depth

Value 18.22 50 24 75 9 x 4.5 4.2 12

Proc. of SPIE Vol. 10685 1068537-3 Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 5/25/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

1 2 3 4

9809084

8709013

7701031

0504058

8306152

8302033

9705006

9203169

9001054

9811037

9203215

8112012

9607019

1006023

9806310

9503093

9203095

8112023

0008013

ROI

5

Strain N° Figure 2: RGB mosaicked image built by merging five regions of interest (4.2 mm x 4.2 mm in size, along rows 1 to 5) for each of the 19 strains (columns 1 to 19) forming the training set (testing set not shown)

2.3 Spectra preprocessing. Diffuse reflectance hypercubes S (1200 lines x 640 pixels x 240 spectral bands) are calibrated following typical steps consisting in acquiring two flat field correction vectors (single line x 640 pixels x 240 bands): the first vector W being the response from a white standard corresponding to the white patch of a colorchecker™ target (XRite, Manchester, UK) and the second correction vector D being a measurement of the dark current. Each reflectance hypercube line Sl is calculated from the corresponding radiance hypercube line (Il) using those two flat-field correction vectors by applying the following equation: Sl = (Il–D)/(W–D). Pixel spectra were binned spatially four times in the x and y spatial direction (16 fold total). Several preprocessing techniques were then applied to mitigate the impact of scattering and other related physical factors or “size effect” by correcting signals for noise, namely Single Normal Variate (SNV), Savitsky-Golay first-derivative, smoothing and centered data. 2.4 Classification analysis. Several classification algorithms were considered to predict the Gram type of a colony based on a collection of HSI spectra obtained at the pixel level. The classification performance was estimated by means of a leave-onestrain-out (LOSO) procedure, in which a classification model is learned on 18 of the 19 strains, and is used to predict the 19th left-out strain. The first algorithm we consider is a standard SVM based on a linear kernel. The regularization parameter of the model (usually denoted as C) is optimized by means of an internal step of 10-fold cross-validation. SVMs are known to be efficient to learn prediction models in high-dimension (here 240 spectral channels). Using a linear kernel leads to learning vector weights associated with each channel. These weights define in turn a score for each spectrum, that is used to classify it as GP or GN if the score is positive or negative. As with a standard SVM, all these weights are non-null. All classification statistics were run at the pixel level without aggregating data to the colony level. 2.5 Determination of discriminant spectral channels A Fused LASSO approach was used to determine the most discriminant channels. A LASSO is a penalized logistic regression method which performs both variable selection as well as regularization. Fused LASSO8 is an extension of the LASSO dedicated to handle data which presents spatial or temporal structure. It combines the classical LASSO constraint (which encourages sparsity of the coefficients) with a second constraint which smooths coefficient variations. These two constraints are controlled by two parameters : λ1 to encourage sparsity and λ2 to

Proc. of SPIE Vol. 10685 1068537-4 Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 5/25/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

smooth sparsity. The classification task relies on a weighted-logistic loss and a leave-one-out cross validation procedure is used to select λ1 and λ2 parameters that provide an interesting compromise between model complexity (number of coefficients different from zero) and classification accuracy. A Partial least squares (PLS) regression method was also chosen, as an alternative to predict the Gram type through a dummy variable approach (binary coded as either 0 for GN or 1 for GP) from the multivariate input (linear combination of seven spectral reflectances). In this case however, the analysis was performed on colony averaged spectra (not at the pixel level). Because there can be non-null regression coefficients associated with insignificant variables, an uncertainty test, also based on a cross-validation procedure, was performed in order to retain only significant predictive variables. It can thus create many different models with a variation of regression coefficients.

3. RESULTS AND DISCUSSION 3.1 Exploratory data analysis. Mean spectra of GP and GN classes, all strains confounded, within a 1 standard deviation envelope are represented in Figure 3 (Left). From this graph, a channel maximizing the difference between average spectra of the GP and GN groups was visually identified at 427.8 nm. Using simple descriptive statistics, graphically depicting groups in a box-and-whisker plot (Figure 3, Right), the reflectance difference appears to be significant. The possibility to discriminate Gram type from a single reflectance intensity acquired on a colony at 428 nm was tested first via a simple linear univariate regression analysis between the raw reflectance and the Gram. A unique single reflectance value did not allow to correctly discriminate (R2 Pearson coeff = 0.65). Indeed, there are many optical and matrix effects causing large baseline variations. However, with appropriate technique relying on more complete spectral information, like multivariate modeling, we showed that an excellent Gram prediction can be achieved.

ó 0.06 0.05 0.04 0.03 GN

O O I

I

400

500

600

700

800

GP

Gt I

900

Wavelength (nm)

Figure 3: (Left) Mean spectra of GP (Red) and GN (blue) classes within a 1 standard deviation envelope; (Right) boxand-whisker plot analysis for GP and GN intensities at 427.8 nm

Proc. of SPIE Vol. 10685 1068537-5 Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 5/25/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

alysis; overalll and per-straiin identificatioon rate 3.2 SVM claassification ana SVM M classificatio on allowed to predict p the Gram type with high Correct Identification I R Rates (CIR = no. of properly classsified pixels / total no. of pixxels . 100%). The T best perforrmance overalll was observedd for centered spectra s although we noticed very little l differencee between the several pre-proocessing approoaches that were tested (cf. section s mance. In conseequence, only results r 2.3) at the exxception of firsst derivative sppectra which leed to below avverage perform from centeredd spectra are reeported below.. In cross c validation n, an overall CIR C of 93% waas achieved wiith per-strain-C CIR ranging froom 72% to 1000% as shown in Fiigure 4, indiccating that perrformance seeems quite straain-dependent. Similar classification ratess were calculated (T Table 3) for GP G and GN baacteria (94.0% % and 92.6% for f GP and GN G class accurracies, respectiively). Although SV VM is a great predictive tool, it is usually known to be difficult to interpret. Indeedd we were unaable to determine the most discrim minant channells of interest by analyzing thhe SVM modell weights per channel c as shoown in Figure 5 (topp graph). To allleviate this prooblem, we chosse to also classsify using diffeerent algorithm ms as describedd in the next section. Thee performance of o the learned model was also tested on new w strains (listeed as the testingg set in Table 1). An overall CIR of o 94.6% was achieved (Tabble 3) with per--strain-CIR rannging from 12..5% to 100% as a shown in Figgure 4 (bottom grapph). The strain 22 is atypical with w a very low w CIR value off 12.5%, a valuue that is not exxplained at thiss early stage (with no n attempts con nducted yet to reproduce r this outlier).

Figure 4: Per strain Correect Identificationn Rate calculatedd in cross validattion and test moddes from training and testing setts P pixel population p are not n balanced. (GN in reed, GP in blue). Per-strain

Proc. of SPIE Vol. 10685 1068537-6 Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 5/25/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

Table 3: Overall and per-class Correct Identification Rate from pixel statistics for training and testing sets

Training set N pixels GN GP GP and GN

Testing set

True ID

False ID

CIR

True ID

False ID

CIR

1703 530 2233

136 34 170

92.6% 94.0% 92.9%

1910 633 2543

131 14 145

93.6% 97.8% 94.6%

3.3 Discriminative channels determination by Fused LASSO or by PLSR Determination of discriminant spectral channels is fundamentally critical to help understand the origin of the discriminant signal but also practically to envision simpler multispectral systems. Because SVM analysis was shown inadequate to identify spectral regions of interest in our case we have relied instead on the use of Fused LASSO algorithms to limit the number of spectral channels in our model, achieving feature selection through an L1penalization approach, which tends to provide sparse solutions while maximizing the prediction accuracy. In addition, a PLS regression method was also tested to perform feature selection. Two spectral windows were identified as discriminant both, in the 415-430 nm and in the 750-800 nm regions both by the Fused LASSO and the PLS regression methods as shown respectively in Figure 5 (bottom graph) and Figure 6. Over the 415-430 nm spectral region, the reflectance intensities of GN bacteria appear to be systematically smaller than those of GP bacteria. This difference in intensity could possibly be explained by differences in cytochrome content. Cytochromes belong to porphyrins which are chromophores, i.e. cellular pigments able to absorb light at specific wavelengths via their heme cofactor (due to an electron dipole allowing π-π* transitions). We also know from the literature that GN bacteria contain more cytochromes than GP bacteria which lack cytochromes of type a, a1 and d9,10. Cytochromes represent a significant ratio of cellular chromophores, and for the majority of them, the Soret’s absorption peaks have a dominant spectral intensity. The interpretation of the second spectral band in the very near infrared is still a work in progress. We will only emphasize that this spectral region is outside the visible range and even often referred in many studies dealing with microbial identification in the VNIR region as a spectral region carrying little microbial information. We acknowledge that evidence of correlation does not always imply causation and that complementary experiments will have to be conducted to confirm this hypothesis, for instance by testing strains having different metabolic pathways (strict aerobe vs. strict anaerobe for instance) affecting their cytochrome content.

Proc. of SPIE Vol. 10685 1068537-7 Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 5/25/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

Figure 5: SVM (top) & Fused LASSO (bottom) model weights w per wavellength

Figure 6: PLS regression coefficients perr spectral channeel with most signnificant spectral channels represeented in black. The algorrithm finds the correct c linear com mbination with appropriate a weigghts for each variiable. Some weight can further be b completelly cancelled afteer statistical signnificance testing.

4. CON NCLUSION NS The Gram claassification performance is excellent for CP PS medium (955%), suggestinng that the convventional Gram m stain reference tesst could possib bly be replacedd by hyperspecctral imaging analysis a after microbial m cultuure on agar. Obbvious benefits incluude (i) perform ming the test online o (comparred to offline testing) non-innvasively and without contact, (ii) reducing reaagent consump ption, (iii) redducing time off analysis, (ivv) reducing huuman error innterpretation annd (v) reducing opeerator hands on n time. Confirm mation of thosse results with extended bioloogical models encompassingg more

Proc. of SPIE Vol. 10685 1068537-8 Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 5/25/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

strains and species growing on a diversity of isolation media commonly used in clinical laboratories is advisable. Also, more thorough parsimonious analysis will provide a better insight on the level of performance achievable with multispectral systems.

REFERENCES [1] Samuel, L. P., Balada-Llasat, J. M., Harrington, A. & Cavagnolo, R., "Multicenter assessment of gram stain error rates," J. Clin. Microbiol. 54, 1442–1447 (2016). [2] Park, B., Seo, Y. & Yoon, S., "Hyperspectral microscope imaging methods to classify gram-positive and gramnegative foodborne pathogenic bacteria," Trans ASABE, 58, 5–16 (2015). [3] Berezin, S., Aviv, Y., Aviv, H., Goldberg, E. & Tischler, Y. R., "Replacing a Century Old Technique – Modern Spectroscopy Can Supplant Gram Staining," Sci. Rep. 7, 3810 (2017). [4] Arrigoni, S., Turra, G. & Signoroni, A., "Hyperspectral image analysis for rapid and accurate discrimination of bacterial infections : A benchmark study," Comput. Biol. Med. 88, 60–71 (2017). [5] Leroux, D.F., Midahuen, R., Perrin, G., Pescatore, J. and Imbaud, P.,"Hyperspectral imaging applied to microbial categorization in an automated microbiology workflow," Proc. of SPIE-OSA, 9537, 953726-1 (2015) [6] Guillemot, M., Midahuen, R., Archeny, D., Fulchiron, C., Montvernay, R., Perrin, G., & Leroux, D. F. , "Hyperspectral imaging for presumptive identification of bacterial colonies on solid chromogenic culture media," Proc. of SPIE, 9887, 98873L-1 (2016). [7] RDevelopment, C. O. R. E. "TEAM 2009: R: A language and environment for statistical computing," Vienna, Austria. Internet: http://www. R-project. org (2012). [8] Tibshirani, R.,"Regression shrinkage and selection via the lasso," Journal of the Royal Statistical Society, Series B (Methodological), 267-288 (1996). [9] Georgiou, C.D. and Webster, D.A., "Identification of b, c, and d cytochromes in the membrane of Vitreoscilla," Archives of Microbiology, 148(4), 328–33 (1987) [10] Meyer, D. J., Jones, C. W., "Distribution of Cytochromes in Bacteria: Relationship to General Physiology," International Journal of Systematic Bacteriology, 23(4), 459–467 (1973)

Proc. of SPIE Vol. 10685 1068537-9 Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 5/25/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use