Terahertz time-domain spectroscopy combined with ... - Springer Link

1 downloads 0 Views 753KB Size Report
Nov 30, 2014 - Abstract Combined with terahertz time-domain spec- troscopy, the feasibility of fast and reliable diagnosis of cervical carcinoma by a fuzzy ...
Med Oncol (2015) 32:383 DOI 10.1007/s12032-014-0383-z

ORIGINAL PAPER

Terahertz time-domain spectroscopy combined with fuzzy rule-building expert system and fuzzy optimal associative memory applied to diagnosis of cervical carcinoma Na Qi • Zhuoyong Zhang • Yuhong Xiang Yuping Yang • Peter de B. Harrington



Received: 15 November 2014 / Accepted: 18 November 2014 / Published online: 30 November 2014 Ó Springer Science+Business Media New York 2014

Abstract Combined with terahertz time-domain spectroscopy, the feasibility of fast and reliable diagnosis of cervical carcinoma by a fuzzy rule-building expert system (FuRES) and a fuzzy optimal associative memory (FOAM) had been studied. The terahertz spectra of 52 specimens of cervix were collected in the work. The original data of samples were preprocessed by Savitzky–Golay first derivative (vderivative), principal component orthogonal signal correction (PC-OSC) and emphatic orthogonal signal correction to improve the performance of FuRES and FOAM models. The effect of the different pretreating methods to improve prediction accuracy was evaluated. The FuRES and FOAM models were validated using bootstrapped Latin-partition method. The obtained results showed that the FuRES and FOAM model optimized with the combination S–G first derivative and PC-OSC method had the better predictive ability with classification rates of 92.9 ± 0.4 and 92.5 ± 0.4 %, respectively. The proposed procedure proved that terahertz spectroscopy combined

N. Qi College of Life Science, Capital Normal University, Beijing 100048, China N. Qi  Z. Zhang (&)  Y. Xiang Department of Chemistry, Capital Normal University, Beijing 100048, China e-mail: [email protected] Y. Yang School of Science, Minzu University of China, Beijing 100081, China P. B. Harrington Center for Intelligent Chemical Instrumentation, Clippinger Laboratories, Department of Chemistry and Biochemistry, OHIO University, Athens, OH 45701-2979, USA

with fuzzy classifiers could supply a technology which has potential for diagnosis of cancerous tissue. Keywords Cervical carcinoma  Terahertz time-domain spectroscopy  Fuzzy rule-building expert system  Fuzzy optimal associative memory  Cancer diagnosis

Introduction Cervical cancer is the third most common cancer in gynecological oncology worldwide [1]. Currently, different diagnostic techniques have been proposed including the thin prep cytological test (TCT), histopathological examination and manually inspection with colposcopy and cervicography [2]. Cancer diagnosis relies on the availability of qualified pathologists. This is time consuming and expensive as the specimen collection and assessed histologically during surgery. A fast and reliable diagnostic technique can save time and release patient discomfort. The terahertz (1 THz = 1012 Hz) electromagnetic waves occupy the spectrum between the microwave and infrared regions of the electromagnetic spectrum, with a wavelength and frequency range, typically, corresponding to 0.1 up to 10 THz. Low-frequency vibrational modes of molecules, such as torsional and collective vibrational modes and hydrogen-bond modes, and rotational modes of molecules absorb in this region [3, 4]. THz is low energy, non-ionizing and therefore is no harmful to living tissue [5]. Pathologic diagnosis making use of the difference in terahertz absorbance between normal and cancerous tissues has been reported in the medical literature [6, 7]. Terahertz spectroscopy and imaging have been applied to medical testing and diagnosis [8]. THz imaging has been used for

123

383 Page 2 of 6

detecting micro-metastatic foci in the lymph nodes of early-stage cervical cancer [9]. Chemometrics provides multivariate tools for exploring the relationships among the data objects and variables as well as classifiers that will be used for identifying cancerous samples from their THz spectra. Principal component analysis has been applied to analyze THz images to understand the origin of contrast [10]. Wavelet transform algorithm is used to terahertz data denoising [11]. Neural networks and support vector machines have been reported to diagnosis colon cancer THz spectra from tissue samples [12]. In this paper, human cervical tissue slides were subjected to terahertz time-domain spectroscopy (THz-TDS). Fuzzy rule-building expert systems (FuRESs) [13] and fuzzy optimal associative memories (FOAMs) [14] classified the THz data for cervical cancer diagnosis. The sample data were converted to first derivatives by a Savitzky– Golay polynomial filter. The effectiveness and feasibility of the preprocessing methods, emphatic orthogonal signal correction (EOSC) [15] and principal component orthogonal signal correction (PC-OSC) [16] were also evaluated.

Experiments and methods Sample Fifty-two cervical tissue sections (32 normal and 20 cancerous) were provided by Beijing Haidian Maternal and Child Health Hospital. All cervical tissues were put into 4 % formaldehyde solution to be fixed and then were washed with ethanol solutions for dehydration. The tissues were put into xylene for hyalinization and then paraffin wax for embedding. The paraffin-embedded tissues were sliced into 8-lm-thick sections after properly cooled. The sections were placed in water for flatting and then put on quartz plates. The slides were put in a regulated heating oven and dried at 60 °C for half an hour to remove water. Two replicate slides were taken from each of fifty-two tissues sections. The transmission THz spectra of all samples were measured by the terahertz time-domain spectrometer. As given in Fig. 1, the two replicate slides of each tissue were

Fig. 1 Schematic representation of tissue samples for TDS measurement

123

Med Oncol (2015) 32:383

secured together, tissue to tissue, with a sample holder and then measured by the THz-TDS system. The thickness of tissue was increased in this way, and more appreciable absorption was occurred. A total of 52 sample spectra were obtained as the experimental data. Instrumentation A schematic of the THz-TDS instrument is given in Fig. 2. The THz-TDS used a commercially available femtosecond laser (SPECIM, MaiTai). The photoconductive antennas for the generation and detection of THz pulses are included in the system. The laser source is split into two beams. One beam is the pump beam that illuminates a GaAs-based semiconductor antenna and gives rise to the THz electromagnetic pulse. The coupling efficiency of the THz radiation is improved by a parabolic mirror with a hemispherical silicon lens. The position of the sample holder is at the focus of the parabolic mirror. The beam that passes through the sample is collected by another parabolic mirror and sent to the photoconductive detector. The other beam is the probe beam which provides the reference detection pulse on the detecting antenna. The two beams are coherent so the detector can reject in coherent radiation and improve the signal-to-noise ratio. In the experiments, the volume of the spectrometer through which the THz beam passes is filled with dry nitrogen (N2) to reduce absorption caused by water vapor in the air. Theory Parameters extraction In this work, the absorption spectra of samples were statistically analyzed. Then, a ‘‘reference’’ pulse and a ‘‘sample pulse’’ are required to calculate the absorption coefficient of a sample. The sample pulse is transmitted through tissue slides and the corresponding reference signal is obtained with the sample removed. The THz electric field pulses are directly measured as a function of time. The frequency spectra of both signal and reference were obtained by the fast Fourier transform. The sample’s refractive index n(x) and absorption coefficient a(x), respectively, describe the dispersion and absorption characteristics. The mathematical description of the parameters can be given as below [17–19]: nðxÞ ¼

uðxÞc þ1 xd

ð1Þ

aðxÞ ¼

2jðxÞx 2 4nðxÞ ¼ ln c d AðxÞðnðxÞ þ 1Þ2

ð2Þ

in which d is the thickness of sample, and c is the velocity of light in vacuum. x, j(x) represent the frequency and attenuation coefficient, respectively. A(x) and u(x) are the

Med Oncol (2015) 32:383

Page 3 of 6 383

Fig. 2 Schematic of a terahertz time-domain transmission spectrometer system used in this work

amplitude ratio and phase difference of reference and sample signal that can be directly obtained. Orthogonal signal correction Orthogonal signal correction (OSC) is a chemometrics data processing technique to remove the systematic variations that are orthogonal or not related to the properties of the dependent variables (i.e., the cancer class designees) [20]. The important information is retained, and the structured noise, such as baseline, instrument variation and measurement conditions are removed. The PC-OSC method is developed based on principal component analysis (PCA) and the least constrained and simplest of OSC correction methods. The detailed procedure of PC-OSC can be referred to in Ref. [16]. The EOSC method has been developed and used for the correction of near-infrared spectra. The theory and procedures of EOSC can be found in ref [15]. Fuzzy rule-building expert system (FuRES) FuRES is a classification tree model using fuzzy entropy of classification which each rule is a temperature-constrained sigmoid logistic function [13]. The classification tree is constructed using the inductive dichotomizer 3 (ID3)

algorithm to minimize the entropy of classification, H(C|A). The length of the weight vector w must be normalized for the fuzziness of each rule is controlled through a computational temperature parameter t. The magnitude of the derivative of the entropy of classification with respect to temperature is maximized with the optimal computational temperature. The related equations are given as below  1 vA ðxk Þ ¼ 1 þ eðxk waÞ=t ð3Þ for which a is the bias value, and xA(xk) is the degree of fuzzy membership of object xk. The conditional probability p(ci|aj) is obtained by summing the membership functions with the attribute ai and the class of ci. The equation is given as follows , ni n X   X p ci jaj ¼ vA ð x k Þ vA ðxk Þ ð4Þ k¼1

k¼1

where ni is the number of objects in class ci. The classification entropy H(C|ai) of the attribute ai is given by: n X        p ci jaj ln p ci jaj H Cjaj ¼ 

ð5Þ

i¼1

The classification entropy H(C|A) of the system is the weighted sum of the entropy for each attribute:

123

383 Page 4 of 6

H ðCjAÞ ¼

2 X     p aj H Cjaj

Med Oncol (2015) 32:383

ð6Þ

j¼1

The FuRES model provides inductive logic in the tree structure of the classifier that can accommodate overlapping data and avoid overfitting the data by the temperature constraint. Fuzzy optimal associative memory (FOAM) The FOAM method is an optimal associative memory (OAM) encoding multivariate data as a two-way binary image replaced a one-way vector [14]. First, the data are encoded as bipolar matrix using uniformly sized grid unit. A vector of m variables is converted to m 9 h bipolar matrix. After removing u unused grid, the number of grid is k ((m 9 h) - u). The FOAM stores pattern in a weight matrix W which expressed by n X W¼ yi yTi ð7Þ i¼1

for which yi is the ith bipolarly encoded pattern. The stored grid-encoded spectra are orthogonalized to form a basis using singular value decomposition. The encoded predicted background scan zf can be obtained by   ð8Þ zf ¼ V V T zi for which V is the orthogonalized pattern. Then, the zf is decoded to a spectrum by reversing the gridding procedure. The data object can be assigned to the best-fitting class by building a basis for each class with the minimum reconstruction error. In this work, FOAM used its standard configuration of 100 intensity grids and a 19 point triangular fuzzy membership function. Data treatment and computation To establish a model for diagnosis of cervical cancer, FuRES and FOAM were used to build classification model. The classification results of these methods were compared. The original signal was converted to the first derivatives by the Savitzky–Golay (1 derivative, polynomial order 3 and window size 5) filter. Then, the preprocessing methods of PC-OSC or EOSC were applied to the signals for eliminating background and reducing noise. The performance of preprocessing and modeling approaches are evaluated with prediction rate. The pretreating methods were constructed from the training sets and applied to the prediction set. The generalized prediction accuracy was measured by five Latin partitions bootstrapped for one hundred times. For each bootstrap, the data were split into training and

123

prediction sets so that each spectrum was used only once in the prediction set. Four Latin partitions were combined into a model-building set, and the fifth was used for prediction. The results of the five prediction sets from each partition were pooled. This approach was used for all the FuRES and FOAM to assess and measure the prediction accuracies. All models were constructed from the training data. The average classification accuracies and standard deviations were obtained from the prediction sets across the 100 bootstraps to provide 95 % confidence intervals (CIs). The number of components of the OSC model was selected by finding the maximum average classification rate across internal 100 9 5 bootstrap Latin partitions. All parameters optimization and model construction were performed in MATLAB.

Results and discussion The absorption spectra of all tissues are given in Fig. 3. The signal and the first two principal components for the absorption data preprocessed by S–G first derivative are given in Figs. 4 and 5, respectively. The component numbers for PC-OSC and EOSC were to be optimized. The optimal number of component is determined by the average classification result across internal 100 9 5 bootstrap Latin partitions. The EOSC and PC-OSC method was constructed from the training data and applied to the prediction sets. FuRES and FOAM are parameter-free classifiers. FuRES and FOAM built 500 classification models using the same sets of training and prediction data within each of the 5 Latin partitions and 100 bootstraps. Figure 6 gives the prediction accuracy with respect to the number of EOSC components of the first derivative spectra. The FuRES (component number 36) achieved the largest average prediction rate of 92.9 ± 0.4 %, and FOAM (orthogonal components number 36) yielded the largest average prediction rate of 92.6 ± 0.5 %. For the data

Fig. 3 Absorption spectra for all tissue samples

Med Oncol (2015) 32:383

Fig. 4 Signal for the spectra data preprocessed by S–G first derivative

Page 5 of 6 383

Fig. 7 Percent correct classification with respect to the number of orthogonal components in PC-OSC and 95 % CIs from 5 Latin partitions and 100 bootstraps for the classification of the tissue thin sections using FuRES and FOAM

Table 1 A comparison of the percent correct classification and 95 % CIs from 100 bootstraps and 5 Latin partitions of the FuRES and FOAM with different preprocessing procedure for test data set Data preprocessing method

Classification accuracy FuRES (%)

FOAM (%)

No

66.4 ± 1.2

57.4 ± 0.8

S–G first derivative

85.3 ± 0.4

79.4 ± 0.7

S–G first derivative ? EOSC

92.9 ± 0.4

92.6 ± 0.5

S–G first derivative ? PC-OSC

92.9 ± 0.4

92.5 ± 0.4

Fig. 5 Principal component scores for the spectra data preprocessed by S–G first derivative. Normal sample is designated by A and cancerous sample is designated by B

Fig. 6 Prediction rate averages with respect to the number of orthogonal components in EOSC and 95 % CIs from 5 Latin partitions and 100 bootstraps for the classification of the tissue thin sections using FuRES and FOAM

pretreated by EOSC, there was no significant difference in prediction accuracy between FuRES and FOAM. The average prediction rates with respect to the component number for PC-OSC of the first derivative spectra are given in Fig. 7. The maximum classification accuracy obtained by

FuRES (component number 34) and FOAM (component number 36) were 92.9 ± 0.4 and 92.5 ± 0.4 %, respectively. The comparison of different pretreatments for average classification accuracy of FuRES and FOAM are given in Table 1. As shown in Table 1, the performance of FuRES and FOAM had obvious distinction in classification rates without preprocessing methods or with S–G first derivative method. Combined with the first derivative, EOSC and PCOSC were both effective for FuRES (FOAM), but the PCOSC was more convenient in that it performed a procedure with lower computation costs than EOSC. The optimal FuRES model of the PC-OSC pretreatment and S–G first derivative method successfully classified the samples with prediction accuracies of 92.9 ± 0.4 %. The details about the performance of FuRES and FOAM optimized by PC-OSC and S–G first derivative filter are contained in two confusion matrices, as given in Tables 2 and 3, respectively. The average prediction results of FuRES and FOAM were calculated across the 100 bootstraps and provided 95 % CIs. In the two matrices, each row represents the actual class, and column represents the predicted class. It can be seen from the matrices that both FuRES and FOAM can distinguish between normal and malignant category.

123

383 Page 6 of 6

Med Oncol (2015) 32:383

Table 2 The confusion matrix obtained from the validations of FuRES classifiers with S–G first derivative filter and PC-OSC FuRES

Predicted class Normal

Malignant

Actual class Normal (32 samples) Malignant (20 samples)

30.1 ± 0.2

1.9 ± 0.2

1.8 ± 0.2

18.2 ± 0.2

Table 3 Confusion matrix obtained from the validations of FOAM classifiers with S–G first derivative filter and PC-OSC FOAM

Predicted class Normal

Malignant

Actual class Normal (32 samples) Malignant (20 samples)

29.8 ± 0.2

2.2 ± 0.2

1.7 ± 0.2

18.3 ± 0.2

Conclusions In this work, THz-TDS system was applied to detect the normal and malignant tissue sections. The classification model of FuRES and FOAM combined with different pretreatments was established to propose a new diagnosis technique for diagnosing cervical carcinoma based on terahertz spectrum. The effects of different preprocessing methods to optimize model were investigated. Comparing the classification accuracies pretreated by different preprocessing methods, it indicated that FuRES and FOAM with the combination of S–G first derivative and PC-OSC based on terahertz spectroscopy of tissue could provide a better application for early diagnosis of cervical carcinoma, with the classification accuracies of 92.9 ± 0.4 and 92.5 ± 0.4 %, respectively. Coupled with terahertz technology, the proposed procedure could provide a convenient, solvent free and environmentally friendly application that had potential development as cancer diagnosis method. Acknowledgments This work was supported by the National Instrumentation Program (2012YQ140005) and the Natural Science Foundation of China (21275101). Conflict of interest We declare that we have obeyed the laws and ethics and have no conflict relationships with other people or organizations that can inappropriately influence our work. The paper does not contain any secret information and can be published in journal.

References 1. Jemal A, Bray F, Center MM, Ferlay J, Ward E, Forman D. Global cancer statistics. CA Cancer J Clin. 2011;61(2):69–90.

123

2. Janicek MF, Averette HE. Cervical cancer: prevention, diagnosis, and therapeutics. CA Cancer J Clin. 2001;51(2):92–114. 3. Yu B, Zeng F, Yang Y, Xing Q, Chechin A, Xin X, et al. Torsional vibrational modes of tryptophan studied by terahertz timedomain spectroscopy. Biophys J. 2004;86(3):1649–54. 4. Walther M, Plochocka P, Fischer B, Helm H, Jepsen PU. Collective vibrational modes in biological molecules investigated by terahertz time-domain spectroscopy. Biopolymers. 2002;67(4–5):310–3. 5. Fitzgerald AJ, Berry E, Zinovev NN, Walker GC, Smith MA, Chamberlain JM. An introduction to medical imaging with coherent terahertz frequency radiation. Phys Med Biol. 2002; 47(7):R67–84. 6. Woodward RM, Cole BE, Wallace VP, Pye RJ, Arnone DD, Linfield EH, et al. Terahertz pulse imaging in reflection geometry of human skin cancer and skin tissue. Phys Med Biol. 2002; 47(21):3853–63. 7. Knobloch P, Schildknecht C, Kleine-Ostmann T, Koch M, Hoffmann S, Hofmann M, et al. Medical THz imaging: an investigation of histo-pathological samples. Phys Med Biol. 2002;47(21):3875–84. 8. Qi N, Zhang Z, Xiang Y. Application of terahertz technology in medical testing and diagnosis. Spectrosc Spectr Anal. 2013;33(8): 2064–70. 9. Jung E, Lim M, Moon K, Do Y, Lee S, Han H, et al. Terahertz pulse imaging of micro-metastatic lymph nodes in early-stage cervical cancer patients. J Opt Soc Korea. 2011;15(2):155–60. 10. Brun MA, Formanek F, Yasuda A, Sekine M, Ando N, Eishii Y. Terahertz imaging applied to cancer diagnosis. Phys Med Biol. 2010;55(16):4615–23. 11. Ferguson B, Abbott D. De-noising techniques for terahertz responses of biological samples. Microelectr J. 2001;32(12):943–53. 12. Eadie LH, Reid CB, Fitzgerald AJ, Wallace VP. Optimizing multi-dimensional terahertz imaging analysis for colon cancer diagnosis. Expert Syst Appl. 2013;40(6):2043–50. 13. Harrington PB. Fuzzy multivariate rule-building expert systems: minimal neural networks. J Chemom. 1991;5(5):467–86. 14. Wabuyele BW, Harrington PB. Fuzzy optimal associative memory for background prediction of near-infrared spectra. Appl Spectrosc. 1996;50(1):35–42. 15. Zhang J, Zhang Z, Xiang Y, Dai Y, Harrington PB. An emphatic orthogonal signal correction-support vector machine method for the classification of tissue sections of endometrial carcinoma by near infrared spectroscopy. Talanta. 2011;83(5):1401–9. 16. Harrington PB, Kister J, Artaud J, Dupuy N. Automated principal component-based orthogonal signal correction applied to fused near infrared-mid-infrared spectra of French olive oils. Anal Chem. 2009;81(17):7160–9. 17. Duvillaret L, Garet F, Coutaz J-L. A reliable method for extraction of material parameters in terahertz time-domain spectroscopy. IEEE J Sel Top Quantum Electron. 1996;2(3):739–46. 18. Duvillaret L, Garet F, Coutaz J-L. Highly precise determination of optical constants and sample thickness in terahertz timedomain spectroscopy. Appl Opt. 1999;38(2):409–15. 19. Dorney TD, Baraniuk RG, Mittleman DM. Material parameter estimation with terahertz time-domain spectroscopy. J Opt Soc Am A. 2001;18(7):1562–71. ¨ hman J. Orthogonal signal cor20. Wold S, Antti H, Lindgren F, O rection of near-infrared spectra. Chemom Intell Lab Syst. 1998;44(1–2):175–85.