Discrimination of serum Raman spectroscopy ... - OSA Publishing

2 downloads 286 Views 644KB Size Report
Jul 20, 2012 - 3Solon High School, 33600 Inwood Drive, Solon, Ohio 44139, USA. *Corresponding author: [email protected]. Received 8 July 2011; revised ...
Discrimination of serum Raman spectroscopy between normal and colorectal cancer using selected parameters and regression-discriminant analysis Xiaozhou Li,1,2,* Tianyue Yang,1 and Siqi Li3 1

School of Science, Shenyang Ligong University, No. 6 Middle Nanping Road, Shenyang, Liaoning, 110159, China 2

School of Physics and Optoelectronic Engineering, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning, 116024, China 3

Solon High School, 33600 Inwood Drive, Solon, Ohio 44139, USA *Corresponding author: [email protected]

Received 8 July 2011; revised 22 December 2011; accepted 21 May 2012; posted 24 May 2012 (Doc. ID 149933); published 11 July 2012

Raman spectroscopy of tissues has been widely studied for the diagnosis of various cancers, but biofluids were seldom chosen as the analyte because of the low concentration. Herein, serum of 30 normal people, 46 colon cancer, and 44 rectum cancer patients were measured using Raman spectra and analyzed. The information of Raman peaks (intensity and width) and that of the fluorescence background (baseline function coefficients) were selected as parameters for statistical analysis. Principal component regression (PCR) and partial least square regression (PLSR) were used on the selected parameters separately to see the diagnosing performance of the parameters. PCR performed better than PLSR in our spectral data. Then linear discriminant analysis (LDA) was used on the principal components (PCs) of the two regression methods on the selected parameters, and the diagnostic accuracy were 88% and 83%. The conclusion is that the selected parameters can maintain the information of the original spectra well and Raman spectroscopy of serum has the potential for the diagnosis of colorectal cancer. © 2012 Optical Society of America OCIS codes: 170.5660, 300.6450.

1. Introduction

Incidence of colorectal cancer ranked fourth in men and third in women, with increasing trends in most areas in the world [1]. Screening methods most often used are occult blood tests, fecal DNA tests, endoscopy, and CT colonography. But all have certain deficiencies such as low sensitivity, complicated procedure, or high equipment requirements [2]. Raman spectroscopy can provide the vibrational and rotational information of molecules, and has been widely used in tissue detection [3]. Raman peaks of biofluids were detected by Raman spectra generally using some enhanced methods [4], 1559-128X/12/215038-06$15.00/0 © 2012 Optical Society of America 5038

APPLIED OPTICS / Vol. 51, No. 21 / 20 July 2012

or some pretreatment of the biofluids such as drying [5]. Research concerning normal Raman spectroscopy (NR) of biofluids has seldom been reported, maybe because of the low sensitivity of NR and the low concentration of biofluids. However, some experiments still have shown the feasibility of using the NR of biofluids as a tool for analyzing the components and diagnosing diseases [6,7], and certain kinds of statistical methods or algorithms are the essential parts to improve the diagnosis performance. Commonly used representative features in the spectral analysis are the intensity ratios of the main Raman peaks [8], Raman-fluorescence ratios [9], FWHM, spectral trends [10], spectral differences [11], or the shifts of Raman peaks. Algorithms such as principal components analysis (PCA) [12], linear discriminant analysis (LDA) [13], and partial

least-squares (PLS) [14] were often used combined with those spectral features. But those parameters can only provide limited information about the whole spectra, neglecting others such as fluorescence background which has been used on the blood for the diagnosis of several cancers [6,15]. Richer information will be obtained using both the features of Raman and the background fluorescence spectroscopy. In this paper, spectral parameters that represent both the Raman peaks (position, intensity, and width) and the fluorescence background (the coefficients of the fluorescence baseline function) were selected. Then two regression methods, principal component regression (PCR) and partial least square regression (PLSR), were used for the dimension deduction of spectral data. Finally, we use LDA on the principal components (PCs) to see the diagnostic performance. 2. Materials and Methods A.

Serum Sampling

Blood samples were drawn from donors who completed an informed consent form in accordance with the ethical guidelines published by the Council for International Organizations of Medical Sciences (CIOMS) [16]. There were 30 healthy people, 44 colon cancer patients, and 46 rectum cancer patients involved in our experiment. All the samples were provided by Liaoning Cancer Hospital & Institute. About 2 mL venous blood of each case was phlebotomized before breakfast in the morning to avoid the interference of food. Anticoagulant was not added into the blood sample so that serum was obtained by the supernatant through centrifugation at the speed of 3,000 rpm for 10 min. Then serums were stored at 4°C and the Raman spectra were measured within four days to prevent content degeneration. B.

Experimental Setup

A laser of wavelength 488 nm produced by Argon ion laser (No. 772 Factory, Nanjing, China) was used as the excitation light. A double spectrometer (with grating 1200 grooves ∕ mm, reciprocal dispersion 0.7 nm ∕ mm, resolution 0.05 mm, blazed wavelength 500 nm, wavelength precision 0.01 nm, scan speed 400 nm ∕ min) was used to disperse the Raman scattering spectra. A chopper with a frequency of 700 Hz and a lock-in amplifier were used to amplify the Raman signal. The Raman scattering signals were detected by PMT and transacted in a computer (the schematic sketch is shown in Figure 1). Scanning spectral range was from 505 to 535 nm with the laser power of 3.5 mW. About 1 mL of serum sample was put into a transparent tube between the chopper and the double spectrometer when the spectra were measured. C.

Statistical Analysis

Raman spectral data analysis was performed using Matlab software (2009b, The Mathworks,

Fig. 1. (Color online) Schematic drawing of the Raman spectroscopy system for the serum detection.

Inc. Natick, Massachusetts, U.S.A.). Curve Fitting Toolbox and Statistics Toolbox were used in our spectral treatments. Before analysis, the data were treated with least-squares smoothing method and vector normalization to eliminate the noise and the disturbance of laser power fluctuation. Feature parameters (total 14) were selected to represent both the features of Raman and fluorescence spectroscopy of our spectra. They are the position, intensity, and width of the three main Raman peaks (nine parameters), and the first five coefficients of the fluorescence baseline function. The asymmetric Huber function was chosen as the cost function for the fluorescence baseline correction in our data for the adaptability of the half-quadratic minimization of the cost functions, and has shown various spectroscopy [17]. Its polynomial order was set as 10, and the threshold 0.01. Analysis of variance (ANOVA) is a kind of significance test for comparing the means of more than two populations. The p value of each parameter was calculated by one-way ANOVA to test the significance and those with low p values were retained for further statistical analysis. PCR and PLSR are two methods that can reduce the dimension of data. They can reduce the original data to several PCs to represent the data with the difference that PLSR will take the response variables

Fig. 2. (Color online) All Raman spectra from the three groups: normal (sample number 1–30), rectum cancer (31–74) and colon cancer (75–120). Three major Raman peaks exist in each spectrum; the shape and trend of the fluorescence background of different groups are similar. 20 July 2012 / Vol. 51, No. 21 / APPLIED OPTICS

5039

and each partition was tested by a model built with the rest. 3. Results and Discussion A. Raman Spectroscopy

Fig. 3. (Color online) Averaged spectrum of each group. Three Raman peaks at the wavelength of about 1029 cm−1 , 1170 cm−1 , and 1538 cm−1 existed in all three groups.

into consideration while PCR will not. Though with similar theoretical relationship, the two methods perform differently in accordance with different data they manipulated [18]. LDA is a method that can produce a linear classifier (Fisher’s linear function) to separate different groups of data. It is often used combined with the regression methods such as PCR and PLSR. PCR/PLSR-LDA was used on the selected parameters to observe the effectiveness of the parameters and the ability of Raman spectroscopy of serum to diagnose colorectal cancer. Then this algorithm was validated by 10-fold cross validation where the original data were randomly separated into 10 subgroups

Table 1.

Raman spectra of the healthy, colon cancer, and the rectum cancer group were compared after smoothing and normalization. The shape and trend of the three groups were very similar: there are three main Raman peaks at the position of about 1029 cm−1 , 1170 cm−1 , and 1538 cm−1 , and the fluorescence background outline are almost the same (Figure 2). Electron-rich groups (e.g., C═O, C═N, and C═C) are the major source of Raman spectroscopy [19], and many Raman peaks are caused by the same Ramansensitive group belonging to different biomolecules [20]. However, there are still some visible differences, the band at 1170 cm−1 (assigned to tryptophan and phenylalanine [8]) and 1538 cm−1 (assigned to beta carotene [8]) decrease from the control group to the cancer group, the difference between groups can be observed more clearly from the averaged spectra of each group (Figure 3). The decrease represents the corresponding decrease of tryptophan, phenylalanine, and beta carotene from the control group to the colorectal cancer group, and those three substances all have anticancer effect. We presume that those three chemical components in serum are the key factors that counteract with cancer cells, or the decrease of those substances induces the cancer formation. B. Selection of Parameters

Apart from apparent differences at the last two Raman peaks (1170 cm−1 and 1538 cm−1 ), some small differences existed (e.g., the fluorescence background, the correlation between the two kinds of spectroscopy). To identify the differences, 14

Mean and Standard Deviation of the 14 Selected Spectroscopy Parametersa

Mean(SD) Parameters Wavelength (nm) Relative Intensity Peak Width Function Coefficients

R1 R2 R3 R1 R2 R3 R1 R2 R3 1st 2nd 3rd 4th 5th

Normal

Rectum cancer

Colon cancer

p value

513.7481(0.3444) 517.6153(0.2998) 527.6930(0.2995) 0.0398(0.000897) 0.0445(0.000973) 0.0488(0.001450) 5.7675(0.9845) 3.2418(0.4357) 2.8541(0.3701) 0.2320(0.1201) 0.6372(0.1499) −0.55560.1711 0.1057(0.1156) −0.05310.0609

513.8434(0.4778) 517.5278(0.3033) 527.5819(0.3161) 0.0399(0.000703) 0.0434(0.000637) 0.0465(0.001110) 8.4595(2.7356) 4.2574(0.6025) 3.6207(0.4773) 0.4211(0.1242) 0.6010(0.1539) −0.63690.2278 0.2449(0.1172) −0.09390.0890

513.8137(0.4508) 517.5257(0.3780) 527.5814(0.3739) 0.0405(0.000631) 0.0429(0.000490) 0.0447(0.000932) 14.2915(10.4755) 5.7041(1.2132) 4.9157(1.2365) 0.6269(0.1353) 0.4476(0.1756) −0.76110.1889 0.4549(0.1797) −0.17730.1112

0.652 0.454 0.294