An auto-adaptive background subtraction method for

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 161 (2016) 58–63

Contents lists available at ScienceDirect

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy journal homepage: www.elsevier.com/locate/saa

An auto-adaptive background subtraction method for Raman spectra Yi Xie a, Lidong Yang a, Xilong Sun a, Dewen Wu a, Qizhen Chen b, Yongming Zeng b, Guokun Liu c,⁎ a b c

Fujian Key Laboratory of Sensing and Computing for Smart City, School of Information Science and Engineering, Xiamen University, China College of Chemistry and Chemical Engineering, Xiamen University, China State Key Laboratory of Marine Environmental Science, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China

a r t i c l e

i n f o

Article history: Received 25 November 2015 Received in revised form 6 February 2016 Accepted 22 February 2016 Available online 24 February 2016 Keywords: Raman spectrum Background subtraction Auto-adaptive

a b s t r a c t Background subtraction is a crucial step in the preprocessing of Raman spectrum. Usually, parameter manipulating of the background subtraction method is necessary for the efficient removal of the background, which makes the quality of the spectrum empirically dependent. In order to avoid artificial bias, we proposed an auto-adaptive background subtraction method without parameter adjustment. The main procedure is: (1) select the local minima of spectrum while preserving major peaks, (2) apply an interpolation scheme to estimate background, (3) and design an iteration scheme to improve the adaptability of background subtraction. Both simulated data and Raman spectra have been used to evaluate the proposed method. By comparing the backgrounds obtained from three widely applied methods: the polynomial, the Baek's and the airPLS, the auto-adaptive method meets the demand of practical applications in terms of efficiency and accuracy. © 2016 Elsevier B.V. All rights reserved.

1. Introduction By providing fingerprint information, Raman spectroscopy has been widely applied in material characterization and identification [1]. However, the Raman signal of a target may be obscured or swamped by various background from the fluorescence of surrounding medium, contaminator [2,3] or the target itself. Under this condition, background subtraction is inevitable to obtain a reliable Raman spectrum for further analysis. Various strategies have been applied to subtract the background from Raman spectrum. The polynomial method [4], one of the mostly applied methods in commercial Raman software, estimates the background by a strategy of polynomial fitting based on least squares. The degree of polynomial fitting is manually adjusted according to the profile of Raman spectrum [5], and the fitting quality is improved by applying the IasLS method [6] during the procedure of polynomial fitting. Wavelet transform plays a core role in some methods of background subtraction [7–9]. These methods can estimate background when the mother wavelet and the decomposed level of wavelet are determined appropriately [10]. Although the above mentioned methods remove the background from Raman spectrum in an accurate and efficient way, their performances are significantly affected by parameter manipulating. Such inconveniences hinder in some extent the wide application of Raman technique. For example, as a non-specialist user of handheld Raman spectrometers, one prefers to directly obtain reliable data without any complicated data analysis. ⁎ Corresponding author. E-mail address: [email protected] (G. Liu).

http://dx.doi.org/10.1016/j.saa.2016.02.016 1386-1425/© 2016 Elsevier B.V. All rights reserved.

In order to meet this demand, the methods based on least squares, such as the airPLS method [11], were proposed to reduce the requirements of users' experiences [12]. By introducing an iteration scheme to adaptively adjust the weight vector for background estimation, the airPLS only needs one adjustable parameter of λ, which is mostly compensated by the iteration scheme. Therefore, the background subtraction obtained by the airPLS is less sensitive to parameter values, in comparison to the above mentioned methods. Furthermore, Bake et al. [13] proposed an automatic method without parameter adjustment, which subtracted background when the spectrum peaks could be quickly detected using the derivation operation. However, the Bake's method lost its accuracy when Raman peaks are overlapped. By incorporating the merits of the Baek's and airPLS methods, we propose an auto-adaptive background subtraction method free of parameter manipulating. Here the three successively smoothed derivative of raw data is regarded as the derivative of background. The iteration scheme is used to promote the adaptability and overcome the issue of peak overlapping. We use both simulated data and Raman spectra (from different targets and different conditions of the same target) to evaluate the auto-adaptive method of background subtraction which is compared with the other three reported methods: the polynomial, the Baek's and the airPLS. The proposed method displays higher accuracy than the polynomial and the Baek's, and is slightly less than the airPLS method with its optimal parameter. The experimental results have demonstrated that the auto-adaptive method works well in subtracting background under different conditions, and provides comparable performance to the airPLS, one of the best methods reported for background subtraction.

Y. Xie et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 161 (2016) 58–63

2. Materials and methods

59

s(q) where q is the unit of wavenumber. Shown in Eq. (2), s(q) is the sum of Gaussian peaks g(q) and the background b(q), and then the derivative of s(q) is expressed as Eq. (3). Therefore, g(q) can be regarded as the spectrum data without background, whose derivative can be calculated by Eq. (4).

In general, the intersections of the raw data and its background curve are the local minima of the spectrum without background, shown in Fig. 1(a). Therefore, we can detect the locations of local minima of a spectrum, and appropriately fill data among these minima to estimate the background. The auto-adaptive method performs four main operations to obtain the background: smoothing operation, local minimum detection, interpolation among local minima and iterative procedure.

sðqÞ ¼ gðqÞ þ bðqÞ

ð2Þ

dsðqÞ dgðqÞ dbðqÞ ¼ þ dq dq dq

ð3Þ

2.1. Smoothing operation

dg≜

Smoothing operation is necessary and runs through the proposed method for background subtraction. The Savitzky–Golay filter [14] is selected because it performs a generalized moving average filter which can smoothen out the Raman signal without significantly destroying original characteristics of raw data. The input of raw data is expressed as one vector x[ j], j = 1, …, Lx, where Lx is the vector length. Using the Savitzky–Golay filter, the smoothed vector y = smooth(x,2N + 1), is calculated according to Eq. (1) with the span of 2N + 1. In this smoothing function, the coefficients of filter w t are derived by the unweight linear least squares fit using a polynomial whose degree is 2.

y½ j ¼

N X t¼−N

ð1Þ

¼ w−N x½ j−N þ … þ wN x½ j þ N

ð4Þ

can be approximated by Eq. (5) since that the raw data Here, dsðqÞ dq of spectrum is discrete and denoted as a vector s, where l is the index of sample which corresponds to q. In order to reduce the influence of noise, we apply the smoothing operation twice, before and after diff(s), shown in Eq. (6). dbðqÞ dq can be calculated as Eq. (7) since that the first derivative of background is similar to three successively smoothed derivative of raw data (denoted as s3ds) [13]. The spans of Savitzky–Golay filter L n and Lb are set as appropriate values. ds≜

wt x½ j−t

dgðqÞ dsðqÞ dbðqÞ ¼ dq dq dq

dsðqÞ ≈ diff ðsÞ ¼ s½l s½l 1 dq

ð5Þ

ds ¼ smoothðdiff ðsmoothðs; Ln ÞÞ; Ln Þ

ð6Þ

dbðqÞ ≜s3ds ¼ smoothðsmoothðsmoothðds; Lb Þ; Lb Þ; Lb Þ dq

ð7Þ

2.2. Local minimum detection Searching the local minima of a Raman spectrum is a critical step for estimating background. In this paper, the curve of raw data is denoted as

Therefore, the first derivative dg is obtained on the basis of Eqs. (4), (6) and (7). Shown in Fig. 1(b), the local minima of g(q) exist where the sign of dg changes from negative to positive. Then

Fig. 1. (a) The local minima of simulated data. (b) Simulated data and its first derivative. (c) Overlap peaks in simulated data. (d) Estimated background by four methods. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

60


we search the local minima by checking the element signs of dg, and record the indexes of all minimum candidates pk, k = 1…K. In order to avoid the interferences from noise and unexpected factors, the local minima shall be carefully determined according to the following rules. 1. If the region between two adjacent local minima pk and pk + 1 is smaller than the span of Savitzky–Golay smoothing, i.e.(pk + 1 - pk) b Ln, then we will delete these two minima. It is efficient to remove the false local minima due to noise. 2. Consider an effective region between two adjacent local minima, pk b l b pk + 1. We draw a line segment from s(pk) to s(pk + 1), denoted as plinek(l). If plinek(l) and the curve segment intersect at some places, we will find the index of minpk b l b pk + 1s(l), denoted as pk0 + 1, and then replace pk + 1 with pk0 + 1. 2.3. Interpolation operation The background is estimated by interpolating values between two adjacent local minima whose indexes are pk and pk + 1. Baek et al. [13] has shown that simply linear interpolation could not obtain a satisfied background. Then the proposed method builds the segment of background curve bk(l) within [pk,pk + 1] according to its first derivative dbk(l), which can be estimated by s3dsk(l) according to Eq. (7). According to the integration definition of discrete data and Eq. (7), bk(l) can be approximated as the cumulative sum of dbk(l) plus a line segment plinek(l), shown as Eq. (8). Here, s3dsk , the average of s3dsk(l) acts as an adjustment value. bk ðlÞ ¼ plinek ðlÞ þ

Xl i¼pk

s3dsk ðiÞ−s3dsk ; pk b l b pkþ1

ð8Þ

Finally, the background b(l) consists of all curve segments of b k (l),k = 1…K. Since the background is a slowly changing and

relatively smooth curve [9], the proposed method adopts the smoothing operation to achieve the final result of estimated background b, shown as Eq. (9). b ¼ smoothðU ∀k bk ; Lb Þ

ð9Þ

2.4. Iterative procedure In order to enhance the adaptability of the auto-adaptive method, an iterative procedure is introduced to ameliorate the estimated background in complex situations. For example, the simulated raw data (the blue curve) consists of two overlapping peaks shown in Fig. 1(c). The background obtained from the local minimum detection and interpolation operation (the green dashed curve) is quite far away from the known background (the black dashed curve), which may be due to the intersection of two peaks. Next we assume that the estimated background can be regarded as a curve with Gaussian peaks. Then an iterative procedure adopts the estimated background obtained by each interpolation operation as the input spectrum data s for the next cycle. In the new cycle, the local minimum detection and interpolation operation are repeated to achieve a newly estimated background. The iterations will be continued until the threshold number of cycles is reached to ensure the background smooth enough. When the iteration procedure stops, an accurate background b is obtained and then the spectrum without background can be calculated with a simple subtraction, g = s − b. Scheme 1 summarizes the procedure of the proposed auto-adaptive method to obtain g, the spectrum without the background b, given the raw spectrum data s. Here “count” presents the number of iterations, whose threshold is 5.

Scheme 1. The flow of the auto-adaptive method for background subtraction.


2.5. Simulated data Simulated data is often used to evaluate the method of background subtraction because the known background in simulations can be regarded as a benchmark. A simulated data s(l) can be expressed by the sum of the Gaussian peaks g(l) and the linear or curved background b(l), where l is the wavenumber. Then the simulated raw data used in this paper is shown in Fig. 1(d), where b(l) is created by Eq. (11), g(l) is created by Eq. (12) with six Gaussian peaks, and l = 1,2,...,600. −ðl−300Þ ðl−300Þ bðlÞ ¼ 40 exp 100000

ð11Þ

results are influenced by the choices of λ and Degree, shown in Fig. S1(a–b) and Table S1. The results of optimal parameter are underlined in Table 1. The best result of airPLS is similar to the result of the proposed auto-adaptive method, shown in Fig. 1(d). The polynomial method has higher RMSE compared with the airPLS method and the auto-adaptive method because it is under-fitting near the end of the spectrum as shown in Fig. 1(d). Nevertheless, the obtained curves after background removal by the polynomial method, the airlPLS method and auto-adaptive method are almost identical to each other shown as Fig. S2, since that their backgrounds are indistinguishable as shown in the insert of Fig. 1(d). 3. Experimental

−ðl−100Þ ðl−100Þ g ðlÞ ¼ 100 exp 50 −ðl−500Þ ðl−500Þ þ200 exp 100 −ðl−470Þ ðl−470Þ þ150 exp 100 −ðl−440Þ ðl−440Þ þ130 exp 100

3.1. Raman spectra

ð12Þ

2.6. Analysis of simulated data In order to compare the proposed method with three typical methods: (1) the automatic Beak's method based on the peak detection, (2) the polynomial method widely used, and (3) the airPLS method specifically designed for commercial Raman spectrometer. A quantified indicator Root Mean Square Error (RMSE) is introduced to measure the difference between the known background and the estimated background. The expression of RMSE is show in Eq. (13), where b i is the estimated background and b 0i is the known background created by Eq. (11). The closer the estimated background to the known background, the smaller the RMSE is, and then the higher accuracy the background subtraction method is. RMSE ¼

61

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 X 0 bi −bi =n i

ð13Þ

Experimental Raman spectra have been collected using the DeltaNu Inspector Raman instrument (the excitation laser is 785 nm with a power of 60 mW, and the obtained Raman spectrum is from 200 to 2000 cm−1 with a 1800 l/m grating) and the B&WTek i-Raman instrument (the excitation laser is 785 nm with a power of 250 mW, and the obtained Raman spectrum is from 175 to 3200 cm− 1 with a 1000 l/m grating). Surface-enhanced Raman spectra of three target molecules, Fenthion, Ractopamine, and Sodium thiocyanate, have been obtained by mixing the gold colloid and the target solution with 1:1 volume ratio. Gold colloid used for obtaining surface-enhanced Raman spectroscopy (SERS) signal of these molecules is purchased from PERSer Nanotechnology LTD. (Xiamen). 3.2. Programming The proposed auto-adaptive method is implemented by C language with the GVIM tool, with the OS of Windows version 7.4. The spans of the smoothing operation are fixed as L n = 140 and Lb = 6, which are insensitive to the final result after several iterations. The threshold of iterations is set to 5 for the sake of algorithm efficiency. After analyzing abundant spectrum samples, we find that this configuration makes the proposed method satisfy the accuracy requirements in the applications. 4. Results and discussion

The airPLS method and the polynomial method can adjust key parameters manually and then their RMSE results are influenced by the choice of significant parameters. For a better description, the parameters which can influence the accuracy of background in the airPLS and polynomial methods, i.e. the penalized coefficient and the degree for polynomial fitting, are expressed as λ and Degree respectively. In the following, we use the best results obtained by manipulating different parameters for comparison. Fig. 1(d) depicts the backgrounds estimated by four methods, and Table 1 presents the corresponding RMSE of each procedure. The Baek's method has highest RMSE value due to the weakness in dealing with the region of overlapped peaks. The airPLS method and the polynomial method can adjust key parameters manually and then their RMSE Table 1 Comparison of the RMSE of estimated backgrounds by four methods. Methods

Parameters

RMSE

The Beak's The auto-adaptive The polynomial

Fixed Fixed Degree = 1 Degree = 2 Degree = 3 λ = 104 λ = 105 λ = 106

8.969245 0.360581 16.17316 1.368723 4.495035 11.327078 0.311919 0.571081

The airPLS

As it has been discussed in the analysis of simulated data that the backgrounds obtained by either the polynomial method or the airPLS method would be significantly modified by tuning key parameters appropriately. The optimal parameters of these two methods are different in dealing with SERS spectra of these different materials, as shown in Table 2, and the related backgrounds are displayed from Figs. S5 to S10. In the following, the proposed method will be compared with the polynomial method and the airPLS method in terms of the accuracy with their optimal parameters. It shall be mentioned that we did not discuss the results obtained by the Baek's method due to the low fitting quality whose results are given in Fig. S3. Fig. 2(a) shows the backgrounds obtained by these three methods in dealing with the SERS spectrum of Fenthion at a concentration of 1 ppm. Table 2 The optimal parameters of the polynomial and airPLS methods in dealing with SERS spectra of Fenthion, Ractopamine and Sodium thiocyanate. Methods

Materials

Optimal parameter

The polynomial

Fenthion Ractopamine Sodium thiocyanate Fenthion Ractopamine Sodium thiocyanate

Degree = 5 Degree = 7 Degree = 6 λ = 106 λ = 107 λ = 105

The airPLS

62


Fig. 2. (a) Backgrounds of the SERS spectrum of 1 ppm Fenthion by three methods. The insert is the enlarged part from 900 to 1300 cm−1. (b) Backgrounds of the SERS spectrum of 1 ppm Ractopamine by three methods.

It is clearly displayed that all the three methods can estimate the background in an acceptable way. Nevertheless, the polynomial one only provides the basic background trend with some background residue left as shown in the insert of Fig. 2(a), due to the low fitting quality when dealing with the continuous Raman peaks. Furthermore, both the polynomial and airPLS methods result in an underfitting near the end of spectrum as shown in the insert of Fig. 2(a). Similar results have been obtained when dealing the 1 ppm Ractopamine SERS spectrum in Fig. 2(b). It shall be mentioned that the airPLS method estimates the background in a more reliable mode in the range from 200 to

400 cm−1, where the Raman signal is cut off at 200 cm−1 due to the edge filter used in the current Raman facility. Although, the polynomial and auto-adaptive methods tend to fit the background directly down to the cutting-off edge instead of following the trend of the whole background. For the left region of the Raman spectrum, the auto-adaptive method displays the comparative background estimation to that by the airPLS, and the identical characteristics as shown in Fig. 2(b). This results in the similar SERS spectra after background subtraction as shown in Fig. S4(b) for the auto-adaptive and the airPLS methods, while the baseline obtained by the polynomial one is up-drifted.

Fig. 3. (a) Backgrounds of the SERS spectrum of 1 ppm Sodium thiocyanate by three methods. (b) Backgrounds of the SERS spectrum of Sodium thiocyanate mixed in milk with different concentrations by the auto-adaptive method. (c) Part from 300 to 1700 cm−1 of (a). (d) The normalization of data after background correction.


The weak fitting quality near the cutting-off edge ignites us to explore that whether these methods are able to estimate a reliable background when dealing with the Raman spectrum containing some unexpected defects, such as the Raman intensity saturated straight line induced by either too high laser power or too long integration time. They are hardly avoided either due to the fixed scanning range of the Raman spectrometer or the unprofessional operations in practical applications. A typical example is shown in Fig. 3(a) with the SERS spectrum of the diluted milk solution containing 1 ppm Sodium thiocyanate. The background obtained by either the proposed auto-adaptive or the airPLS method is little affected by the saturation part in the range of 200–300 cm−1. The two methods output similar backgrounds with a negligible difference and then the obtained Raman spectra provide the exactly same information as shown in Fig. 3(c). We further investigate the reliability of the auto-adaptive method confronting serial spectra with similar Raman characteristics by increasing the Sodium thiocyanate concentration up to 4, 5 and 6 ppm in milk samples. As it is shown in Fig. 3(b) that with the increasing concentration, not only the background changes nonlinearly, but also the Raman intensity saturated region is enlarged accordingly under the same collection time. (It shall be mentioned that such spectra series are totally bad and intolerant for any Raman specialist, but it constantly happened for the non-specialist users using portable Raman instruments. Therefore, it is highly thirsty to reduce such unexpected effects as far as possible by software automatically.) Not surprisingly, the auto-adaptive method provides very nice background curves for all these three concentrations displaying as dashed lines in Fig. 3(b). The normalized Raman spectra after background removal shown in Fig. 3(d) demonstrate that either the nonlinear background or the fluctuating overwhelmed Raman intensity region has little effect on the final result. All the three Raman spectra display the same Raman characteristics with the same relative Raman intensities (all these signals are related to the proteins and fats containing in milk), except the signal from the SCN− located at ~2130 cm−1, which is concentration dependent. 5. Conclusion An auto-adaptive method of background subtraction has been proposed here, in order to automatically remove the unwanted background from the raw Raman spectrum in a reliable way. The proposed method is inspired by the idea of the Bake's method which automatically estimates one background using the derivation operations of raw data, while learns the iteration scheme from the airPLS method to enhance the accuracy and the adaptability. According to the analysis results of simulated data and real spectra, the proposed method outperforms

63

the Baek's method and the polynomial method, and achieves the similar accuracy as the airPLS, a commercial method, whose performance is enslaved by the experience of the operator. The advantage of this method is that it can handle different Raman spectra without manipulating parameters and well keep the fingerprint information of spectrum which is significant for substance identification. In general, the proposed method is feasible, convenient, and easy to popularize. It has been successfully deployed in the server which provides handheld Raman spectrometers a network service to estimate the backgrounds of Raman spectra in real time. Acknowledgments We acknowledge the support from National Special Fund for Major Research Equipment and Instruments of China (No. 2011YQ03012417), National Natural Science Foundation of China (No. 21473140, 61379157), Shenzhen City Special Fund for Strategic Emerging Industries (No. JCYJ20120830153030584), the Scientific Research Fund of Sichuan Provincial Department Science and Technology (No. 2014SZ0107, 2015GZ0333). Appendix A. Supplementary data Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.saa.2016.02.016. References [1] Z.H. Sun, M.Z. Huang, Z.G. Yu, Y. Ji, Y. Wang, Laser Optoelectron. Prog. 51 (2014) 1–7. [2] Z.M. Zhang, S. Chen, Y.Z. Liang, Z.X. Liu, Q.M. Zhang, L.X. Ding, F. Ye, H. Zhou, J. Raman Spectrosc. 41 (2010) 659–669. [3] S. Chen, X.N. Li, Y.Z. Liang, Z.M. Zhang, Z.X. Liu, Q.M. Zhang, L.X. Ding, F. Ye, Spectrosc. Spectr. Anal. 30 (2010) 2157–2160. [4] X.W. Feng, Z.L. Zhu, M.J. Shen, P.S. Cong, Comput. Appl. Chem. 26 (2009) 759–762. [5] Z.J. Qin, Z.H. Tao, J.X. LIU, G.W. Wang, Spectrosc. Spectr. Anal. 33 (2013) 383–386. [6] S.X. He, W. Zhang, L.J. Liu, Y. Huang, J.M. He, W.Y. Xie, P. Wu, C.L. Du, Anal. Methods 6 (2014) 4402–4407. [7] Y.G. Hu, T. Jiang, A.G. Shen, W. Li, X.P. Wang, J.M. Hu, Chemom. Intell. Lab. Syst. 85 (2007) 94–101. [8] P.M. Ramos, I. Ruisnchez, J. Raman, Spectroscopy 36 (2005) 848–856. [9] A.E. Villanueva-Luna, J. Castro-Ramos, S. Vazquez-Montiel, A. Flores-Gil, J.A. Delgado-Atencio, E.E. Orozco-Guillen, Opt. Mem. Neural Netw. 19 (2010) 310–317. [10] G. Li, FCC '09 Proceedings of the 2009 ETP International Conference on Future Computer and Communication, New York, 2009 198–200. [11] Z.M. Zhang, S. Chen, Y.Z. Liang, Analyst 135 (2010) 1138–1146. [12] P.F. Gao, R. Yang, J. Ji, H.M. Guo, Q. Hu, L.X. Ding, S.L. Zhuang, Spectrosc. Spectr. Anal. 35 (2015) 1281–1285. [13] S.J. Baek, A. Park, J. Kim, A.G. Shen, J.M. Hu, Chemom. Intell. Lab. Syst. 98 (2009) 24–30. [14] A. Savitzky, M.J.E. Golay, Anal. Chem. 36 (1964) 1627–1639.