Classification of Brazilian Coffee Using Near-Infrared

0 downloads 0 Views 243KB Size Report
May 8, 2012 - Aplicada, Federal University of Rio Grande do Norte (UFRN), Natal, .... were recorded in triplicate using an MB-160 D FT-NIR spectrometer ...
This article was downloaded by: [Miss Lura Embrick] On: 29 November 2012, At: 05:37 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Analytical Letters Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lanl20

Classification of Brazilian Coffee Using Near-Infrared Spectroscopy and Multivariate Calibration a

a

a

Klécia M. Santos , Maria F. V. Moura , Francisco G. Azevedo , a

b

Kássio M. G. Lima , Ivo M. Raimundo Jr. & C. Pasquini

b

a

Institute of Chemistry, Grupo de Pesquisa em Quimiometria Aplicada, Federal University of Rio Grande do Norte (UFRN), Natal, Brazil b

Institute of Chemistry, University of Campinas (UNICAMP), Campinas, Brazil Accepted author version posted online: 15 Feb 2012.Version of record first published: 08 May 2012.

To cite this article: Klécia M. Santos, Maria F. V. Moura, Francisco G. Azevedo, Kássio M. G. Lima, Ivo M. Raimundo Jr. & C. Pasquini (2012): Classification of Brazilian Coffee Using Near-Infrared Spectroscopy and Multivariate Calibration, Analytical Letters, 45:7, 774-781 To link to this article: http://dx.doi.org/10.1080/00032719.2011.653905

PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Analytical Letters, 45: 774–781, 2012 Copyright # Taylor & Francis Group, LLC ISSN: 0003-2719 print=1532-236X online DOI: 10.1080/00032719.2011.653905

Vibrational Spectroscopy

Downloaded by [Miss Lura Embrick] at 05:37 29 November 2012

CLASSIFICATION OF BRAZILIAN COFFEE USING NEAR-INFRARED SPECTROSCOPY AND MULTIVARIATE CALIBRATION Kle´cia M. Santos,1 Maria F. V. Moura,1 Francisco G. Azevedo,1 Ka´ssio M. G. Lima,1 Ivo M. Raimundo, Jr.,2 and C. Pasquini2 1

Institute of Chemistry, Grupo de Pesquisa em Quimiometria Aplicada, Federal University of Rio Grande do Norte (UFRN), Natal, Brazil 2 Institute of Chemistry, University of Campinas (UNICAMP), Campinas, Brazil This work describes the use of near infrared spectroscopy (NIRS) and chemometric techniques calibration for the classification of coffee samples from different lots and producers acquired in supermarkets and roasting industries in some Brazilian cities. Seventy-three samples of finely ground roasted coffee were acquired in the market and 91 samples of roasted ground Arabica beans were analyzed in the full NIR spectral range (800–2500 nm) using a diffuse reflectance accessory coupled to an MB160 Bomem spectrophotometer. Two classification models were constructed: Soft Independent Modeling Class Analogy (SIMCA) and PLS Discriminant Analysis (PLS-DA). All findings reveal that NIR spectroscopy, coupled with either SIMCA or PLS-DA multivariate models, can be a useful tool to differentiate roasted coffee grains and to replace sensory tests. Keywords: Coffee; NIR spectroscopy; PLS-DA; SIMCA

INTRODUCTION Coffee is recognized throughout the world as one of the most widely consumed products in international markets. Of the more than 60 species of coffee grains commercialized, the two most economically important are Coffea ara´bica L. for its aroma and taste and Coffea canephora Pierre (robusta), used primarily in blends sold in internal markets at lower prices. The large amount of coffee produced in Brazil is due to the climatic conditions that favor its cultivation. Consequently, there are several brands on the Brazilian Received 26 April 2011; accepted 8 September 2011.  In memorian. The authors thank CNPq, CAPES=PROCAD, and UFRN=PPGQ. Address correspondence to Ka´ssio M. G. Lima, Institute of Chemistry, Grupo de Pesquisa em Quimiometria Aplicada, Federal University of Rio Grande do Norte (UFRN), CEP 59072-970, Natal, Brazil. E-mail: [email protected] 774

Downloaded by [Miss Lura Embrick] at 05:37 29 November 2012

COFFEE, NIR, PLS-DA, SIMCA

775

market. These are produced by mixing different coffee beans to form blends that result in flavors and aromas from different species, regions, crops, and so forth (Farah et al. 2006). Thus, the types of coffee sold in Brazil can be classified into three main classes: Traditional, consisting of Arabica beans or those blended with up to 30% robusta and sold at lower prices; Gourmet, composed of 100% Arabica beans at higher prices; and Decaffeinated, with low caffeine content. Furthermore, coffee can also be classified as to type of drink, where sensory tests are conducted (cup testing) to assess a number of parameters, such as aroma, acidity, bitterness, taste, astringency, and so forth, so that at the end of the test it is classified into one of the following classes: Soft, pleasant aroma, and sweet taste; Hard, bitter taste; Rioysh, slightly chemical taste with typical iodoform flavor; Rio, inferior flavor with excessive chemical taste (Franca, Mendonca, and Oliveira 2005). This difference in quality results from varying consumer demands. The better the coffee the higher the price, and this association between price and quality makes coffee a ‘‘target’’ product for adulterations (Assad et al. 2002). Given its importance, studies on the composition of coffee, fraud detection and quality assessment are extremely important. For this reason, several quality parameters must be considered, such as chemical and sensory analyses. However, two problems are revealed in this case: subjectivity of sensory analysis and chemical analyses that need to use reagents and specialized analytical instrumentation (Morgano et al. 2007). Given that composition depends on the formulation of coffee blends and variability as a function of roasting conditions, it is important to differentiate and classify coffees according to their chemical and=or sensory characteristics (Pizarro, Esteban-Dı´ez, and Gonza´lez-Sa´iz 2007). Techniques such as near infrared spectroscopy (NIRS), associated to diffuse reflectance and chemometric tools (Naes et al. 2002) such as SIMCA (Wang, Wang, and Ma 2006; Flaten, Grung, and Kvalheim 2004) and PLS-DA (Bylejo et al. 2006; Chevallier et al. 2006) have been used successfully in monitoring food quality parameters, as they provide rapid and efficient results compared to conventional methods and do not require reagents, a fact that favors the environment. Therefore, in this study we used the near infrared spectroscopy (NIRS) technique, combined with multivariate analysis to construct models that classify samples of roasted and ground coffee as an alternative, rapid, reliable, and non-destructive methodology, compared to sensory tests currently in use.

EXPERIMENTAL SECTION Samples Seventy-three samples of roasted and ground coffee belonging to traditional (37), gourmet (27) and decaffeinated (9) classes were obtained from supermarkets in the states of Sa˜o Paulo and Rio Grande do Norte, Brazil. Ninety-one samples of soft (25), hard (26), rioysh (25) and rio (15) coffee were obtained from the SUMATRA coffee factory in the city of Espı´rito Santo do Pinhal, Sa˜o Paulo state, which were previously classified by cup testing performed by specialized SUMATRA tasters.

776

K. M. SANTOS ET AL.

These samples consisted of dry-processed blends from arabica and robusta varieties, which are the two most employed species in Brazilian industry. All the soft, hard, rioysh and rio samples were submitted to the grinding process to standardize the coffee grain size. To this end, 6 grams of each sample was ground in an electric grinder (Cuisinart DCG-20) for 40 seconds and sifted through 0.25 mm sieves.

Downloaded by [Miss Lura Embrick] at 05:37 29 November 2012

Equipment The spectra in the near infrared range of 14,000–4000 cm1 or 714–2500 nm were recorded in triplicate using an MB-160 D FT-NIR spectrometer (Bomem) with a spectral resolution of 8 cm1 and 50 scans co-added. A mean spectrum was then calculated for each sample by averaging the triplicate spectra. The spectrum of a polytetrafluoroethylene (PTFE) sample was used as background. Data Analysis and Processing Classification models were constructed based on soft independent modeling of class analogy (SIMCA) software and partial least squares regression for discriminant analysis (PLS-DA). The SIMCA classification is a method based on disjoint PCA modeling realized for each class in the calibration set. Unknown samples are then compared to the class models and assigned to classes according to their analogy with the calibration samples. A new sample will be recognized as a member of a class if it is similar enough to the other members; otherwise it will be rejected. Each class is modeled using separate PCA models. The principle of PLS-DA consists in a classical PLS regression where the response variable is binary and expresses a class membership. Therefore, PLS-DA does not allow attributing a sample to other groups other than the ones first defined. As a consequence, all measured variables play the same role with respect to the class assignment. Actually, PLS latent variables are built to find a proper compromise between two purposes: describing the set of explanatory variables and predicting the response ones. Data pretreatment for SIMCA and PLS-DA models included calculation of the first derivative spectra applying a first order Savitzky–Golay filter with 9 smoothing points and normalization using the multiplicative scatter correction (MSC). PCA, SIMCA and PLS-DA were carried out in Unscrambler@ 9.8 (CAMO SA) and performed with the default settings of the software. RESULTS AND DISCUSSION Figure 1a shows the set of NIR spectra containing the 164 coffee samples, 91 belonging to soft (25), hard (26), rioysh (25), and rio (15); and 73 belonging to the traditional (37), gourmet (27), and decaffeinated classes (9). In an attempt at minimizing the effects caused by the difficulty in obtaining an ideal spectrum, without undesirable random variations, mathematic techniques were applied to the original spectra before construction of the models (SIMCA and PLS-DA) so that these variations did not influence final results. The following methods were used in each spectrum: Savitzky-Golay smoothing to improve the signal-noise ratio and multiplicative signal correction (MSC) to minimize the effects of light scattering. Savitzky-Golay

Downloaded by [Miss Lura Embrick] at 05:37 29 November 2012

COFFEE, NIR, PLS-DA, SIMCA

777

Figure 1. (a) Set of original spectra containing 164 samples of coffee; and (b) set of spectra after application of mathematical techniques: Savitzky-Golay smoothing, multiplicative signal correction (MSC), and first derivative.

first derivative with a 21-point window to correct baseline displacement is shown in Figure 1b. After mathematical techniques, the total set of spectra was divided into two subsets: one with 73 coffee samples and another with 91 samples to construct two distinct classification models. A calibration set with 54 samples belonging to the soft, hard, rioysh and rio classes, and another set with 65 gourmet, traditional and decaffeinated coffee samples were used to construct SIMCA models, based on PCA graphs for each data set. For the models developed, cross-validation was used as a validation technique. Figure 2 shows the graph of scores for PCA models constructed for the coffee samples. After modeling the set of spectra, it was found that 2 PCs were enough to characterize the four coffee classes (soft, hard, rioysh, and rio). Good inter-class discrimination was observed for the different samples. Figure 2a shows the

Figure 2. Score graph. (a) Soft, hard, rioysh, and rio classes; and (b) Gourmet, traditional, and decaffeinated.

778

K. M. SANTOS ET AL. Table 1. Results (classification errors in the test set) obtained for the SIMCA model

Samples

Soft PC(3)

Hard PC(5)

Rioysh PC(3)

Rio PC(5)

Error (%)

Hit (%)

0 0 0 0

0 0 0 3

0 0 0 0

0 1 0 0

0 1.25 0 3.75 5.0

– – – – 95

Soft (10) Hard (10) Rioysh (10) Rio (7) Total Samples

Downloaded by [Miss Lura Embrick] at 05:37 29 November 2012

Gourmet (10) Traditional (14) Decaffeinated (4) Total

Gourmet PC(3)

Traditional PC(4)

Decaffeinated PC(3)

Error (%)

Hit (%)

0 1 1

3 0 0

0 0 0

5.55 1.85 1.85 9.25

– – – 90.75

Numbers between parentheses indicate the number of samples used for predicting.

classification model per type of coffee. There was well defined separation of coffee classes, which can be seen in the quadrants of the scores graph. Samples are distributed along the first main component that exhibits the highest data variance (92%), while the second main component has only 2% variance. Figure 2b illustrates the classification model for the coffee samples by quality category. There was good separation between the different types of coffee. PC1 explains 84% of data variance, whereas PC2 accounts for only 8% of total variance. To determine the predictive capacity of the SIMCA model constructed, two sets of external validation were used: one containing 37 coffee samples [hard (10), soft (10), rioysh (10), rio (07)] and another with 28 samples for the gourmet (10), traditional (14), and decaffeinated (4) classes. Table 1 contains the results obtained for the SIMCA model at a 95% confidence level. The values of the main diagonal in the table below represent the number of type I errors, that is, samples that should fall within their own class. The remaining numbers indicate type II errors (samples that belonged to a determinate class and fell into another). The numbers in parentheses indicate the number of prediction samples chosen for each class. Six prediction samples were chosen for the soft type, which were classified within their own class, generating 0% prediction error. For the 10 predicted hard samples, only one was classified incorrectly into the rio class, resulting in an error of 1.25%. The rioysh and rio samples obtained errors of 0% and 3.75%, respectively. Thus, the 37 coffee samples showed a hit index of 95%. Furthermore, 10, 14, and 4 predictions were selected for gourmet, traditional, and decaffeinated classes, respectively. For the gourmet class three samples fell into the traditional class, generating an error of 5.55%. Traditional and decaffeinated samples obtained the same error percentage of 1.85%. Therefore, for the 28 samples, a hit percentage of 90.8% was obtained in the prediction model. These results reveal that the SIMCA model constructed was quite satisfactory in both cases. However, the soft and rioysh classes exhibited better prediction responses. PLS-DA models were constructed, using the same training set previously mentioned, in order to assess the classification capacity of the models investigated. The PLS-DA model was used to maximize covariance between measured data (X) and

COFFEE, NIR, PLS-DA, SIMCA

779

Table 2. Results (classification errors in the test set) obtained for the PLS-DA model Samples Soft (10) Hard (10) Rioysh (10) Rio (7) Total Samples

Downloaded by [Miss Lura Embrick] at 05:37 29 November 2012

Gourmet (10) Traditional (14) Decaffeinated (4) Total

Soft

Hard

Rioysh

Rio

Error (%)

Hit (%)

0 0 1 0

0 0 0 2

1 0 0 0

0 0 0 0

1.25 0.00 1.25 2.50 5.0

– – – – 95

Gourmet

Traditional

Decaffeinated

Error (%)

Hit (%)

1 0 0

0 1 0

0 0 0

1.85 1.85 0 3.70

– – – 96.3

Numbers between parentheses indicate the number of samples used for predicting.

the response variable (Y) and the integrity of the prediction parameter Q2. The PLS-DA modeling for four coffee classes (soft, hard, rioysh and rio) revealed R2X , R2Y values above 0.90 and Q2 values between 0.29 and 0.35. The PLS-DA modeling for the coffee samples by quality category (traditional, gourmet and decaffeinated) revealed R2X , R2Y values above 0.88 and Q2 values between 0.34 and 0.55. Table 2 shows the results for the set of samples using PLS-DA at a 95% confidence level. Both soft and rioysh classes showed an error of 1.25%, while the rio class obtained an error of 2.5%. By contrast, the hard sample had a 100% prediction rate. Thus, the hit index for the set of predictions of these samples was 95%. In the case of the gourmet and traditional groups, both classes obtained a 1.85% error rate, while decaffeinated coffee showed 100% hits. Altogether this data set obtained a hit rate of 96.3%. The PLS-DA loadings were proposed for identification of potential components for separation of the coffee groups. The relevant variables for the differentiation of four classes investigated (soft, hard, rioysh, and rio) were 1920 nm (2nd C=O stretch overtone of CONH amides), 1750 nm (1st C-H stretch overtone of the CH3 group), 1730 nm (1st C-H stretch overtone from the (CH2)n), and 2150 nm regions (combination band of C-H þC=C bonds from the HC=CH group). It should be underscored that the combination of NIR spectroscopy and multivariate analysis is important, given that the chemical differences between the coffee samples can be visualized (Chevallier et al. 2006). The models constructed indicated good performance in terms of separating the different types of coffee, as determined by sensory analyses. It is known that the quality parameters assessed depend primarily on the chemical composition of beans that are composed of innumerable volatile and non-volatile chemical compounds, including acids, sugars, proteins, amino acids, phenolic compounds, lipids, and so forth (Flaten, et al. 2004). This substance diversity characterizes the different coffee flavors, given that the species Coffea arabica exhibit higher concentrations of carbohydrates, lipids, and trigonelline and is considered to be higher quality. However, Coffea robusta exhibits higher levels of phenols and caffeine, which makes coffee more bitter and astringent (Bylejo et al. 2006). Thus, coffee composition may be an important factor to explain

Downloaded by [Miss Lura Embrick] at 05:37 29 November 2012

780

K. M. SANTOS ET AL.

the separation of different classes in the suggested classification models, especially with respect to the soft, hard, rioysh, and rio types. Functional groups are identified over the entire spectrum; however, some variables are relevant to data interpretation and are more intense in a number of wavelengths, which is the case of caffeine and lipids. Gourmet coffee contains more lipids and less caffeine than the traditional type, while decaffeinated contains low caffeine levels. Classifying roasted and ground coffee is a complex task, since the composition of the final product is affected by two main factors: blends and adulteration, which hinders class discrimination. However, this study obtained significant results, given that it separated classes with different blends. Another relevant point was to separate coffees by type. This is a subjective test, hitherto performed by sensory testing (cup testing). NIR spectroscopy identified different coffee categories better than other analytical techniques such as HPLC (De Maria et al. 1994) and MID (Dı´ez, Saiz, and Pizarro 2004), which require more analysis time and the use of reagents. Moreover, the diffuse reflectance technique was very important for obtaining the spectra, demonstrating its capacity and efficiency when used in particulate material analyses. CONCLUSIONS This study proposes new techniques for classifying coffee samples using near infrared spectroscopy and chemometric classification methods (SIMCA and PLS-DA). Coffee samples in both classification methods showed excellent predictive capacity. Thus, the method proposed proved to be a powerful and promising tool for classifying roasted and ground coffee. The methodology developed requires no sample pretreatment and may eventually replace sensory analysis testing. REFERENCES Assad, E. D., E. E. Sano, S. A. R. Cunha, T. B. S. Correa, and H. R. Rodrigues. 2002. Identificac¸ a˜o de impurezas e misturas em po´ de cafe´ por meio de comportamento espectral e ana´lise de imagens digitais. Pesquisa Agropecua´ria Brasileira. 37: 211–216. Bylejo, M., M. Rantalainen, O. CloarecJ, K. Nicholson, E. Holmes, and J. Trygg. 2006. O PLS discriminant analysis: combining the strengths of PLS-DA and SIMCA classification. J. Chemom. 20: 341–351. Chevallier, S., D. Bertrand, A. Kohler, and P. Courcoux. 2006. Application of PLS-DA in multivariate image analysis. J. Chemom. 20: 221–229. De Maria, C. A. B., L. C. Trugo, R. F. A. Moreira, and C. C. Werneck. 1994. Composition of green coffee fraction and their contribution to the volatile profile formed during roasting. Food Chem. 50: 141–145. Dı´ez, I. E., J. M. G. Saiz, and C. Pizarro. 2004. Prediction of roasting colour and other quality parameters of roasted coffee samples by near infrared spectroscopy. A feasibility study. J. Near Infrared Spectrosc. 12: 287–297. Farah, A., M. C. Monteiro, V. Calado, A. S. Franca, and L. C Trugo. 2006. Correlation between cup quality and chemical attributes of Brazilian coffee. Food Chem. 98: 373–380. Flaten, G. R., B. Grung, and O. M. Kvalheim. 2004. A method for validation of reference sets in SIMCA modeling. Chemom. Intell. Lab. Systems. 72: 55–66.

COFFEE, NIR, PLS-DA, SIMCA

781

Downloaded by [Miss Lura Embrick] at 05:37 29 November 2012

Franca, A. S., J. C. F. Mendonca, and S. D. Oliveira. 2005. Composition of green and roasted coffees of different cup qualities. LWT–Food Sci. Technol. 38: 709–715. Morgano, M. A., C. G. de Faria, M. F. Ferrao, and M. M. C. Ferreira. 2007. Determination of total sugar in raw coffee using near infrared spectroscopy and PLS regression. Quim. Nova. 30: 346–350. Naes, T., T. Isaksson, T. Fearn, and T. Davies. 2002. A User-Friendly Guide to Multivariate Calibration and Classification. Chichester, UK: J. Wiley and Sons. Pizarro, C., I. Esteban-Dı´ez, and J. M. Gonza´lez-Sa´iz. 2007. Mixture resolution according to the percentage of robusta variety in order to detect adulteration in roasted coffee by near infrared spectroscopy. Anal. Chim. Acta. 585: 266–276. Wang, J. J., F. Wang, and L. Ma. 2006. The quality assessment of cigarette paper by SIMCA and PLS combined with near infrared spectrum. Spectrosc. Spect. Anal. 26: 1858–1862.