A User's Guide to Principal Components,J. E. Jackson, 592 p.,. Wiley (1991). 6. Pattern Recognition Statistical Structural and Neural Approaches,. R. Schalkoff ...
Computer-based tutorial on chemometrics Joseph Dubrovkin
Copyright © 2017 Joseph Dubrovkin All rights reserved
1
What is Chemometrics ?
2
Chemometrics is “the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and
experiments, and to provide maximum chemical information by analysing chemical data” [ (www.elsevier.com/locate/chemometrics) ] The literature on the issue was collected from scattered sources (chapters of manuals and books, articles and tutorials).
There are a lot of the references… 1. Chemometric Techniques for Quantitative Analysis, R. Kramer, 215 p, Marcel-Dekker (1998) 2. Factor Analysis in Chemistry, 3rd ed., E. Malinowski 432 p., Wiley(2002) 3. Applied Regression Analysis, 3rd ed., N. Draper, H. Smith, 709 p., Wiley (1998) 4. Basic Statistics and Pharmaceutical Statistical Applications, 2nd ed., J. E. De Muth, 714p., Taylor and Francis (2006) 5. A User's Guide to Principal Components,J. E. Jackson, 592 p., Wiley (1991) 6. Pattern Recognition Statistical Structural and Neural Approaches, R. Schalkoff, 364 p., Wiley (1992) 7. Multivariate Analysis of Quality: An Introduction,H. Martens, M. Martens, 466 p., Wiley (2001) 8. Multivariate Calibration, H. Martens, T. Naes, 416 p., Wiley (1989) 9. Chemometrics Statistics and Computer Application in Analytical Chemistry, M. Otto, 314 p., Wiley - VCH (1999) 10. Chemometrics, A Practical Guide, K. Beebe, R. Pell, M. B. Seasholtz, 348p., Wiley (1998) 11. Chemometrics in Environmental Analysis, J.W. Einax, H.W. Zwanziger, S. Geiss, 404 p., VCH (1997)
3
There are many commercial software packages for chemometrics
But the reference sources usually do not contain the logical steps: Theory It is difficult for nonprofessionals to understand
parameters
Formulas
Calculations
To make sure the output data are correct, one must check it
All estimates are taken "on faith“ by an analyst - practitioner with only a basic knowledge in mathematics and statistics. It is common "to jump" from a boring theory to the formulas since mathematical proofs are absent. A blind use of software may give unpredictable data. For example, decomposition of unresolved spectra into "a forest" of pure components.
4
The foregoing prompted us to write
5
The Open Source Computer-based Tutorial on Chemometrics 1. The tutorial intends to provide a reader (practitioners, researchers
and students) with a comprehensive presentation of chemometrics which is illustrated by various examples supplied by the Matlab code. 2. The reader can validate numerical data given in the guide,
understand the details of the algorithm and, if necessary, modify corresponding computer program. 3. Interactive Graphic User Interface allowed us to represent figures and results of the numerical simulations obtained in a wide range of variable model parameters.
Concepts Goal
Chemometrics Tools
Get, organize and display the most important chemical information and data in the most efficient manner
Chemical Data Analysis
Statistics Design
Mathematical methods Linear
Pattern Recognition
Multivariate Curve Resolution
Pre-processing Nonlinear Univariate Calibration Multivariate Process Analytical Control
6
Process Analytical Control
7
Counterfeit tablets Octane number determination
O. Rodionova et. all, Anal.Chim.Acta, 549,151 (2005)
MATLAB is the software package
8
Cleve Moler (University of New Mexico) started developing MATLAB in the late 1970s. MATLAB was first adopted by researchers and practitioners in control
engineering. It is now widely used in education, in particular the teaching of linear algebra, numerical analysis, and is popular amongst scientists involved in one and two - dimensional signal processing. Programming components: Vectors, Matrices, Structures, Functions, Classes and Objects. Visualization: Plain text, 2D and 3D graphics, Graphical User Interface. Software compatibility
MATLAB and other programming languages can operate and interact satisfactorily together on the same computer. MATLAB can collect data written in different formats.
EXAMPLE: Non-interactive Interface. Decomposition of the trans-stilbene fluorescence spectrum into 4 components using Asymmetrical Log-Normal function. Isranalytica-2017.
9
Decomposition of the trans-stilbene fluorescence spectrum into 4 components using Asymmetrical Log-Normal function. Isranalytica-2017. Manual Processing
10
EXAMPLE: Interactive Interface
11
RESULTS
12
Dataset 1. Dependences 𝑅𝑀𝑆𝐸(𝜎𝑎 ) obtained by averaging of the regression matrix. 𝒄𝑝𝑟 = 0.333, 0.333, 0.333 (a) and 0.8, 0.1, 0.1 (b) 𝜎𝑠 = 0(●) and 10 (■). 𝑛𝑐𝑙 = 1,5, 10 and 100 from the top to the bottom curves respectively. All plots (●) for all 𝑛𝑐𝑙 and plots (■) for 𝑛𝑐𝑙 = 10 and 100 (panel a) merge.
Manual change of the plot parameters
13
Manual change of the model parameters function [VectorAmpl,VectorImax,VectorWidth] =… inputData( dataSet ) %For Gaussians data sets start=104; finish=203; switch dataSet case 'dataset1' VectorAmpl=[1 1 1]*1e3; VectorImax=[120 160 180]; VectorWidth=[50 50 50];