Applying Machine Learning Techniques to Chemical ...

18 downloads 0 Views 1MB Size Report
Applying Machine Learning Techniques to Chemical ... of this research is to apply machine learning methods to ... data for the purpose of pattern recognition and.
Applying Machine Learning Techniques to Chemical Sensor Data Eric Nallon, Qiliang Li Department of Electrical and Computer Engineering, George Mason University, Fairfax, VA 22033

Motivation

RBaseline =

R − R0 R0

 R L 2 = z=

Olfactory Bulb

R



x2

i i

X −µ

σ

Preprocessing

Sensor Data

Dimensionality Reduction / Visualization

Feature Extraction

Classification Olfactory Receptors

Data Preprocessing The goal during preprocessing is to reduce noise or irrelevant information in order to precondition the data for further analysis steps. Noise can be reduced by performing baseline correction and drift compensation in sensor data. Sensor Drift Correction

Response Time

Feature Extraction and Selection

Machine Learning Overview Machine learning is a field that deals with learning from data for the purpose of pattern recognition and classification. Broadly speaking, machine learning can be separated into two subfields: Unsupervised learning is commonly used as an exploratory data analysis tool to uncover and exploit correlations within data. Some important techniques include principal component analysis (PCA), linear discriminant analysis (LDA), K-mean clustering, and hierarchical clustering. Supervised learning aims to learn from a portion of the data in order to predict unknown data and provide correct classification labels. Most supervised learning algorithms are provided a M x N feature vector, with M samples and N features. The features are attributes used to describe the input object and are ultimately used for further discrimination. Some important techniques include knearest neighbors (KNN), support vectors machines (SVM), decision trees , and neural networks. Unsupervised and Supervised learning are often used in conjunction when creating an entire system. For example, data sets can become very large and can often be difficult to accurately process for classification. PCA and/or LDA maybe used to reduce the dimensionality of the original data in order to create a smaller data set which still describes the input object, but in a lower dimension.

Max Change

Recovery Time

One of the most important steps in the analysis process is the extraction of features, which describe characteristics of the sensor response. Features can contain information related to the physical properties of the chemical that the sensor is exposed to. Some useful metrics include max signal change, signal-to-noise ratio, and response/recovery times. Principal Component Analysis

Dimensionality Reduction and Visualization Allowing a feature vector to grow to large dimensions can yield problems when moving into classification. The amount of features to be processed is directly related to the number of training samples needed for trustworthy classification. The feature vector can contain features which are redundant and have co-linear properties, causing issues when mathematical transform take place. PCA and LDA can be used to visually inspect data to detect relationships, and to help in the process of selecting an optimal sub-feature vector. K-Nearest Neighbor classification and confusion matrix

The human olfactory system consists of sensory neurons, an olfactory bulb, and the brain. The sensory neurons are receptors that respond to odors, forming a pattern (set of signals) which is further processed by the olfactory bulb and brain. In a human, the sensory neurons can contain 100 million odor sensors. This array contains a variety of sensors with differing odor specificity to particular classes of odors. Individual sensor elements display broad and overlapping selectivity towards chemical species. The sensors are not highly selective, but selectivity is achieved by unique patterns generated by arrays of differing sensors.

An electronic nose attempts to mimic the olfactory system by using a chemical sensor array and machine learning for pattern recognition. The below illustration shows an electrically based sensor array containing various semiconducting materials which respond to a variety of chemicals in different ways. Features can be extracted from the sensor responses to create unique chemical signatures. Similar to how the brain interprets odor patterns, a learning algorithm can be used to train the sensor on known data in order to classify unknowns.

Classification

Sensor 1

The goal of pattern classification is to predict a class label for an unknown feature vector. The feature vector is split into a training data and testing data. The training data contains correct labels while the labels for the testing data are unknown. Once the classifier is trained it attempts to predict the labels for the unknown testing data. Training Train Feature Vector

Data

Sensor 2

Sensor 3

1 Sensor 4

Classifier

Testing Data

Distribution Statement A: Approved for public release

Classify Test Data

Sensor Response Features

Chemical sensors utilizing semiconducting sensing materials can rarely provide direct discrimination of sampled species due to their non-selectivity over a range of chemicals. Post processing of sensor data can aid in discriminating between chemicals. Machine learning techniques have become increasingly popular for the analysis of large datasets and are used in a breadth of fields ranging from drug discovery to marketing. The aim of this research is to apply machine learning methods to chemical sensor data to demonstrate how chemical discrimination may be achieved by examining chemical response information.

Machine Olfaction: Application to Chemical The Electronic Nose Sensors and Sensor Arrays

Data Analysis Approach

Chemical Species

Sensor Response

2 3 Individual Sensor

4

Unique Chemical Signature