Nov 11, 2005 - context, Learning Vector Quantization (LVQ) [6] found a very ..... [14] S. Seo and K. Obermayer, âSoft Learning Vector Quantization,â Neural.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
Learning Vector Quantization with Training Data Selection Carlos E. Pedreira Abstract—In this paper, we propose a method that selects a subset of the training data points to update LVQ prototypes. The main goal is to conduct the prototypes to converge at a more convenient location, diminishing misclassification errors. The method selects an update set composed by a subset of points considered to be at the risk of being captured by another class prototype. We associate the proposed methodology to a weighted norm, instead of the Euclidean, in order to establish different levels of relevance for the input attributes. The technique was implemented on a controlled experiment and on Web available data sets. Index Terms—Learning vector quantization LVQ, pattern classification, clustering, data selection, neural networks.
æ 1
INTRODUCTION
VECTOR quantization (VQ) has been extensively explored from theoretical and applied points of view. Gersho [1] and Zador [2] are classical reviews. Some broadly used algorithms may be considered in a VQ framework, among those, the family of the K-means [3] and Kohonen’s Self-organizing Maps [4], [5]. Moving into a supervised context, Learning Vector Quantization (LVQ) [6] found a very important role in statistical pattern classification [7]. The main idea of VQ algorithms is to build a quantized approximation to the distribution of the input data, using a finite number of prototype, or codebook, vectors. Concerning LVQ, these prototypes result from an update procedure based on the training data set. Once the prototypes are set, one may associate an assortment of them with a segment or class and classify a data point by using the nearestneighbor rule. LVQ can be included in a broad family of learning algorithms based on Stochastic Gradient Descent [8]. Since its original formulation, a lot of material on LVQ has been published. In the 1980s, Kohonen proposed a number of improvements in his algorithm generating the LVQ2, LVQ2.1, and LVQ3 [9], [5], [6]. Relevant contributions in the 1990s include Sato and Yamada [10], where the Generalized LVQ (GLVQ) algorithm is proposed. In this paper, an explicit cost function (continuous and differentiable) is introduced and the updating rule obtained by minimizing this cost. Without the intention of providing a systematic overview of the state-of-the-art, we mention some different aspects of recent contributions on LVQ. Hammer and Villmann [11] are concerned with scaling the input dimensions according to its relevance. In [12], an initialization insensitive LVQ is proposed. The main idea is to substitute the usual nearest-neighbor distances by harmonic average distances. In [13], the “premature clustering phenomenon” is approached. The impact of wrong patterns is increased by considering different weights for correct and wrong patterns to equalize their total impact on prototypes shifting, with the goal of avoiding premature convergence. In [14], an objective function based on a likelihood ratio is proposed.
VOL. 28, NO. 1,
JANUARY 2006
157
The success of a classification scheme may be directly associated with an appropriate data preprocessing. Two very important aspects of preprocessing treatment are related to: the choice of the attribute features and to input data selection. A considerable amount of work can be found in the machine learning literature on input attributes choice and manipulation. On the other hand, contributions pointing to data selection are less abundant. Focusing on LVQ, contributions are basically restricted to Kohonen’s windows ideas proposed for the LVQ2.1 algorithm [5]. It is well-known that redundant or uninformative attributes are, in general, harmful and frequently overshadow performance. Concerning attribute preprocessing, one approach is to impose transformations on the original variables, generating more appropriate representations, e.g., Factor Analysis [15], [3], PCA [16], ICA [17]. Among the methods related to the choice of relevant attributes, one may mention: Stepwise Regression [18], Decision Tree; Neural Networks [19], Mutual Information [20], [21], and Attribute weighting [22]. The main goal of this paper is to propose a new algorithm for data selection in LVQ schemes. Furthermore, we explore a combination of this algorithm with attribute weighting preprocessing. The idea of data selection in LVQ context was previously explored in [5] (LVQ2.1) by establishing a fixed window around the midplane between two prototypes. Data selection may be also related to margin. Concerning LVQ, margin concept was explored in [24]. In this paper, we propose a data selection methodology that takes advantage of the geometry of the problem. In this way, LVQ algorithms may be viewed as a sort of dispute among the prototypes for the data points. So, each prototype would be responsible for “defending” its group points. Some of the training data points may be totally secure, concerning being captured by the wrong prototype, meanwhile others may be in “risk” zones. The proposed policy is to update the prototypes considering the risk zones points only. The prototypes tend to move into the risk areas in order to defend these points.
2
THE MODEL
2.1
Notation and Prototypes Update
Let us consider a labeled data set S ¼ fðxi ; yi Þ; i ¼ 1; . . . ; Ng, where xi 2