Scaling (MDS) for the visualization of multidimensional data. We show how ... patterns, outliers, distance relations, proximity of data points etc. There are many ...
! " # $% '&($) &! *+&,-&./% *-&,10-)' 2 '3$ 4576"8:99@?BADCFEHGJI9LKNMPOJQJRS;25T9LKN9 y UWdeVYX>jtZ7Z7VYVY[:[zl{\Y]N_N[=dk^`u|Zc_ab>ZcV=b@VY}kdeVY_aZcb>ZgfiVYhY[Yjk~ _N_aXXmlNl`j UW]Nd=nmoLZ7VY[qpmhrfgVsXmhtVtu2U2vxw FY
Y m Keywords:
Information Visualization, Multi Dimensional Scaling.
Abstract Multi Dimensional Scaling is a structure preserving projection method that allows for the visualization of multidimensional data. In this paper we discuss our practical experience in using MDS as a projection method in three different application scenarios. Various reasons are given why structure preserving projection methods are useful for the analysis of multidimensional data. We discuss two visual forms (glyphs, heightfields) which can be used to represent the output of the projection methods.
. +¡ F.
In this paper we discuss our practical experience in using Multi Dimensional Scaling (MDS) for the visualization of multidimensional data. We show how MDS is used to gain insight into multidimensional spaces that are represented in a table. A large class of data can be characterized by tables. Such tables can be described by a matrix of attribute variables in one dimension and the outcome of specific cases in the other. Discovery and understanding of the structure in this type of data has many applications in science and business, [1]. Here the word structure refers to geometric relationships among subsets of the data variables in the table. Examples of structure include clusters, regular patterns, outliers, distance relations, proximity of data points etc. There are many numerical and statistical techniques that can be used to analyze structural information from multidimensional data tables. These techniques can be used to automatically extract certain structural properties from the data. Examples of such techniques are principal component analysis (PCA),
¢
k-means and hierarchical clustering algorithms (see [2, 3]). The majority of these techniques focus on specific aspects of the structure of the data such as clusters. A different class of techniques for the analysis of structural information is based on the idea that the multidimensional data points can be projected in a lower dimensional space such that the structural properties of the data are preserved. We called this class of techniques structure preserving projection methods. In this paper we discuss how multidimensional data can be visualized using structure preserving projection methods. We sketch three alternative methods and point out some differences between them. The paper is structured as follows: in the following section we will give an overview of the visualization process of data analysis using projection methods. In section 3 we sketch three structure preserving projection methods. Section 4 describes the visualization of the output of the described projection methods. Three applications illustrate the methods in section 5.
The process of transforming data tables into a visual form can be considered in the context of the well known visualization pipeline[4]. For projection based methods, a pipeline of four stages can be specified as follows (see Figure 1): data acquisition, projection, mapping, and rendering. data aquisition
projection
mapping
rendering
interaction
"!$#&%('*),+
Transforming tabular data into images.
Data acquisition is the process of acquiring and selecting the data to be analyzed. This stage results in the data table. In the projection stage, nonlinear projection techniques are used to transform data points in a high dimensional data space to a lower dimensional visualization space. The goal of these techniques is to compute a spatial representation which preserves structural properties of the data table. In the mapping stage the output of the projection is translated into a set of graphical primitives. The goal of this stage is to effectively present the data in a visual form. During rendering the graphical primitives are rendered as an image. User interaction allows the user to investigate different aspects of the data. In all but the smallest data sets it is impossible to present all information con-
!#"$ &%'()
*
tained in the data automatically in a single image. Therefore the user should be able to interact with the parameters in the visualization pipeline in a meaningful and understandable way.
+-,
.0/2143650798;:=
?@5A80BC1EDGF
Projection methods for the analysis of structure have the following useful properties: The methods do not depend upon any control parameters that would require a priori knowledge about the data. For example, these methods do not depend on control parameters that determine the number of clusters. The methods are not limited to specific types of structures. In contrast to many specific structure seeking methods, projection methods can be used for the analysis of a wide range of complex structures. The methods use human visual capacity to recognize and interpret structure. For example, problems concerning anomalies in the data are overcome since humans can easily eliminate troublesome data points (automatic clustering algorithms have difficulty doing this). We briefly summarize some aspects of three projection based techniques. It goes beyond scope of this paper to discuss each technique in detail: Multi Dimensional Scaling (MDS) computes a configuration of points in a low-dimensional Euclidean space so that the distances between two points match the original dissimilarities between the corresponding variables in the data table [5]. To apply MDS, first a distance matrix (also called a similarity or adjacency matrix) must be generated from the data table. This is done by defining a metric by which the similarity or dissimilarity between cases in the table can be determined. Depending on the data type in the table, numeric, boolean or textual, many different metrics exist to calculate this difference [6]. Formally, if HJILK is the distance between points MNI and MOK and P)I is the position of MI in visualization space, the minimum of the equation
QSRUT
T I K!V-HQBR9PFI@AS