XGvis: Interactive Data Visualization with Multidimensional Scaling ANDREAS BUJA 1, DEBORAH F. SWAYNE 2, MICHAEL L. LITTMAN 3, and NATHANIEL DEAN 4 November 24, 1998 We discuss interactive techniques for multidimensional scaling (MDS) and a system, named \XGvis", that implements these techniques. MDS is a method for visualizing proximity data, that is, data where objects are characterized by dissimilarity values for all pairs of objects. MDS constructs maps of these objects in IRk by interpreting the dissimilarities as distances. MDS in its conventional batch implementations is prone to uncertainties with regard to 1) local minima in the underlying optimization, 2) sensitivity to the choice of the optimization criterion, 3) artifacts in point con gurations, and 4) local inadequacy of the point con gurations. These uncertainties will be addressed by the following interactive techniques: 1) algorithm animation, random restarts, and manual editing of con gurations, 2) interactive control over parameters that determine the criterion and its minimization, 3) diagnostics for pinning down artifactual point con gurations, and 4) restricting MDS to subsets of objects and subsets of pairs of objects. MDS was originally developed for the social sciences, but it is now also used for laying out graphs. Graph layout is usually done in 2D, but we allow layouts in arbitrary dimensions. We permit missing values, which can be used to implement multidimensional unfolding. We show applications to the mapping of computer usage data, to the dimension reduction of marketing segmentation data, to the layout of mathematical graphs and social network graphs, and nally to the reconstruction of molecules in nano technology. Andreas Buja is Technology Consultant, AT&T Labs - Research, 180 Park Ave, P.O. Box 971, Florham Park, NJ 07932-0971.
[email protected], http://www.research.att.com/~andreas/ 2 Deborah F. Swayne is Senior Technical Sta Member, AT&T Labs - Research, 180 Park Ave, P.O. Box 971, Florham Park, NJ 07932-0971.
[email protected], http://www.research.att.com/~dfs/ 3 Michael L. Littman is assistant professor, Department of Computer Science, Duke University, Box 90129, Durham, NC 27708.
[email protected], http://www.cs.duke.edu/~mlittman/ 4 Nathaniel Dean is Associate Professor, Computational and Applied Mathematics - MS 134, Rice University, 6100 Main Street, Houston, TX 77005.
[email protected], http://www.caam.rice.edu/~nated/ 1
1
XGvis uses the XGobi system for visualizing point con gurations. The XGvis system, which implements these techniques, is freely available with the XGobi distribution from http://www.research.att.com/areas/stat/xgobi/ or ftp://ftp.research.att.com/dist/xgobi/. Key Words: Multidimensional Unfolding, Proximity Data, Multivariate Analysis, Interactive Data Visualization, Dimension Reduction, Graph Layout, Social Networks, Molecule Reconstruction.
1 Introduction: Basics of Multidimensional Scaling The present section gives a short introduction to those types of MDS that are of relevance to this article. Section 2 describes the functionality of the XGvis system. Section 3 outlines the MDS methodology supported by this functionality. Section 4 contains case studies of proximity analysis, dimension reduction, and graph layout in two and more dimensions.
1.1 Proximity Data and Stress Functions
Multidimensional scaling (MDS) is a family of methods for analyzing proximity data. Proximity data consist of similarity or, equivalently, dissimilarity information for pairs of objects. This contrasts with plain multivariate data that consist of covariate information for individual objects. If the objects are labeled i = 1; :::; N , the data can be assumed to be dissimilarity values Di;j . If the data are given as similarities, some monotone decreasing transformation will convert them to dissimilarities. Dissimilarity data occur in many areas (see Section 1.3 below). The goal of MDS is to map the objects i = 1; :::; N to points x ; :::; xN 2 IRk in such a way that the given dissimilarities Di;j are well-approximated by the distances kxi ? xj k. Psychometricians often call these distances the \model" tted to the data Di;j . The dissimilarity matrices of Figure 1 are simple examples with easily recognized optimal scaling solutions: the left matrix suggests mapping the ve objects to an equispaced linear arrangement; the right matrix suggests mapping the three objects to a right triangle. The gure shows con gurations actually found by MDS. The rst con guration can be embedded without distortion in k = 1 dimension, while the second needs at least k = 2 dimensions. In real data, there are typically many more objects than in these examples, and the dissimilarities usually contain error as well as bias with regard to the tted distances. Since Kruskal (1964a,b), MDS is most often conceived as the minimization of a so-called \stress function," that is, a measure of lack of t between dissimilarities Di;j and distances kxi ? xj k. In the simplest case, the stress function is a residual sum of squares: 0 1= X SD (x ; :::; xN ) = @ (Di;j ? kxi ? xj k) A (1) 1
1 2
1
2
i6=j =1::N
where the outer square root is just a convention. For a given dissimilarity matrix D = (Di;j ), MDS minimizes the stress function over all point con gurations (x ; :::; xN ), thought of as 1
2
2 66 01 6 D = 66 2 64 3 4
1 0 1 2 3
2 1 0 1 2
3 2 1 0 1
4 3 2 1 0
3 77 77 77 5
3 2 0 3 4 D = 64 3 0 5 75 4 5 0
Figure 1: Simple Examples of Dissimilarity Matrices and Their Exact Optimal Scaling Solutions.
k N -dimensional hypervectors of unknown parameters. The minimization can be carried out by straightforward gradient descent applied to SD , viewed as a function on IRkN . The dimension k of the con guration points xi is arbitrary in principle, but low in practice: k = 1; 2; 3 are the most frequently used dimensions, for the simple reason that the points serve as easily visualized representors of the objects.
1.2 Types of Multidimensional Scaling
There exist many MDS methods, diering mostly in the stress function they use. A natural classi cation of MDS methods has the form of a 2 2 table based on two dichotomies: \Classical scaling" versus \Kruskal-Shepard scaling": In classical scaling, similarity data are modeled by inner products, while in Kruskal-Shepard scaling dissimilarity data are modeled by distances. The latter type of MDS is often called \least squares scaling," but this is a misnomer because \classical scaling" is least squares, too, the residual sum of squares being applied to similarity data and inner products: X (Si;j ? < xi; xj >) = min : 2
A conceptual dierence between classical and Kruskal-Shepard scaling is that inner products rely on an origin, while distances do not; a set of inner products determines uniquely a set of distances, but a set of distances determines a set of inner products only modulo change of origin. The computational dierence between classical and Kruskal-Shepard scaling is that the minimization problem for the former can be solved with a simple eigendecomposition, while the latter requires an iterative minimization procedure. 3
\Metric scaling" versus \nonmetric scaling": Metric scaling uses the actual values of
the similarities or dissimilarities, while nonmetric scaling eectively uses only their ranks (Shepard 1962, Kruskal 1964a). Nonmetric MDS is realized by estimating an optimal monotone transformation f (Di;j ) of the proximities simultaneously with the con guration. In this article, we discuss roughly speaking \Kruskal-Shepard metric scaling": We model dissimilarities by distances, and we use the actual values of the dissimilarities. We are more general in several ways, however, by allowing the dissimilarities to be transformed by a power transformation, by using weights that are a power of the dissimilarities, by permitting missing values among the dissimilarities, by providing features for localizing the stress function, ...; see Section 2.2 for a complete description of the stress function.
1.3 Three Application Areas of MDS
We describe three major uses of MDS: MDS was invented for the analysis of proximity data which arise in the following areas: { The social sciences: Proximity data take the form of similarity ratings for pairs of stimuli such as tastes, colors, sounds, people, nations, ... { Archeology: Similarity of two digging sites can be quanti ed based on the frequency of shared features in artifacts found in the sites. { Classi cation problems: In classi cation with large numbers of classes, pairwise misclassi cation rates produce confusion matrices that can be analyzed as similarity data. An example would be confusion rates of phonemes in speech recognition. A second early use ofK MDS has been for dimension reduction: Given high-dimensional data y ; :::; yN 2 IR (K large), compute a matrix of pairwise distances dist(yi ; yj ) = Di;j , and use Kruskal-Shepard MDS to nd lower-dimensional x ; :::; xN 2 IRk (k 1, distorting the con guration into a star-like shape due to the greater in uence of the largest dissimilarities. In XGvis, we provide interactive power transformations as a substitute for the estimation of a monotone transformation that is part of nonmetric MDS. Combining interactive power transformations with metric MDS has certain advantages: The stiness of power transformations prevents over tting, whereas automatically estimated transformations occasionally lead to artifacts due to over t. In these cases, the estimated monotone transformation is not the most meaningful, although it minimizes stress.
3.5 Minkowski Distances as Alternative Distance Models, and their Use for Rotation of Con gurations
Minkowski or Lebesgue distances on con guration space are sometimes used as alternatives to Euclidean L distance in order to achieve better ts. Of particular importance are the two extremes of the Minkowski family: the L (or city block or Manhattan) metric, and the L1 (or maximum) metric. Greater plausibility of L or L1 over L can sometimes be argued. In our own experiments with interactive MDS, we found another more frequent use for Minkowski metrics: rotation of con gurations for interpretation, similar to factor rotation in factor analysis. The point is to exploit non-Euclidean metrics to break rotation invariance of 2
1
1
17
2
m=2:0
m=2:5
m=1:7
Figure 8: The Eect of Minkowski Metrics on MDS Con gurations. Top row: the 5 5 grid graph, bottom row: the Morse code data, both scaled into two dimensions, in Minkowski metrics with parameters m = 2:0 (Euclidean), m = 2:5, and m = 1:7. the stress function: For m 6= 2, optimal con gurations must align themselves in a particular way with the coordinate axes. This alignment often leads to interpretable axes. If there are for example certain axial (near-)symmetries in a con guration, non-Euclidean metrics may force the axes of symmetry to line up with the coordinate axes. In order to nd these special alignments, the following recipe can be used: Temporarily choose an Lm metric with m > 2 or m < 2, optimize the con guration, switch back to L , and optimize again. The result is a rotated version of the L con guration. Typically, for con gurations in 2D, the major dierence between solutions under Lm : and Lm : is a 45 degree rotation. Examples are shown in Figure 8: Both for the 5 5 grid graph, the Lm : and Lm : metrics force certain axes of symmetry to align with the coordinate axes. Although both metrics nd axes of symmetry, for most purposes we would probably prefer the one found with Lm : . The same is true for the Morse code data where the Lm : metric essentially aligns the code length with the horizontal axis and the dash frequency with the vertical axis. The con guration for Lm : is a 45 degrees rotated version of the same. 2
2
=2 5
=1 7
=2 5
=2 5
=1 7
=2 5
=1 7
3.6 Diagnostics with Localization Methods
With localization we mean the examination of structure contained in relatively small dissimilarities. Localization is a dicult problem because of the following received wisdom about MDS: 18
0 -----
0 -----
I ..
T-
I ..
0 -----
T-
T-
Figure 9: Localizing MDS to Partitions and Individual Objects. All plots show the subset of the Morse code data consisting of the codes of length 1, 2, and 5. Subsetting was performed simply to avoid clutter in the subsequent plots. The rst plot shows the subset scaled globally. The second plot shows the result of applying within-groups MDS to the groups of equal code length; each group (marked by dierent glyphs) is scaled separately from the other groups. The eect on the digits (codes of length 5) is that they now form a loop rather than an arc. The right hand plots show the eect of anchored MDS with regard to the codes \T" and \0", respectively. In the third plot, the code \T" is shown to be about equally far from all other codes except for \E", the other code of length 1. In fourth plot, the most distant digit from \0" is shown to be about as far as the codes of length 1 and 2. The global shape of MDS con gurations is determined by the large dissimilarities; consequently, small distances should be interpreted with caution: they may not re ect small dissimilarities.
These statements are based on a widely cited study by Graef and Spence (1979) who ran simulations in which they removed, respectively, the largest third and the smallest third of the dissimilarities. They found devastating eects when removing the largest third, but relatively benign eects when removing the smallest third. As a consequence, if the points representing the Morse codes \T" and \E" lie close together, it does not follow that they are perceptually similar (i.e., often confused). It may much rather mean that there exists a large set of codes from which they are both roughly equally dissimilar. Shared patterns in large dissimilarities are vastly more powerful contributors to the stress function than the single dissimilarity between the codes \T" and \E". Thus it would be of interest to diagnose whether \T" and \E" are indeed close. To answer these and similar questions, one may use two techniques for localizing MDS: Within-groups MDS can be used to assess dissimilarity structure in subsets. Groups can be precomputed and entered in colors and glyphs les. Alternatively, groups can be continually rede ned with brushing operations in an auxiliary linked window. Note that in within-groups MDS the relative positions of the groups to each other are meaningless (determined by the starting con guration). The second plot in Figure 9 shows an example of within-groups MDS, where the Morse codes of lengths 1, 2 and 5 are partitioned into three groups that are scaled separately. 19
Figure 10: The Eects of Weighting and Truncating Dissimilarities. Left: standard con guration, unweighted/untruncated dissimilarities. ?. Center: weighting with wi;j = Di;j Right: truncating the largest third of the dissimilarities. 4
Anchored MDS with single objects: Nothing prevents us from forming a group
containing just one object and using it as an anchor. Note that only distances between the anchor object and the other objects should be interpreted, while the positions between the other objects are meaningless (determined by the starting con guration). The third and fourth plot in Figure 9 show con gurations for the Morse code data with anchors \T" and \0", respectively. The third plot shows that these codes are close, but not as close as the global con guration of the rst plot would suggest.
3.7 Weighting and Truncating to Assess Sensitivity to Large Dissimilarities
Within-groups MDS and anchored MDS are powerful ways of exploring local structure. Yet it is another method of localization that is often proposed for MDS: dropping the large dissimilarities from the stress function and approximating only the small dissimilarities with con guration distances. The intuition behind this idea is that small dissimilarities re ect local properties, and that by building con gurations from local information one obtains more faithful global con gurations. Unfortunately, this proposal is generally a great disappointment. Not only do the resulting optimizations often not converge to meaningful global con gurations, such con gurations often do not exist because of inconsistencies among dissimilarities stemming from observational error or from non-metric behavior of the dissimilarities. Such inconsistencies fail mathematical intuitions trained on dierential equations, where in nitesimal information is successfully integrated up to global solutions. Attempts at integrating local to global structure are often not successful in MDS. Just the same, we provide two mechanisms for assessing the in uence of large and small dissimilarities: Weighting in order to smoothly changeq the in uence of small and large distances: If the exponent q in the weights wi;j = Di;j is chosen to be negative, large dissimilarities 20
p=6:0
D^p
2e+13
I ..,2 ..---
0
1e+13
100
D^p
200
3.00e+13
4e+13
300
p=1:0
100
200
300
0
1e+13
d_config
2e+13
3.00e+13
4e+13
d_config
Figure 11: Two Shepard Diagrams for Optimized Con gurations of the Morse Code Data. Left: raw data, power p = 1; right: transformed data, power p = 6. The transformed data exhibit an improved t over the raw data. An outlier is marked in the right hand plot: For the pair of codes (I; 2) ( ; ? ? ? ) the tted distance is vastly larger than the target dissimilarity. are down-weighted; conversely, if q is chosen to be positive, they are up-weighted. Truncation in order to drop large (or small) dissimilarities from the stress function. This is done in XGvis by moving the two handles below the histogram at the bottom of the control panel (Figure 3, left). The location of these handles is translated into thresholds for excluding large and small dissimilarities, respectively. In our experience, both truncation and weighting have a data-dependent range in which they produce useful alternative con gurations. Outside this range MDS minimization tends to disintegrate. Figure 10 shows con gurations obtained with down-weighted large distances using q = ?4, and with the largest third of the dissimilarities truncated. Both con gurations were obtained by starting MDS minimization from the standard con guration (unweighted and untruncated). Starting from random con gurations did generally not result in meaningful con gurations.
3.8 Goodness of Fit: The Shepard Diagram
The Shepard diagram is a plot of the (transformed) dissimilarity data against the tted distances. It provides a qualitative assessment of the goodness of t, beyond the quantitative assessment given by the stress value. In Figure 11, for example, one easily recognizes the structural de ciency of tting distances to untransformed dissimilarities in the Morse code data. (An XGobi window with a Shepard diagram is produced from the XGvis control 21
panel.)
3.9 MDS on Large Numbers of Objects
Kruskal-Shepard MDS is an N algorithm, a fact that severely limits the reach of MDS for large N . On the hardware at our disposal, interactive use of XGvis is possible up to about N = 500. Much larger N can be processed also, but the user will essentially treat it as a batch job and leave the optimization to itself for a while. The largest N we have scaled with XGvis was N = 3648 (Section 4.3.2), but still larger N are feasible with more patience. The reach of MDS extends when a substantial number of terms is trimmed from the stress function. Such trimming is the eect of both within-groups MDS and anchored MDS (Section 2.3). Within-groups MDS can be applied if there exist natural groups in the data, and if comparisons between groups are not essential. If one has two groups of equal size, the number of terms in the stress function is cut by half. Anchored MDS can be applied if an informative set of anchors can be found. The choice of anchors is crucial: In the example below (Section 4.3.2), random choice of anchors did not work. Because the dissimilarities of the example were derived from a graph, it was natural to use the 100 node objects of highest degree, a choice that was eective. The total number of non-missing dissimilarities was then 100 N as opposed to N , which is a substantial savings for N = 3648. The duration of one gradient step was reduced from 100 seconds to 6 seconds. The resulting con guration was useful for the purpose at hand (Section 4.3.2). Introducing missing values among the dissimilarities is another method for reducing the number of terms in the stress function. In particular, within-groups MDS can be thought of as MDS on a block-diagonal dissimilarity matrix where each block corresponds to a group and the dissimilarities outside the diagonal blocks are missing. For two groups, this looks as follows: group1 group2 ! (Dgrp ;grp ) NA group1 D = NA (D ) group2 2
2
1
1
grp2;grp2
Similarly, anchored MDS is achieved by using a dissimilarity matrix with the following NA pattern: anchors oaters! (Dancr;ancr ) NA anchors D = (D ) NA oaters fltr;ancr
3.10 Multidimensional Unfolding
3.10.1 Multidimensional Unfolding of Bipartite Proximity Data
Multidimensional unfolding is a variant of MDS where objects are partitioned into two kinds, and dissimilarities are known only for pairs of objects of dierent kinds. A typical application of multidimensional unfolding would be to preference data, where subjects rate stimuli (teas, 22
nations, ...), and both subjects and stimuli are considered as objects. If a subject rates a stimulus highly, one would say that this subject and this stimulus have high proximity. The intent of multidimensional unfolding is to construct a joint map of both subjects and stimuli that re ects these proximities. Because dissimilarities are missing for objects of the same kind, the matrix has the following structure (Borg and Lingos (1987, chapter 9): subjects stimuli ! NA (Dsubj;stim ) subjects D = (Dstim;subj ) NA stimuli Alternatively, unfolding can be achieved with across-groups MDS using subjects and stimuli as the two groups. It is obvious that multidimensional unfolding is a much more tenuous endeavor than MDS proper because of the greater sparsity of the geometric information used as input. Figure 12 shows an example where the Morse code data was subjected to multidimensional unfolding as follows: The objects of the rst type were the digits 0,...,9 (codes of length 5), the objects of the second type were the letters a,...,z (codes of length 1,...,4). The gure shows three con gurations corresponding to the raw dissimilarities, their 5th roots and their 6th powers, respectively. The histogram below the left con guration plot reveals that the raw dissimilarities are nearly constant. Taking the 5th root, as in the center histogram, accentuates this even more. By contrast, taking the 6th power spreads the dissimilarities nicely between zero and the maximum value. These properties of the dissimilarity distribution translate into a nearly degenerate con guration on the left, a strongly degenerate con guration in the center, and a less degenerate and therefore more meaningful con guration on the right. Degeneracy in multidimensional unfolding means that all con gurations points of the rst type are about equally far from all points of the second kind. This is borne out by the diamond shape of the con guration in the center plot where all digits and letters fall in diagonally opposite locations of the diamond so as to achieve equidistance between digits and letters. The stress values of .09, .04 and .22, respectively, give the wrong clues { the worst stress value belongs to the most meaningful con guration. This is well-known in the literature on multidimensional unfolding (Borg and Lingos, 1987). An implication is that nonmetric varieties of unfolding often do not work because they optimize transformations of the dissimilarities with regard to stress. The eect of this optimization is transformation of the dissimilarities into degeneracy, achieving unrealistic near-zero stress values. The right con guration of Figure 12 is the most meaningful and interpretable con guration: The digits lie nearly on an arc segment where the extremes are formed by the digits 0 ( ve dashes) and 5 ( ve dots). The letters with codes of length 4 are tangled up with the digits, but they, too, line up according to the frequency of dots. The letters with codes of length 3, 2, and 1 form segments at increasing distances, but always mimicking the trend from low to high fraction of dashes. This expresses, of course, the fact that again code length and fraction of dashes are the two factors determining confusion rates between digits and letters. More speci cally, codes of length 4 are often confused with digits (length 5), and the shorter a code, the better it is discriminated from digits. 23
p=1:0
p=0:1
p=6:0 0 -----
0 -----
O ---
T0 ----Q --.-
M --
Q --.S ...
Q --.-
S ... O --T I -...-EM
E. H ....
--M T E IO..-.-H ....
5 .....
I .. 5 ..... H ....
5 .....
S ...
Figure 12: Multidimensional Unfolding of the Morse Code Data into 2D. The digits (0-9) are unfolded against the letters (a-z). The left, center and right con gurations were obtained from the raw dissimilarities, their 5th roots and their 6th powers, respectively. The histograms below the con gurations show the distributions of the raw and the transformed dissimilarities. The stress values are .09, .04, and .22, respectively. The center con guration is an example of a degeneracy, arising from transformed dissimilarities that are all approximately equal (see the histogram). The left con guration (raw dissimilarities) is an approximation to the degeneracy. The right con guration (power p = 6) is the most meaningful one. The results of unfolding depend even more than those of MDS proper on the choice of good starting con gurations.
3.10.2 Multidimensional Unfolding of Asymmetric Proximity Data
An interesting application of multidimensional unfolding is to asymmetric proximity data, an example being the Morse code data: In Rothkopf's experiment, ordered pairs of codes were presented to subjects who were asked to tell whether the codes were the same. From this experimental setup, one should expect an order eect: The pairs (A,E) and (E,A), for example, will generally have diering confusion rates, resulting in asymmetry of the confusion matrix. In fact, eyeballing the Morse code data con rms that sizable asymmetries do occur: The pair (A,E) has a confusion rate of 6%, while (E,A) has 3%. A general recipe for analyzing asymmetric proximity data is, roughly speaking, to apply multidimensional unfolding to the objects in the rst position against the objects in the second position. To this end, we double the number of objects according to whether they appear in the rst or the second position in a pair. In the Morse code data, for example, the objects would be A ; B ; :::; 9 ; 0 ; A ; B ; :::; 9 ; 0 , where the subscript indicates the position of a code in a pair. For these positional objects, dissimilarities are known for diering positions but missing for identical positions. The raw N N dissimilarity matrix 1
1
1
1
2
24
2
2
2
-9
-9
-9
9-
9EA-
94EA-
4-4
-E-A
4-4
E-E -4 -A A-
-E -A
Figure 13: Multidimensional Unfolding of the Asymmetric Morse Code Data, First Position (\X-") Versus Second Position (\-X"). The codes in the rst position are linked by lines as in Figure 2, while the codes in the second position are linked by lines to their counterparts in the rst position. The left hand plot shows a highly degenerate solution for the raw data. The center plot shows a perfect degeneracy obtained from constant dissimilarities after transforming the dissimilarities with the power p = 0. The right hand plot shows a solution obtained after transforming the data with the power p = 6. This is the only meaningful con guration; it resembles the MDS solution of Figure 7 (rightmost plot), but it has additional information about positional eects.
D and the positional 2N 2N dissimilarity matrix D~ are therefore related as follows: ! NA D D~ = DT NA By applying multidimensional unfolding to D~ , we are nding a joint map for the objects in both positions, the map being solely determined by the dissimilarities between codes in dierent positions. Figure 13 shows such maps for the Morse code data: The raw data (left plot) yield a nearly degenerate con guration: The codes in the rst position clump in the shape of an oval cluster and the codes in the second position spread equidistantly around the cluster. This near degeneracy approximates the exact degeneracy of constant data (center plot). A meaningful con guration is obtained after transforming the data with the 6th power: The rst and second positions are generally quite close to each other, and the overall layout is reminiscent of the con guration of Figure 7 (rightmost plot). For the codes of length 2 and 3, however, one can recognize a systematic pattern of asymmetry: The vectors from the rst to the second positions generally point left towards shorter codes, indicating that the second positions are more easily confused with shorter than longer codes in the rst position.
4 A Tour of MDS Applications We give a tour of data examples that demonstrate the wide applicability of MDS, and the power of XGvis to execute. 25
32 tar
56 xcal 63 4Dwm 7 userenv
4 pr
69 4Dwm 66 xhost
23 postprin 37 sendmail
35 more 56 xdm 50 xterm
79 xdvi
27 make 26 java
3 mailx 29 sed
7 nawk 73 eqn
44 grep
78 spell
47 find
79 virtex
67 sed 31 gzip
80 detex
75 xgvis
Figure 14: Maps of Computer Commands for Two Individuals. Left: a statistician who does programming and data manipulation. Right: an administrative assistant who does e-mail and word processing.
4.1 Distance Data for Computer Usage
A type of data that poses diculty for visualization (or any statistical analysis for that matter) are non-rectangular data such as transactional data and event data, where objects of interest are characterized by a variable number of transactions or events. Examples are phone call data that characterize phone company customers, or earthquake data that characterize geographic areas. The typical approach to such data is to extract a xed number of features for each object, thus transforming non-rectangular to rectangular multivariate data. Another, lesser known, approach is extraction of distance data: Often it is not too dicult to directly de ne sensible distance measures between series of transactions and events. Given a distance matrix for all pairs of objects, MDS becomes the tool bag of choice for visualization. We illustrate transactional data with computer usage data collected at AT&T Labs as part of a project on intrusion detection for computer networks (Theus and Schonlau 1998, Dumouchel and Schonlau 1998). The purpose of this particular data collection was to examine behavioral indicators that would discriminate between known, legitimate users. The data shown here are courtesy of Schonlau and Theus. The data are essentially a record of all operating system commands executed by a given user over the course of several days. Theus and Schonlau (1998) explored individual usage patterns by calculating for a given individual a set of distance measures for commands: two commands are close for this individual if he/she often uses the commands close to each other. 26
Figure 15: Dimension Reduction of 4 Orthogonal Circles in 8-Space. The con gurations are 2D, 3D and 8D. The 8D con guration simply shows that the circles can be fully recovered (stress near zero). The 2D and 3D con gurations illustrate the tradeos of dimension reduction: A congested place in 8D, the intersection of the 4 circles has to be accommodated in 2D or 3D by distorting the circles and forming a vertex. Otherwise the reduction tries to re ect the identical shapes and sizes as well as the positional symmetry of the circles. The resulting distances were then used to create a map of the commands with MDS. The hope was that users would exhibit strong individual signatures in these maps, which they did indeed. Figure 14 shows maps for two individuals, one a statistician, the other an administrative assistant. Both maps show in the upper left a chain of commands that originates from the startup of a session, when among other things the window manager (\4Dwm") is being launched. Otherwise the two maps are drastically dierent: The statistician on the left shows a cluster in the center arising from programming activity, and a cluster in the lower right arising from data manipulation and use of XGvis. The administrative assistant shows some e-mail activity near the startup sequence and word processing work in the right half of the map. These two individuals are somewhat extreme in their dierences, but maps of more homogeneous groups of users carry strong individual signatures also. Dierentiating features of the signatures have been identi ed and exploited for discrimination among users by Theus and Schonlau (1998).
4.2 Dimension Reduction
In order to use MDS for dimension reduction, one computes a distance matrix from a highdimensional multivariate dataset and subjects the matrix to MDS in lower dimensions, such as two or three. The potential advantage of MDS over other methods such as principal components is its nonlinearity and consequent greater exibility. The disadvantage of MDS is the lack of a mapping between data space and con guration space: MDS maps only the observations, not the embedding space. By comparison, principal components analysis produces projections as mappings. In what follows we give two examples of dimension reduction, one for arti cial data, one 27
for real data.
4.2.1 An Arti cial Example: 4 Circles in 8-Space
If high-dimensional data are truly high-dimensional, that is, they do not lie on a lowerdimensional manifold, then any dimension reduction technique has to decide on how to
atten the data. In the following example, there exists exactly one location in data space where the data are truly high-dimensional, while the data are intrinsically 1- or 2-dimensional everywhere else. A non-linear eect of MDS will be observed in the 8-dimensional location. The data in the example consist of points on four circles in 8-space, 50 points per circle. The circles lie in four 2-planes that are mutually orthogonal to each other, which is possible in 8-space, but not in any lower-dimensional space. The circles intersect at the origin: This is where the data are at least 4-dimensional (spanned by four tangent vectors) if not 8-dimensional (spanned by four planes). Figure 15 shows the consequences of attening these data to two and three dimensions with MDS: The reduction distorts the circles but preserves their identical sizes and shapes and gives an indication of symmetric position between them. At the origin, the four circles are distorted into cusps, a non-linear eect of MDS. By contrast, the third plot in Figure 15 shows a random 2D projection of the circles in 8-space: The circles are mapped to the obvious smooth ellipses, but their symmetric positions and identical sizes and shapes are not preserved.
4.2.2 Marketing Data
This example is from a market segmentation study at AT&T. Analysts constructed four market segments by applying k-means clustering to roughly 2,000 households characterized by eight variables. The data do not fall into natural clusters, and the k-means algorithm simply slices the data like a pie. The four k-means centers lie very near the plane spanned by the rst two principal components, resulting in the coherent Voronoi regions visible in the right hand plot of Figure 16. We compared the projection onto the rst two principal components with the MDS reduction to two dimensions. To this end, we used XGvis basically in batch mode by starting up the optimization and walking away for several minutes: Applying MDS to 2,000 objects is a computationally expensive task, although quite feasible given some patience. The result is shown in the left plot in Figure 16: The point scatter is very similar to that of the principal component projection, although more rounded on the periphery and less crowded in the center. The stress value was a disappointing 0.254, indicating substantial residual structure. We interpret the peripheral rounding of the con guration and the low density in its center as approximation to the null con guration of MDS in 2D (see Section 3.3.1). In light of this likely artifact, we prefer the principal component projection over the MDS con guration in this case. In general, our impression is that dimension reduction is not an optimal use of MDS.
28
Figure 16: Marketing Segmentation Data. Left: MDS reduction to 2D; right: largest two principal components. The glyphs represent four market segments constructed with k-means clustering using four means. MDS and principal components yield similar dimension reductions. The MDS reduction shows mild symptoms of null behavior, though, with less mass in the center and rounding of the periphery.
Figure 17: The Graphs of the Permutation Groups S4 and S5. The groups are turned into graphs by placing edges between permutations that dier only by an adjacent transposition. The two left hand plots show optimal con gurations of S4 in 2D and 3D; the two right hand plots show the same for S5 in 3D and 4D (really: 2D projections thereof). The graph of S3 scales to a truncated dodecahedron in 3D. The graph of S4 is most naturally embedded in 4D.
4.3 Graph Layout
The XGvis software was originally motivated by the problem of laying out graphs in more than two dimensions and using XGobi as a display device (Littman et al. 1992). The use of more than two dimensions is an obvious way of avoiding edge crossings, the bane of graph layout in two dimensions. Strictly speaking, edge crossings are not avoided by laying out in 3D: the drawing plane is still 2D, but the 2D display of a moving layout in 3D creates the illusion of avoiding edge crossings. Another way of looking at 3D layout is the following: Rotating a 3D layout and projecting 29
Figure 18: Telephone Call Graphs. The nodes represent telephone numbers, the edges represent the existence of a call between two telephone numbers in a given time period. The two rows present three views each of two call graphs. The left column shows layouts in 2D, the middle and right hand column show projections of layouts in 3D. For static rendering in a print medium such as this journal, the 2D layouts look more orderly. For realtime viewing with data rotations, however, the layouts in 3D are vastly superior because overplotting is a much smaller problem in 3D. it to 2D amounts to moving among a 2-parameter family of 2D layouts. Rotation then allows users to search among the 2-parameter family of 2D layouts and nd the one that best reveals local detail of interest. In order to lay out a connected graph with MDS, one computes a distance matrix from the graph. The simplest is the so-called \shortest-path metric": For each pair of nodes, de ne their distance to be the length of the shortest path of adjacent edges between the two nodes. This de nes distance values that satisfy the triangle inequality. These distances can be submitted to MDS, resulting in con gurations that can serve as layouts. One will of course show the edges as lines connecting the con guration points. XGvis accepts graphs as input data, represented as a list of pairs of object numbers. XGvis then computes the shortest-path metric at start-up. In the following section we show layouts of three types of graphs: 1) mathematically de ned permutation graphs, 2) graphs that arise in the analysis of telephone calling patterns, and 3) graphs that arise from a molecule reconstruction problem in chemistry. XGvis does not address very large graphs. The largest problem we have successfully scaled 30
Figure 19: A Larger Telephone Call Graph with 3648 Nodes, Scaled into 3D. The interest in this graph stems from an investigation into chat lines with fraudulent 800 numbers. Left: full MDS with the 3648 3648 matrix; right: anchored MDS with 100 anchors. with XGvis had 3648 objects (see below). Full MDS took hours, but anchored MDS made the optimization considerably more ecient. For a system that can lay out large graphs in 2D, see G. Wills' \NicheWorks" (http://www.bell-labs.com/user/gwills/NICHEguide/niche.html).
4.3.1 Permutation Graphs
Visual representations of permutation graphs have found powerful uses in the analysis of ranking data, as the work of G. Thompson (1991,1993) shows. We will not go into the analysis of such data; we are only concerned with the more basic question of how to generate natural visual representations of permutation graphs. To this end, we follow the practice in discrete mathematics of de ning a graph structure on permutations as follows: Two permutations are linked by an edge i they dier only by one adjacent transposition. For example, 1234 has an edge with 1243, but not with 1432 and neither with 2143. MDS con gurations are shown in Figure 17 for permutations of n = 4 and n = 5 objects. Not unexpectedly, the con gurations are extremely symmetric. As the number n of objects grows, n = 4 and 5 are about the only manageable cases with N = 24 and 120 nodes, respectively. For n = 6 already, the graph size of N = 720 is not manageable, more for reasons of visual overload than computational limitations.
4.3.2 Telephone Call Graphs
Social networks can be interpreted as graphs and visualized accordingly. A particular type of social networks are telephone call graphs in which telephone numbers are nodes and the existence of calls made between two telephone numbers constitutes an edge. Call graphs as 31
seen by large telephone companies tend to be extremely large; the techniques described in this paper are applicable only to small subgraphs. In the following example, a call graph consisting of dozens of millions of nodes was broken up into strongly connected components (Abney et al. 1998). Because we had no good prior intuitions about the structure of call graphs, we examined a few of these components visually in XGvis. Figure 18 shows two examples, one subgraph with 110 nodes, the other with 370 nodes. The two examples are typical for a certain type of subgraph: Both subgraphs are dominated by \dandelions", and both dandelions arise from hotels at their centers. This was a valuable negative insight which led us to re ne the de nition of the call graph by eliminating uninteresting nodes. Figure 19 shows a much larger call graph with N = 3648 nodes scaled into 3D. The left hand plot shows the result of full MDS, which took the better part of a night. By comparison, the result of anchored MDS is shown in the right hand plot. The left hand layout shows the nodes with high degrees as fan-like shapes. The right hand layout apparently collapsed the fans with the high-degree nodes at their bases, but it oers more detail in other ways. The part of greatest interest is the left-most structure in the right hand layout where a community of chat line participants communicates via several most likely fraudulent 800-numbers.
4.3.3 A Network of Collaborations
Social networks can be de ned in many other ways besides phone call interactions. We were introduced to an interesting example by Will Hill and Loren Terveen (1998): They constructed a human network from collaborations of the participants of the yearly ACM SIGCHI (Computer-Human Interaction) conference. They de ned a weighted graph on 653 participants by linking pairs according to the number of collaborations between 1993 and 1997. This graph had one large connected component of size 416 and a number of very small components. From the weighted subgraph of the large component, we constructed a dissimilarity matrix of size 416 for viewing in XGvis. The dissimilarity values were derived from the reciprocals of the weights. Figure 20 shows a fully labeled con guration in 2D after a square root transformation of the dissimilarities. Clearly visible are two clusters whose labels are dicult to discern due to overplotting. Ironically the dense areas that are of greatest interest suer the most from crowded ink. With a little bit of technology, however, this problem is readily done away with: Drilling down into crowded areas is trivial in data visualization systems with interactive zoom and pan. Alternatively, one can posterize the gure and produce a large map with complete information and zero overplotting. With one such poster, Will Hill was quickly able to analyze the topography of CHI collaborations by labeling various clusters and connected areas in Figure 20: \the west coast," \the `agents' group," \the former Bellcore folks,"...
4.3.4 Geometric Embedding of Nanotubes in 3D
The following is an application of MDS to the geometric construction of so-called nanotubes. Nanotubes are large carbon molecules in which the carbon atoms are arranged so as to form 32
hubscher, roland
perlin, kenneth
guzdial, mark cohen, meyer,oryx shawna schiano, diane tversky, barbara brinck, thomas hearst, marti schank, patricia druin, allison stewart, jason jackson, bielaczyc, shari kate gershon, nahum feiner, steven eick, stephen nilsen, erik hollan, james pitkow, james bederson, benjamin johnson, hilary bekker, mathilde murray, la stead, larry soloway, elliot lamping, john wilson, stephanie johnson, peter york, william johnson, walter klotz, leigh zacks, jeffrey mackinlay, pirolli, peter jock mcdaniel, susanrao, robertson, george ramana jacques, richard card, stuart barabesi, alessandro wiedenbeck, susan ramsay, judith olson, olson, judith gary terveen, loren furnas, george hill, william noirhomme-fraiture, monique scholtz, jean bodart, francois sorensen, douglas vanderdonckt, jean salvador, anthony brunner, hans hewett, thomas preece, jenny lehder, diane jones, scottdarold hemphill, henninger, scott pycock, james belkin,rivers, nicholas tognazzini, bruce landauer, david sparks, randall meiskey, lori thomas capucciati, maria curran, patrick pemberton, steven ashlund, stacey millen, david belge, matthew wagner, annette burkhart, brenda gorny, peter bowers, john stolze, markus gasen, perlman, jean shackel, jul, susanne gary brian koenemann, jurgen czerwinski, mary lokuge, ishantha blough, eric nielsen, janni mateas, michael lund, arnold ward, karen ehrlich, dray, kate susan richard henneman, fusco, marc nielsen, jakob teasley, instone, leventhal, keith barbee laura morris, john koenemann-belliveau, jurgen wendy faraday, petermackay, white, ellen bennett, john daniel carlsson, christer singley, mark mary beaudouin-lafon, michel deusen, root, wildman, robert lynch, desurvire, heather alpert, sherman eugene sutcliffe, alistair aboulafia, annette fahlen, lennart karat, kellogg, wendy rosson, maryjohn dayton, thomas carroll, john erickson, thomas greenhalgh, chris ashworth, catherine thomas, john muller, michael lowgren, jonas salber, daniel schur, anne novick, david bevan, nigel wharton, cathleen benford, steve coutaz, joelle ira maddy buie,brouwer-janse, elizabeth winkler, rodden, thomas boyarski, daniel smith, gillian neale, wayne mccall, raymond kaptelinin, victor comstock, elizabeth nigay, laurence sumner, tamara curtis, william casaday, george wixon, flanders, dennis alicia rainis, cynthia scaife, mcclelland, ian beabes, minette bly,michael sara mckerlie, diane brusilovsky, peter kuutti, kari ford, shannon hartson, rex marshall,mccall, catherine ray schulman, robert reeves, brent bannon, liam chase, j. nowell, lucy ostwald, jonathan shipman, frank hix, deborah fischer, gerhard hefley, william hammond, nicholas nakakoji, kumiyo shum, simon eisenberg, michael maclean, allan rogers, yvonne poltrock, steven nardi, bonnie veer, gerrit overbeeke, franzke, marita smets, gerdakees grudin, jonathan ishizaki, suguru gaver, william johnson, jeffrey redmiles, david malinowski, uwe bellotti, victoria jeffries, robin marx, adam dourish, paul digiano, chris palen, leysia o’day, vicki kuchinsky, allan kuhme, thomas heath, christian schwarz, heinrich whittaker, steve eldridge, edmonds, margery newman, ernest william isaacs, ellen harper, richard moran, thomas laurie maxine wiley, myrtletang, john dringus, cohen, o’hara, kenton sellen, chiu, patrick pedersen, elin melle, williamabigail o’conaill, brid kimber, donald hyland, patrick rua, monica adler, annette rieman, john daly-jones, owen lewis, clayton henderson, austin small, david polson, peter zellweger, polle kabbash, paul kurtenbach, gordon kitajima, muneo borella, michael buxton, william frohlich, david sears, andrew matias, mackenzie, edgar scott terwilliger, robert ishii, hiroshi milgram, paul fitzmaurice, george zhai, shumin young, richard edwards, alistair soukoreff,baudel, william thomas irving, miller, james accot, johnny irving, sharon j. mynatt, elizabeth shneiderman, benjamin ware, colin fishkin, kenneth drascic, david stone, maureen bier, eric plaisant, ahlberg, catherine christopher arthur, kevin balakrishnan, ravin olsen, brad daniel myers, cooperstock, jeremy rose, bruns, anne thomas edwards, keith carr, david watts, booth, kellogg chignell, mark monk,leon andrew mcdaniel, richard mackenzie, christine howes, andrew pier, kenneth edwards, alan milash, turo, brett david narine, tracy golovchinsky, eugene landay, james foley, james graham, evan modugno, jacob, francesmary robert wistrand, maes, patriciaerik wexelblat, alan sukaviriya, piyawadee wright, peter yamaashi, kimiya gray, wayne kosbie, david hudson, scott kirschenbaum, susan tanikoshi, koichiro klawe, maria stasko, john goldberg, beirne, garry conway, matthew corbett,matthew albertsmith, ian norman, donald tani, masayuki price, blaine pausch, randy futakawa, masayasu mantei, marilyn kohlert, douglas butler, john, keith bonnie horita, masato mukherjea, sougata cousins, steve weber, gerhard geelhoed, erik vicente, harrison, kim beverly kieras, hinckley,neal kenneth mickish, andrew wood,david scott kassell, byrne, michael winograd, terry toft, peter kawamata, yukihiro gossweiler, richard ahlstrom, goldstein, brettjade bharat, krishna brown, marc meyer, david holladay, walter atwood, michael wilcox, lynngoble, john neal, lisaschlueter, kevin brewster, stephen pane, john hara, yoshinori roth, steven bhavnani, gong, suresh richard arons, barry kolojejchick, john shillner, robert chuah, mei baecker, ronald mattis, joseph dix, alan posner, ilona walker, ehret, neff brian szekely, pedro mitchell, alex alison andreas friedlander, naomi lee, girgensohn, sawhney, nitin aust, dietmar schilit, william neches, robert arai, toshifumi greenberg, saul luo, ping roseman, mark dews, shelly lucas, peter technology, gutwin, carl abowd, gregory lawrence, deborah hibino, stacie stifelman, lisa moriyon, roberto schmandt, chris siegel, jane dam, andries kraut, robert machii, kimiyoshi marx, matthew aberg, peter fish, robert mander, richard yankelovich, nicole maulsby, david puerta, angel mukhopadhyay, tridas scherlis, mane, amir kiesler, sarawilliam karis, demetrios salomon, gitta suri, jane boyce, susan hull, richard wong, yin fisher, carolanne bass, leonard spreenberg, peter sproull, lee siewiorek, daniel stivoric, john houde, stephanie walker, janet smailagic, asim sellman, royston subramani, r. kazman, richard mann, steve
Figure 20: A 2D Con guration of Will Hill's Graph of CHI Collaboration. The plot shows the 416 CHI participants in the dominant connected component of the graph. Lines indicate direct collaboration between 1993 and 1997. The edges were weighted by the number of collaborations. capped tubes. The electrical and physical properties of these tubes may spur promising new technologies. [An account for the lay reader appeared in the New York Times, February 17, 1998, \Science Times" p. C1,5, complete with a photo of a wire model.] The geometry of nanotubes was described by D. Asimov (1998). His theory permits the construction of all 33
Figure 21: Nanotube Embedding. One of Asimov's graphs for a nanotube is rendered with MDS in 3D. The nodes represent carbon atoms, the lines represent chemical bonds. Nanotubes are composed mostly of hexagons, with the exception of six pentagons in a cap. Some of these pentagons are visible in the bottom right plot. A nanotube consists of six stripes of hexagons, winding their way down the tube; one such stripe is highlighted with large dots. possible cappings of nanotubes. The results of his constructions are graphs in which the nodes correspond to carbon atoms and the edges correspond to chemical bonds. Given a graph representing the intrinsic structure of a nanotube, one is left with the task of embedding the graph in 3-space in order to obtain a full geometric representation of a nanotube. Such an embedding can be achieved with MDS. We generated according to Asimov's theory a graph with 222 nodes, 549 edges, resulting in 49,062 pairwise distances, and representing a half-tube with a cap on one end and an opening on the other end. Subjecting the graph to MDS in XGvis produced the 3D embedding shown in Figure 21. Clearly visible is the hexagonal tiling structure of the graph. The notable exception to the hexagonal structure is in the cap where exactly six pentagons can be found; they permit the necessary bending of the cap. These pentagons are visible in the 34
bottom row of Figure 21. We were surprised by the good quality of the nanotube embedding in view of the fact that the shortest-path metric derived from a graph is not Euclidean. Surprisingly, the deviation from the Euclidean ideal turned out to be minor: The stress of the nal embedding was a low 0.06. An obvious minor imperfection is the slight widening of the uncapped end of the half-tube.
5 Conclusions The purpose of this work was threefold: to describe a system for interactive multidimensional scaling, to outline an extensive methodology for creating, visualizing and diagnosing MDS con gurations, and to give a selected tour of modern MDS applications. We hope to have shown that MDS as a data visualization technique is tailor-made for contemporary interactive computing environments and of great use in some important and attractive data problems. The XGvis/XGobi software is freely available from the following web page and ftp site: http://www.research.att.com/areas/stat/xgobi/ ftp://ftp.research.att.com/dist/xgobi/ The systems run under the X Window System . It has been used on several types of Unix systems, on Linux, and on X emulators for PCs running under Windows. Freely available X emulators can be found with a web search (we thank Brian Ripley for this hint). TM
Acknowledgments We wish to thank Jon Kettenring and Daryl Pregibon, the two managers who made this work possible.
Selected Literature Multidimensional scaling is the subject of some recent books: Borg and Groenen (1997), Cox and Cox (1994), Davison (1992), as well as some older ones: Borg and Lingoes (1987), Schiman (1981), Kruskal and Wish (1978). The collection editd by Davies and Coxon (1982) contains some of the seminal articles in the eld, including Kruskal's (1964a,b). An older book article we still recommend is Greenacre and Underhill (1982). Many books on multivariate analysis include chapters on multidimensional scaling, such as Gnanadesikan (1997) and Seber (1984). For a variant of MDS that is slightly dierent from Kruskal's see the work of the \Dutch school" of de Leeuw, Heiser and collaborators, such as Groenen's thesis (1993), and de Leeuw's web page, http://www.stat.ucla.edu/~deleeuw/work/research.html. 35
References [1] Abney, S., Volinsky C., and Buja A. (1998), \Mining Large Call Graphs," AT&T Labs Technical Memorandum. [2] Asimov, D. (1998), \Geometry of Capped Nanocylinders," AT&T Labs Technical Memorandum. [3] Becker, R. A., and Cleveland, W. S. (1987), \Brushing scatterplots," Technometrics 29, pp. 127-142; reprinted in Cleveland and McGill, eds. (1988), Dynamic Graphics for Statistics, Wadsworth, Inc.: Belmont, CA, pp. 201-224. [4] Borg, I., and Groenen, P. (1997), Modern Multidimensional Scaling: Theory and Applications, Springer Series in Statistics, Springer-Verlag: New York. [5] Borg, I., and Lingoes, J. (1987), Multidimensional Similarity Structure Analysis, Springer-Verlag: New York. [6] Buja, A., Cook, D., and Swayne, D. F. (1996), \Interactive high-dimensional data visualization," Journal of Computational and Graphical Statistics 5, pp. 78{99. A companion video tape can be borrowed from the ASA Statistical Graphics Section lending library. [7] Buja, A., Logan, B. F., Reeds, J. R., Shepp, L. A. (1994), \Inequalities and positivede nite functions arising from a problem in multidimensional scaling," The Annals of Statistics 22, pp. 406-438. [8] Cook, D., Buja, A., Cabrera, J., and Hurley, H. (1995), \Grand tour and projection pursuit," J. of Computational and Graphical Statistics 2 3, pp. 225{250. [9] Cook, D., and Buja, A. (1997), \Manual controls for high-dimensional data projections," J. of Computational and Graphical Statistics 6 4, pp. 464-480. [10] Cox, R. F., and Cox, M. A. A. (1994), Multidimensional Scaling, Monographs on Statistics and Applied Probability, London: Chapman & Hall. [11] Davison, M. L. (1992), Multidimensional Scaling, Melbourne, Florida: Krieger Publishing Company. [12] Davies, P. M., and Coxon, A. P. M. (eds.) (1982), Key Texts in Multidimensional Scaling, Exeter, New Hampshire: Heinemann Educational Books Inc. [13] de Leeuw, J., Stoop, I. (1984), \Upper bounds for Kruskal's stress," Psychometrika 49 3 pp. 391-402. [14] Di Battista, G., Eades, P., Tamassia, R. and Tollis, I. (1994), \Algorithms For Drawing Graphs: An Annotated Bibliography," Computational Geometry 4, pp. 235-282. 36
[15] DuMouchel, W. and Schonlau, M. (1998), \A Comparison of Test Statistics for Computer Intrusion Detection Based on Principal Components Regression of Transition Probabilities," Proceedings of the 30th Symposium on the Interface: Computing Science an d Statistics, (to appear). Fairfax Station, VA: Interface Foundation of North America, Inc. [16] Gnanadesikan, R. (1997), Methods for Statistical Data Analysis of Multivariate Observations, Wiley Series in Probability and Statistics, New York: John Wiley & Sons. [17] Graef J., and Spence, I. (1979), \Using Distance Information in the Design of Large Multidimensional Scaling Experiments," Psychological Bulletin 86, pp 60-66. [18] Greenacre, M. J., and Underhill, L. G. (1982), \Scaling a Data Matrix in a LowDimensional Euclidean Space," in Topics in Applied Multivariate Analysis, Hawkins, D. M., ed., Cambridge UK: Cambridge University Press, pp. 183-268. [19] Groenen, P. J. F. (1993), \The Majorization Approach to Multidimensional Scaling: Some Problems and Extensions ," Leiden: DSWO Press. [20] Hill, W. C., and Terveen, L. G. (1998), Forthcoming AT&T Labs Technical Memorandum. [21] Kruskal, J. B., and Wish, M. (1978), Multidimensional Scaling, Sage University Paper series on Quantitative Application in the Social Sciences, 07-011. Beverly Hills and London: Sage Publications. [22] Kruskal, J. B. (1964a), \Multidimensional scaling by optimizing goodness of t to a nonmetric hypothesis," Psychometrika 29, pp 1-27. [23] Kruskal, J. B. (1964b), \Nonmetric multidimensional scaling: a numerical method," Psychometrika 29, pp. 115-129. [24] Kruskal, J. B., and Seery, J. B. (1980), \Designing Network Diagrams," Proceedings of the First General Conference on Social Graphics, July 1980, US Dept of the Census, Washington, DC, pp. 22-50. [25] Littman, M., Swayne, D. F., Dean, N., and Buja, A. (1992), \Visualizing the embedding of objects in Euclidean space," Computing Science and Statistics: Proc. of the 24th Symp. on the Interface, Fairfax Station, VA: Interface Foundation of North America, Inc., pp. 208{217. [26] Schiman, S. (1981), Introduction to Multidimensional Scaling : Theory, Methods, and Applications, San Diego and London: Academic Press. [27] Seber, G. A. F. (1984), Multivariate Observations, New York: John Wiley & Sons. [28] Shepard, R. N. (1962), \The analysis of proximities: multidimensional scaling with an unknown distance function," I and II, Psychometrika 27, pp 125-139 and 219-246. 37
[29] Swayne, D. F., Cook, D., and Buja, A. (1998), \XGobi: Interactive Data Visualization in the X Window System," Journal of Computational and Graphical Statistics 7 1, pp. 113-130. [30] Swayne, D. F., and Buja, A. (1998), \Missing data in interactive high-dimensional data visualization," Computational Statistics 13 1, pp 15-26. [31] Theus, M. and Schonlau, M., (1998), \Intrusion Detection Based on Structural Zeroes," Statistical Computing & Graphics Newsletter, Vol. 9, No 1, 12-17, Alexandria, VA: American Statistical Association. [32] Thompson, G. (1991), \Exploratory Graphical Techniques for Ranked Data," Proceedings of the 23rd Symposium on the Interface, Fairfax Station, VA: Interface Foundation of North America, Inc. [33] Thompson, G. (1993), \Generalized Permutation Polytopes and Exploratory Graphical Methods for Ranked Data," The Annals of Statistics 21 3, pp. 1401-1430.
38