Web-based Statistical Graphics using XML

9 downloads 0 Views 6MB Size Report
Accordingly, an example using an HTML form will be illustrated herein. Here, we ..... chronous JavaScript + XML (Ajax) is adopted, in addition to SVG being used.
Web-based Statistical Graphics using XML Technologies Yoshiro Yamamoto1 , Masaya Iizuka2 , and Tomokazu Fujino3 1 2 3

Tokai University [email protected] Okayama University [email protected] Fukuoka Women’s University [email protected]

1 Introduction In recent years, the use of XML-based vector graphics formats, such as Scalable Vector Graphics (SVG4 ) and Extensible 3D (X3D5 ), has become widespread. In the case of the implementation of statistical graphs on the Web, vector graphics offer many advantages compared to traditional raster graphics. Furthermore, this XML format technology will play an important role in the future of the Semantic Web6 . In this section, we explain the fundamental aspects of XML-based vector graphics formats and propose a method of applying XML graphics to the field of statistics and discuss the effects of application. In the remainder of this section, we outline the relationship among statistical graphs, vector graphics and XML, and in Section 2 we summarize the general characteristics of XML graphics. In Sections 3 and 4, we briefly introduce the specifications of SVG and X3D using some examples. In Section 5, we propose methods for applying these formats in various statistical environments. 1.1 Vector and Raster graphics There are two methods of displaying graphics by a computer: raster graphics and vector graphics. Raster graphics, which are expressed by the enumeration of a point (the dot) and a color, do not contain all of the information of the contents pictured in an image. As such, in raster graphics, outlines appear jagged (notches appear) when zooming in on the image, and information is lost when zooming out from the image. Therefore, raster graphics are not suitable for zooming or transformation. 4 5 6

http://www.w3.org/Graphics/SVG/ http://www.web3d.org/x3d/ http://www.w3.org/2001/sw/

2

Yoshiro Yamamoto, Masaya Iizuka, and Tomokazu Fujino

On the other hand, vector graphics hold drawing information such as position, size and style of a shape. Therefore, deterioration of image quality can be prevented by redrawing the image via software using this information when the image is expanded, reduced, or transformed. Namely, the representative advantage of vector graphics is that the degree of freedom is high with high-quality images.

Fig. 1. Vector and raster graphics

1.2 Web, Statistics and Statistical Graphics Although the Internet has progressed as a communication tool, at present, the core function has become Web page access, which has brought about a wide range of changes with respect to statistics. In addition to the publication of official statistical data and results of company research, general Internet users have begun to make data available through their Web sites. Huge databases that can be accessed by anyone via the Internet are making information available to the public. As a result, statistical databases (see Boyens et al., 2004) have been developed to store such data, and statistical analysis methods such as data mining (see Wilhelm, 2004) have been developed to help access such data. Moreover, new target areas in statistical analysis, such as network intrusion detection (see Marchette, 2004) have been developed. The popularization of the Web has brought about big changes in the statistical analysis environment and statistics education. Early in the history of the Internet, since making textbooks and data available to the public was done by individuals, thus the sites such as StatLib7 , which gathered statistical information, had played a very important role in those days. At present, various 7

http://lib.stat.cmu.edu/

Web-based Statistical Graphics using XML Technologies

3

service and applications that use multi-media and multi-platform characteristics are available for statistical analysis and statistics education. For statistical analysis systems, it is important to imprement client server type systems for saving resources, Xplore8 (H¨ardle et al., 2000) and Jasp9 has these feature. Moreover, server-type commercial software, such as SPSS and S-PLUS, are available. For statistics education, numerous data sets, tutorials and analysis tools have been made available, for example by the UCLA Department of Statistics10 and the Web Interface for Statistics Education (WISE11 ) program at Claremont University. MD*Base12 and DASL13 are databases for case studies. The EMILeA Stat14 (e-stat) project and the @d project15 (Mori et al., 2005) enable analysis on the Web using the statistical engines. In addition, elearning systems, such as New Statistics16 , use multi-media teaching materials that include video and interactive applications. These types of content required statistical graphics in order to visualize statistical data. Early in the history of Web publication, static (not interactive) and raster (not vector) graphics formats such as JPEG or GIF were used. However, using the Java mechanism, it became possible to implement interactive and dynamic graphics on the Web, but such graphics did not become sufficiently popular. With the spread of Flash programs, the interactive features on the Web became more general. As a result, the demand for interactive and dynamic graphics using Web technology is rising. On Web based systems, it is sometimes necessary to create graphics according to user requests. In such cases, it is impossible to create graphs and prepare information beforehand. Moreover, a feature to display detailed information according to user requests is also necessary. Therefore, the statistical graphics package requires an interactive feature. 1.3 XML and statistical graphics HTML can provide functions on web pages by cooperating with other technologies, such as CGI or JavaScript, by linking to other pages and by arranging information well. However, when information described in HTML is reused, it is difficult to automate these tasks, because the accompanying information consists only of tags that control the display of information on the Web page. 8 9 10 11 12 13 14 15 16

http://www.xplore-stat.de/ http://jasp.ism.ac.jp/index-e.html http://www.stat.ucla.edu/ http://wise.cgu.edu/ http://www.quantlet.org/mdbase/ http://lib.stat.cmu.edu/DASL/ http://www.emilea.de/ http://mo161.soci.ous.ac.jp/@d/ http://www.fernuni-hagen.de/newstatistics/

4

Yoshiro Yamamoto, Masaya Iizuka, and Tomokazu Fujino

This causes the database to become enormous on the Web. Therefore, the concept of Semantic Web was devised in order to allow information to be used efficiently and effectively. Semantic Web uses metadata that accompany all Web contents to interpret exchange between information devices, without mediation by a human operator, by conveying the meaning (semantic) of information to the computer. The basic technologies for realizing the Semantic Web are XML and the allied technology standardized by W3C17 . Just like HTML, XML uses tags to assign the meaning of information.As for the XML tag, a grammar can easily perform processing with a computer while simultaneously expressing various types of data flexibly, enabling the data type to be set freely. At present, a standard based on XML has been developed for various kinds of data in order to realize Semantic Web. StatDataML18 and DandD19 have been developed for statistical data, and GML20 has been developed for geographical information data. Constructing a statistics environment in the Semantic Web framework become possible with the addition of XML graphics, such as SVG and X3D, to the standard.

2 XML-based graphics format In this section, we describe the characteristics of XML-based graphics formats. • XML: Both SVG and X3D, which are XML based, can perform mutual transformation of XML documents with other by using a language like XSLT. For example, once an XSLT document is made, every type of statistical graph for the data was described with XML, and then a statistical graph corresponding to the data can be displayed without having to perform new tasks, even if the data is changed. In addition, it is possible to offer contents related to statistics including a statistical graph as Web service with the core technologies to construct Semantic Web such as SOAP, WSDL, and UDDI. • Text file: While conventional raster graphics such as jpeg, gif, png and bmp are in binary form, XML graphics are text files. Thus, even without using a special tool, we can confirm or modify the contents by opening the file with a general text editor. Therefore, we can reuse the contents easily. In addition, we can develop systems more flexibly, because graphics can be 17 18 19 20

http://www.w3.org/ http://www.omegahat.org/StatDataML/ http://www.stat.math.keio.ac.jp/DandDIII/index.html http://opengis.net/gml/

Web-based Statistical Graphics using XML Technologies

5

output simply by outputting a text file, no matter what kind of programming language is used. Moreover, if the graphics (e.g. statistical graph, map or CAD) are closely related to the outside resource, we include the related information between the element of the graphics and the outside resource within the graphics themselves. On the other hand, raster graphics use binary images, so it is difficult to link between outside resources and images because we have no information regarding figure elements. Moreover, in order to generate raster graphics we need a library corresponding to a programming language, and for different programming languages, the graphics must be output by different grammar. • Vector graphics: Vector graphics were described in Section 1. Vector graphics have no deterioration of image quality upon expansion, reduction or transformation of images. In addition, vector graphics have an advantage in that the file size relies on the quantity of information of the diagram, and the size of the image is unrelated to the file size. • Implementation of interactive function and animation: Software or plug-ins that display most XML graphics have functions similar to zooming and movement of the graphics. Separate from this, we can add a new interactive function to XML graphics described by JavaScript. In addition, in SVG, we can change an attribute of a SVG element into SVG with a change in time, called animation, using Synchronized Multimedia Integration Language (SMIL21 ), which can be built on to SVG. We later present examples of the implementation of the interactive function by JavaScript. At present, macromedia Flash22 is one of the most popular formats on the Web for vector graphics. The different features between XML graphics and Flash are whether open standard or not, and whether text-base or not. On the other hand, it is desirable to use the most appropriate format for the application that is to be developed.

3 SVG 3.1 Outline of SVG SVG is an XML format for describing two-dimensional vector graphics. In the days before SVG, Microsoft-led Vector Markup Language (VML) and Adobeled Precision Graphics Markup Language (PGML) were proposed to W3C. SVG1.0 was released in September 2001 by W3C as integrated format. The recommendation of the current version, SVG1.1, was put forth in January 2003. 21 22

http://www.w3.org/AudioVideo/ http://www.macromedia.com/flash/

6

Yoshiro Yamamoto, Masaya Iizuka, and Tomokazu Fujino

SVG viewers SVG consist of a plain text file as well as HTML. Thus, exclusive software for displaying SVG as a graphic is required. The above-mentioned functions, such as zooming, are implemented in the viewer. This viewer can be classified into following three categories: • Browser plug-in: The browser plug-in is the most common form of SVG display environment. Adobe SVG Viewer 3.0 (ASV323 ), which supports SVG 1.0, is the de-facto standard for SVG browser plug-ins. Currently, ASV6, which supports the next version of SVG (SVG1.2), is in beta testing and is set to be released after SVG1.2 Recommendation is announced. • Web browser to support native rendering of SVG: SVG enabled builds of Firefox and Mozilla (the official binary package does not support native rendering) and Opera 8.0 support native rendering of SVG. • Stand-alone application: Batik24 is a Java technology based toolkit for SVG. One application of Batik is Batik-Squiggle, which is a full-fledged SVG browser. We will briefly illustrate a language specification of SVG through simple examples. For details, see the W3C web site (http://www.w3.org/Graphics/SVG/). Note that all examples for SVG are derived using Internet Explorer 6.0 + ASV3.0. 3.2 Basic structure We first present a simple example of an SVG document: < svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd"> Basic structure Example of basic structure

The first line is the XML declaration, and second line is Document Type Definition (DTD) declaration. The root element of SVG is the element, which has width and height attributes to set the canvas size of SVG. When the unit of the attributes is omitted, the units are assumed to be pixels. and elements provide a title and detailed information about the SVG document, respectively. The text node content of the element is shown in the title bar of the window. 23 24

http://www.adobe.com/svg/ http://xml.apache.org/batik/

Web-based Statistical Graphics using XML Technologies

7

Fig. 2. Viewport and viewbox

Coordinate System SVG operates on an infinite plane on which the coordinate system is oriented such that the positive X-axis is to the right and the positive Y-axis is downward from the coordinate origin. However, in many statistical graphs, positive Y-axis is directed upward. Therefore, when coding a statistical graph in SVG, developers must take this into account in order to display the graphs correctly. On the other hand, a viewport in SVG is a physical area displaying graphics elements on the screen with a size set by the width and height attributes in the svg element. When no units are specified, the units of the width and height attributes are assumed to be pixels. Although other units, including cm and pt, can also be designated, no units are specified in the cases presented herein. When only the width and height attributes are designated, the domain of width × height from the origin of the coordinate plane is assigned to a viewport of width × height in size. When displaying an arbitrary domain of the coordinate plane (coordinates of the upper left quadrant of the domain: (originX, originY), size: width × height) in this viewport, the viewBox attribute is designated as follows: viewBox="originX originY width height"

When the aspect ratio of the domain and the viewport vary, the X-axis and the Y-axis are each enlarged or shrunk by default, but the behavior can be controlled by designating the preserveAspectRatio attribute. Basic shapes Basic shapes of SVG are illustrated in Fig. 3 along with the corresponding SVG codes. The minimum information required to display the shapes, such

8

Yoshiro Yamamoto, Masaya Iizuka, and Tomokazu Fujino

Fig. 3. Basic shapes

as position and size, are specified as attributes of each element. Additional information, such as line and fill color and line width, are specified as the style attribute in Cascading Style Sheet (CSS) format. Text Text in SVG can be realized by using the element with the x and y attributes to determine the position for displaying the string in the text node enclosed by elements. Fig. 4 shows a simple example of text in SVG, and the source code of the example is illustrated as follows: Example of Text Element SVG for Statistics SVG for Statistics SVG for Statistics

In the element, various properties of text, such as font size, family, weight, and color, are specified. The shape of text in SVG consists of the font

Web-based Statistical Graphics using XML Technologies

9

Fig. 4. Text elements Fig. 5. Grouped elements

border and the area inside the border, of which the properties can be specified differently, as in the third string in the example. The text-anchor property is used to determine which position of the string is aligned with the text position specified by the x and y attributes. Each string in the example is configured with start, middle, and end, respectively. Group One of the advantages of vector graphics is that layers can be dealt with. In SVG, functions of layers are realized by using a group element . This means that elements that are grouped by the g element are considered to be one layer. By using elements, it is possible to turn a specific layer on and off, and apply styles and interactive functions, which will be described in the next section, to elements belonging to the layer collectively. Fig. 5 shows a simple example of a scatter plot obtained by using the element. The data points in SVG are described as follows:

All data points (circle elements) are collected into one layer by one element that has style attribute, which is reflected in all data points. The file size of an SVG that include several elements, such as a scatter plot with several data points, is increased when the style attribute is described in each element. Therefore, it is desirable to make the file size of SVG small by using element.

10

Yoshiro Yamamoto, Masaya Iizuka, and Tomokazu Fujino

3.3 Implementation of interactive functions by JavaScript As stated earlier in this section, interactive functions other than functions such as zooming, which are supported by default in SVG browsers, can be implemented using JavaScript. In some SVG browsers, VBScript is also available. All of these languages are Document Object Model (DOM) compliant. In other words, all of these languages define API and its object model for manipulating XML documents. Fig. 6 is a simple example of the interactive function and accessing SVG elements based on DOM. The source code is as follows: Scripting Sample Click Here!

Each element of SVG can have event-handlers for implementing the script according to catch up events, such as keyboard and mouse operation and SVG document loading. In this example, onclick described in the element is the event-handler. The clickHere function is performed by clicking the left mouse button on the element or element. The argument evt of the function clickHere is a specific object to reference the event. Moreover, the target property for the event object can reference the element invoking the event. The most frequently used event-handlers in SVG are mouse events such as onmouse { down, move, out, over, up}, SVG specific events such as onload, onresize and onzoom, and keyboard events such as onkeydown, onkeypress and onkeyup. Tooltip We next illustrate how to realize the function whereby, upon placing the mouse pointer over the data point in a scatter plot, information about the point, such as coordinates and a label, is displayed in a tooltip next to the mouse pointer (Fig.7). This technique can be applied to indicate detailed information about an area on a map. The essential parts of the source code is shown in the following:

Web-based Statistical Graphics using XML Technologies

11

dummy

Only two points are described with respect to the drawing of data points, and the drawing of axes and labels are not discussed. In the script, three functions, ShowTooltip, HideTooltip, and ZoomControl, are defined. ShowTooltip function is called from the onmousemove attribute in the element in which the id attribute is data. When the mouse cursor is moved over the element, the function and a script described via the onmousemove attribute are executed. When described in a element, the attribute is applied for all child elements of the element. In this function, first, the text and the rectangle that are child elements of the group are prepared beforehand for the tooltip in which the id is tooltip are acquired. In addition, the content of the id of the element that generated the event (the data points indicated by the mouse cursor) is acquired, and the content of the text for the tooltip is updated. The position of the mouse cursor is acquired next, and position of text and rectangle is updated. Finally, tooltip display is realized by indicating the hidden tooltip (the visibility property in the style attribute of the tooltip is set to visible). The HideTooltip function is called from the onmouseout attribute of the element. In the onmouse attribute, the process that is executed when mouse cursor exits the area of the element is described. In this function, the displayed tooltip disappears (visibility property in the style attribute of tooltip is set to hidden). The ZoomControl function is called from the onzoom attribute of the element describing

12

Yoshiro Yamamoto, Masaya Iizuka, and Tomokazu Fujino

Fig. 6. Scripting in SVG

Fig. 7. Tooltip Example

the process that is executed when zooming is performed with the SVG viewer. The script revising scale property of the transform attribute of the tooltip, according to degree of zoom, is described so that the tooltip is not magnified excessively when zoom is executed. Switching layer Displaying related information on statistical graphs and maps is an effective means to perform visual analysis. However, this may actually be disadvantageous in that too much information may displayed at one time. Therefore, functions to interactively change the display/non-display of information are necessary in visual analysis. SVG realizes such a function simply by JavaScript. Objects having display/non-display properties that may change are first gathered into one element, and the visibility property is set in the style attribute. A code to set the property as visible/hidden is described in a script called from an event. An interface that helps users to use this facility can be built in SVG, and HTML forms can be used. When built in SVG, the interface is described as a figure element, and the script that changes the display/non-display property by setting up the event is called. However, this requires additional work and loss of operability could be caused by enlarging or contracting of the interface while simultaneously zooming or panning of SVG. Accordingly, an example using an HTML form will be illustrated herein. Here, we consider data for the percentage of votes received by republican candidates in presidential elections from 1856 to 1976 in United States. The data was analyzed using the procedure described below, and the visualized result is presented as a chart. The dissimilarities based on the Euclidean distance

Web-based Statistical Graphics using XML Technologies

13

are first calculated from this data, and clustering is executed by the k-means method. Principal component analysis is performed using the dissimilarities, and the individual scores with respect to the first and second principal components are obtained. The visualization is realized by plotting the scores on a plane with a Cartesian coordinate system. Labels are attached to the data points, and ellipses showing the domain of the cluster are generated. Graphs having a function that can make the labels and ellipses visible or invisible are realized using SVG and HTML forms. In addition, when magnified, only part of the graph becomes visible because the form and SVG are built into one page, and the display area of SVG is limited. Accordingly, the SVG component (cluster.svg) to display the graph and the HTML form (control.html), which plays the role of an interface for change the visibility/invisibility properties of the graph are divided here by a frame (index.html). The control.html form contains a check box for visualizing data point labels and ellipse domains, and the function that is executed when the check boxes are selected is designated in the onclick attribute. Label on/off
Cluster Area on/off

In cluster.svg, the element for the ellipse domain is described as a child element of the element in which the id attribute is clustArea, and the element for the individual label is described as a child element of the element in which the id attribute is label. In addition, a script for realizing a change of display is shown in the following. function init(){ parent.parent.setVisibility = setVisibility; } function setVisibility(id){ sty = svgDocument.getElementById(id).style; curvb = sty.getPropertyValue(’visibility’); if(curvb==’hidden’) sty.setProperty(’visibility’,’visible’); else sty.setProperty(’visibility’,’hidden’); }

When cluster.svg is loaded, the init function is first executed, upon which the setVisibility function defined in the sequence is linked to the document of index.html. As a result, the function may be called by control.html. The setVisibility function receives the content of the id attribute of the check box clicked in control.html as an argument, and the visibility property of the element of SVG corresponding to the id is changed. Fig. 8 shows the screen to which each layer switches.

14

Yoshiro Yamamoto, Masaya Iizuka, and Tomokazu Fujino

Fig. 8. Example of switching layer

4 X3D Extensible 3D (X3D) is an open-standard XML-enabled three-dimensional modeling language to enable real-time communication of 3D data across all applications, including network applications. X3D is neither a programming API nor a simple file format for geometry interchange. Rather X3D combines both geometry and runtime behavioral descriptions into a single file. X3D is the next revision of the VRML97 ISO specification, referred to as VRML-NG (Next Generation). In this section some features of X3D will be introduced, but full description of the X3D may found in Web3D Web site and Geroimenko and Chen (2004).

Web-based Statistical Graphics using XML Technologies

15

4.1 Outline of X3D Specifications The first draft of the VRML 1.0 specification was published in 1994. In 1996, the first version of the VRML 2.0 specification was released, and the JTC1/SC24 committee of the International Standards Organization (ISO) agreed to publish VRML 2.0 as Committee Draft (CD) 14772. This specification is known as VRML97. X3D, a new version of VRML, has been designated International Standard ISO/IEC 19775 and was published in 2004 by the Web3D Consortium. As of September of 2005, there are six X3D International Specification Standards, including X3D encodings, which specify XML and Classic VRML encoding. The latest specifications are described at Web3D Web site. The Web3D Consortium organizes several working groups to deal with various problems regarding Web3D. Several working groups, such as GeoSpatial (X3D GeoSpatial Working Group), H-Anim (Humanoid Animation), and a number of focus market working groups, such as CAD, are researching and proposing solutions to specific technical problems related to X3D. Component and Profile Component and profile are new X3D methods of defining both extensibility and the set of services required by user content. A component defines a specific collection of nodes, and a profile is a collection of components at specific levels of support. X3D allows developers to support subsets of the specification (profiles) composed of modular blocks of functionality (components). A component-based architecture supports the creation of different profiles that can be individually supported. Components can be individually extended or modified by adding new levels, or new components can be added to introduce new features, such as streaming. Through this mechanism, the specification can be advanced quickly because development in one area does not slow the specification as a whole. The following are X3D baseline profiles: • Interchange is the basic profile for communication between applications. Interchange supports geometry, texturing, basic lighting, and animation. • Interactive enables basic interaction with a 3D environment by adding various sensor nodes for user navigation and interaction (e.g., PlanseSensor, TouchSensor, etc.), enhanced timing, and additional lighting (Spotlight, PointLight). • Immersive enables full 3D graphics and interaction, including audio support, collision, fog, and scripting. • Full includes all defined nodes including NURBS, H-Anim and GeoSpatial components.

16

Yoshiro Yamamoto, Masaya Iizuka, and Tomokazu Fujino

X3D viewers X3D requires a viewer, a X3D browser or a plug-in for a Web browser, to parse and realize a 3D world. It is possible to move and rotate this world using the functions of the viewer. Details of the viewer are provided at http://www.web3d.org/applications/tools/viewers and browsers/. Octaga Player is the first 3D player for both VRML and X3D. Octaga Player supports the entire profile of X3D and is freely available for personal non-commercial use. Octaga Player is a high-performance, standardscompliant 3D player that can run as a stand-alone application or as a plug-in in any Internet browser. In this section, all X3D objects are shown using Octage Player. FreeWRL is an open-source X3D/VRML browser and plug-in. Platforms that support FreeWRL include MacOS X, Linux, Unix, IRIX, and Java. Other plug-ins for MS Windows include Flux Player, OpenWorlds and Vcom3D Venues. Xj3D is a project of the Web3D Consortium that is focused on creating a toolkit for VRML97 and X3D content written completely in Java. A stand-alone viewer is included. 4.2 Basic structure The basic structure of the X3D format is as follows.

The first line is the XML declaration, and the second and third lines are the DTD declaration.The root element of X3D is tag (called node in X3D), with version attribute specified as the version of X3D and with profile attribute specified as the profile. In this example, the X3D version is 3.0, and the profile is the simplest Interchange. The X3D world is described with the node, by arranging various contents. The example arrangement is a sphere of radius 4 and origin (3, 0, 1) (Fig. 9).

Web-based Statistical Graphics using XML Technologies

Fig. 9. Simple X3D example

17

Fig. 10. Coordinate axes in X3D

Node and Field Since X3D is an object-description type format, the world or components consist of 3D, multi-media and interactive objects. An object is described by a nest of nodes, and the parameters of an object are described as fields. Components can be reused by using the grouping and proto type of an existing node. Standard units and coordinate system X3D defines the unit of measure of the global coordinate system as the meter. All other coordinate systems are generated from transformations based on the global coordinate system. The unit of linear distance is the meter. Angles are given in radians, and time is given in seconds. The color space is specified with three real numbers (RGB) between 0 and 1, e.g. ’1 0 0’ for red. X3D uses a Cartesian, right-handed, three-dimensional coordinate system. By default, the viewer is positioned along the positive Z-axis, looking in the -Z direction with the +Y-axis upward. A modeling transformation ( and ) or viewing transformation () can be used to alter this default projection. Basic objects Objects that construct a X3D world are described within the node. The basic nodes for figures are , , and , and parameters, such as size and radius, are specified using field. The material of an object is specified within the node with the node or the texture node. The color and the degree of transparency of a object is also specified within the node with the diffuseColor and the transparency field. The position of an object is specified by nesting with the node and setting the translation field. To use the node, it is possible to show strings with the string field and set the font type and size with the fontstyle field. Using the node, it is possible

18

Yoshiro Yamamoto, Masaya Iizuka, and Tomokazu Fujino

to orient the text face to the front view. Fig. 10 shows the coordinate axes generated by using as the axis, as the arrows for the coordinate axes, and as the axes labels. The source code is as follows: X3D 3D scatter plot

Web-based Statistical Graphics using XML Technologies

Fig. 11. 3D scatter plot

19

Fig. 12. Bar chart on the city map



By arranging points in the same way by X3D objects such like Sphere or Box, a scatter plot is composed. Fig. 11 show a example of a 3D scatter plot with color according to a clustering result. Moreover, it is possible to make two-dimensional plane with IndexFaceSet and IndexLineSet. Fig. 12 displays a 3D bar chart of ward populations in the city of Sapporo (in northern Japan) plotted on the city map construct by X3D. Grouping and proto type The Group node can make a grouping of some objects in order to move or copy all of the components. The DEF keyword can be used to define an object that can be used by USE keyword. In above example of coordinate axes, a cylinder and a cone consist an arrow and grouped them and named ”myAxis”. The arrow is used to indicate another axis with the USE keyword and it is rotated to point to the appropriate direction. The labels of axes are composed of different strings, so we define the proto type of the label with node. To make instances for the proto type by using node, individual labels are easy to created. Interactive function X3D has various sensors. The touch sensor, for example, generates an event by clicking the mouse or approaching the object of interest. This event requires animation in which the color and position of objects change. If the example of the Fig. 11, all points have the node with the description field describing the case name and point coordinates. By positioning the mouse

20

Yoshiro Yamamoto, Masaya Iizuka, and Tomokazu Fujino

currsor over a point, then case name and the coordinates of the point are shown at the status bar on the right bottom corner of the viewer. Moreover, it is possible to implement more interactive functions in order to use Java and JavaScript (ECMAScript) within the node.

5 Applications 5.1 SVG application as teach-ware As introduced in section 1, there are lots of software to understand statistical thinking or concept for statistical education on the Web. For students in the new generation who are familiar with TV or games, it is useful to understand the statistical thinking by using interactive and visualize way. By providing teachware which is easy to understand statistics, students may understand statistics more. Heretofore, such interactive application are developed by using Java or Flash, but by utilizing the interactive characteristics of SVG, it is possible to develop the same application. The application introduced here is SVG version of teach-ware, which was developed by using Java in the Case project25 . This application is to understand the difference of mean and median. Clicking the axis of coordinates, data point is added, and show the mean and median of all data points. This application help us to understand the difference of mean and median visually.

Fig. 13. A teachware to to understand mean and median

25

http://case.f7.ems.okayama-u.ac.jp/

Web-based Statistical Graphics using XML Technologies

Fig. 14. test1.x3d

21

Fig. 15. test2.x3d

In Fig. 13, the plot of top shows five data points and display mean and median as triangle. The plot below show that adding a lower point changes mean a lot, but change median a little. It is easy to edit SVG application, since SVG is a text file. When someone would change this applications language, he only needs to change the corresponding parts without special authoring tool. 5.2 X3D applications X3D scatter plot function for R In Section 4, a 3D coordinate system is introduced. Using this object as a template, it is easy to create a 3D scatter plot by simply transforming the coordinates and plotting points by a text editor. However, this is not easy when there are numerous data to visualize. Therefore, we provide a function for R to make a 3D scatter plot. The options for the 3D scatter plot function are scaling and showing the coordinate axis, coordinate planes, frames, and case labels. An interactive function to show the case name and coordinates, when a point is selected, is also implemented. > source("x3dplot3d.r") # load function > x3dplot3d(cityecon2, "test1.x3d", frame=T, axis=T, label="axis") > x3dplot3d(cityecon2, "test2.x3d", scale=F, axis=T)

The function x3dplot3d() can be created grouped plot with colors like Fig 11. The forth column of data frame show a category of group, and set colored=T option as follow. > x3dplot3d(cepc3, "cepc3.x3d", scale=T, axis=T, label="axis", colored=T)

22

Yoshiro Yamamoto, Masaya Iizuka, and Tomokazu Fujino

Fig. 16. 3D dendrogram on the first two Fig. 17. A representation of the association rule principal components

Applications of three-dimensional representation By rearranging a X3D scatter plot, it is possible to generate a new graphical representation. Fig. 11 show the result of PCA, showing the first three principal components, colored according to k-means clustering for six clusters. Fig. 16 shows a proto type of a 3D dendrogram. The first two principal component scores are plotted, and the dendrogram is constructed according to the hierarchical clustering method. Another application is to visualize an association rule for a market basket analysis. For detail of association rule, see Wilhelm (2004). In general, an association rule is described in the form X ⇒ Y , in which X and Y are referred to as the rule head and rule body, respectively. The method of obtaining the association rule depends on the specification parameters of support and confidence. In addition, it is difficult to visualize an association rule, because there are too many parameters. Fig. 17 show a proto type of a representative association rule. Items are arranged according to item score by Hayashi’s quantification method III. The height of the bar indicates the confidence, and color indicates the support, where red indicates high support and blue indicates low support. The translucent plane represents a confidence level of 0.55. Good rules are therefore indicated by high, red bars. 5.3 GIS application GIS application on the Web is one of the important fields that can benefit from the use of SVG. Only function displaying map is not required for GIS application. In addition, functions of enlargement/reduction/movement of the map and functions for displaying information, such as roads, railroads or facilities, are required according to user demand. Furthermore, statistical information about domains and points are visualized on a map in cooperation with data bases generated from statistical surveys and statistical data that depend on

Web-based Statistical Graphics using XML Technologies

23

location. Although many such systems have been designed, most were built as Web applications using raster graphics or as Java applications, which results in problems of inconvenience. Applications that use raster graphics run into problems related to the costs of preparing large numbers of image files (all maps required for the system and maps having layered information) and operability, because extensive communication with a server occurs for every user request. Moreover, scalability problems, whereby further facilities cannot be added to the application, and problems of picture quality described in the introduction, are also troublesome. In case of Java application, costs related to development increase because many facilities, including the interface and drawing components, are required in order to develop a useful and convenient system. Enhancement of productivity and convenience is expected by adopting SVG for such GIS applications on the Web. The advantage of using SVG in GIS applications is that comfortable operability is provided because SVG allows only part of a graphic to be changed as well as graphics layering without rereading the entire based on a user request. The SVG-based GIS application for the Web developed herein is described below. Okayama trade area analysis system This system was developed in order to visualize data for a trade area obtained by ten behavioral area surveys of Okayama that has been conducted in Okayama prefecture, Japan from 1979. For each survey, 7,000-8,000 replies were collected, and the sample number of each of 78 municipalities was assigned in proportion to population of the city, town or village in order to examine the population movement of each local municipality. The survey content examined with this system consisted of questionnaire items regarding the city, town or village in which the respondent most frequently purchases or uses 15 items or services. Based on this data, the degree of dependence and the outflow ratio, which are important characteristics in the analysis of trade areas, can be calculated. We let nab be the number of inhabitants of city A that purP chase a particular item in city B. nab / b nab is the proportion of people who purchase the item in city B out of the total number of people in city A who purchased the item. When city A is fixed and this proportion is calculated for each city, town and village B in the prefecture, the results are collectively referred to as the outflow ratio from city A to each of the municipalities. In contrast, when one city B is fixed, and the proportions are calculated for each city, town and village A in the prefecture, the results are collectively referred to as the degree of dependence on city B of each of the municipalities. The behavior tendency of inhabitants of one municipality can be determined by examining the outflow rate of the municipality, and when the degree of dependence is examined, the trade area of the municipality can be determined. This system can visualize the outflow rate with a spider plot (Fig. 19) and the degree of dependence with a choropleth map (Fig. 18) on a map. The yearly

24

Yoshiro Yamamoto, Masaya Iizuka, and Tomokazu Fujino

Fig. 18. Choropleth map

Fig. 19. Spider diagram

Fig. 20. Time variation

Fig. 21. Displaying other layers

change of each characteristic can also be visualized (Fig. 20). Furthermore, because the system has a function to display/hide related information (Fig. 21), such as the location of government offices, railways, stations, roads and borders of cities, towns and villages on the map, the relationship between the change in the trade area and these factors may be determined. The feature of this system is implemented in the form of a Web application called Asynchronous JavaScript + XML (Ajax) is adopted, in addition to SVG being used as a tool for visualization. Ajax is implemented in the form of a Web application that executes processing while transmitting and receiving XML or plain text data without updating an entire Web page by using the HTTP communication function of JavaScript implemented in a Web browser. Because HTML was output whenever a user request was processed in a CGI-based Web application, as before, sufficient operability was not achieved. However, a seamless Web application that does not make a user conscious of server can be realized because only the minimum information required for updating the

Web-based Statistical Graphics using XML Technologies

25

Fig. 22. System structure of Okayama market area analysis system

Web page is obtained from the server by using Ajax, and the description of HTML and SVG can be updated through the API of DOM. Because SVG consists of XML and a text file, SVG can be treated in the same manner as HTML on Web sites that are constructed using Ajax technology. Naturally, the user interface is a Web browser, and the HTML file, which is a frame consisting of three files (menu.html, layercontrol.html, okayama.svg), is loaded first from a Web server. The back-end of the server side is RDBMS. A CGI script that receives requests from clients, sends queries for RDBMS, receives the results, and sends the required data to the client, is installed in the Web server. A map and related information regarding Okayama prefecture are contained in okayama.svg. The map is described by polyline elements, which are grouped together as g elements with id property for each municipalities. In part of a script, the following five functions are defined and can be called from menu.html and layercontrol.html by being associated with document of index.html upon loading. • setBorder(col) - Boundaries of cities, towns and villages are set by colors designated with col. Display or non-display of the boundaries is realized by changing the color of the boundary to black or white, respectively. • fillColor(id,col) - Fill the domain of the cities, towns and villages designated with id with the color designated with col. Called when drawing a choropleth map. • setVisibility(id,visibility) - Set the visibility property of the layer for related information designated with id in visible and hidden attributes. • setArrow(id1,id2,sw) - Draw the arrow with stroke width sw from a government office of id1 to a government office of id2. Called when drawing a spider diagram.

26

Yoshiro Yamamoto, Masaya Iizuka, and Tomokazu Fujino

• eraseArrows() - Erase every arrow. layercontrol.html consists of check boxes to display a layer of related information, and the setVisibility function is called when an onclick event occurs for each check box. menu.html provides an interface such as a dropdown menu to select the city, town or village, shopping item, and survey year in order to calculate degree of dependence and outflow rate. Functions called by event from this interface are defined in index.html. First, in the script part of index.html, an XMLHttpRequest object is made as follows: xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");

This object has facilities of publishing HTTP requests, obtaining resources and analyzing them as XML, and building DOM trees, but only the first of these facilities is used herein. In other words, the object is utilized as a simple HTTP client component. The getResult function receives the URL of the resource as an argument and initiates communication between the Web browser and the server that has the resource by open method of XMLHttpRequest. The process when the communication status with server changes is described in onreadystatechange method as a function, in which, when reception has been completed normally, the received data is stored in res, and res is returned as the value of the function. The actual program is as follows: function getResult(serverURL){ var res = ’’; xmlhttp.open("GET",serverURL,false); xmlhttp.onreadystatechange = function(){ if(xmlhttp.readyState == 4 && xmlhttp.status == 200) res = xmlhttp.responseText; } xmlhttp.send(null); return(res); }

The getDepend function passes the URL of the CGI script (getDepend.cgi) of the server to the getResult function with a parameter that is necessary for calculation and then receives the ID of the cities, towns and villages and any color data necessary for drawing a choropleth map. A choropleth map is then made by calling the fillColor function as argument with this data. The actual program is shown below. function getDepend(){ var res = getResult(’./cgi-bin/getDepend.cgi?itemcode=’ +shoppingitem+’&place=’+place+’&year=’ +years[yidx]); var ary = res.split(’\n’) for(var i=0;i > > > >

devSVG(width=8,height=8) plot(LifeCycleSavings,oma=c(10,10,10,10)) help(LifeCycleSavings) title("Life Cycle Savings: Data on the savings ratio 1960-1970.") dev.off()

Thus, we can obtain the output of SVG shown in Fig. 23. There is little change in the labor required to input the command that outputs the graphics in other devices. In addition, we can immediately use functions such as expansion, reduction and movement, as shown in Fig. 24. 26

http://www.darkridge.com/ jake/RSvg/

28

Yoshiro Yamamoto, Masaya Iizuka, and Tomokazu Fujino

RInG library If we use techniques such as tooltip display or layer changing, as described in the scripting of SVG section, we can develop interactive statistical graphs. However, applying these techniques is difficult for users who have no knowledge of JavaScript or DOM. In addition, RSvgDevice does not have a function that adds an interactive function, and instead can only output drawing information in SVG form. An interactive function can be obtained by editing the SVG form graphs that are output using RSvgDevice. However, realization of this function requires a great deal of work, including adding a script, grouping the elements, and numbering the ID. To solve such problems, we developed the R Interactive Graphics (RInG) library. The RInG library provides a function of R that outputs fundamental statistical graphs, including an interactive function, in SVG form. At present, the source code and binary package of the Windows version can be downloaded from our Web site. After installing this package, the user can use the RInG library by the following command: > library(ringlib)

In RInG library, we provide three kinds of functions that output fundamental interactive statistical graphs. iplot is a function that outputs a twodimensional scatter plot with an interactive feature using SVG. Interactive statistical graphs provided by this function have the following features. When the mouse cursor is positioned over a data point, the value of the data point is displayed by a tooltip, and additional lines running from the data point to both the x and y axes of the graph also displayed. Two output examples of the use of iplot are shown in Fig. 25 and Fig. 26. The figures are obtained by the following commands: > iplot(cars$speed, cars$dist, col="blue", grid=T, + main="Speed and Stopping Distances of Cars") > iplot(BJsales,type="h",col="red", + main="Sales Data with Leading Indicator")

ihist is a function that outputs the histogram in SVG form with interactive features. This function provides the following interactive features. When the mouse cursor is positioned over a bar in the histogram, information for this bar, such as upper value, lower value, class value and the number of observations, are displayed as a tooltip. The histogram, shown in Fig. 27, is output by the following command: > ihist(rnorm(1000),breaks="FD")

iboxplot is a function that outputs a boxplot in SVG form with interactive features. This function has a number of interactive features. If the mouse cursor is positioned over the box of the boxplot, Tukey’s five number summary is displayed as a tooltip. In addition, the data point value is displayed as a tooltip when the mouse cursor is positioned over an outlier data point. By typing the following command, a boxplot such as that shown in Fig. 28, is output:

Web-based Statistical Graphics using XML Technologies

29

Fig. 25. Scatter plot obtained using the Fig. 26. Vertical line plot obtained using iplot function in the RInG library iplot

Fig. 27. Histogram obtained using the Fig. 28. Boxplot obtained using the iplot iplot function of the RInG library function of the RInG library > attach(InsectSprays) > iboxplot(split(count,spray),xlab="spray",ylab="count", + main="Effectiveness of Insect Sprays")

In ”Okayama trade area analysis system” described in the previous section, clicking position of public office of the cities, towns, and villages, the time

30

Yoshiro Yamamoto, Masaya Iizuka, and Tomokazu Fujino

Fig. 29. Time series plot of Okayama trade area analysis system

series plot of ”Degree of dependence” and ”Outflow ratio” is obtained. This feature is implemented by using RInG library, which can output the code of SVG to not only a file but also standard output directly. Therefore, the interactive plot dynamically generated without the temporary file according to the request of the user can be provided through the Web by using RInG library and CGIwithR library together. Up to now, there have been many CGI-based systems using R, but most of them have adopted a method to call R from script language such as Perl. However, by using this method, it has been possible to achieve from communication with RDBMS to statistical computing and graphics only by R.

References 1. Boyens, C., G¨ unther, O., Lenz, H-J. (2004) Statistical Databases. In: Gentle, J.E., H¨ ardle, W., Mori, Y. (eds) handbook of Computational Statistics –Concepts and Methods–. Springer, Berlin Heidelberg. 2. Eisenberg, J.D. (2002) SVG Essentials. O’Reilly. 3. Fitzgerald, M. (2004) XML Hacks. O’Reilly. 4. Fujino, T., Yamamoto, Y. and Tarumi, T. (2004) Possibilities and Problems of the XML-based Graphics in Statistics, COMPSTAT2004 Proceedings in Computational Statistics, 1043-1052. 5. Geroimenko V. and Chen, C. (eds) (2004) Visualizing Information Using SVG and X3D. Springer.

Web-based Statistical Graphics using XML Technologies

31

6. Geroimenko V. and Chen, C. (eds) (2003) Visualizing the Semantic Web. Springer. 7. H¨ ardle, W., Klinke, S. and M¨ uller, M. (2000) XploRe Learning Guide, Springer, Berlin Heidelberg. 8. Klinke, S. (2004) Statistical User Interfaces. In: Gentle, J.E., H¨ ardle, W., Mori, Y. (eds) handbook of Computational Statistics –Concepts and Methods–. Springer, Berlin Heidelberg. ardle, 9. Marchette, D.J. (2004) Network Intrusion Detection. In: Gentle, J.E., H¨ W., Mori, Y. (eds) handbook of Computational Statistics –Concepts and Methods–. Springer, Berlin Heidelberg. 10. Mori, Y., Fujino, T., Yamamoto, Y. and Tarumi, T. (2004) XML-based Applications in Statistical Analysis, Proceedings of Interface 2004: Computational Biology and Bioinformatics, 36th Symposium on the Interface. 11. Mori, Y., Fujino, T., Yamamoto, Y, Kubota, T and Tarumi, T. (2004). XMLbased Applications in Statistical Analysis. Proceedings of Interface 2004: Computational Biology and Bioinformatics, 36th Symposium on the Interface (CDROM). 12. Symanzik, J. (2004) Interactive and Dynamic Graphics. In: Gentle, J.E., H¨ ardle, W., Mori, Y. (eds) handbook of Computational Statistics –Concepts and Methods–. Springer, Berlin Heidelberg. 13. Wilhelm, A. (2004) Data and Knowledge Mining. In: Gentle, J.E., H¨ ardle, W., Mori, Y. (eds) handbook of Computational Statistics –Concepts and Methods–. Springer, Berlin Heidelberg. 14. Yamamoto, Y., Iizuka, M. and Fujino, T. (2005), Consideration for Developing Environments of Web-based Interactive Statistical Graphics, 55th session of the International Statistical Institute.