Thematic Mapping in the Internet: Exploring ... - Semantic Scholar

2 downloads 3002 Views 924KB Size Report
Mar 18, 1999 - Section 4 presents an application of the client-server approach outlined in .... module called Application Builder supports this activity. This module supports ..... census statistics for the UK in a familiar desktop environment.
Thematic Mapping in the Internet: Exploring Census Data with Descartes Gennady Andrienko*, Natalia Andrienko*, and Jackie Carter+ *

GMD - German National Research Center for Information Technology Schloss Birlinghoven, Sankt-Augustin, D-53754 Germany Tel: +49-2241-142329 Fax: +49-2241-142072

E-mail: [email protected] URL http://allanon.gmd.de/and/ +

Manchester Computing, University of Manchester, M13 9PL, UK Tel: +44(0) 161 275 6075 Fax: +44(0) 161 275 6040

E-mail: [email protected] 1. Introduction Analysis of demographic information is impossible without thematic mapping. Last year significant efforts were involved into putting such data coupled with interactive mapping service into the Internet. For example, 220 demographic variables from 1990 US Census are available online with CIESIN Demographic Viewer1. Similar projects are running in other countries. Thus, in the UK the results of 1991 Census are being made available to the academic community in the WWW. For real usefulness of public access of Census data it is necessary to provide a convenient interface for the selection and querying of data, calculation, thematic mapping and manipulations with produced maps. Section 2 of the paper discusses alternative architectures for Internet mapping services. A general distinction is made with regard to the client interface in terms of map and data manipulation functions. Section 3 presents some general features of our solution, Descartes2, which is particularly well suited to provide Internet access to georeferenced statistical data to a wider public, but which may also be used as a Web-based professional tool for the analysis of such data. Section 3 already provides some examples of using Descartes. Section 4 approach described Census of 1

2

presents an application of the client-server outlined in Section 2, and the software in Section 3, to a subset of the 1991 UK Population; a voluminous digital collection of

URL http://plue.sedac.ciesin.org/plue/ddviewer/

Many examples of use of Descartes are publicly available as demonstrators at http://allanon.gmd.de/and/java/iris/ and through the home page of Dialogis Software and Services GmbH, a spin-off company of GMD offering a commercial version of Descartes, cf. http://www.dialogis.de/. The further development of Descartes is partially funded by the European Commission, DGIII, under contract 28983, project CommonGIS (November 1998 – April 2001).

socio-demographic data which is freely available to the UK academic community. Descartes is being employed in order to develop a visualization gateway to this valuable data resource, in order that users will be able to assess the appropriateness of the data for their use, through visual exploration, before downloading for analysis locally.

2. Organization of mapping services in the Internet: possible solutions Most GIS vendors propose solutions for Internet mapping on the basis of their existing packages. The typical architecture involves a GIS running as a CGI script [6,9] on an Internet server computer (see Fig. 1). For example, versions 2 of the CIESIN Demographic Viewer, Canadian census WWW gateway, and Pennsylvania Statistics3 have similar architecture. The interaction between the user and the system is done through CGI forms displayed by a World-Wide Web (WWW) browser on an Internet-connected client computer. CGI forms provide a rudimentary graphical user interface. A user’s input by means of a form is transformed into commands to the GIS. In response the GIS generates a map and sends it to the client computer as a raster image, typically in the GIF format. The limitations of this approach are obvious: • restricted user interface; • low interactivity (even such a standard operation as zooming is impeded); • insufficient quality of graphics (limited not only by the speed of the Internet connection but also by some restrictions of the GIF format); • substantial delays when big raster images are sent from the server to the client in each transaction. Simple interactivity for server-side mapping can be implemented in the JavaScript language. A script may 3

URLs http://atlas.gc.ca/cgi-bin/hsi/gmap/schoolnet/issuemap/ and http://www.maproom.psu.edu/

enable a user to select one out of a set of previously prepared GIF images to be shown on the page. For example, if somebody wants to highlight one of 10 objects presented on a map, he should prepare 11 maps: one without highlighting and 10 highlighting one object each. After all the images have been downloaded to the client computer the script allows reasonably fast interaction with them on the screen4. In the same way M.Peterson implemented simple dynamic mapping and animation [8]. The advantage of this approach is that JavaScript programs can be interpreted by most browsers. The language is very simple and can be used by nonprogrammers. However, visualizations are limited by raster GIF files, and interaction capabilities are very restricted. Internet server computer

Internet-connected client computer Commands

Internet HTTP server

Internet browser Raster graphics (GIF files)

CGI script

GIS

Internet (HTTP protocol)

Database

Fig.1. Internet mapping: server-side GIS running as CGI script. Another solution consists in the use of Java applets (Fig. 2). Java is a programming language that can be interpreted by most Internet browsers on various hardware platforms. Java applets embedded in WWW pages are loaded by a browser from an Internet server computer and then executed on a client computer without requiring any installation. Applets can access data stored on the server through the HTTP protocol. Advantages of this solution are the opportunities for providing an advanced user interface, unlimited user interaction, and generation of high-quality graphics. This approach has also certain shortcomings. First, a critical issue is the size of the Java applet: downloading of a big applet may require quite a long time. This may seriously inhibit augmentation of the functionality. Second, applets may run rather slowly on some hardware and software platforms. Third, data security is not guaranteed: any other software can access database files accessed by the applet. Although these files can be encrypted, this is not a perfect solution: it is possible to open the encryption mechanism through decompiling the

code of the applet. For these reasons most existing mapping applets only provide very basic map-viewing capabilities5 and are not used for real problem solving [10]. Internet server computer Internet HTTP server

Data

See URLs http://www.geog.fuberlin.de/bb/themen/politik/brburg/k98/k98bb.html and http://maps.unomaha.edu/AnimArt/ActiveLegend/Peterson.html

Internet-connected client computer Internet browser Java applet

File operations

Database

Internet (HTTP protocol)

Fig.2. Internet mapping: client-side GIS Java applet Sometimes, instead of Java, other interpretable programming languages, such as Tcl/Tk, are used. An example is the system for visual analysis of spatially referenced data named CDV [4,5]. In such a case, to make the system work in the Internet, special software called browser plug-ins must be installed on the client computer6. This requirement may actually be a serious limitation because end users may have insufficient competence for doing the installation or be unwilling to do it. The available Internet mapping software, like all GIS in general, is rather weak with regard to the cartographic presentation of spatially referenced (statistical) data. Most GIS developers seem to consider this function as marginal. In contrast, our system Descartes is primarily designed to support visual exploration of spatially referenced data through automated cartographic visualization. Correctness of map presentations, i.e. compliance with the established principles of graphic and cartographic design, is very important for proper data interpretation and, as a consequence, for problem solving. As we do not want to presume that all potential end users are familiar with these principles, Descartes incorporates such knowledge about thematic data mapping, and on this basis acts as a professional cartographer providing users with automatically built maps appropriately presenting user-selected data. The maps built by Descartes are not just static pictures. The system offers a wide range of interactive tools to manipulate the presentations in order to make them more expressive and to create more beneficial conditions for 5

See URLs http://www.lutumtappert.de/java/wahl98.htm and http://www.vtt.co.jp/staff/sorokin/mt/ 6

4

Requests for connection to data (URL opening)

The same applies to the latest versions of the Java language that are not supported yet by most Internet browsers. Unfortunately, different versions of the Java plug-in are not completely compatible, and this may cause some trouble for end-users.

revealing significant regularities.

relationships,

patterns,

and

The services of Descartes are available in the Internet. For this purpose, the functionality of the system is distributed between two parts: the server and the client. The server7 provides database access, data manipulation (fulfillment of queries, calculations, transformation of structure of tables etc.), and map design (selection of presentation methods and setting of their parameters using the internal knowledge base on thematic mapping). The client provides the user interface, draws the vectorbased maps, and supports interactive manipulation of these maps. Internet server computer

Internet HTTP server

Internet-connected client computer Start of the Descartes server

Internet browser

Commands

Descartes Java client

CGI script

Descartes server Data, map descriptions, results of queries and calculations

Descartes application: • spatial data (points, lines, polygons); • attribute data; • semantic relations

Internet (HTTP protocol and socket connection)

3. Mapping attribute data in Descartes When designing a map Descartes takes into account several characteristics of the data variables that the user wants to visualize: the number of data variables selected for simultaneous presentation, their types, the number of different values, and semantic relationships between the variables. All these characteristics, except for the relationships, can be found in the internal database schema or derived from the data. Semantic relationships (IS-A, PART-OF, and comparability)9 have to be described in the application building process by a person familiar with the meaning of the data. An interactive module called Application Builder supports this activity. This module supports conceptual indexing of tables with attribute data, linking them with spatial data (coordinates and geometry of spatial objects), and description of semantic relationships. A table in Descartes is a collection of uniform records (rows) composed from fields (columns). The data should refer to some geographical objects listed in one of the columns. A file with coordinates or outlines of these objects in vector format should be available. Fig. 4 shows how Descartes displays a table (this and all the following examples refer to a set of demographic data referring Leicester census data). Note the organization of the table caption: its

generation is based on the table index supplied to it in the application building process, and it also accounts for relationships between attributes.

Fig.3. Internet mapping in Descartes: client-server architecture with the Java client8. The client is a Java applet implemented in Java 1.0.2. Most Internet browsers such as Netscape versions from 3.0 to 4.5 and Internet Explorer versions from 3.0 to 5.0 support this version of Java. When a user loads the Internet page with the applet, it activates the server part of the system and starts communication with it through a socket connection. This approach allows combining high user interactivity and dynamic data displays on the client side with powerful functionality on the server side while the size of the applet can be kept relatively small. We admit, however, that our approach also may exhibit a drawback: some local networks are protected by firewalls that prohibit socket operations and thus forbid to run Descartes applications across the firewall. A general scheme of Descartes' functionality is presented in Fig. 3.

7

Implemented in C++ and is available for different dialects of Unix and for Windows 95 and NT.

Fig.4. A table with data as it is shown in Descartes. Census data Crown copyright. Having a table on the screen, the user can perform some operations to define the scope of data to be represented graphically: s/he can select table columns or/and impose a filter on table rows (restrictions on data values). S/he can also specify an arithmetic formula over the columns

8

The latest version of CIESIN data viewer has the similar architecture. However, its visualization capabilities are limited to mapping of a single numeric variable by classified choropleth maps.

9

Examples of semantic relationships: “Female population” part-of “Total population”, “Mammals” is-a “Animals”, “Birth rate” comparable-with “Death rate”.

and let the system do the corresponding calculations for each record, thus storing the results in a new, dynamically generated column. When the data selection process is finished, and the user presses the button “Visualize”, the system activates the visualization design. As an example (cf. Fig. 4), suppose that the user has selected the column containing values of the attribute “Ill per Capita”. Fig. 5 shows the result of the visualization design as presented to the user. The three icons symbolically represent the three alternative visualization methods that Descartes proposes and applies to build the maps (due to the transformation from a color screenshot to the grayscale picture the second and the third icons in Fig. 5 appear to be identical). Fig.7. The presentation of a map and its legend in Descartes.

Fig.5. A single numeric variable can be presented in different ways. Selection of other attributes will result in different, casespecific proposals of alternative map designs. Fig. 6 presents an example where the user selected several variables, each referring to a different age group of the total population. The variables are thus linked by the relationship of inclusion in a common total (Fig.6).

Fig.6. Presentation of several variables with the relationship of inclusion in a common total. To actually view a map, the user selects the corresponding row and presses the "Show" button. Thus, after having selected the third item (pie-chart) in the list of Figure 6, a window as shown in Figure 7 will be created.

Maps in Descartes allow a number of standard GIS operations: zooming and panning, switching on/off geographical layers, and changing colors and sequence of drawing of geographical layers. The system also enables interactive access to source data through maps: when the mouse cursor moves over some geographical object, the exact data values associated with this object will be shown in the bottom right frame of the window (Fig. 7). Map legends provide keys to correct map interpretation and additional information useful for analysis, such as a summary statistics of data variation as shown in Figure 8. The statistics (minimum, maximum, median, and quartiles) are presented graphically by the Tukey’s “box and whiskers” plot.

Fig.8. Map, legend, and dynamic manipulation control in Descartes. Highlighted are objects with value of the attribute approximately equal to 0.033. Boundary data copyright ED-Line Consortium. More details about visualization design and many further facilities for interactive manipulation of the maps generated by Descartes can be found in [1,2]. In this paper, we only mention one further capability: once a user has found and interactively adapted a specific map

of interest this map can be saved for further use. Importantly, such a map is not just a raster picture but a description of the data and the visualization method that was chosen for presenting the data (pie chart, bar chart, or the like). This description can be interpreted by Descartes itself, or by another standalone applet called ShowMap, which can be seen as a functionally reduced variant of Descartes specifically designed for map viewing. In both cases the map is shown to the user in such a way that all interactive operations are enabled: data access, panning and zooming, selection of layers, manipulation of the presentation parameters etc. Unlike the full Descartes system, however, the ShowMap applet can only handle the given map and data, i.e. no additional attributes or alternative presentation methods can be selected.

For example, maps on figure 9 show cross-classification of objects on the basis of 2 numeric attributes: proportion of ill per capita and average number of cars per capita. The applet allows a user a complete spectrum of interactive manipulations with the map. Thus, a user might click to the right part of the number line for “cars per capita” attribute, and the system hides all regions where number of cars per capita smaller than 0.4. Other objects are painted in different colors depending on the proportions of “ill per capita”. A user can also highlight any objects by pointing to the map, scatter-plot, or any of number lines. He can dynamically change a number of classes and move the class ranges.

Fig.9. Interactive maps shown by ShowMap applet. Boundary data copyright ED-Line Consortium. Advantages are that due to the reduced functionality operating the ShowMap applet is “easier” to learn, and also its size is reduced resulting in a faster loading time. The presentation of a map saved by Descartes can be embedded into standard HTML pages - just like other graphics10. Clicking on the graphics will invoke the ShowMap applet thus making the picture come alive. Another significant difference to the full Descartes applet is that the application of ShowMap does not involve any communication with the server, and therefore will not be disabled by firewalls.

4 Applications and General Tasks for MIDAS In the UK, MIDAS (Manchester Information Datasets and Associated Services) provides a national support and dissemination service to the academic community for census statistics and related datasets and information. Online access to the census area statistics is provided by MIDAS, although until recently various weaknesses in the dissemination model, combined with issues relating to copyright and access restrictions, confined the user 10

Several sets of HTML pages with embedded maps are available at the URLs http://allanon.gmd.de/and/IcaVisApplet/ and http://allanon.gmd.de/and/java/iris/app/elect/indexm.html

base to those with higher levels of computer literacy and/or data handling skills. Thus, a user would have to first register for the data they were interested in, then log into a remote Unix server over the Internet via Telnet, and issue a series of complex commands to the access software within an often unfamiliar command line environment. Once they had extracted the required data they could either use the ftp protocol to transfer the data from the remote machine to their local machine for analysis, or work online with the data with software supported on the Unix server. The combination of skills required to access and use the data have resulted in some researchers being discouraged from using what is one of the most comprehensive socio-demographic resources available to the academic community. Current initiatives to overcome these problems have resulted in the release of a Web-based interface to the census area statistics (Casweb11) together with the development of a visualization gateway to the Census data held at MIDAS. The intention is to increase accessibility to, and the use of, census counts and associated spatial data resources (such as boundary data files which can be used for mapping census data at a variety of geographical levels) to all users, regardless of their level of expertise. The Web-based interface to 11

See http://census.ac.uk/casweb/

census area statistics was launched in September 1998 and requires only that users are registered to use the data. Additional functionality is available for users who are prepared to download a plug-in to their Web browser, which will then enable them to use a dynamic map interface for the purpose of area selection of data. The development of a Visualization Gateway seeks to extend this work by incorporating visualization functionality into the interface so that users can decide on the appropriateness of data for their use before they download them (Carter and Dykes, 1998). The cartographic data visualizer (cdv) was developed in order to demonstrate the cartographic visualization approach to exploratory data analysis (EDA) [3,4,5]. The approach taken to the development of this software owes its origins to Visualization in Scientific Computing (ViSC) in the 1980s [7]. The emphasis is on real-time, interactive analysis based on graphic representations of data. ViSC systems have several definitive features, notably the capacity to interrogate images on the screen, interact with and control a representation, re-express the data in numerous ways, and to produce multiple alternative and linked views. Such techniques can help in interpreting and understanding large and complex datasets, such as the census; the aim is to use graphics in the development of idea and not simply for presentation purposes. In order to use the cdv software a user can either access it remotely using an Xwindows interface to a Unix server, download it to their own PC or run it from a CD. A hypertext guide is available which enables users to work though the functionality step by step. Datasets are bundled with the software for demonstrator purposes but additional datasets are available for 1991 registered census users, and instructions have been written to allow users who have their own datasets (digitised boundary data plus associated area enumerated attribute data) to incorporate them into cdv. A series of workshops and seminars has been offered throughout the UK to interested users, and several universities have incorporated the use of the software into their undergraduate teaching programs. However, whilst cdv has been used successfully and has helped in promoting a cartographic visualization approach to EDA, there are several limitations in adapting it to be Internet enabled. Although the aforementioned Tcl/Tk plugin provides a means for creating dynamic maps on the Internet, security considerations require that the functionality available for use with the plugin is only a subset of the full application. Most of the file manipulation commands for opening and reading data files are missing. Tcl scripts written for use on the Web must therefore be distributed as single ASCII files containing both program code and data. Such scripts are a static, inefficient and insecure means of distributing spatial information, especially when such information is access restricted.

There are several advantages to a visualization gateway adopting the client-server approach described in Section 2. Firstly, as the use of Web browsing software becomes more familiar to all users, the limitations imposed on an individual to accessing and retrieving census data, as discussed above, will be mitigated. The user can thus concentrate on the task of searching for relevant datasets for their area of interest. Secondly, by introducing a visual dynamic environment in which data can be explored before they are extracted, the user can be sure of the appropriateness of the data for their purposes before they download them for further analysis. Combining census attribute data with digitised boundary data requires a set of skills which only a limited numbers of users may have. The use of either a GIS, or mapping software, to explore the data demands specific knowledge of not only the data, and the software, but also of spatial concepts (for example, how the data can be combined) and cartographic expertise. Additional steps may be required in order to reformat the data before it can be used in the GIS software. By introducing a visualization gateway which allows for the selection of data and visual exploratory analysis to take place all within the familiar environment of the user's desktop, the obstacles of learning additional commands for controlling access software, and for downloading data for use locally, can be avoided. The solution offered by the Descartes system is currently being investigated at MIDAS. The server software has been installed on a Sun superserver. Demonstrator data for the UK county of Leicestershire, at electoral ward level, is currently available for investigation using the system12. Although this data is Crown copyright, under terms and conditions previously negotiated by the UK Office for National Statistics and the ED-Line Consortium (responsible for the provision of the digitised boundary data) these data are available for academic use. Data for other areas will require a user to be authorised, and so authentication will be an important part of the final gateway. The overheads in using Descartes are that the application builder13 must be used each time a new application (i.e. new dataset) is input to the system. The visualization gateway aims, initially, to offer users an area that will be of interest to them (eg. a single county in the UK, at 12

See http://lenny.mcc.ac.uk/kindsdb6/ and http://kandinsky.mcc.ac.uk/VisGat/Descartes/testDescartes.asp 13

The application builder is an interactive tool for linking tables with geographical information. The tool also helps to describe semantics of data, i.e. relationships between data components. This information is used later for knowledge-based visualization design. Generally, it is possible to skip this step, i.e. do not provide any metainformation. In this case the system will assume that all attributes are independent and will not select and produce some visualization techniques, for example, pie charts or segmented bars.

electoral ward level) in order that they can select a local area for investigation. Data subsetting must therefore take place, both at the census attribute and at the geographic level, and a collection of data be made available to cover the entire country. Subsequently an application will be created for each of these data subsets, and a new user will be offered a drop down menu from which they can make their choice. It would be helpful to be able to combine Descartes with the new census download interface (Casweb) so that a user could select any area of the country for which data were available, at any level of geography, together with the census attributes that they wished to explore. This level of flexibility is currently not possible due to the user interaction required at the 'application builder' stage. MIDAS is committed to providing increased access to the census of population statistics in order to return the investment that the academic community has made in the provision of this data resource. The Descartes solution will enable the server-client approach to help otherwise inexperienced users to locate, explore and learn about census statistics for the UK in a familiar desktop environment. This solution overcomes many of the previous hurdles encountered in attempting to encourage the use of visual exploration of census datasets.

References

Journal Geographic Information Science, 13 (4), to appear. 3.

Carter, J., and Dykes, J., 1998, Developing a Visualization Gateway for the UK Census of Population. In Proceeding Spatial Data Handling’98, (Vancouver Canada, July 1998), 543-555.

4.

Dykes, J.A., 1997, Exploring Spatial Data Representation with Dynamic Graphics. Computers & Geosciences, 23 (4), 345-370.

5.

Dykes, J.A., 1998, Cartographic Visualization: Exploratory Spatial Data Analysis with Local Indicators of Spatial Association using Tcl/Tk and cdv. The Statistician, 47 (3), 485-497.

6.

Grossley, D., and Boston, T., 1995, A Generic Map Interface to Query Geographical Information using the World Wide Web. World Wide Web Journal, 1996, pp.723-735. (Proceedings of the 4th WWW Conference, December 1995).

7.

McCormick, B.H., DeFanti, T.A., and Brown, M.D. (eds.), 1987, Visualization in Scientific Computing, Computer Graphics, 21 (6).

8.

Peterson, M.P., 1999, Active Legends for Interactive Cartographic Animation. International Journal Geographic Information Science, 13 (4), to appear.

9.

Plewe, B., 1997, GIS Online. Information Retrieval, Mapping, and the Internet. Santa Fe: OnWorld Press.

1.

Andrienko, G., and Andrienko, N., 1998, Intelligent Visualization and Dynamic Manipulation: Two Complementary Instruments to Support Data Exploration with GIS. In Proceedings Advanced Visual Interfaces'98 (AVI’98, L’Aquilla Italy, May 1998), ACM Press, pp.66-75.

10. Sorokine, A., and Merzliakova, I., 1998, Interactive Map Applet for Illustrative Purposes. In Proceedings of 6th International Symposium on Advances in Geographic Information Systems (ACM-GIS’98, November 6-7, 1998, Washington DC), ACM Press, pp.46-51.

2.

Andrienko, G., and Andrienko, N., 1999, Interactive Maps for Visual Data Exploration. International

Last updated: March 18, 1999

Suggest Documents