... Affecting User's Perception in Map Simplification: An Empirical Analysis. Authors; Authors and affiliations. Vincenzo Del Fatto; Luca Paolino; Monica Sebillo ...
Spatial Factors Affecting User’s Perception in Map Simplification: An Empirical Analysis Vincenzo Del Fatto, Luca Paolino, Monica Sebillo, Giuliana Vitiello, and Genoveffa Tortora Dipartimento di Matematica e Informatica, Università di Salerno 84084 Fisciano (SA), Italy {vdelfat,lpaolino,msebillo,gvitiello,tortora}@unisa.it
Abstract. In this paper, we describe an empirical study we conducted on the application of a simplification algorithm, meant to understand which factors affect the human’s perception of map changes. In the study, three main factors have been taken into account, namely Number of Polygons, Number of Vertices and Screen Resolution. An analysis of variance (ANOVA) test has been applied in order to compute such evaluations. As a result, number of vertices and screen resolution turn out to be effective factors influencing the human’s perception while number of polygons as well as interaction among the factors do not have any impact on the measure. Keywords: Ramer-Douglas-Peucker Algorithm, Human Factors, Cognitive Aspects, Maps, Controlled Experiment, ANOVA Test.
1 Introduction In the area of geographic information management, simplification of lines is a generalization process that aims to eliminate the “unwanted” map details. That is to say, a simplification algorithm is a method to select a subset of points which best represents the geometric properties of a polyline while eliminating the remaining points. In spite of the apparent straightforwardness of the concept of map simplification, this process may become very complex and is presently addressed as a solution to several interesting research issues in the field of spatial data management. As a matter of fact, simplification commonly yields: • • •
Quick Map Transmission: Simplified maps may be quickly transmitted over the network with respect to original maps because they are slighter in terms of number of vertices. Plotting time reduction− If the plotting process is slow, it may cause a bottleneck effect. After simplification, the number of vertices is reduced, therefore decreasing the plotting time. Storage space reduction − Coordinate pairs take up large amounts of space in GISs. The coordinates or data can be significantly reduced by simplification, in turn decreasing the amount of storage space needed and the cost of storing it.
M. Bertolotto, C. Ray, and X. Li (Eds.): W2GIS 2008, LNCS 5373, pp. 152–163, 2008. © Springer-Verlag Berlin Heidelberg 2008
Spatial Factors Affecting User’s Perception in Map Simplification
•
153
Quick Data Processing: Simplification can greatly reduce the time needed for vector processing a data set. It can also speed up many types of symbolgeneration techniques.
The problem focuses on choosing a threshold indicating what simplification level we consider acceptable, that is to say the rate of how many details we are ready to sacrifice in order to obtain a faster plotting, a faster transmission, or a reduced storage space for our geometric data. Generally, choosing such a threshold is a complex problem and users are usually let to decide on a case-by-case basis, and very little human guidance or automated help is generally available for this. In this paper, we carried out an empirical study aimed at finding out which factors affect the choice of the threshold when the final user’s aim is just to see the original map and no post-elaboration has to be done. That is to say, we performed an empirical study with the goal to understand which factors influence the map reduction threshold by measuring the subject’s perception of changes of maps. Such a comprehension is fundamental to perform successive regression analysis in order to extract a general rule for calculating this threshold. One of the most important tasks we faced during the experiment design phase was producing the maps which could be shown to the subjects in order to highlight which factors mostly affect their perception. Once determined such factors (number of vertices (NV), number of polygons (NP) and screen resolution (SR)) we chose some maps which satisfy any combination of the factor levels. Therefore, for each map we generated some simplified versions at fixed rates by means of a web application which computes the Douglas-Peucker (RDP) algorithm [4,16]4. The remainder of this paper is divided as follows. In Section 2, we present some preliminaries. Then, Section 3 introduces the experiment design, we present all the choices we have made and the tests we have performed to prove our assertions. Section 4 presents the result of the test we performed along with their interpretations. A discussion on the achieved results concludes the paper in Section 5.
2 Preliminaries Map Simplification Contour lines are the model generally used for geographic features representation of digital maps in Geographical Information Systems (GIS). How to simplify map contour lines is a very popular research issue in the field of Map Generalization [2,12]. In particular, the need to provide on-the-fly the user with the same map at different scales makes the automated map simplification a relevant Map Generalization technique. Many methods have been developed to perform map simplification [9,5], such as, among the most cited, the Visvalingam-Whyatt algorithm [22] and its extension [24]. Nevertheless, the method most commonly adopted within GIS and Cartographic tools and applications is certainly the Ramer-Douglas-Peucker (RDP) algorithm [4,16] and its improvements [7,26,17]. This algorithm uses a threshold in order to remove line vertices from the map. Such a characteristic makes the simplification process by means of the RDP algorithm difficult to tune [5], because no linear relationship exists between the threshold to choose and needed map scale [3]. To the best of our knowledge, just one study exists in literature to automate this process, namely, Zhao et al. [26] use
154
V. Del Fatto et al.
the Topfer’s Radical Law [21] to modify the RDP algorithm in order to automatically get the threshold. In other studies the threshold is selected in empirical way. In [10] a threshold of 25 is chosen for data in 1:500,000 scale, while a threshold of 35 for data in 1:1,000,000 scale. Other studies [23,15] employ similar ranges of threshold. With
regards to commercial applications two available solutions exist. ArcGIS™ provides the Pointremove and Bendsimplify operators which are both based on the RDP algorithm. The former removes vertices quite effectively by a line thinning/compression method. It achieves satisfactory results but produces angularity along lines that is not very pleasing from an aesthetic point of view. In addition, in some instances, cartographic manual editing is required. The latter operator is designed to preserve cartographic quality. It removes small bends along the lines and maintains the smoothness [8]. State of the Art As for research relating maps with the study of cognitive aspects, much work has been done especially in map design [13, 11] and in usability studies based on cartographic human factor [19]. However, concerning generalization and simplification, the literature seems to be limited to few studies in cognitive aspects of map scale [14]. These studies concern with general aspects of this topic, like what people means for basic conceptual structure of scale, size, resolution, and detail; difficulty of comprehending scale translations on maps; definition of psychological scale classes; definition of the relationship between scale and mental imagery; etc, but do not take into account how users perceive map simplification.
3 Design of the Experiment In order to understand which factors affect the user’s perception of map variation when maps are simplified by means of the Douglas-Peucker algorithm, we have performed a controlled experiment in which the influences of some key factors have been measured. In particular, in this experiment we focus our attention on three important features of the visualized map, namely the number of vertices (NV), the number of polygons (NP) and the screen resolution (SR). Some other factors might influence variation perception, such as roughness or map size. As an example, map size may affect the user perception because, it is evident that, the smaller the map, the less details the that the subject is able to perceive. However, we decided not to include the influence of this factor in our initial conjectures, and, in order to avoid that it biased the results we gathered, we decided to present maps according to a predefined size, namely 26cm x 15cm. Besides verifying whether the considered properties affect the user’s perception of map variation, we also wished to measure the degree of influence of each factor in order to find out whether some general rules exist for optimizing the reduction factor on the basis of the map properties. In the following we formally describe the experiment by highlighting the most important choices we had to face. Independent Variables. In our experiment, the independent variables are NV, NP and RES, while the dependent one is the Significant Simplification Rate (SSR), which is defined as the RDP simplification rate that we observe when the subject starts to see evident changes in a reduced map with respect to the original map.
Spatial Factors Affecting User’s Perception in Map Simplification
155
Number of Vertices - NV. The number of vertices is the first independent variable we took into account in our experiment. For the sake of clarity, we call vertex a point which lies on either the start or the end of a map segment. Usually, this value may be represented by an integer variable in the 0-infinite range. However, for our purpose we decided to segment the range of possible values into four categories, namely: (1) from 1000 to 2000, (2) from7000 to 8000, (3) from 46000 to 47000 and (4) from 100000 to 120000. Figure 1 shows an example of a map used in the experiment with a 40% simplification map.
Fig. 1. An example map represented through two different sets of vertices
Number of Polygons – NP. The number of polygons is the second variable we investigated. It seems plausible that this factor affects the user’s perception because the higher the number of objects visualized, the more detailed the map's perception. Similarly to the number of vertices, we decided to categorize the NP variable with the following three different levels: (1) less than, or equal to, 5 polygons, (2) from 6 to 10 polygons and (3) from 11 to 30 polygons. Screen Resolution – RES. The third variable corresponds to the size of the screen where maps are visualized. By reducing the number of pixels used for visualizing the map, some vertices may overlap, thus affecting user’s perception. However, apparent overlapping can be resolved by increasing the resolution. This in mind, we considered 2 screen resolutions (levels), namely 800x600 and 1280x768, the most commonly used. Dependent variable. Usually during a map simplification (reduction of the number of points required to draw the map) process, the user’s perception of changes may vary from very little perception to a point when the simplification becomes evident. Generally, we may identify the following five levels of perception of changes: 1. 2.
3.
No simplification detected, in this case the subject does not recognize any difference between the original map and the simplified map. Some minor simplifications detected which do not affect the appearance of the map. In this case, some differences are perceived but they are not relevant. As an example, simple linearization of raw lines or smaller contour lines due to less overlapped points. Some evident simplifications detected which do not alter the meaning of the map. For instance, when rough lines get fairly straight.
156
V. Del Fatto et al.
4.
Some substantial simplifications detected which affect the information the map conveys. The map is still recognizable but some changes may hide relevant details. As examples, some peninsulas may disappear; boundaries may deeply alter their position, and so on.
5.
The map can no longer be recognized.
Based on the above classification, we defined the Significant Simplification Rate (SSR) as the simplification rate which makes the user’s perception move from level 2 to level 3. Participants. The participants we used in this research were students following the degree programme of Computer Science at the faculty of Science, University of Salerno (Italy). The complete list of individuals involved 144 subjects divided into 24 groups, namely 6 subjects per group. In order to make such groups as independent as possible we decided to randomly assign subjects to groups. The rationale behind the choice of 24 groups is that 24 is exactly the number of different combinations of values for NP, NV and RES. All subjects were asked to complete a background information sheet in order to collect both personal data and data about their experiences with computer technology and, possibly, with GIS, also in terms of the exams they had already passed and the marks they had obtained. We were interested in understanding how people who generally have marginal experience with maps perceive different levels of simplification. Consequently, we avoided subjects who work with maps or had attended some GIS courses. Actually, the individuals we selected knew the most popular map applications, such as some web-based path finders, Google Map, etc. As for their age, they ranged between 20 and 30 years old. Apparatus. As for the apparatus, we exploited three software applications, namely, SYSTAT™ [20], MapShaper [1] and ESRI ArcView™ [6]. They were run on a Windows XP© Professional platform mounted on a computer based on a Pentium Centrino™ processor with 1G RAM, a 72 Gb HD at 7200 rpm and a 15.4’’ multi resolution display. We used MapShaper for producing the reduced maps. It is a free online editor for Polygon and Polyline Shapefiles. It has a Flash interface that runs in an ordinary web browser. Mapshaper supports the Douglas-Peucker (RDP) line simplification algorithm. Finally, we used the ESRI ArcView 3.2 to produce the source maps we used in the experiment and to express them also in terms of the combination of the factor levels for, respectively, the number of polygons (NP) the number of vertices (NV) and the screen resolution (SR). Task. Every group was associated with a map which satisfies one of the 24 combinations of the independent variable levels. This map and a simplified version of it were successively shown to each subject in the group, for ten second intervals. Then, according to the scale concerning the perception of changes, we asked the subjects for a value between 1 and 5. This step was repeated, increasing the simplification rate, until either subjects reached the maximum simplification rate or rated 5 their perception of changes. For each subject we reported the value of SSR, namely the simplification rate registered when they detected some evident simplifications, which did not alter the meaning of the map (i.e., level 3 of perception).
Spatial Factors Affecting User’s Perception in Map Simplification
157
Test. Once results have been collected we applied a three way ANOVA (Analysis of Variance). The theory and methodology of ANOVA was mainly developed by Fisher during the 1920s [18]. ANOVA is essentially a method for analyzing one, two, or more quantitative or qualitative variables on one quantitative response. ANOVA is useful in a range of disciplines when it is suspected that one or more factors might affect a response. ANOVA is essentially a method of analyzing the variance of a response, dividing it into the various components corresponding to the sources of variation, which can be identified. We are interested in studying two effects - the main effect determined by each factor separately and the possible interactions among the factors. The main effect is defined to be the change of the evaluated factor (the dependent variable) while changing the level of one of the independent variables. In some experiments, we may find that the difference in the response between the levels of one factor is not the same at all levels of the other factors. When this happens, some interaction occurs between these factors. Moreover, the ANOVA method allows us to determine whether a change in the responses is due to a change in a factor or due to a wrong sample selection. Research Hypotheses. Before starting the operative phase, we formally claim the null hypotheses we wished to reject in our experiment. Table 1. Experiment hypothesis
H1. There is no difference in the SSR means of the different levels of factor NV H2. There is no difference in the SSR means of the different levels of factor NP H3. There is no difference in the SRR means of the different levels of factor RES H4. There is no interaction between NP and NV on SSR. H5. There is no interaction between NV and RES on SSR. H6. There is no interaction between RES and NP on SSR. H7. The two-way NV*NP interaction is the same at every level of RES As listed in Table 1, the first three hypotheses are concerned with the influence that each factor may have on the SSR measure, separately. The next three hypotheses are concerned with the pair-wise interaction between independent factors and its influence on SSR. Finally the last hypothesis involves the study of the interaction among all three independent variables and its influence on SSR. Assumptions. The ANOVA test is based on the following assumptions about the input data: (1) All observations are mutually independent; (2) All sample populations are normally distributed; (3) All sample populations have equal variance. The first assumption should be true on the basis of how we choose individuals and how we assigned them to groups. The Shapiro-Wilk’s test (W) [18] is used in testing for normality. It is the preferred test of normality because of its good properties as compared to a wide range of alternative tests. It is a standard test for normality used when the sample size is between 3 and 5000. The W-value given by this test is an indication of how good the fit is to a
158
V. Del Fatto et al.
normal variable, namely the closer the W-value is to 1, the better the fit is. In our case, high values of W are common to all the groups we took into account; therefore we should accept the hypothesis of normality for each group. Table 2 shows the W values obtained after having applied the Shapiro-Wilk Test to each group. Table 2. Results of the Shapiro-Wilk Test NP