METHODOLOGY FOR THE DEVELOPMENT OF A SAMPLING RECOMMENDATION WHEN USING LINEAR ELEMENTS FOR POSITIONAL QUALITY CONTROL Ariza López, Francisco Javier (*); Mozas Calvache, Antonio (*) (*) Grupo de Investigación en Ingeniería Cartográfica. Dpto. de Ingeniería Cartográfica, Geodésica y Fotogrametría. Universidad de Jaén. Campus “Las Lagunillas” s/n. 23071. Jaén (Spain). e-mail:
[email protected]. Tel: +34953212469 e-mail:
[email protected]. Tel: +34953212853
ABSTRACT A methodology for analyzing the key aspect of positional control performed using road lines as reference elements is presented. The methodology is mainly based on random simulation to generate a wide range of synthetic samples which will also comply with the criteria of the experiment design. Linear data bases are enriched in order to facilitate the application of selection criteria. Simulation results will be analyzed by means of adequate statistical tools for deriving a sound, operational and competitive methodological proposal as opposed to the point-based methodologies nowadays in use for positional control assessment.
1. INTRODUCTION AND OBJECTIVES Positional control of cartographic products is developed by means of control points: The National Map Accuracy Standard, Engineering Map Accuracy Standard, National Standard Spatial Data Accuracy, and so on (Ariza, 2002, Giordano and Veregin, 1994). For this purpose a set of well-defined points (control points) is used, both in the product (data set) and in a more precise source (reference data), mainly a field survey. In this case there are some recommendations on the size of the control point sets, and also on the spatial distribution of points (FGDC, 1998, MPLMIC, 1999). Since the development of the Global Positional System there are increasing indications of the possibility of using lineal elements for positional quality control in cartography. It is widely believed that such elements can give us at least the same capabilities of positional control as points. Some examples have been developed with actual spatial databases and there are also some theoretical and practical studies (Ariza, 2002, Atkinson and Ariza, 2002):
¬ The Epsilon band method (Skidmore and Turner, 1992). Uses the Perkal (1966) and Blakemore (1983) ideas (Figure 1.a)to propose a distance measure derived from the area enclosed by two sets of linear data (Figure 1.b).
¬ The Buffer Overlay Method (BOM) (Goodchild and Hunter, 1997). Measures the percentage of length of a data set included within a buffer distance from a reference data set (Figure 1.c). It was tested with data from the Digital Chart of the World (S1.000K) versus a local S25k data set.
¬ The Buffer Overlay Statistic (BOS) method (Tveite and Langaas, 1999). Data sets, reference and product, are
buffered deriving some indexes from overlaid areas between buffers (Figure 1.d). It was tested over some kind of linear elements (roads, railways, and so on) from the Digital Chart of the World versus a S250k data set of the National Mapping Agency of Norway.
¬ Hausdorff's distance method (Abbas, Grussenmeyer and Hottier, 1995). Proposed by researchers of the National
Mapping Agency of France (IGN-Fr). It is very similar to the Epsilon band Method but using the Hausdorff distance, and was tested with a road axes data set extracted from the BDCarto and DBTopo products of the IGN-Fr.
¬ The Maximum Proportion Standard (MPS) and the Maximum Distortion Standard (Veregin, 2000). A proposal that
takes some ideas from line generalization processes. In some way the MPS and the MDS are related to the positional control tests NMAS (USBC, 1947) and USGS DEM accuracy categories (Caruso, 1987; Carter, 1989) respectively. These were applied to a data set extracted from the Digital Line Graphs (USGS, 1999). It is interesting
to note that there are some other studies devoted to line generalization (Cheung and Shi, 2004) from which it would be possible to extract useful ideas for developing methodologies of positional control by means of lines. This new method for positional quality assessment would also need new metrics to measure positional error. Proposals for this are included in some of the studies cited above and in others (e.g. Ramirez and Ali, 2003).
Figure 1.- Ideas for the use of linear elements in positional control: a) epsilon band concept, b) area enclosed by two lines, c) buffer overlay method, d) buffer overlay statistic method. Another important issue is the positional accuracy relation between point and line-based methods but this has not received too much attention, and so there are few studies dealing with it (Van Niel and McVicar, 2002). We think that linear elements can be successfully used for positional accuracy control, but any one of the previous studies give advice on implementing a control process, adequate to a National Mapping Agency, using reference data from the field. Thus many questions arise about the size and spatial distribution of the sample, as well as the influence or not of other factors of line elements like orientation, sinuosity, and so on. And also a final question: What is the relation between point based and line-based accuracy measures?. Here we present the methodology we have developed for studying these important aspects within a research project funded by the Spanish Ministry of Science and Technology. The methodology is based on spatial and statistical simulation and on spatial database segmentation and enrichment. Our objective is to develop a methodology, applicable in a production environment, for controlling positional accuracy (horizontal) of spatial data bases by using as reference data linear elements extracted from the communications network (roads) and field surveyed by mobile GPS. As a result of this project and its methodology, we should be able to derive on sample size, spatial distribution, which kinds of elements to use (for instance: A-roads, B-roads, trucks, etc), desirable geometric properties and so on. This paper is organized into two sections: The first presents a general view of the proposed methodology. Within each stage there are many activities (GPS capture, road axis calculation, and so on) with proper methodologies but it will be impossible to explain all of them here, so only an overview is given. This is an ongoing research project, and at this time conclusions are necessarily limited to the methodology itself.
2. PROPOSED METHODOLOGY Here we present the general methodology proposed. Many of the activities of the project (GPS field point survey, the mobile GPS field line survey, etc.) have their own methodologies, but this discussion is out of the scope of this general presentation.
For our research we use two real databases from partners of the project:
¬ MTA10: The topographic data base of Andalusia at S10k scale, a product from the Instituto Cartográfico de Andalucía (ICA) (http://www.juntadeandalucia.es/obraspublicasytransportes/jsp/tema.jsp?ct=8).
¬ MTN25: The National Topographic Map at scale S25K, a product from the Instituto Geográfico Nacional (IGN) (http://www.mfom.es/ign/top_geografico.html). One sheet of the MTN25 includes 16 sheets of the MTA10.
Thus, our study is limited to the Spanish region of Andalucía, which has approximately 87.000 km2 and 10.571 km of paved roads. For both products two data sets are used:
¬ All the paved roads that appear in both products. ¬ A set of 200 well defined and distributed points that appear in both products. By means of a DGPS field survey all elements, points and paved roads, are to be surveyed. A road is considered to be formed by different road features (or segments), each forming part of a road between two junctions. This criterion allows us to segment roads into a different number of parts depending on the presence of more or less junctions with other roads. For each segment a set of important attributes will be derived and stored allowing for a classification of that segment. This allows the extraction of simulated samples of lineal elements by means of random or guided sampling in conjunction with filters on parameters (length, spatial position, orientation, sinuosity, and so on). As a result we will have what we call a simulation engine which allows us to analyze the effect of important sampling aspects on the soundness of the result. For each sample of line elements positional accuracy measures will be derived by applying different proposed methods, for example the BOS. Results will be compared with each other and also with positional accuracy measures derived from the global set of points for the same zone. Bellow we present the main stages of the proposed methodology. Figure 2 shows a diagram.
Figure 2.- Process flow of the research
Stage 0: Pilot test. A general test of the proposed methodology has been developed, mainly in order to improve field survey methodologies, for timing assessment and for software tool development. Figures used here to exemplify some ideas of he methodology come from this tuning work developed in the pilot zone with data from the MTN25 and applying the buffer overlay method of Goodchild and Hunter (1997). Stage 1: Study zone. The importance of the study zone is great in our study because our interest is to develop an applicable proposal. For this reason we have to consider study zones of different characteristics: topography, density of the road network, configuration of the road network (radial, parallel, see Figure 3 for an example), interior and coastal zones, etc. Another very important issue is the size of the sample. For positional control by points we will use 200 effective control points per study zone. This figure is suggested by our own research, which indicates that with this amount we have very precise positional accuracy estimation with low producer's and user's risks. As mentioned above, all the paved roads presented in a study zone will be captured by mobile GPS. We have proposed six different zones for study, which represents approximately 7000 km2 and 2600 km of paved roads. A geodetic support has to be assured for the field survey. We plan to use two or more GPS permanent stations belonging to official networks but when that becomes impossible we will establish temporal ones.
Figure 3.- Different configuration examples of road networks: a) radial predominance, b) vertical predominance, c) diagonal predominance
Stage 2: Sample design. Two field samples have to be determined, the sample of control points and the sample of reference roads. As mentioned above for the first case, more than 200 points have to be proposed in each zone to achieve an effective size of 200 points. These elements should be presented in both of the products to be controlled: The MTA10 and the MTN25. The sample design is performed interactively on the computer using both data bases. We use a combination of two spatial distribution criteria. Each criteria affecting one of the half-populations of control points, the fist seeking an homogeneous distribution and the other a distribution closer to that of roads. For each control point its type is recorded because we have also tested that not all control points have the same positional behavior. The sample of reference roads is directly derived from the cartography and coincides with all elements present in it. Because of traffic, roads will be surveyed by their right side or margin, and in both directions (two times). Both sampling designs are optimized in order to minimize displacements.
Stage 3: GPS Survey. Data capture is performed using differential GPS techniques. Control points are GPS-collected and processed by methodologies that will ensure a final precision ( 1 m) three times greater than that of products being controlled ( 3 m). The survey of the reference roads follows similar criteria, using a kinematics collection with one point per 1 second epoch, at a speed of 40 km/h. The GPS post-processing of points is direct and possible problems are very limited. The post processing of GPS data corresponding to reference roads is more difficult because of greater signal loss and multipath problems. Stage 4: Processing. Here we include two main types of processing applied to points (4.a) and lines (4.b). 4.a. Points are used to apply a classical positional accuracy control; here two control methodologies will be applied: the EMAS (ASCI, 1983) and the NSSDA (FGDC, 1998; MPLMIC, 1999). Our idea is also to use simulation techniques to study differential behaviors and also to apply the results of some of our own studies presented to this International Cartographic Conference (see Ariza and Atkinson, 2005a, 2005b). 4.b.1. For lineal data each survey has to be carefully revised in order to decide whether or not to accept the data. After that, an axis adjustment will be applied in order to derive elements homologous element to those which are present in both topographic products. When a road has been surveyed from both its margins, an interpolation will be applied to determine the medial axis of the road. It will also be studied to derive this axis from the data of only one side, by means of adding an offset. Once the axes are derived they must be joined to the homologous elements of the MTA10k and MTN25k databases. Next, this data set of segments will be enriched by automatically deriving attributes of interest such as: category, length, orientation, homogeneity, sinuosity and so on. In this way we will create a bank, or database, of segments which can be used to extract elements depending on the diverse combinations of the attributes previously used for the enrichment. This allows us to extract simulated samples of lineal elements by random and/or guided sampling in conjunction with filters on parameters (length, space position, orientation, sinuosity, and so on). We will then have what we call a simulation engine which will allow us to analyze the effect of important sampling aspects on the soundness of the result. 4.b.2. Simulation of controls by means of lines. In order to obtain the results we seek, we need to analyze the relevance of many influencing aspects. For that study we need a software tool capable of automatzing the process of selecting a sample and generating a control result. This tool we call a simulation engine. CPLin is our simulation engine and it is capable of:
¬ Extracting multiple samples of control segments by applying random and guided criteria. ¬ Seeking for the homologous segments in the MTA and MTN segment sets. ¬ Applying a control method (e.g. the BOM, the BOS...) using as reference data the simulated sample and a set of homologous segments from each one of the products being controlled. In Figure 4 we can observe a detail of a buffer operation on a reference segment and the behavior of the controlled segment.
Figure 4.- Example of a buffer operation on a reference segment and the behavior of the controlled segment
Stage 5: Results analysis. This stage follows a similar method to the previous one, so two main analysis steps, applied to points and lines can be considered. 5.1. Analysis of the accuracy estimations derived by the use of points will be carried out. The positional control developed by means of points will be used as the reference control and also to propose a relation between measures derived from points and measures derived from lines. 5.2. The application of the simulation tool requires a previous design of the experiments in order to consider all aspects of interest appropriately and derive sound statistical results analyzing the influence of each one of them, for example: size and spatial distribution of sample, length, orientation, sinuosity of segments and so on. Because we will have quantitative results it will be possible to compare these using statistical methods like the Analysis of Variance, distribution adherence tests and so on. Figure 5 presents some raw results obtained from the pilot zone, which shows behavioral differences when considering typology, orientation and classification of the road segments.
a) Global behavior
b) Kind of road
d) Sinuosity (level) c) Orientation of segments (degrees)
Figure 6.- Evolution of the percentage of length of MTN25 data set included within a variable buffer distance of reference data
Stage 6: Proposal. As the final result of the project, the knowledge acquired will allow us to develop an operational and competitive methodological proposal as opposed to the point-based methodologies in use nowadays. The proposal will be centered on:
¬ Main aspects of the sample: characteristics, size and spatial distribution. ¬ Surveying and processing methods: GPS capture recommendations, axis calculations. ¬ Method and measurements for assessing positional accuracy. Jointly with the proposal a software tool will be delivered in order to facilitate the practical application of the selected method for assessing the positional accuracy of a product.
3. CONCLUSIONS We have presented the general methodology of an ongoing research project whose main objective is to propose a positional accuracy control method for spatial data by using roads as control elements. We consider that this proposal is relevant because nowadays the positional control by means of lines is a competitive option due to GPS improvements (technology, GPS permanent stations networks, and so on). Also new uses of cartography, like navigation, require us to pay attention to positional quality from a perspective closer to its final use. There are some previous studies which indicate the actual possibility of using linear elements, but none have analyzed the problem considering its application within a production environment and none give advice on such important aspects as sample size and distribution, and other sample properties. Determining these important aspects is the main objective of our project and proposal. The methodology presented has been adjusted by means of a pilot study and is now being applied to the global project for which approximately 2600 km of roads will be processed and more than 1200 control points used for developing classical controls by points. Simulation and statistics are the bases of the project in order to analyze and study the relevance of what are considered key aspects. The first raw results indicate that some of the proposed aspects seem to have a true relevance to positional control.
ACKNOWLEDGEMENTS This work is funded by the National Ministry of Sciences and Technology under grant nº BIA2003-02234.
4. REFERENCES ABBAS, I.; GRUSSENMEYER, P.; HOTTIER, P. (1995). Contrôlede la planimétrie d’une base de données vectorielles : une nouvelle méthode basée sur la distance de Hausdorff: la méthode du contrôle linéare, Bul. S.F.P.T., Nº 137. ARIZA, F.J. (2002). Control de Calidad en la Producción Cartográfica. Ra-Ma. Madrid. ARIZA, F.J., ATKINSON, D. (2005A). Positional quality control by means of the emas test and acceptance curves. In proceedings of the XXII International Cartographic Conference, La Coruña, España. ARIZA, F.J., ATKINSON, D. (2005B). Sample size and confidence when applying the NSSDA. In proceedings of the XXII International Cartographic Conference, La Coruña, España. ASCI, (1983). Map Uses, scales and accuracies for engineering and associated proposes. American Society of Civil Engineers, Committee on Cartographic Surveying, Surveying and Mapping Division, Nueva York. ATKINSON, A. D. J.; ARIZA, F. J. (2002). Nuevo enfoque para el análisis de la calidad posicional en cartografía mediante estudios basados en la geometría lineal. In proceedings of the XIV Congreso Internacional de Ingeniería Gráfica. Santander, España. ATKINSON, A.D.J. (2005). Control de calidad posicional en cartografía: análisis de los principales estándares y propuesta de mejora. Tesis doctoral, Universidad de Jaén, Jaén. BLAKEMORE M., (1983). Generalisation and Error in Spatial Data Bases. In Cartographica, vol. 21 nº 2/3. CARUSO, V.M. (1987). Standards for Digital Elevation Model. Technical Paper, ASPRS/ACSM, 4th Annual Convention. CARTER, J.R. (1989). Relative errors Identified in USGS Gridded DEMs. Proceedings AutoCarto 9, Baltimore
FGDC (1998). Geospatial Positioning Accuracy Standards, National Standard for Spatial Data Accuracy (FGDC-STD-007.3). GOODCHILD, M. F.; HUNTER, G. J. (1997). A simple positional accuracy for linear features, Int. Journal Geographical Information Science, Vol. 11, nº 3. MPLMIC (1999). Positional Accuracy Handbook. Minnesota Planning Land Management Information Center. Minnesota. PERKAL, J. (1966). On the length of empirical curves. Discussion paper nº 10. Michigan interuniversity Community of Mathematical Geographers. RAMIREZ, R.; ALI, T. (2003). Progress in metrics development to measure positional accuracy of spatial data. In Proceedings of the 21st International Cartographic Conference (ICC). Durban. SKIDMORE, A.; TURNER, B. (1992). Map accuracy Assessment Using Line Intersect Sampling. In PE&RS, vol. 58, nº 10. TVEITE, H.; LANGAAS, S. (1999) An accuracy assessment meted for geographical line data sets based on buffering, Int. Journal Geographical Information Science, Vol. 13, Nº 1. USBB (1947). United States National Map Accuracy Standards. U.S. Bureau of the Budget.
USGS (1999). Part 2: Specifications, Standards for Digital Line Graphs (DLG-3). Department of the interior. National Mapping Division. VAN NIEL, T. G.; MCVICAR, T. (2002). Experimental evaluation of positional accuracy estimates from a linear network using point and line-based testing methods, Int. Journal Geographical Information Science, Vol. 16, Nº 5, pp. 455–473. VEREGIN H. (2000). Quantifying positional error induced by line simplification, Int. Journal Geographical Information Science, Vol. 4, Nº 2.. VEREGIN, H.; GIORDANO, A.(1994). Il contollo di qualitá nei sistema informativi territoriali. Il Cardo Editore, Venetia.