Journal of Archaeological Science (2001) 28, 661–669 doi:10.1006/jasc.2001.0654, available online at http://www.idealibrary.com on
Using Monte Carlo Simulation for the Environmental Analysis of Small Archaeologic Datasets, with the Mesolithic in Northeast Belgium as a Case Study Veerle Vanacker† Laboratory for Experimental Geomorphology, Katholieke Universiteit Leuven, Redingenstraat 16, B-3000 Leuven, Belgium
Gerard Govers* Laboratory for Experimental Geomorphology, Katholieke Universiteit Leuven, Redingenstraat 16, B-3000 Leuven, Belgium
Philip Van Peer Laboratory for Prehistory, Katholieke Universiteit Leuven, Redingenstraat 16, B-3000 Leuven, Belgium
Cyriel Verbeek Laboratory for Prehistory, Katholieke Universiteit Leuven, Redingenstraat 16, B-3000 Leuven, Belgium
Johan Desmet, Jr Department of Land Management, National Institute for Agricultural Techniques, Burg. Ven Gansbekelaan 115, B-9820 Merelbeke, Belgium
Jeroen Reyniers Laboratory for Regional Geomorphology, Katholieke Universiteit Leuven, Redingenstraat 16, B-3000 Leuven, Belgium (Received 19 November 1999; revised manuscript accepted 12 January 2001) In Europe often only small archaeological databases are available due to a lack of extensively prospected areas and the disturbance of the soils. Traditional statistical techniques do not allow location analyses on small archaeological databases, composed of dependent site data. Several authors have therefore developed alternative techniques, in which observed weight factors for the sample of the sites were compared with a distribution of weight factors obtained by simulating a randomly distributed site population of the same size. However, the Monte Carlo simulation does not require a prior defined weight factors. With this simplified technique, it becomes possible to use small archaeological datasets for demonstrating significant relations between environmental data and location behaviour in the past. An application of the technique to the Mesolithic in the northeast of Belgium has demonstrated that the proximity to water played a major role in the location behaviour. Small evolutions in the location choice could be linked with climate fluctuations in the Early Holocene. 2001 Academic Press Keywords: LOCATION ANALYSIS, MONTE CARLO SIMULATION, MESOLITHIC, NOORDERKEMPEN, BELGIUM. *Author for correspondence. Fax: +32 16 326400; e-mail:
[email protected] †Fund for Scientific Research, Flanders, Belgium.
661 0305–4403/01/060661+09 $35.00/0
2001 Academic Press
662
V. Vanacker et al.
Height (m) 40
Metres 5840.00
30 Grid
North
20 Archaeological site
Figure 1. Situation of the research area.
Introduction
I
t is commonly accepted in prehistory that settlement patterns of prehistoric sites are not random. Several studies have shown that the location of archaeological sites is related to a wide range of environmental factors, such as elevation, aspect, soil classes and the proximity to water. Before the use of Geographical Information Systems (GIS) as an analytical tool in archaeology, the evidence of environmental constraints on Mesolithic site locations was mostly anecdotal (e.g. Kozlowski, 1980; Vermeersch, 1989). The development of advanced GIS software made it possible to perform statistical tests to the significance of the observed relations between environmental data and the location of archaeological sites (Baena, Blaseo & Recuero, 1995; Brandt, Groenewoudt & Kvamme, 1992; Dalla Bona, 1996; Hasentab & Resnick, 1990; Kvamme & Jochim, 1989; Maschner & Stein, 1995; Wansleeben & Verhart, 1990; Warren, 1990). In the European context, location analyses are often hampered due to the lack of accurate archaeological information. The long historic occupation history of the European plains and midlands and the intensive soil use led to the destruction of many archaeological traces. Furthermore, intensive surveys of large areas are rare: only a few small-scale projects provide homogeneous and good information. Even if detailed information is available, archaeological datasets often do not contain enough detailed and accurate information to use traditional statistical methods (like the chisquare analysis) for environmental analysis. The major
problem is that, in most cases, the number of sites is quite limited, so that statistical tests do not yield reliable or meaningful results (Siegel, 1956; Wansleeben & Verhart, 1995). Also, statistical tests consider each observation to be independent. This is not the case when spatial data are analysed, as these often show a high degree of spatial autocorrelation, i.e. values from neighbouring landscape positions are expected to have similar values (e.g. Van Leusen, 1996). This paper has both a methodological and an archaeological objective. The first objective is to present a method, based on the Monte Carlo simulation, which can be used for the environmental analysis of small archaeological datasets. This technique is then applied to a study of the Mesolithic in the north of Belgium. The second objective is to analyse the environmental implications on site location for the Mesolithic in northern Belgium.
Materials and Methods The research area is situated in the community of Weelde in the northeastern part of the province Antwerp, Belgium (Figure 1). It belongs to the physiographic region ‘‘Noorderkempen’’. This low-lying flat area gently dips to the Northwest and varies in height from 35 m in the south to 25 m in the north. It forms the watershed between the drainage systems of the Maas and the Schelde (Wouters & Vandenberghe, 1994).
Monte Carlo Simulation 663
The geomorphology of the region is largely the result of Holocene reworking of the Pleistocene coversand formations. The natural landscape consists of active and fossil dune formations and local depressions, the ‘‘fens’’. The dunes are well drained, but within the depressions water conditions prevail due to the presence of a clayey substrate near the surface (De Ploey, 1961). Recent changes in the landscape are associated with human activity. Extensive deforestation led to an evolution of the mature iron podsol into a humic-iron podsol. Since the Middle Ages, farmers used sod manuring to improve the poor soils in the areas used for cultivation. This practice led to thick humic soils around the villages and soil truncation outside the cultivated areas. During the last decades, land redistribution projects—meant to increase the agricultural productivity—changed the geomorphology and drainage system irreversibly. A levelling of the small ridges and a filling up of the local depressions smoothed the topography. Wet areas, like fens and swamps were improved for agricultural use by the construction of drainage channels. The consequence of these human activities was the removal or destruction of the original topography, hypsography, soils and most probably a significant number of archaeological sites.
Available datasets Three basic types of environmental information were collected: topography, hydrography and soil information. The archaeological dataset contains 64 sites of the Mesolithic. The accuracy of each dataset was controlled beforehand because it influences the quality of the location analysis. Digital Elevation Model (DEM) A Digital Elevation Model (DEM) for the whole area was available from the Belgian National Geographic Institute (NGI). The model consists of a regular grid of data, sampled each 1 in latitude and 2 in longitude. Elevation was derived from scanned topographic maps at a scale 1:50,000. Grid cells that were not crossed by a contour line were given the height value of the nearest lower contour line. This resulted in a ‘‘stair stepped’’ DEM. In areas with steep slopes is this ‘‘stair-step’’ effect often not important, but in flat areas is it a big obstacle for the interpretation of the landscape. In order to convert this DEM into a continuously varying surface the following steps were carried out. A raster-DEM with a resolution of 30 m was constructed in IDRISI using linear interpolation and mean filtering. Next, the hypsometry of the area was reconstructed by vectorizing the raster-DEM. The resulting contourlines were smoothed using a low-pass filter after which the data reduction algorithm of Douglas and Peucker was applied to reduce the size of the
vector dataset. The resulting vector lines were again rasterized and a DEM with a 30-m resolution was constructed using kriging and mean filtering. The accuracy of this DEM was evaluated by comparing it with an accurate reference DEM for a test area of 53 km2. The reference DEM was constructed by digitizing the contour lines from topographic maps at scale 1:10,000 and interpolating with kriging in SURFER with a resolution of 30 m. The Root Mean Square Error (RMSE) of the 1:50,000 DEM is 0·83 m and is mainly caused by the lower relief-intensity of the 1:50,000 DEM. The spatial distribution of the altimetric difference shows a systematic pattern: local depression and small ridges are not well represented by the 1:50,000 DEM. Drainage system Stream courses were digitized by the Administration for Environmental, Nature, Land and Water Management of Flanders (AMINAL) from topographic maps at 1/25,000 with a positional accuracy of c. 13 m. This database also contains information about the nature of rivers and streams, the periodicity of the water content, the quality of the water, etc. These data are important because they allow to exclude artificial waterways (channels, drainage ditches) from the database. However, the current drainage system is no longer representative for the past due to antropogenic changes. First, only the fens that are still present in the landscape are included in the digital database. Due to the intensification of the land use at the end of the 19th century, most of the low-productivity heath areas were converted to a conifer forest or to farmland. Fens were filled in and the drainage of these wet areas was improved by digging canals. Only in exceptional cases, e.g. nature reserves, the fens are still conserved in the landscape. Information on the location of disappeared fens was obtained by digitizing the fens from the first edition of the topographical map of Belgium (1866– 1881) at a scale of 1:20,000. The second problem consists of natural and recent antropogenic changes of river courses by channelization: this means that the present-day location of the river does not necessarily correspond with its location in prehistoric times. For this reason, the analysis was conducted using different distance classes rather than the exact distance to the nearest fen or river. Soil map The digital soil map, made by the National Institute for Agricultural Techniques (Rijksstation voor landbouwtechnieken), is a vector layer consisting of polygons. These vector files contain for 81% of the polygons information about soil texture, drainage and profile. The map was compiled on the basis of soil augering with a density of c. 1–2 augerings per ha complemented by a geomorphological analysis of the
664
V. Vanacker et al.
landscape (Ameryckx, Verheye & Vermeire, 1985). The map purity can be used to estimate the quality of the maps (Van Meirvenne, 1998). It measures the conformity between two independent observations of the same area. Van Meirvenne evaluated the map purity with respect to textural classes for two test areas (the Polders of East-Flanders and the Sand-Loam area in the West of Flanders) and obtained map purities of 50% and 60% respectively. c. 35% of the observations were classified in a textural class just above or below the correct textural class.
Archaeological dataset The quality of location analyses is highly dependent on the accuracy of the archaeological information. Only intensively surveyed areas can be taken into account. If not, the archaeological dataset will be too biased by an inadequate sampling strategy. In Belgium, the area of ‘‘Weelde’’ is a unique example, as it has been thoroughly prospected under the guidance of an experienced archaeologist (Verbeek & Vermeersch, 1993, 1994, 1995). Secondly, the accuracy of the reported site positions should be verified. For the study area, information on some sites is also available from publications. A comparison of published site co-ordinates with the survey co-ordinates revealed a mean difference of c. 200 m. Consequently, the literature information could not be used in the analysis as landscape characteristics often vary significantly over distances much smaller than 200 m. This demonstrates a fundamental problem in modern archaeological research. A lot of effort is spent on detailed topographical measurements of the sites, but very often its precise situation in a broader context is neglected. This will often make a later spatial analysis of site locations impossible. For the research area a digital archaeological dataset was created, using only the survey data of Verbeek. In the area of 130 km2, 64 Mesolithic sites were documented: 8 Early Mesolithic, 12 Middle Mesolithic, 24 Late Mesolithic sites and 20 sites, of which only the major period is known.
Methodology Selecting Ground Disturbance Areas Ground Disturbance Areas (GDAs) are regions, affected by geomorphological processes (such as water erosion, wind deflation or sedimentation) and specific modern or historical landuse (sod-manuring, urban and residential areas, woods and pasture) (Hasentab & Resnick, 1990). Due to disturbance or removal of the topsoil, to dense vegetational or residential landcover is the chance of finding archaeological remains in situ quasi non-existing. Therefore, the GDAs were excluded of the datasets.
Creation of secondary datalayers Using the three primary environmental datasources (topography, hydrography and soils), nine secondary datalayers were created. These secondary layers form the basic for further analyses. Each calculated variable was categorized in two or three classes. Table 1 shows all primary and secondary datalayers.
Univariate statistical analysis The most commonly used statistical method for univariate analyses is the chi-square analysis (Brandt, Groenewoudt & Kvamme, 1992; Gillings, 1995; Wansleeben, 1995). This method has important limitations concerning the amount of archaeological information in each category of the variables. When the number of sites is lower than 40 and the lowest expected frequency of a category is below five the chi-square analysis is no longer usable (Siegel, 1956). Another popular univariate method is the t-test (Kvamme, 1990). This test deals with continuous data and compares the mean and standard deviation between the sample of the sites and the population (=entire dataset). The biggest constraint of this method is that the sample must contain independent site data, randomly selected from a normally distributed larger dataset (Hayes, 1988). In archaeological applications, sites are often clustered, so that the sample of the sites is not independent. Furthermore, environmental datasets rarely have a normal distribution (Parker, 1985). Archaeologists have therefore tried to develop alternatives, which do not require making assumptions about the distribution of the data and/or the number of sites. The method of Atwell & Fletcher (1985) is based on Monte Carlo simulation. The method is based on the calculation of weight factors, which are estimates of the relative importance of different categories. In the case of three categories , and , the following formula is used to calculate the weight factors A, B and C (Atwell & Fletcher, 1987). A=a bc/(a bc+ab c+abc ) B=ab c/(a bc+ab c+abc ) C=abc /(a bc+ab c+abc ) where: a,b,c: proportion of pixels within the categories , and a ,b ,c : proportion of sites in the categories , and Weight factors vary between 0 and 1. When the sites are randomly located, each weight factor equals 1 divided by the number of categories. Observed weight factors for the site population are compared with a distribution of weight factors obtained by simulating a randomly distributed site population of the same size a hundred times. Wansleeben & Verhart (1995) criticize this method and propose an alternative,
Monte Carlo Simulation 665 Table 1. Primary and secondary datalayers, used in the location analysis Primary datasource Environmental datalayers Digital Elevation Model
Secondary datalayer
Categories
Aspect
270–90 90–135; 225–270 135–225 0·45% Ridge Depression Sloping surface 200 m 400 m 200 m 400 m Sand Good Middle Poor
Slope Geomorphological unit Drainage System
Distance 1 to nearest river Distance 2 to nearest river Distance 1 to nearest fen Distance 2 to nearest fen
Soil Series
Archaeological datalayers Database of archaeological sites
Texture Drainage
Early Mesolithic Middle Mesolithic Late Mesolithic
based on the calculation of a Kj-factor, which is defined as:
Table 2. Analysis of the relationship between the Early Mesolithic site distribution and the soil drainage
Kj=(ps*(ps pa)/pw)0·5 where: the average site density is less then 1 per unit cell ps =the proportion of the sites incorporated in the model pa =the proportion of the area incorporated in the model pw =the proportion of the area without archaeological sites The observed Kj-value is compared with a distribution of values obtained by 1000 simulations of a randomly distributed site population. As pointed out by the authors, this method has also limitations. The most important drawback of a Kj-analysis is that only preferred environments can be identified: it does not allow identifying environments that were avoided. Also, the distribution of Kj-values is very strongly dependent of the number of sites. However, a Monte Carlo analysis does not require the a priori definition of weight factors or an index, but can be directly applied to the proportion of sites located inside a certain map category. In such an analysis, the population of sites is considered as a sample of N elements from the population. The entire population is approached by k random samples of N
Category 1: dry soils
Category 2: moderate wet soils
Category 3: wet soils
1 2 3 4 5 6 7 8 9 10 — 100
2 1 1 0 1 0 1 0 1 1 — 0
4 6 1 5 2 5 2 5 7 5 — 6
2 1 6 3 5 3 5 3 0 2 — 2
Sites
2 (0·25%)
3 (0·38%)
3 (0·38%)
Samples
Population of Early Mesolithic sites: N=8. Soil drainage: m=3. 1=dry soils; 2=moderate drained soils; 3=wet soils. Number of random samples=k=100.
elements. For each of the k+1 samples (k random samples and the sample of the sites) the distribution of the N pixels over m categories is calculated. This is illustrated in Table 2 with an analysis of the relationship between the Early Mesolithic site distribution and the soil drainage.
666
V. Vanacker et al.
Table 3. Exceedance probabilities of the 100 randomly sampled site distributions. Distribution of the 8 pixels for the 101 samples Category 1 dry soils 0 0 0 0 0 0 0 0 0 0 4
Category 2 moderate wet soils
Category 3 wet soils
Exceedance probability
0 0 1 1 1 1 1 2 2 2 8
0 0 0 0 0 0 1 1 1 1 7
(100–1)/101 (100–2)/101 (100–3)/101 (100–4)/101 (100–5)/101 (100–6)/101 (100–7)/101 (100–8)/101 (100–9)/101 (100–10)/101 (100–100)/101
The probability of having a sample with a given distribution of observations over the m categories can then be estimated by ranking the k samples in ascending order and by calculating for each of the k values an
exceedance probability with the following formula (Kvamme, 1997): P(xdxi)=(ki)/(k+1) where: k=number of samples xi =sample of the population i=rank number after ordering the samples The exceedance probabilities can then be plotted against the ranked sampled values of each category. Finally, for each category the exceedance probability of the sample of the archaeological sites can be derived from the plot and can be compared with a given significance level. When the derived exceedance probability lies outside the probabilities for the margins of the confidence interval it may be concluded that the distribution of sites over the map categories is not random. Correct use of the Monte Carlo technique requires that a sufficient number of simulations is carried out so that the empirical probability distribution is stable. This can be investigated by looking at the evolution of
Exceedance probability (%)
1.00 dry soils
0.90
moderately drained soils
0.80
wet soils
0.70
sample of archaeological sites
0.60 0.50 0.40 0.30 0.20 0.10 0.00
0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Ranked randomly sampled values (%)
Figure 2. Plot of the exceedance probabilities against the 100 ranked randomly sampled values.
Exceedance probabilities
1
0.8
0.6
0.4
0.2
0
d95
d90
d10
d5
d50
50
100
150
Number of simulations Figure 3. Evolution of the statistical characteristics of the sample population in function of the number of simulations.
Monte Carlo Simulation 667 Table 4. Results of the location analysis Secondary datalayer Soil texture Soil drainage Distance to the nearest river Distance to the nearest fen Geomorphology
Category Sand Loamy sand Good Poor