Chapter 4
Semantics of Point Spaces Through the Topological Weighted Centroid and Other Mathematical Quantities: Theory and Applications Massimo Buscema, Marco Breda, Enzo Grossi, Luigi Catzola, and Pier Luigi Sacco
Part A: Syntax of Physical Space: Theory 4.1 4.1.1
The Conceptual Context Introduction
The spatial dimension is often a key feature to understand the structure of a phenomenon. In some cases, this is relatively obvious, as phenomena themselves are basically defined in spatial terms as in the case of, for example, diffusion processes. In some other cases, however, such dimension is not obviously relevant. For example, think of a comparative analysis of different socio-economic systems in which geographical coordinates are not necessarily part of the data base. There is a vast array of alternative, statistically based approaches that have been developed to deal with the spatial character of phenomena, each approach based on different fields such as physics, biology, economics and geography, and many more. But an aspect that has been somewhat overlooked so far is that of the semantic dimension of space, that is to say the interpreting of the topological or metric dimension of space as conveying an intrinsic meaning that may have a substantial bearing on the M. Buscema (*) Semeion Research Center of Sciences of Communication, Via Sersale 117, Rome, Italy Department of Mathematical and Statistical Sciences, CCMB, University of Colorado, Denver, Colorado, USA e-mail:
[email protected] M. Breda • L. Catzola • P.L. Sacco Semeion Research Center of Sciences of Communication, Via Sersale 117, Rome, Italy E. Grossi Bracco Research Division, Via Folli, 50, 20134 Milan, Italy W.J. Tastle (ed.), Data Mining Applications Using Artificial Adaptive Systems, DOI 10.1007/978-1-4614-4223-3_4, # Springer Science+Business Media New York 2013
75
76
M. Buscema et al.
interpretation of the underlying phenomena. Thus, whether or not we are considering a physical space or some sort of abstract, representational space, we must consider the possibility that the spatial dimension may carry relational information as to why certain entities ‘stay together’ in a given environment; this may add substantially to our understanding of the respective phenomena. It is important to stress the somewhat original meaning that we give to the term ‘relational’ in this context: We are claiming that, to make any kind of significant sense of a certain phenomenon, we must consider its variables as negotiating their position within the state space according to a semantic defined in terms of relative proximity. In other words, the physical entities produced by a given phenomenon situate themselves in the state space as if they were aware, to some degree, of their relative position and would adjust to each other in appropriate ways, as if they were abiding by a sort of implicit grammar of the phenomenon. We can unscramble and interpret such negotiation by suitably re-mapping the space in such a way as to give proximity to its most expressive meaning. Insofar as the semantic aspects of the phenomenon relate to the spatial characteristics only, we are basically reasoning in terms of a syntax of space, i.e., the internal rules by which spatial features combine appropriately to generate proper structures of meaning. This is the most basic level of a relational spatial analysis. If, in addition, the phenomenon presents some characteristic dimensions of a non-spatial nature, we can speak of a full semantic. In this case, we will have to develop a more articulated approach that will be presented in the second part of the chapter. In this chapter, we present a number of new mathematical quantities that are particularly useful at capturing such relational dimensions and thus to allow for a rigorous analysis of the semantics of space. Specifically, such quantities are practical and relatively easy to use in cognitively accessible spaces – namely, two- or three-dimensional ones. More indirectly, they can also be used to analyze the semantics of points defined in higher dimensional space. In this case, a multidimensional scaling algorithm should be previously applied in order to obtain a projection of the high-dimensional source space onto a two- or three-dimensional target space. For the target space semantics to be representative of the original source space, it is necessary that the scaling algorithm be capable of minimizing the distortion of the hyperdistances once they are projected from the source to the target space. The smaller the distortion error, the more the target conveys a semantic that is congruent to the original one. Since R-record of a V-variables dataset can always be seen as a set of R points in a V-dimensional space, the introduced quantities can be used to describe some semantic aspects of the dataset that are related to the relative position of the R records in their V-dimensional space. By the same token, one may transpose the reasoning and regard the same quantities as illustrative of some semantic aspect of the relative positions of the V variables in their R-dimensional space, provided that such transposition is computationally feasible. All the proposed quantities will be defined considering a set of K points, called entities, in a two-dimensional space – the extension to the three dimensional case is straightforward. As we will see, the proposed mathematical quantities are points,
4
Semantics of Point Spaces Through the Topological Weighted Centroid. . .
77
curves or scalar fields. They are listed below and then defined in the specific sections: • • • • • • • •
Topological Weighted Centroid (TWC)1; Self Topological Weighted Centroid (STWC); Proximity Scalar Field; Gradient of the Scalar Field; Relative Topological Weighted Centroid (TWCi); Paths from the Arithmetic Centroid to entities; Paths between entities; Scalar Field of the trajectories.
These quantities imply that each entity in the set of K points has the same features apart from its position within the N-dimensional space. But we can, additionally, introduce the possibility that each entity has different features other than its position, thereby causing more complex interactions with the others within the N-dimensional ambient space. To deal with this further complication, we will apply a new algorithm, named the Auto Contractive Map, to evaluate the relationships among the K non-homogeneous entities while taking their specific qualitative features into account. Furthermore, we will propose a new method for combining our derived relationships among non-homogeneous entities with the preexisting geographical information. Overall, we have thus developed a conceptually innovative methodology to deal with space semantics that may prove particularly interesting and effective in tackling particularly complex problems and even some kinds of problems that are commonly believed to be intractable according to the currently available toolbox of methods and methodologies. To fully illustrate the scope and power of such techniques, we apply them, in the second part of the paper, to a variety of different problems taken from various disciplines whose heterogeneity makes a clear case for the ‘universality’ of our approach.
4.1.2
Location Theory
In the scientific literature the problem of the semantics of the geographic space is typically analyzed within the framework of Location Theory (Buscema et al. 2009a; Brantingham and Brantingham 1981, 1984; Levine 2004; O’Leary 2006, Buscema & Terzi, 2006, 2006a). Location theory is concerned with one of the central issues in geography. This theory attempts to find an optimal location for any particular distribution of activities, population, or events over a region according to a specific criterion.
1 The Topological Weighted Centroid and its equations were designed by M. Buscema in 2008 at Semeion.
78
M. Buscema et al.
The specific location problem we want to deal with can be simply defined in the following terms. Let’s consider a distribution of K points in a N-dimensional space, Xi ¼ fxi1 ; . . . ; xiN g, with i ¼ 1; :::; K, and typically N ¼ 2 or N ¼ 3, depending on whether we want to deal with a two or three dimensional space. Let these points be spatial positions representing locations where something meaningful happened related to an existing phenomenon under study. They could be, for instance, locations where deaths occurred in case of a disease outbreak, or crime scenes where serial offences took place. We want to define and calculate a point function H :