arXiv:1603.01446v1 [math.AT] 4 Mar 2016
Sheaves are the canonical datastructure for sensor integration Michael Robinson Mathematics and Statistics American University Washington, DC, USA
[email protected] March 7, 2016 Abstract A sensor integration framework should be sufficiently general to accurately represent all information sources, and also be able to summarize information in a faithful way that emphasizes important, actionable features. Few approaches adequately address these two discordant requirements. The purpose of this expository paper is to explain why sheaves are the canonical data structure for sensor integration and how the mathematics of sheaves satisfies our two requirements. We outline some of the powerful inferential tools that are not available to other representational frameworks.
1
Introduction
There is increasing concern within the data processing community about “swimming in sensors and drowning in data,” because current data fusion only addresses localized objects. This refrain is repeated throughout many scientific disciplines, because there are few treatments of unified models for complex phenomena and it is difficult to infer these models from heterogeneous data. A sensor integration framework should be (1) sufficiently general to accurately represent all information sources1 , and (2) also be able to summarize information in a faithful way that emphasizes important, actionable features. Few approaches adequately address these two discordant requirements. Models of specific phenomena fail when multiple or unexpected sensor types are combined, and cannot assemble local fusions into a global picture consistently. Bayesian or network theory tolerate more sensor types, but suffer a severe lack of sophisticated analytic tools. 1 In
this article, the terms “sensor” and “information source” are used interchangeably.
1
The mathematics of sheaves addresses our two requirements and provides powerful inferential tools that are not available to other representational frameworks. Sheaf theory represents locally valid datasets in which the datatypes and contexts vary, and therefore provides a common, canonical language for heterogenous datasets. Other methods typically aggregate information either exclusively globally (on the level of whole semantic ontologies) or exclusively locally (through various maximum likelihood stages). This limits the kind of inferences that can be made by these approaches. For instance, the data association problem in tracking frustrates local approaches (such as those based on optimal state estimation) and remains essentially unsolved. The analysis of sheaves avoids both of these extremes by specifying information where it occurs – locally at each data source – and then uses global relationships to constrain how this information is interpreted. The foundational and canonical nature of sheaves means that existing approaches that address aspects of the knowledge integration problem space already illuminate some portion of sheaf theory without exploiting its full potential. In contrast to the generality that is naturally present in sheaves, existing approaches to combining heterogeneous quantitative and qualitative data tend to reflect specific domain knowledge on small collections of data sources. Even in the most limited setting of pairs quantitative data sources, considerable effort has been expended in developing models of their joint behavior. Our approach leverages these pairwise models into models of multiple-source interaction. Additionally, where joint models are not yet available, sheaf theory provides a context for understanding the properties that such a model must have.
1.1
Contribution
Sometimes data fusion is described methodologically, as it is in the Joint Directors of Laboratories (JDL) model [26], which defines operationally-relevant “levels” of data fusion. A more permissive – and more theoretically useful – definition was given by Wald [61], “data fusion is a formal framework in which are expressed the means and tools for the alliance of data originating from different sources.” This article shows that this is precisely what is provided by sheaf theory. In Chapter 3 of Wald’s book [62], the description of the implications of this definition is strikingly similar to what is presented in this article. But we go further, showing that the power of sheaf theory is not just a full-fidelity representation of the data and its internal relationships, but it is also a jumping-off point for analysis. According to Hubbard [29, Appendix A7]: “a sheaf is nothing more and nothing less than local data.” Because of the generality of their definition, any other theory of data fusion will recapitulate sheaf theory, but as Hubbard goes on to say “It is fairly easy to understand what a sheaf is, especially after looking at a few examples. Understanding what they are good for is rather harder; indeed, without cohomology theory, they aren’t good for much.” This is precisely what the problem of information integration has been: there are many universal languages for information representation but few families of strong 2
analytic frameworks. This article shows how sheaf cohomology is a powerful and practical inferential tool, and captures certain classes of inferential hazards.
1.2
Historical context
There are essentially two threads of research in the literature on data fusion: (1) physical sensors that return structured, numerical data (“hard” fusion), and (2) unstructured, semantic data (“soft” fusion) [35]. Hard fusion is generally performed on spatially-referenced data (images), while soft fusion is generally referenced to a common ontology. Especially in hard fusion, the spatial references are taken to be absolute. Most sensor fusion is performed on a pixel-by-pixel basis amongst sensors of the same physical modality (for instance [60, 1, 64]). It generally requires image registration [14] as a precondition, especially if the sensors have different modalities (for instance [38, 25]). When image extents overlap but are not coincident, mosaics are the resulting fused product. These are typically based on pixel- or patch-matching methods (for instance [10]). Because these methods look for areas of close agreement, they are inherently sensitive to differences in physical modality. It can be difficult to extend these ideas to heterogeneous collections of sensors. Like hard fusion, soft fusion requires registration amongst different data sources. However, since there is no physically-apparent “coordinate system,” soft fusion must proceed without one. There are a number of approaches to align disparate ontologies into a common ontology [20, 19, 33, 44], against which analysis can proceed. There, most of the analysis derives from a combination of tools from natural language processing (for instance [28, 49, 24]) and artificial intelligence (like the methods discussed in [41, 31, 55]). That these approaches derive from theoretical logic, type theory, and relational algebras is indicative of deeper mathematical foundations. These three topics have roots in the same category theoretic machinery that drives the sheaf theory discussed in this article. Weaving the two threads of hard and soft fusion is difficult at best, and is usually approached statistically, as discussed in [63, 16, 27]. Unfortunately, this clouds the issue. If a stochastic data source is fused with another data source (deterministic or not), it stands to reason that the fused result will be stochastic. This viewpoint is quite effective in multi-target tracking [57, 47] or in event detection from social media feeds [2], where there are sufficient data to estimate probability distributions. But if two deterministic data sources are fused, one numeric and one textual, why should the result be stochastic? Regardless of this conundrum, information theoretic or possibilistic approaches seem to be natural and popular candidates for performing knowledge integration, for instance [11, 7, 8]. They are actually subsumed by category theory [5] and arise naturally when needed (via Lemma 9). These models tend to rely on the homogeneity of information sources in order to obtain strong theoretical results. Sheaf theory extends the reach of these methods by explaining that the most robust aspects of networks tend to be topological in nature. For example, one of the strengths of Bayesian methods is that strong convergence guaranteees 3
are available. However, when applied to data sources arranged in a feedback loop, Bayesian updates can converge to the wrong distribution or not at all! [46] The fact that this is possible is witnessed by the presence of a topological feature – a loop – in the relationships between sources. Sheaf theory provides canonical computational tools that sit on the general framework and has been occasionally [43, 53, 32] used in applications. Unfortunately, it is often difficult to get the general machine into a systematic computational tool. Combinatorial advances [3, 4, 56, 12] have enabled the approach discussed in this article through the use of finite topologies and related structures. Sheaves, and their cohomological summaries, are best thought of as being part of category theory. There has been a continuing undercurrent that sheaves aspire to be more than that – they provide the bridge between categories and logic [23]. The pioneering work [22] by Goguen took this connection seriously, and inspired this paper – our axioms sets are quite similar – though the focus of this paper is somewhat different. The approach taken in this paper is that time may be implicit or explicit in the description of a data source; for Goguen, time is always explicit. More recently category theory has become a valuable tool for data scientists, for instance [58], and information fusion [39, 40].
2
A motivating example
Although the methodology presented in this article is sufficiently general to handle any configuration of sensors and data types, we will focus on a few simple examples for expository purposes. We begin with the well-understood problem of forming image mosaics from a collection of imaging sensors. In its usual formulation (for instance [60, 1]), several cameras with similar perspectives are trained on a scene. Assuming that the perspectives and lighting conditions are similar enough, large mosaics can be formed. The problem is more complicated if the lighting or perspectives differ substantially, but it can still be addressed. In either case, the methodology aims to align pixels or regions of pixels. If some of the imaging sensors are of different modalities – say a LIDAR paired with a RADAR – pixel alignment is doomed to fail. Then, the problem is fully heterogeneous, and is usually handled by trying to interpolate into a common coordinate system [38, 25]. If there is insufficient sample support or perspectives are too different, then it is necessary to match object-level summaries rather than pixel regions [64]. We will consider three different mosaic-forming problems characterized by imaging modality: 1. All sensors yield images, with identical lighting and perspective: perfect mosaics can be formed (left frame of Figure 1) 2. All sensors yield images, though lighting and perspective can differ: imperfect mosaics can be formed (right frame of Figure 1)
4
Background discontinuity
Perfect mosaic
Imperfect mosaic
Figure 1: Two mosaics of photographs. The mosaic on the left is perfect, in that the subimages on the overlap have identical pixel values. The mosaic on the right is imperfect, in that the pixel values of the two subimages differ somewhat on the overlap
3. Sensors yield a mixture of images or semantic detections: a fully heterogenous data fusion problem (see Figure 4 for a preview of what this looks like) These three examples showcase the transition from homogeneous data in problem (1) to heterogeneous data in problem (3), with homogeneous data being a clear special case. These examples highlight how fusion of heterogeneous data is to be performed, and makes the resulting structure-preserving summaries easy to interpret. Moreover the examples provide a context for three novel features that we now outline: 1. Sections of sheaves correspond to globally consistent data, 2. Different configurations of sensors lead to different base space topologies, and 3. Structure-preserving summaries are obtained by enhancing the algebraic structure of the raw sensor data through category representations. When two cameras A and B capture adjacent images as in Figure 1, one can examine the difference between groups of pixels to tell if the images overlap. If the difference is small, then the images are deemed to overlap, and a mosaic – an image formed on the union of the sets of pixels – can be formed. This idea is precisely that of a section: a globally consistent picture derived from locally consistent, smaller pictures. Mathematically, one tries to ensure that the image portions on the overlap are exactly the same (left frame of Figure 1), though realistically some differences are tolerated (right frame of Figure 1). Depending on the number and arrangement of cameras, the collection and shape of overlaps between sensing domains might be rather more or less intricate. The topological structure of the sensors and the portions of the scene 5
they image provides important constraints on the kinds of mosaics that can be formed. Since a sheaf model of the sensors is a faithful, full-fidelity representation, it is subject to these topological constraints. Fortunately, topological constraints on sheaves are straightforward to compute, and have clear interpretations. For instance, densely overlapping regions of images are often highly redundant: many pixels will have their values supplied by many different cameras, while sparsely overlapping regions may be determined by only one camera. The pixels which are specified by a single camera will be constrained by pixels elsewhere in the image, where there is some overlap. These constraints are directly computable as the zeroth cohomology classes H 0 of an appropriate sheaf. Depending on the configuration of images, it is also possible that anomalous sections can be posited as mosaics. These are mediated by the underlying topology of the sensor configuration. For instance, it is possible to obtain locally reasonable but globally infeasible2 images, such as the (in)famous Penrose triangle as first cohomology classes H 1 of the same sheaf. In order to compute these cohomology classes, the sheaf must be built from data that have sufficient algebraic structure. Images as vectors of pixel values have sufficient structure, though object detections do not. Since we stipulated that raw data be stored in the sheaf, this presents an apparent obstacle, especially when there is mixture of data types. The appropriate solution is to re-encode the data with enriched algebraic structure through category representations. Recent work in knot theory [36, 37] has demonstrated the theoretical power of the related technique of categorification, and some advances in categorification in data science [48] point to its value. Intuitively, category representation causes discrete data – such as object detections – to be replaced with distributions and functions to be replaced with stochastic maps.
3
An axiomatic framework
To emphasize the generality of sheaves in information integration from the outset, we posit a set of axioms for a sensor integration framework, and briefly outline some of the results that follow from these axioms. These axioms start quite general, and increase in specificity. In the subsequent sections, we delve more deeply into the Axioms and their implications, highlighting their connection to our example of camera mosaics and a few others. Axiom 1: The entities observed by the sensors lie in a set X. Axiom 2: All possible attributes assigned by the integrated sensor system lie in a collection of sets A. The first axiom is not contraversial. “Entities” might be targets, raw sensor data, or even textual documents. Axiom 2 loses no generality by positing that 2 The infeasible object is in R3 . It was no particular problem to draw this image on the plane!
6
each sensor can assign attributes from a set. This set of attributes may differ from those assigned by other sensors. Axiom 3: The collection of all groups of entities whose attributes might be related forms a topology T on X. The topology T specifies “where” fusions can be performed: ultimately every open set in T will be a region in which data are provided. Following the example of image mosaics, raw sensor data will be supplied at some of the open sets in T . The raw data is then merged on unions of these open sets whenever the data are consistent along intersections. Therefore, the precise relationships among open sets in T are crucial and generally should have physical meaning. Axiom 3 is a true limitation,3 which posits that entities are organized into local neighborhoods. In all situations of which the author is aware, the appropriate topology T is either the geometric realization of a cell complex or (equivalently) a finite topology built on a preorder as described in [59]. The three axioms encompass unlabeled graphs as a special case, for instance. Finite topologies are particularly compelling from a practical standpoint even though relatively little has been written about them. The interested reader should consult [45] for a (somewhat dated) overview of the interesting problems. Example 1. In the case of image mosaics, the open sets in T should have something to do with the regions observed by individual cameras. Let the set of entities X consist of the locations of points in the scene. There are two obvious possibilities for a topology: 1. The topology T1 of the scene itself, generally a manifold or a subspace of Euclidean space or 2. The topology T2 which is generated by a much smaller collection of sets U, in which each U ∈ U consists of the set of points observed by a single camera. (Recall that the topology generated by U consists of all possible unions and finite intersections of elements of U.) Although T1 is rather natural – and is what is usually used when forming mosaics – it has the disadvantage of being quite large. It also requires fusions to be performed at the pixel level, which is generally inappropriate when there are heterogeneous sensors that don’t return pixels. This also can be computationally intense, as pixel-level fusion scales badly. In contrast, T2 is much smaller (usually finite) and therefore more amenable both to computations and to fusion of heterogeneous data types. Example 2. (See also [32]) Axiom 3 also specifies how the tables in a relational database can be related. If the collection of attributes is a single set A = {A}, then the entities and their attributes correspond to one another in a relation 3 Instead of a topology, one might consider generalizing further to a Grothendieck site, but this level of generality seems excessive for practical sensor integration.
7
Figure 2: The simplicial complex associated to the relation in Example 2. Image credit: Cliff Joslyn [32]
R ⊆ X × A. The appropriate topology is that of a Dowker complex [17], which is an abstract simplicial complex. For instance, consider the relation given by the table A C E K T1 T2 V
S √
O
√ √ √
P √ √
I
√
√
√ √
L √ √
R √ √
√
Figure 2 shows the simplicial complex in which the vertices are given by rows of the table and the simplices are given by columns. Notice that some simplices (namely I) are not maximal, as they are contained in higher dimensional simplices. If each row corresponds to an entity and each column corresponds to an attribute, then the topology given by the simplicial complex indicates when attributes of multiple entities ought to be compared. For instance, the data supplied by A, C, and K all provide the P attribute. Surprisingly, the homotopy class – a topological type – is preserved if we swap entities and attributes [17]. This example motivates a tight connection between attributes and entities. Based on this connection, the following axiom uses the local neighborhoods of entities to relate attributes from neighboring sensors and entities. Axiom 4: Each information source assigns one of the sets of attributes to an open subset of entities. • This assignment is denoted by S : T → A, so that the attribute set assigned to an open set U ∈ T is written S(U ) ∈ A.
8
• The set S(X) is called the set of global sections of S or simply the set of sections. • S(U ) is called the set of observations or local sections over the sensing domain U . There is complete freedom in what the set of observations for a given data source can be. Generally, it is best if the set of observations associated to a physical sensor is the space of raw data corresponding to that sensor. For a camera, this should be the set of possible images, perhaps R3mn (m rows, n columns, and 3 colors for each pixel). For textual data sources, this could be the content of a document as a list of words or a word frequency vector. However, for open sets corresponding to intersections or unions of data sources (and therefore not corresponding a physical sensor), the set of observations needs to be more carefully chosen and may not correspond to raw data. If the open set corresponds the intersection of two sensors of the same type, then usually the observations ought to also be of that type. For instance, the observations in the overlap between two camera images should be merely a smaller image. On an open set corresponding to the intersection between a camera and a radar, each observation might be a listing of target locations rather than the raw data of either sensor. If there are known to be n targets, each in R3 , then clearly this space is parameterized by R3n . If the number of targets is unknown but is less than n, then the space of observations would be given by {⊥} t R3 t R6 t · · · R3n where t represents disjoint union and ⊥ represents the state of having no targets in the scene. This example indicates that moving from the observations over one open set to a smaller set ought to come from a particular transformation that depends on the sensors involved. Put another way, removing entities of a sensing domain from consideration should impact the set of observations in a systematic way. Axiom 5: It is possible to transform sets of observations by reducing the size of sensing domains. • Whenever a pair of information sources assign attributes to open sets of entities U ⊆ V , there is a (fixed) function S(U ⊆ V ) : S(V ) → S(U ) from the set of attributes over the larger sensing domain to the set of attributes over the smaller one. • If U ⊆ V ⊆ W and U ⊆ Y ⊆ W , then S(U ⊆ V ) ◦ S(V ⊆ W ) = S(U ⊆ Y ) ◦ S(Y ⊆ W ), which means that we may refer to S(U ⊆ W ) without ambiguity. 9
Subset relation
Sensing domains
Sections restriction map
Coarse topology
Finer topology
Figure 3: Cropping an image reduces the size of the sensing domain. The topology determines what crops can be performed as restriction maps: a coarse topology (left) permits fewer crops than a finer one (right)
• S(U ⊆ U ) is the identity function. • The function S(U ⊆ V ) is called a restriction. The topology T imposes a lower bound on how small sensing domains can be, and therefore how finely restrictions are able to discriminate between individual entities. Generally speaking, one wants the topology to be as small as practical, so that fewer restrictions need to be defined (and later checked) during the process of fusing data. Example 3. Axiom 5 has quite a clear interpretation for collections of cameras – the restriction maps perform cropping. The topology T controls how much cropping is permitted. Figure 3 gives two examples, in which the topology on the left is smaller than the one on the right. Indeed, if we consider the topology T1 from Example 1, then a huge number of crops must be defined, including many that are unnecessary to form a mosaic of the images. However, the topology T2 from that example require the modeler to define only those crops necessary to assemble a mosaic from the images. Axiom 6: It is possible to merge mutually consistent observations from overlapping sensing domains.4 • Suppose that {Uα } is a collection of sensing domains indexed by α ∈ I and there are observations aα ∈ S(Uα ) taken from each sensing domain. If all of the aα from different sensing domains are equal upon restricting to intersections (where they can be compared on equal footing), namely for all α and β in I (S(Uα ∩ Uβ ⊆ Uα )) (aα ) = (S(Uα ∩ Uβ ⊆ Uβ )) (aβ ), then there is an a ∈ S(∪Uα ) that restricts to each aα via aα = (S(Uα ⊆ ∪Uα )) (a). 4 This
is called conjunctivity in the sheaf literature
10
In the case of two overlapping sets U1 , U2 , Axiom 6 can be visualized as a pair of commutative diagrams
U1 ⊆U1 ∪U2
U1; ∪ Uc 2
U1 c
S(U1 ∪ U2 ) U2 ⊆U1 ∪U2
; U2
U1 ∩U2 ⊆U1
U1 ∩U2 ⊆U2
S(U1 ⊆U1 ∪U2 )
S(U1 )
y
S(U1 ∩U2 ⊆U1 )
U1 ∩ U2
S(U2 ⊆U1 ∪U2 )
%
S(U2 )
% y S(U1 ∩U2 ⊆U2 ) S(U1 ∩ U2 )
in which the arrows on the left represent subset relations, while the arrows on the right represent restriction maps. The Axiom asserts that pairs of elements on the middle row that restrict to the same element on the bottom row are themselves restrictions of an element in the top row. With Axiom 6, it becomes possible to draw inferences from data that amount to data fusion. Example 4. Figure 4 shows a set of photographs of a table, on which a number of coins are placed. Assuming Axiom 6 holds allows the construction of joint observations over the union of those two regions regardless of the types of information the sensors provide. It is immediately apparent that the content of the observations over the union will differ for different types of sensors. For instance, Figure 4 shows three possible configurations of sensors: 1. Two cameras, in which case Axiom 6 constructs mosaics (left frame of Figure 4), 2. A camera and an object-level detector, in which case Axiom 6 constructs joint image/object pairs (middle frame of Figure 4), or 3. A camera and a semantic analytic, in which Axiom 6 joins images to semantic interpretation (right frame of Figure 4). Notice that in the latter two cases, the information stored in the intersection is not an image, but something rather more processed. The example shows that restriction maps in all cases therefore provide not only “restriction” of the domain, but also some kind of summarization. It is to be emphasized that this summarization must be provided by the modeler. The set of axioms thus far is general enough to include any sensor integration approach, yet is also minimal in the sense that removing any of them results in a severely deficient approach. Specifically, • Removing Axiom 4 means that sensors don’t return a well-defined set of attributes, which precludes any commonality between overlapping sensors. When this occurs in practice – typically when textual feeds have nonoverlapping semantic frameworks – integration is severly hampered. 11
(
(
2 quarters
, 00 dimes nickels 1 penny
( ( 2 quarters inside overlap 0 dimes inside overlap 1 nickel inside overlap 1 penny inside overlap 2 quarters outside overlap 0 dimes outside overlap 0 nickels outside overlap 1 penny outside overlap
( ( 2 quarters 0 dimes 1 nickel 1 penny
(
, $0.51
(
( $0.56
Figure 4: Three possible ways to combine information about coins on a table. Left: forming the mosaic of two images, with no further interpretation. Middle: fusing an image with object-level detections, but no interpretation. Right: fusing an image with the value of the coins, which requires interpretation of objects in the scene
• Removing Axiom 5 means that it is not possible to consistently isolate entities (or groups of them) in the intersections between sensing domains. Although more information can be obtained by a pair of information sources taking two views of an entity, their observations cannot be compared. • Removing Axiom 6 means that even if two separate observations of an entity are comparable, it is impossible to infer that observations of nearby entities are related. Example 5. In a relational database, Axiom 6 ensures that the join of two tables is nonempty when there are matching rows and columns. It is helpful (though strictly not necessary) to assume that even if there are systemic variations and noise, the assignment of attributes is deterministic. In principle, all state variables are accounted for, although they may be hidden from observation. The reader is cautioned that probabalistic models are not excluded, instead (like Bayesian networks), they are deterministic at the level of probability. Axiom 7: The assignment of attributes to entities is deterministic. • A collection of observations on all sensing domains individually completely determines the assignment of attributes independent of the sensors.5 • Suppose that U is a sensing domain and that a, b ∈ S(U ) are assignments of attributes to that sensing domain. If these two observations 5 This
is called the monopresheaf condition
12
(
$0.51 outside overlap $0.56 inside overlap
Join of Tables 1 and 2 Name Flight Start Destination Miles Status Bob 123 DCA SYR 300 Econ
Table 1:
Table 2:
Name Flight Start Destination Bob 123 DCA SYR Mary 345 ORD ATL
Name Miles Status Bob 300 Econ Mike 1200 Bus
Common columns of Tables 1 and 2 (Name)
Figure 5: The diagram of the sheaf modeling two tables in a passenger database owned by an airline
are equal onSall sensing domains V in a collection V of sensing domains with V = U , S(V ⊆ U )(a) = S(V ⊆ U )(b), then we can conclude that a = b. Remark 6. If the observations consist of probability distributions, it is usually best to think of “equality” in Axioms 6 and 7 as being “equality almost everywhere”, rather than exact equality. Axioms 1-7 constrain an information integration approach to be represented by a sheaf S. Because of this general framework, anything true for sheaves must carry over to information integration approaches satisfying these axioms. There are some benefits from the sheaf-theoretic perspective implied by these Axioms: 1. The data organization suggested by sheaves is natural and supports disˇ tributed storage and computation (through the Cech cellular sheaf and its cohomology, Section 4), 2. Specific instances of observations can compared easily across a collection of sensing domains (though the construction of sections, Definition 18) 3. Specific collections of observations will usually not lead to assignments of attributes to all entities due to conflicts between observations. The level of agreement can be assessed (Proposition 23), and 4. Sensing domains that are appropriate for repeat observations to resolve conflict are thereby easily identified (Remark 20). Example 7. Real world examples of relational databases often do not satisfy Axiom 7, which is sometimes a source of problems. Consider a database with two tables like so: 13
1. a table whose columns are passenger names, flight numbers, a starting airport, and a destination airport, and 2. a table whose columns are passenger names, frequent flier miles, and flier upgrade status. The two tables are shown at left and right middle in Figure 5. The restrictions (shown by arrows in the Figure) correspond to projecting out the columns to form subtables. Axiom 6 ensures that if the two tables each have a row with the same passenger’s name, then there is really is at least one passenger with all of the associated attributes (start, destination, miles, and status). This is a table join operation: rows from the two tables that match up correspond to rows in the joined table. Axiom 7 ensures that there is exactly one such passenger, so in Figure 5 this means that “Bob” in Table 1 really is the same as “Bob” in Table 2. From personal experience – the author has a relatively common name – Axiom 7 does not hold for some airlines. Example 8. Axiom 7 would seem to rule out the possibility of forming imperfect mosaics, those in which the lighting or perspective is different for different cameras, since the pixels in the overlaps will not be exactly equal. However, computer vision approaches almost always account for this problem by instead comparing object-level detections. In this way, two images are said to form a unique mosaic when their object level detections match. This can be encoded in a sheaf by placing object detections (rather than subimages) on the overlap, such as is shown in Figure 6. Although one can visualize the mosaic formed in this way as an imperfect mosaic (as in the top of Figure 6), though this is only suggestive because it actually specifies multiple values for the pixels in the overlap. These multiple values are constrained by the requirement that both images must yield the correct object-level detections, so it is indeed difficult to form a visualization. It is rather like the fact that separate charts in an atlas for manifold may show the same point in different ways. However, Hubbard states [29] “It is fairly easy to understand what a sheaf is, especially after looking at a few examples. Understanding what they are good for is rather harder; indeed, without cohomology theory, they aren’t good for much.” This is an essential point. Although there are many co-universal information representations (of which sheaves are one), the stumbling block for integration approaches is the lack of theoretically-motivated computational approaches. What sheaf theory provides is a way to perform controlled algebraic summarization – a weaker set of invariants that still permit certain kinds of inference. In order to perform these summarizations, the attributes must be specially encoded, which leads to our final Axiom. Axiom 8: Each set of attributes S(U ) has the structure of a vector space and each restriction S(U ⊆ V ) is linear. The payoffs are immediate and sweeping. Among others,
14
Imperfect mosaic Camera images
( ( 2 quarters 0 dimes 1 nickel 1 penny
Object detections in overlap
Figure 6: Two overlapping images can still be joined into an imperfect mosaic if their object level dections agree
1. It becomes possible to compute the degree of agreement between information sources using linear algebra (Proposition 23), 2. There is a canonical collection of invariants, against which all others can be compared (Definition 24), and 3. Different collections of information sources can be compared for their levels of specificity against this canonical collection and against each other (Theorem 26). Although Axiom 8 usually does permit reasonable options for restrictions, it may also be highly constraining. Many effective algorithms that could be used as restrictions are not linear. Although this seems to dramatically limit the effectiveness of our subsequent analysis, there is a way out that actually increases the expressiveness of the model. Specifically, through the technical tool of category representations [50, 48], a non-linear restriction map can be “enriched” into one that is linear. Lemma 9. (Set Representation Lemma) If f : A → B is a function between two sets and R is a ring with 1, it can be extended to a R-module homomorphism F : R[A] → R[B] in which F (1 · a) = f (a) for each a ∈ A. Proof. By definition, each element x of the free R-module R[A] can be written P as a sum x = a∈A ca · a where ca ∈ R. We therefore need only define ! X X X F (x) = F ca · a = ca · F (1 · a) = ca · f (a). a∈A
a∈A
15
a∈A
Lemma 9 can be represented diagrammatically R[A] O
F
iA
A
/ R[B] O iB
f
/B
where we caution the reader that the vertical maps in the diagram, iA : A → R[A] and iB : B → R[B], are not R-linear even if A or B are R-modules to begin with. Given a sheaf S whose stalks are sets – but not vector spaces – one can replace each stalk with a free vector space, and each restriction function with a homomorphism. Although this seems remarkably na¨ıve, [50] shows that it can yield new, relevant invariants that describe a system. This new sheaf contains the original one as a subsheaf – and so remains a full-fidelity representation of the original heterogeneous information system – but it contains additional structure as well. This new information consists of “mixed states” from the various sensors, which can be interpreted probabilistically if R = R. Specifically, consider the subset n o X P [A] = ca · a ∈ R[A] : a ∈ A, ca = 1, and ca ≥ 0 for all a ∈ A , in which the coefficients of elements sum to 1. P [A] is precisely the set of probability distributions on A. The image of iA , which takes a 7→ 1 · a, lies in P [A]. Moreover the map F obtained by Lemma 9 restricts to a function P [A] → P [B], thus F is a stochastic map. We therefore interpret iA : A → P [A] as replacing a sensor’s observation of an element a ∈ A with a distribution concentrated on that element. Example 10. Consider the task of identifying objects in a scene, such as the visible coins in Figure 6. We might consider this situation as specifying the sheaf given by the following diagram
f ℤ
3mn
{penny, nickel, dime, quarter, dollar}
id
{penny, nickel, dime, quarter, dollar}
where the image is assumed to be a rectangular grid of m × n pixels, each specified by three integers (giving the red, green, and blue components, for instance), and the restriction map f is some kind of detection algorithm. A quick look at the vast image processing literature indicates that there are many options for such a function! Clearly, the set {penny, nickel, dime, quarter, dollar} has no algebraic structure on its own, so it is impossible for any of these algorithms to be linear.
16
When we use Lemma 9 to replace f with a linear function F : P [Z3mn ] → R5 , we merely permit uncertainty in the input image in a systematic way. Notice in particular that this does not constrain our choice of f at all, so we are free to choose whatever detection algorithm suits. The resulting sheaf S of vector spaces has a canonical family of invariants called cohomology groups H k (S) for k = 0, 1, . . . . In this general setting, the cohomology groups can be constructed in various ways using linear algebraic manipulations, most notably the canonical or Godement resolution [21][9, II.2]. ˇ We’ll use the Cech construction in the next section to define the cohomology groups, but will state the main result here. Theorem 11. The cohomology groups H k (S) are sheaf invariants: any pair of isomorphic sheaves will have the same set of cohomology groups. For sensor integration, the most notable fact is that H 0 (S) ∼ = S(X) (shown in Proposition 23); in other words, the group of sections can be computed from the cohomology using linear algebra. Remark 12. Theorem 11 is generally not stated as plainly as we state it here. For instance, it is implied by [34, Prop. 2.2.2] and the discussion following, and it is also discussed in [30, II.3], in [9, II.2], in [56, Sec 1.2], and in [12, Sec 6.2] though not called out directly. For a direct proof, the reader may consult [52, Thm 4.1].
4
ˇ Faithful discretization: the Cech viewpoint
One usually does not have access to all sensing domains. Instead, information sources from a certain subcollection U ⊂ T will be available. Suppose (without really losing generality) that the union of U is X. We construct a discretized version of the sheaf S, called a cellular sheaf Sˇ on a combinatorial space (the nerve) associated to U. Under appropriate conditions (described below by Theorem 26), the cohomology of Sˇ completely recovers the cohomology of S. Definition 13. An abstract simplicial complex A on a set V is a collection of ordered subsets of V that is closed under the operation of taking subsets. (When taking subsets, we do not require order to be preserved.) We call an element of A which itself contains k + 1 elements a k-cell, and we say that k is that cell’s dimension. We usually call a 0-simplex a vertex and a 1-simplex an edge. If a, b are simplices with a ⊂ b (proper subset), we say that a is a face of b, and that b is a coface of a. Usually, we write a b to emphasize that a and b are simplices with a ⊂ b. Unlabeled graphs are in one-to-one correspondence with those abstract simplicial complexes containing only dimension 0 and 1 simplices. We can now define a combinatorial representation of the collection U of sensing domains.
17
no 2-simplex: there is a gap in scene coverage
2-simplex: there are scene points in common
Figure 7: Two sets of photographs of the same scene with their corresponding nerves. The set on the left contains a gap in the middle, while the set on the right does not. The topology of their corresponding nerves reflects this difference in coverage
Definition 14. If U is a collection of open subsets, then the nerve N (U) is the collection of subsets of U in which {U1 , . . . , Uk } ∈ N (U) whenever U1 ∩. . .∩Uk 6= ∅. N (U) is an abstract simplicial complex. Example 15. Figure 7 shows how the nerve of the sensing domains can indicate missing data. Specifically, the Figure shows two sets of photographs taken of a scene consisting of several objects on a table. In the set of photographs on the left, there is a gap in coverage; the white mug in the center is not visible in any of the images. However, the white mug is visible in all of the images on the right. Supposing that the set of locations visible from a given photograph corresponds to an open set, the set of images on the left corresponds to a collection U1 whose nerve N (U1 ) consists of three vertices and three edges. On the right, the collection U2 has a nerve N (U2 ) that additionally contains a 2-dimensional simplex. The topology of these two nerves differs: the one on the right is contractible, while the one on the left is not. This can be detected using homology, and has implications for making inferences from the sheaves describing the set of images themselves. Indeed, the non-contractibility of N (U1 ) opens the possibility of erroneous inferences that involve points in the gap between the images. Given the sheaf S describing all of the possible observations of the entities, we can restrict our attention to a smaller object over N (U). Definition 16. Suppose S is a sheaf on the topological space (X, T ) and that ˇ U ⊆ T is a collection of open sets whose union is X. Following [56], the Cech ˇ cellular subsheaf S on N (U) assigns the following data to the simplices of N (U) ˇ ˇ 1. S({U }) = S(U ) for each U ∈ U and more generally S({U 0 , . . . , Uk }) = S(U0 ∩ · · · Uk ), and is called the stalk at U . 18
2. If U, V ∈ U have nonempty intersection, then the restriction function ˇ S({U }
ˇ ˇ {U, V }) : S({U }) → S({U, V })
is given by ˇ S({U }
{U, V }) = S ((U ∩ V ) ⊆ U ) ,
which is a function S(U ) → S(U ∩ V ). ˇ k ) for some set of simplices 3. We call a choice of observations ak ∈ S(A {Ak } ⊆ N (U) an assignment. Lemma 17. Both S and Sˇ are functors – the former from Open(X, T ) and the latter from the face category of N (U). However, S is contravariant while Sˇ is covariant. The proof of this fact merely involves checking that the diagram of restrictions for each S and Sˇ commutes. At this point, we have merely reduced the information represented in S to that which is actually available to the deployed sensors. From an implementation standpoint, although S specifies the spaces of sensor outputs from all possible sensor configurations, Sˇ can be stored in a distributed fashion. Each sensor needs to only keep track of its own data (the stalk), which sensors are its neighbors, and how to translate its data into a form that is common with its neighbors (the restrictions). Given distributed storage, the process of testing for consistency within an assignment and assembling a more complete picture ˇ across sensors leads to the computation of sections of S. ˇ Some assignments of S correspond directly to sections of S. We call these ˇ sections of S: Definition 18. If ak is an assignment to a set of simplices {Ak }, then we call it a local section over that set provided that ˇ k S(A
Am )ak = am
for all k and m. If ak is a section on all simplices, we call it a global section or simply a section. Proposition 19. Global sections of Sˇ and S are in one-to-one correspondence with each other. Proof. Every section of Sˇ corresponds to a section of S via Axiom 6. Conversely, every section of S corresponds to a section of Sˇ by restricting (Axiom 5) to each element of U. According to the Proposition, sections of either sheaf are the result of combining related observations from different sensors. They are precisely “fused” data from multiple sensors. Unfortunately, it is often the case that actual observations from sensors do not correspond to global sections, since errors can cause the equation in Definition 18 to fail on some simplices. This can happen due 19
to errors in the data and in the models captured by the restrictions, since the models are usually approximate. There are several ways around this problem. The most na¨ıve is to instead replace the exact equality in Definition 18 with a metric distance bound. This approach can be successful [54], for instance in the formation of imperfect mosaics (right frame of Figure 1). Remark 20. A different approach is to require exact equality as in Definition 18, but to use the “size” of a local section as a measure of internal consistency. An assignment of a sheaf Sˇ to all faces of a simplicial complex partitions the faces into equivalence classes: two faces are equivalent if there is a path from one to the other along which the assignment satisfies the restriction maps according to Definition 18. If the distribution of sizes of the equivalence classes is skewed toward many small sets of faces, then very little agreement is present. Fused results should thus be considered suspect. Operationally, one should consider performing additional data collections to try to resolve the conflict. If the distribution of sizes is skewed towards a few, large collections of faces, then there is considerable agreement. In the latter case, fused results are more reliable. There are many interesting open combinatorial and operational questions around the structure of local sections. The reader is immediately cautioned that there are usually local sections of ˇ S that are not even representable in S. Our main task is to find conditions under which we can recover the cohoˇ A relatively strong sufficient condition mology of S from information about S. ˇ is given by the Theorem 26, which makes use of a cohomology theory for S. Recall that an abstract simplicial complex X consists of ordered sets. Suppose B = [v0 , . . . , vk ] is a k face, where we use square brackets to emphasize that order matters in what follows. Then all of its faces can be obtained by deleting one of the vertices in that list. So if A is a k − 1 face, then, define ( (−1)i (−1)(σ) if A = σ([v0 , . . . , vbi , . . . , vk ]) [B : A] = 0 if A is not a face of B. where the hat indicates an excluded vertex, σ is a permutation on k elements, and (σ) is the parity (±1) of that permutation. This admittedly opaque sign convention is computationally quite useful for algebraically computing spaces of sections. The sign of [C : A] for two faces A, C of a simplicial complex is chosen according to the observation that the equation in Definition 18 for a sheaf Sˇ can be rearranged. Consider the case of the sheaf Sˇ on the simplicial complex shown in Figure 8. A section s taking the
20
ˇ ↝ C) (A
ˇ (A)
ˇ (C)
ˇ ↝ C) (B
ˇ (B)
C = [A, B]
A
Sheaf ˇ Nerve N( )
B
Open sets
UA
UB
Figure 8: Constructing the sheaf Sˇ over an open cover
value sA on A and sB on B satisfies the equation ˇ −S(A ˇ −S(A ˇ [C : A]S(A
C)
ˇ S(A ˇ C) (sA ) + S(B C)
ˇ +S(B
ˇ [C : B]S(B
C) (sA )
ˇ = S(B
C) (sB )
C) (sB ) = 0 sA = 0 C) sB sA = 0 C) sB
thus the section s can be thought of as lying in the kernel of a matrix built of the restriction maps. The the signs come from the indices [C : A] and [C : B], which as the equation above indicates can be chosen arbitrarily up to a point. Definition 21. The cellular cochain groups are given by the spaces of assignments to the simplices of a particular dimension. Namely, the space of assignments to k-simplices is given by M ˇ = ˇ C k (U; S) S(A) A∈N (U ), dim A=k
where we observe that the A in the direct sum ranges over all k-simplices. The coboundary maps are functions ˇ → C k+1 (U; S) ˇ dk : C k (U; S) that take an assignment a defined on k-simplices to a new assignment dk a defined on k + 1 simplices. This assignment takes the following value on a k + 1-simplex B X ˇ (dk a)B = [B : A]S(A B)sA . A a k-face ofN (U )
It can be shown that dk ◦ dk−1 = 0, so that the image of dk−1 is a subspace of the kernel of dk , which allows us to make the following definition. 21
Definition 22. The k-th cellular sheaf cohomology of Sˇ on an abstract simplicial complex N (U) is ˇ = ker dk /image dk−1 . H k (U; S) ˇ = ker d0 consists precisely of those Proposition 23. [52, Thm 4.3] H 0 (U; S) ˇ assignments s which are global sections of S. This delivers on one of our promises made earlier – it is possible to compute the agreement between data sources using linear algebra. Therefore, agreement can be measured algebraically, and in closed form. ˇ and H 0 (S) are typically Again, we are faced with the fact that H 0 (U; S) rather different. This is immediately apparent because H 0 (S) has no depenˇ apparently does. dence on our choice of sensing domains U, while H 0 (U; S) However, under the right circumstances, the two agree. To identify those circumstances, we need one rather technical theoretical bridge – the definition of ˇ Cech cohomology for S. Definition 24. (Compare [29, Lemma A7.2.3]) Suppose that V refines U so that every V ∈ V is a subset of some U ∈ U. In this case, there is a group ˇ → H k (V; S). ˇ homomorphism H k (U; S) Given this, the cohomology of S is given by ˇ H k (S) = H k (X; S) = lim H k (U; S). →
Practically speaking, the nontrivial elements of H k for k > 0 represent obstructions to consistent sensor data. They indicate situations in which it is impossible for an assignment from part of the nerve to be extended to the rest of the nerve. When these situations arise in practice they usually indicate a failure of the model to correspond to reality. For instance, in all of the mosaic-forming examples discussed in this article, there has been an implicit assumption that all images were acquired at the same time. When time evolves between imaging events, nontrivial H k elements indicate that inconsistency can easily arise, as the next example shows. Example 25. Consider the collection of photographs shown in Figure 9. As before, consider the sheaf Sˇ over the nerve shown in the diagram for three sensing domains U = {Ui }, where the stalks consist of various spaces of images and the restrictions consist of image cropping operations. Consider the two assignments to the set of vertices that consist of the photograph on the left and the bottom and one of the photographs on the right. Neither of these assignments correspond to a global section, because one of the restriction maps out of vertex on the right will result in a disagreement on one of the edges incident on that vertex. In particular, the photographs shown on the edges could not possibly have arisen by restricting any assignment of phoˇ tographs to the vertices, that is, the image through d1 of an element of C 0 (U; S). Therefore, the assignment of photographs to the edges shown in the Figure is an ˇ since there are no nonempty intersections example of a nontrivial class in H 1 (S) between all three sensing domains. 22
Finger present
Finger present
?? ≠
Finger out of scene
Finger not present
Finger not present
Figure 9: A collection of images in a nontrivial H 1 cohomology class
Original images
Left
Obstacle removed
Right
Left
Right
Figure 10: Trying to see behind a book with two different views
Notice that in light of Definition 24 and the example, we may be tempted to add more sensors to our collection to try to resolve the ambiguity. The essential fact is that without bringing time into the model of mosaic-forming, problems still can arise. Under fairly general conditions, such as those in the above example, the deployed sensors Sˇ contains all of the possible information as is truly encoded by S from the standpoint of cohomology. Theorem 26. (Leray theorem, [42] [29, Thm A7.2.6][9, Thm 4.13 in III.4]) ˇ Suppose S is a sheaf on X, U ⊆ T is collection of open sets, and Sˇ is the Cech cellular subsheaf associated to U. Whenever U is the intersection of a collection of elements in U it follows that H k (U ; S) = 0 for all k > 0, we can conclude ˇ ∼ that H p (U; S) = H p (S) for all p. Example 27. Consider the case of trying to use a collection of sensors to piece together what is behind an obstacle. This occurs in optical systems as well as those based on radio or sound propagation. Usually, the process involves 23
taking collections from several angles followed by an obstacle-removal step. If not enough looks are obtained from enough angles, then it will not be possible to infer anything from behind the obstacle. However, if the sensors’ locations are not well known, it may be impossible to detect that this has occured. This ˇ and is an effect of a difference between the deployed sensors’ view of H 1 (U ; S) the actual H 1 (S). Figure 10 shows the process for trying to see behind a book using two views. The original photographs on the left show the scene, while the photos on the right show the obstacle removed. Suppose that the space of sensors consists of a set X of the following abstract points: L for the left camera, R for the right camera, V1 for the upper overlap, and V2 for the lower overlap. There is a natural topology for the sensors, namely T = {{R, V1 , V2 }, {L, V1 , V2 }, {V1 , V2 }, {V1 }, {V2 }, X, ∅} in which the overlaps are separated. Modeling the deployed cameras as U = {UL , UR } where UL = {L, V1 , V2 }, UR = {R, V1 , V2 }, we obtain the following ˇ ˇ associated to U: Cech cellular subsheaf M / R3p ⊕ R3q o
R3m
R3n
where m is the number of pixels in the left image, n is the number of pixels in the right image, p is the number of pixels in the upper overlap region, and q is the number of pixels in the lower overlap region. Since the overlap region consists of two connected components, does this impact the mosaic process? It does not, according to Theorem 26, though this requires a little calculation. It’s clear the only place we need to be concerned is the overlap. So we need to examine the intersection UL ∩ UR = {V1 , V2 } and compute ˇ H k ({V1 , V2 }; M) = lim H k (V; M), →
where V ranges over refined covers of {V1 , V2 }. There is a maximally refined cover, which is V = {{V1 }, {V2 }, {V1 , V2 }}. The sheaf diagram for this (with the sets of V below for clarity) is R3p {V1 } o
id
/ R3p o
{V1 } ∩ {V1 , V2 }
pr1
R3p ⊕ R3q / {V1 , V2 } o
pr2
/ R3q o
{V2 } ∩ {V1 , V2 }
id
R3q / {V2 }
which is easily computed to have dim H k = 0 for k > 0. (In the diagram, prk is the projection onto the k-th factor of a product and id is the identity map.) The above example shows that fusing relatively raw data works well.6 However, problems arise if the data are processed a bit more, as the processing induces correlations across the sensing domains. 6 It works well up to the limit of having enough agreement between the sensors to deduce the overlaps, of course. This is not necessarily easy!
24
Example 28. Suppose instead that we are interested not in the images themselves but rather whether there is an object in the scene. Continuing the previous example, we are led to consider a sheaf P on T that keeps track of whether there is an object on the left, right, or middle of the scene. The stalk over UL (or UR ) is R2 , whose two components should be interpreted as the probability that an object is present on the left (or right) and in the overlap. The stalk over the three sets {V1 , V2 }, {V1 }, and {V2 } are all given by R, which we interpret as the probability that there is an object in the middle of the scene. Then the ˇ Cech cellular subsheaf Pˇ associated to U is given by the diagram R2
pr2
/Ro
pr2
R2
ˇ = 3 and dim H 1 (P) ˇ = 0. The cohomology of Pˇ is easily seen to have dim H 0 (P) The sheaf Pˇ is sensitive to the fact that the overlap is in two pieces. Again considering the refined cover V from the previous example, R {V1 } o
id
/Ro
pr2
R2
pr2
/ {V1 , V2 } o
{V1 } ∩ {V1 , V2 }
/Ro
{V2 } ∩ {V1 , V2 }
id
R / {V2 }
where a significant change occured: both restriction maps out of the center of the diagram are now the same map. ˇ → C 1 (V; P) ˇ to no longer be surThis change causes the map d1 : C 0 (V; P) 1 ˇ jective. Thus dim H (V; P) = 1, which means that the hypotheses of Theorem ˇ 26 are no longer satisfied. This means that we constructed the Cech cellular subsheaf over a cover that was too coarse, so we ought to refine the cover. Specifically, if we consider the refinement U 0 = {UL , UR , {V1 }, {V2 }} of U. The diagram of Pˇ 0 over this new cover is RO o
id
R
id
pr2
/R O pr2
R2
R2
pr2
Ro
pr2
id
R
id
/R
which has dim H 1 (Pˇ 0 ) = 1. It can be checked that this cover satisfies the conditions of Theorem 26. The previous two examples strongly advocate for performing fusion as close to the sensors as possible, to avoid causing unexpected correlations across sensing domains due to processing artifacts. This view is shared by others, for instance [47], though in some cases the artifacts can be anticipated. For instance, 25
[6] shows how to use spectral sequences to compensate for when the hypotheses of Theorem 26 are known to fail. As a special case for graphs, [53][52, Sec 4.7] contains an algorithmic approach that is somewhat simpler.
5
Conclusions
This article has demonstrated that sheaves are an effective theoretical tool for understanding information integration. There are several methematical frontiers that are opened by repurposing sheaf theory for information integration. For instance, although we have focused on global sections and obstructions to them, the combinatorics of local sections is essentially not studied. The extension of local sections to global sections arises in many important problems. In homogeneous data situations, such as the reconstruction of signals from samples [51] and the solution of differential equations [18], sheaves are already useful. The generality afforded by sheaves indicates that problems of this sort can also be addressed when the data types are not homogeneous. Persistent homology has gotten considerable recent attention. Persistent cohomology [15] has shown promise in signal processing and is a sheaf [52]. This article has shown that sheaves can be used to model sensor data, in a bottom-up, data-driven way. It therefore stands to reason that dually, cosheaves represents world-centric phenomenological models. Recent results [12] in sheafcosheaf duality may provide the necessary tools. Finally, derived categories play an essential role in traditional sheaf theory. There is a cellular version, explicated by Shepard [56] that likely can play a similar role in applications. In these more intricate settings, spectral sequences [6] are particularly promising as a computational tool. In order to facilitate engineering applications for sheaves, there is also considerable work to be done. Computing cohomology can require significant computing resources. In homology calculations, the notion of collapses and reductions are essential for making computation practical. A similar methodology – based on discrete Morse theory [13] – exists for computing sheaf cohomology. In either case (reduced or not), efficient finite field linear algebra could be an enabler. Finally, practical categorification [48] is still very much an art, and has a large impact on practical performance of sheaf-based information integration systems that rely on cohomology.
Acknowledgement This work was partially supported under DARPA SIMPLEX N66001-15-C-4040.
References [1] Luciano Alparone, Bruno Aiazzi, Stefano Baronti, Andrea Garzelli, Filippo Nencini, and Massimo Selva. Multispectral and panchromatic data 26
fusion assessment without reference. Photogrammetric Engineering & Remote Sensing, 74(2):193–200, 2008. [2] Samar M Alqhtani, Suhuai Luo, and Brian Regan. Multimedia data fusion for event detection in twitter by using dempster-shafer evidence theory. World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering, 9(12):2234–2238. [3] K. Baclawski. Whitney numbers of geometric lattices. Adv. in Math., 16:125–138, 1975. [4] K. Baclawski. Galois connections and the Leray spectral sequence. Adv. Math, 25:191–215, 1977. [5] David Balduzzi. On the information-theoretic structure of distributed measurements. EPTCS, 88:28–42, 2012. [6] Saugata Basu. Computing the Betti numbers of arrangements via spectral sequences. Journal of Computer and System Sciences, 67(2):244–262, 2003. [7] Salem Benferhat and Claudio Sossai. Reasoning with multiple-source information in a possibilistic logic framework. Information Fusion, 7(1):80–96, 2006. [8] Salem Benferhat and Faiza Titouna. Fusion and normalization of quantitative possibilistic networks. Applied Intelligence, 31(2):135–160, 2009. [9] Glen Bredon. Sheaf theory. Springer, 1997. [10] Jinwei Chen, Huajun Feng, Kecheng Pan, Zhihai Xu, and Qi Li. An optimization method for registration and mosaicking of remote sensing images. Optik - International Journal for Light and Electron Optics, 125(2):697 – 703, 2014. [11] James L Crowley. Principles and techniques for sensor data fusion. In Multisensor Fusion for Computer Vision, pages 15–36. Springer, 1993. [12] J. Curry. Sheaves, cosheaves and applications, arxiv:1303.3255. 2013. [13] J. Curry, R. Ghrist, and V. Nanda. Discrete morse theory for computing cellular sheaf cohomology, arxiv:1312.6454. 2015. [14] Suma Dawn, Vikas Saxena, and Bhudev Sharma. Remote sensing image registration techniques: A survey. In Abderrahim Elmoataz, Olivier Lezoray, Fathallah Nouboud, Driss Mammass, and Jean Meunier, editors, Image and Signal Processing: 4th International Conference, ICISP 2010, TroisRivi`eres, QC, Canada, June 30-July 2, 2010. Proceedings, pages 103–112. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.
27
[15] V. de Silva, D. Morozov, and M. Vejdemo-Johansson. Persistent cohomology and circular coordinates. Discrete & Computational Geometry, 45(4):737–759, 2011. [16] M. H. DeGroot. Optimal statistical decisions. Wiley, 2004. [17] C.H. Dowker. Homology groups of relations. Annals of Mathematics, pages 84–95, 1952. [18] Leon Ehrenpreis. Sheaves and differential equations. Proceedings of the American Mathematical Society, 7(6):1131–1138, 1956. [19] J´erˆ ome Euzenat, Alfio Ferrara, P Hollink, A Isaac, Cliff Joslyn, V´eronique Malais´e, Christian Meilicke, Andriy Nikolov, Juan Pane, Marta Sabou, Francois Scharffe, Pavel Shvaiko, Vassilis Spiliopoulos, Heiner Stuckenschmidt, Ondrej Iv´ ab-Zamazal, Vojtech Sv´atek, C´assia Trojahn dos Santos, George Vouros, and Shenghui Wang. Results of the ontology alignment evaluation initiative 2009. In Proc. 4th Int. Wshop. on Ontology Matching (OM-2009), CEUR, volume 551, 2009. [20] J´erˆ ome Euzenat and P Shvaiko. Hiedelberg, 2007.
Ontology Matching.
Springer-Verlag,
[21] R. Godement. Topologie algebrique et th´eorie des faisceaux. Herman, Paris, 1958. [22] Joseph A Goguen. Sheaf semantics for concurrent interacting objects. Mathematical Structures in Computer Science, 2(02):159–191, 1992. [23] Robert Goldblatt. Topoi, the Categorial Analysis of Logic. Dover, 2006. [24] Michelle L Gregory, Liam McGrath, Eric Belanga Bell, Kelly O’Hara, and Kelly Domico. Domain independent knowledge base population from structured and unstructured data sources. In FLAIRS Conference, 2011. [25] Z. Guo, G. Sun, K. Ranson, W. Ni, and W. Qin. The potential of combined lidar and sar data in retrieving forest parameters using model analysis. In Geoscience and Remote Sensing Symposium, 2008. IGARSS 2008. IEEE International, volume 5, pages V – 542–V – 545, July 2008. [26] David L Hall, Michael McNeese, James Llinas, and Tracy Mullen. A framework for dynamic hard/soft fusion. In Information Fusion, 2008 11th International Conference on, pages 1–8. IEEE, 2008. [27] David Lee Hall and Sonya AH McMullen. Mathematical techniques in multisensor data fusion. Artech House, 2004. [28] Joachim Hammer, Hector Garcia-Molina, Junghoo Cho, Rohan Aranha, and Arturo Crespo. Extracting semistructured information from the web. 1997.
28
[29] John H. Hubbard. Teichm¨ uller Theory, volume 1. Matrix Editions, 2006. [30] B. Iverson. Cohomology of Sheaves. Aarhus universitet, Matematisk institut, 1984. [31] G Jakobson, L Lewis, and J Buford. An approach to integrated cognitive fusion. In Proceedings of the 7th ISIF International Conference on Information Fusion (FUSION2004), pages 1210–1217, 2004. [32] C. Joslyn, E. Hogan, and M. Robinson. Towards a topological framework for integrating semantic information sources. In Semantic Technologies for Intelligence, Defense, and Security (STIDS), 2014. [33] Cliff Joslyn, Patrick Paulson, and Amanda White. Measuring the structural preservation of semantic hierarchy alignments. In Proc. 4th Int. Wshop. on Ontology Matching (OM-2009), CEUR, volume 551, 2009. [34] M. Kashiwara and P. Schapira. Sheaves on manifolds. Springer, 1990. [35] Bahador Khaleghi, Alaa Khamis, Fakhreddine O Karray, and Saiedeh N Razavi. Multisensor data fusion: A review of the state-of-the-art. Information Fusion, 14(1):28–44, 2013. [36] Mikhail Khovanov. A categorification of the Jones polynomial. Duke Math. J., 101(3):359–426, 02 2000. [37] Mikhail Khovanov and Radmila Sazdanovic. Categorification of the polynomial ring. Fundamenta Mathematicae, 230:251–280, 2015. [38] Benjamin Koetz, Guoqing Sun, Felix Morsdorf, K.J. Ranson, Mathias Kneubhler, Klaus Itten, and Britta Allgwer. Fusion of imaging spectrometer and {LIDAR} data over combined radiative transfer models for forest canopy characterization. Remote Sensing of Environment, 106(4):449 – 459, 2007. [39] M. Kokar, J. Tomasik, and J. Weyman. Formalizing classes of information fusion systems. Information Fusion, 5:189–202, 2004. [40] Mieczyslaw M Kokar, Kenneth Baclawski, and Hongge Gao. Category theory-based synthesis of a higher-level fusion algorithm: An example. In Information Fusion, 2006 9th International Conference on, pages 1–8. IEEE, 2006. [41] Nicholas Kushmerick. Wrapper induction: Efficiency and expressiveness. Artificial Intelligence, 118(1):15–68, 2000. [42] J. Leray. L’anneau spectral et l’anneau filtr´e d’homologie d’un espace localement compact et d’une application continue. Journal des Math´ematiques Pures et Appliqu´ees, 29:1–139, 1950.
29
[43] J. Lilius. Sheaf semantics for Petri nets. Technical report, Helsinki University of Technology, Digital Systems Laboratory, 1993. [44] Eric G Little and Galina L Rogova. Designing ontologies for higher level fusion. Information Fusion, 10(1):70–82, 2009. [45] J.P. May. Finite topological spaces: Notes for REU, 2003. [46] Kevin P Murphy, Yair Weiss, and Michael I Jordan. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, pages 467–475. Morgan Kaufmann Publishers Inc., 1999. [47] Andrew J. Newman and Glenn E. Mitzel. Upstream data fusion: history, technical overview, and applications to critical challenges. Johns Hopkins APL Technical Digest, 31(3):215–233, 2013. [48] Emilie Purvine, Michael Robinson, and Cliff Joslyn. Categorification in the real world. In Joint Mathematics Meetings, Seattle, WA, January 2016. [49] Ellen Riloff, Rosie Jones, et al. Learning dictionaries for information extraction by multi-level bootstrapping. In AAAI/IAAI, pages 474–479, 1999. [50] M. Robinson. Asynchronous logic circuits and sheaf obstructions. Electronic Notes in Theoretical Computer Science, pages 159–177, 2012. [51] M. Robinson. The Nyquist theorem for cellular sheaves. In Sampling Theory and Applications, Bremen, Germany, 2013. [52] M. Robinson. Topological Signal Processing. Springer, 2014. [53] M. Robinson. Imaging geometric graphs using internal measurements. J. Diff. Eqns., 260:872–896, 2016. [54] Michael Robinson. Pseudosections of sheaves with consistency structures. Technical Report 2015-2, AU-CAS-MathStats, 2015. [55] Sunita Sarawagi. Information extraction. databases, 1(3):261–377, 2008.
Foundations and trends in
[56] Allen Shepard. A cellular description of the derived category of a stratified space. PhD thesis, Brown University, 1985. [57] Duncan Smith and Sameer Singh. Approaches to multisensor data fusion in target tracking: A survey. Knowledge and Data Engineering, IEEE Transactions on, 18(12):1696–1710, 2006. [58] D. I. Spivak. Category theory for the sciences. MIT Press, 2014. [59] R. E. Stong. Finite topological spaces. 123(2):325–340, June 1966.
30
Trans. Amer. Math. Soc.,
[60] Pramod K Varshney. Multisensor data fusion. Electronics & Communication Engineering Journal, 9(6):245–253, 1997. [61] L. Wald. Some terms of reference in data fusion. IEEE Trans. Geosciences and Remote Sensing, 37(3):1190–1193, 1999. [62] Lucien Wald. Data fusion: definitions and architectures: fusion of images of different spatial resolutions. Presses des MINES, 2002. [63] Edward Waltz, James Llinas, et al. Multisensor data fusion, volume 685. Artech house Boston, 1990. [64] Jixian Zhang. Multi-source remote sensing data fusion: status and trends. International Journal of Image and Data Fusion, 1(1):5–24, 2010.
Appendix The purpose of a set of information sources is to describe entities in the universe under discussion, by ascribing attributes to them in some fashion. Scientific study emphasizes that relationships between the attributes assigned to entities is central, because unrelated observations about separate phenomena need to be understood very differently from repeated observations of the same phenomenon. According to Axiom 1, we begin by representing the collection of entities in the universe by a set so each entity is an element of this set. With this level of structure – no attributes have yet been ascribed to entities – there is nothing to distinguish entities from one another. By enriching a set model of the universe to describe entity relationships (against which attributes of different entities can then be related), one obtains a topological space as posited by Axiom 3. Given a topological space model of entities, the space of attributes that can be assigned to this set forms a sheaf. Suppose that the entities in the universe under discussion lie in a set X as posited by Axiom 1. A topology on X specifies which groups elements have attributes worthy of comparison. Intuitively a topology expresses that if two entities are unrelated, they can be described in isolation of each other. If they’re closely related, they will be “hard to disentangle” and will usually be described together. Definition 29. A topology on a set X is a collection U of subsets of X satisfying the following properties: 1. X ∈ U. 2. ∅ ∈ U. 3. The union of any collection of elements of U is also in U. 4. The intersection of any finite collection of elements of U is also in U.
31
Usually, the elements of U are called open sets, and the pair (X, U) is called a topological space. In the context of information sources, a U ∈ U is called a sensing domain. The first axiom for a topology states that it is possible to ascribe attributes to all entities (it may be impractical, though). The second axiom states that it is possible to not describe any entities. The third and fourth axioms are more subtle. They state that two entities can be related through their relationships with other entities. The finiteness constraint in the fourth axiom avoids examining an infinite regress through increasingly entity-specific relationships.7 In a dual way, according to Axiom 2, the attributes themselves lie in one of a collection of sets, and related groups of attributes could then also be described as a topological space. Those attributes that lie in disjoint open sets are essentially independent from one another. Given this, a function is clearly the right abstraction to assign attributes to a set of entities: each entity is paired with its attributes. However, if the entities and the attributes both lie in topological spaces, a typical function will not preserve relatedness of entities and their attributes. Those functions that are distinguished by their preservation of open sets are called continuous. While continuous functions work well when all possible attributes to be assigned lie in a single set, they are not descriptive when there are many different sets of attributes as Axiom 2 posits. Given the available information sources, different topological spaces of attributes may be in use for different information sources. One might initially think about defining partial functions on the topological entities, but this doesn’t address the problem of attributes being in different spaces. The best one can hope for is to model information sources as assigning different functions (with different sets of attributes) to open sets of entities. Axiom 4 posits that the way out of this problem is to abstract slightly, away from the particular attributes assigned by an information source to the set of attributes it may assign. One way to interpret this is that instead of describing individual functions from open sets of entities, we should instead assign a function space to each open set. But Axiom 4 allows further generalization, in that we don’t have to describe specific entities explicitly. Information sources merely need to describe the collection of entities in an open set in some fashion without committing a particular piece of a measurement to a particular entity. A single observation from an information source assigns attributes to its sensing domain (an open set) as a whole. This assignment is denoted by the function S : T → A, which assigns a set of attributes S(A) ∈ A to each open set T in the topology. It is only by further analysis – and comparison with other observations – that individual elements can be discriminated. That this is possible is posited by Axioms 5-7, and is formalized by constructing restriction functions S(U ⊆ V ) : S(V ) → S(U ) whenever U ⊆ V are open sets in T . Without constraints on the 7 In some topologies, the intersection of any collection of elements remains in the topology, but requiring this as an axiom would destroy our ability to describe certain phenomena.
32
nature of these restrictions, it is in general impossible to combine observations from overlapping sensing domains. Axiom 6 supplies the appropriate constraint, by requiring there to be an assignment of attributes over a union of sensing domains whenever there are observations that agree on intersections of sensing domains. Axiom 7 asserts that this “globalized” observation (a section of S) is unique – a statement of the underlying determinism of the representation. The framework posited by Axioms 1-7 is sufficiently general to address information integration problems, but can be rather difficult to analyze on two fronts (1) computation of the set of sections S(X) is difficult (usually requiring exhaustive searches) and (2) we don’t usually have access to all possible sensing domains. Axiom 8 asserts that we should assume sets of attributes contain additional algebraic structure, which addresses issue (1). Given that framework, it is possible to address (2) rather completely, which we postpone until the next section. It is sufficient to require attributes to lie in a vector space and that restrictions preserve this structure, so that observations can be compared by subtraction. This algebraic structure can be supplied without loss of information by the process of category representation. In general, there are many possible representations of a given attribute set, and choosing among them remains more art than science at present. (See [50, 36] for several different examples.) As a simple example, merely consider that a set A of attributes may instead be thought of as the abelian group Z[A] that consists of the elements of A as independent generators with integer coefficients.8 The group Z[A] contains the set A along with other “linear combinations” of elements of A. These “linear combination” elements provide a model of uncertainty within elements of A.
8 One can think of elements of this group as polynomials with integer coefficients with A serving as the set of variables. The group operation is component-wise addition.
33