Spatio-Temporal Event Stream Processing in Multimedia Communication Systems Mingyan Gao
Xiaoyan Yang
University of California, Irvine
National University of Singapore
[email protected] Ramesh Jain University of California, Irvine
[email protected] ABSTRACT Emerging multimedia communication environments, such as Environment-to-Environment (E2E) systems, require detecting complex events in environments using multimodal sensory data. Based on these spatio-temporal events, systems select and send data from appropriate sensors. Most existing stream processing systems consider temporal streams of alpha-numeric data and provide efficient approach to deal with queries in these environments. In cases where events are detected in different sensory data types, including audio and video collected at different locations, new approaches need to be developed to represent, combine, and process events to answer queries. In this paper, we present our approach in managing event stream processing to address the needs of a real time E2E system being developed in our laboratory. We introduce the modeling of our problem, and describe in detail the filtering and matching algorithms for querying spatio-temporal event stream. Experimental results demonstrate the efficacy and efficiency of our approach.
1.
INTRODUCTION
Stream processing has been gaining attention in the database community in recent years [8, 5, 2, 12, 19]. Clearly, there are many applications of streams, ranging from the traditional network monitoring system and stock markets to emerging applications such as the RFID tags tracking system. Among these applications, processing of event stream has attracted particular interests lately. In these applications, event stream is usually modeled as a sequence of events [2, 12], each of which is a tuple consisting of a primary timestamp and several attributes. One important task in these applications is to understand the current situation of the system, based on which proper decision can be automatically
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00.
[email protected] Beng Chin Ooi National University of Singapore
[email protected] made. To realize such goal, it is important to develop query processing techniques that facilitate the definition and detection of events in the streams. Efforts have been made to define complex event processing (CEP) in information systems [18]. Due to the time sequence nature of the streams, events are mainly detected based on changes in attributes with respect to time. In many emerging applications, however, data streams may come from different live sensors, such as video cameras and microphones. Such data streams pose a greater challenge due to the more complex semantics of events in sensory data. Moreover, many applications may require events to be defined based on spatially distributed sensors where the semantics of events depends on both temporal and spatial relationships between events. In these applications, sensory data stream is first processed for the appropriate features to be detected, and then transformed to stream of atomic events. Afterwards the spatio-temporal atomic event streams are examined for the detection of more complex events, which would aid the intelligence on situation level. Environment-to-Environment(E2E) is such an intelligent multimedia communication system. We give a brief introduction to E2E system before proposing our problem. Environment-to-Environment (E2E) connection has been developed as a new form of communication that allows users to connect their natural physical environments for communications [21, 1, 20]. The goal is to design an architecture that pushes sensors and other devices into a supporting role in the background and to focus on natural humanhuman-interaction. Thus, users need not worry about staying within proximity, field of view, audible distance and so on of a sensor or an output device (e.g., screen, speaker, etc.) but rather just interact in their natural settings and let the system find the most appropriate input and output devices to support communication. To realize E2E connection, many heterogeneous sensors analyze data to detect and monitor objects and activities. The system analyzes this sensor information to detect events in the physical environment, and then assimilates, stores, and indexes them in a dynamic real-time EventBase. The sensor information and EventBase for each environment are shared by an Event Server over the Internet to create a Joint Situation Model which represents a combined environment. Such a web-based architecture has multiple far-
Env. Model
Control Requests
Security Settings Situation Model
+
Data
Data Acq. & Analysis
MM DB
Distribution/ Networking
Event Server
time, the five sub-events happened in sequential order. In the rest of the paper, we call a sub-event in the previous example an atomic event, which contains information about time, location, person and event type. Events that are constructed from atomic events are called composite events. The problem now becomes that given an atomic event stream with spatio-temporal information, how can the system detect pre-defined composite events efficiently.
Data
1.1 Challenges and Contributions
MM DB Presentation
Control Requests
Figure 1: E2E System
reaching consequences in terms of the use of web-based technologies such as ‘ontologies’ for handling experiential data, the adoption and scalability of the approach as well as supporting ‘serendipitous interoperability’ across environments. Thus, a person in one environment can interact with objects and observe activities from other environments by interacting with the appropriate ES in a natural setting. Figure 1[20] shows the architecture for an E2E node to build a flexible communication paradigm. The ‘Data acquisition and analysis’ (DAA) component converts data from each sensor into data events or features. The translation of sensor based data events into application events uses a physical model of the sensors and the environment. This is handled via the Environment Model (EM) which creates indices between the various sensors, their physical location, and overall physical environment. Thus, if a camera and a microphone detect the sub-events of a ‘person present’ and ‘person talking’, the environment model is useful in deciding which location these sub-events originate from and hence whether they refer to the same person. The actual semantic understanding of the event requires additional contextual information to be added by the specific Situation Model (SM). The SM represents all the domain-dependent information required to support application functionality. As can be seen, in this application, heterogeneous data streams are collected at different locations and these streams are converted to atomic data event streams, also called feature streams, which should then be combined into higher level application event streams. Many emerging applications in telepresence [10], surveillance and monitoring [22], as well as ethnographic studies [16] have similar characteristics. Let us consider one example to illustrate the problems involved in our event processing system. Suppose that we are dealing with a sensor rich environment where there are cameras,microphones, RFID detectors, and motion detectors at many different locations to cover all areas of a multi-storey hospital building. The system should detect events such as ‘Dr. Miller went from Radiation Therapy area to meet a patient Mr. Jones in Oncology ward ’. This event can be considered to be composed of sub-events ‘Dr. Miller leaving Radiation Therapy in the basement’, ‘Dr. Miller catching an elevator ’, ‘Dr. Miller leaving elevator on the fifth floor ’, ‘Dr. Miller walking to room 518 ’ and ‘Dr. Miller greeting Mr. Jones’. The five sub-events are depicted in Figure 2, each of which consists of specific spatio-temporal information. For example, sub-event AE2 is captured in the elevator stopping at the basement. Considering the event occurrence
To solve the problem, the following challenges should be carefully considered. First, events in our problem carry spatial information, which requires efficient processing. Assume each atomic event is associated with a spatial point. If we are interested in a sequence of atomic events at different places, the corresponding composite event will consist of a sequence of points. Given a set of incoming atomic events, how to efficiently match them to a sequence of points is a challenge. Second, real-life composite events usually involve many concurrently as well as sequentially happened atomic events, which makes the processing of complex temporal relationships between atomic events another challenge. In this paper, we propose an event processing strategy to address the above challenges. To the best of our knowledge, there is no prior work on processing spatio-temporal data in event streams. Our main contributions are: 1. We present a systematic study on real life event detection in multimedia communication systems, especially for composite events consisting of spatio-temporal atomic events. We provide precise semantics for this new class of queries over event streams. Particularly, we propose a graph-based model to represent composite events. 2. We implement an Event Processor for efficient detection of these types of composite events. It is composed of two components: a) Spatial filter: we propose a spatial signature based on the bloom filter to represent the spatial information of a composite event. Candidate events are first pruned based on the comparison of signatures. b) Graph matching: we propose a graph-based matching algorithm to efficiently evaluate complex temporal relationships over atomic events. 3. We have conducted a comprehensive experimental study, the results of which demonstrate the effectiveness of our signature-based pruning method and the efficiency of our graph-based matching algorithm. Rest of the paper is organized as follows. We introduce the problem formulation in Section 2, and discuss the related work in Section 3. In Section 4, we describe the general system architecture and the spatial filter component. Section 5 presents our graph model and matching algorithms. Experiments and results are demonstrated in Section 6. The paper is concluded in Section 7.
2. PROBLEM FORMULATION We first introduce our data model, based on which we formulate our problem.
2.1 Atomic Event and Atomic Event Stream Atomic event is the finest data and semantic unit in our system. It indicates the occurrence of a real life event, which captures one type of happening on one person at one spatialtemporal point. We represent an atomic event using a tuple, as defined below:
Atomic Events of Dr. Miller
t
t AE1
ID
Event Description
AE1
Leaving Radiation Therapy in the basement
AE2
Catching an elevator
AE3
Leaving elevator on the fifth floor
AE4
Walking to room 518
AE5
Greeting Mr. Jones
AE2
AE3
AE4
AE5
AE1
AE2
AE3
AE4
AE5
Radiation Therapy
5th Floor
Basement
(a)
518
(b) Figure 2: Examples of Atomic Events
Definition 1. An atomic event is a tuple e (P id, Ts , Te , LocRec(x1 , y1 , x2 , y2 ), Et ), where P id is the person ID, Ts and Te are the start and end time respectively, LocRec is the location where the event happened, and Et is the event type. Different from [2, 12, 7, 19], an atomic event in our system needs not to be instantaneous, and can last for a period of time. This is because each atomic event corresponds to a real life event/activity, the span of which could be an interval rather than a single time point. Instantaneous events are represented as Ts = Te . Each time point discussed in this paper is an integer in N; while the spatial geometry considered is a 2D plane. (x1 , y1 ) and (x2 , y2 ) in LocRec define the coordinates of the bottom-left and top-right corner of a rectangular respectively. When the location of an event is a spatial point, we have x1 = x2 and y1 = y2 . Event type Et could be ‘walking’, ‘talking’, ‘entering room’ and etc. The complete set of Et that can be recognized by the system is defined by the DAA component (Section 1.1). The input to our system is a stream of atomic events. Definition 2. An atomic event stream is defined as: AES = e1 , e2 , ..., ei , ..., where each ei is an atomic event, and atomic events are ordered by their end time Te . Atomic events may have same start and end time (Ts and Te ). In our model we assume atomic events arrive in order of end time, and no out-of-order arrival is considered.
2.2 Temporal Relationships and Patterns 2.2.1 Temporal Relationships Base temporal relationships between intervals were advanced in Allen’s Interval Algebra [4], as shown in Table 1. Symmetric relations are omitted due to space limitations. We adopt this proposal to describe the relationships between events.
2.2.2 Basic Temporal Patterns We now introduce five basic temporal patterns SEQ, EQ, CON J, DISJ, N EG, which are designed to express the temporal relationships of atomic events. Each temporal pattern is specified by a temporal requirement, which defines the occurrence and order of atomic events in this pattern. Also, every temporal pattern is associated with one induced time bound (IT B = [Ts , Te ]) indicating the time boundary of this
Base relationships e1 takes place before e2 e1 meets e2 e1 overlaps with e2 e1 starts e2 e1 during e2 e1 finishes e2 e1 is equal to e2
Equivalent to SEQ(e1 .Te , e2 .Ts ) EQ(e1 .Te , e2 .Ts ) SEQ(e1 .Ts , e2 .Ts , e1 .Te , e2 .Te ) SEQ(EQ(e1 .Ts , e2 .Ts ), e1 .Te , e2 .Te ) SEQ(e2 .Ts , e1 .Ts , e1 .Te , e2 .Te ) SEQ(e2 .Ts , e1 .Ts , EQ(e1 .Te , e2 .Te )) SEQ(EQ(e1 .Ts , e2 .Ts ), EQ(e1 .Te , e2 .Te ))
Table 1: Temporal Relationships and corresponding Temporal Patterns pattern, which is computed from the time points of atomic events in the pattern. First, SEQ and EQ are defined to express sequential and concurrent relationship respectively. In the following definition, i ∈ [1, n], e is an atomic event, ti ∈ {s, e}. • Pattern P1 = SEQ(e1 .Tt1 , ..., ei .Tti , ..., en .Ttn ), requires that e1 .Tt1 < ... < ei .Tti < ... < en .Ttn . The induced time bound IT BP1 is [e1 .Tt1 , en .Ttn ]. Besides time points ei .Tti , atomic events ei can always be directly included in patterns. The interpretation of this case will be introduced in next subsection. • Pattern P2 = EQ(e1 .Tt1 , ..., ei .Tti , ..., en .Ttn ), requires that e1 .Tt1 = ... = ei .Tti = ... = en .Ttn . IT BP2 = [e1 .Tt1 , e1 .Tt1 ]. Besides these two patterns, we also support three general ones: conjunction (CON J), disjunction(DISJ) and negation(N EG). However, in most of the existing event stream processors [12, 2], when considering temporal relationships of events, only sequential patterns (SEQ) are studied, while CON J and DISJ are ignored. • Pattern P3 = CON J(e1 , ..., ei , ..., en ), requires that each ei occurs, but no time order is required among these events. IT BP3 = [min(ei .Ts ), max(ei .Te )]. • Pattern P4 = DISJ(e1 , ..., ei , ..., en ), requires that nonempty subset of these events occurs, but no time order requirement among these atomic events. IT BP4 = [min(ej .Ts ), max(ej .Te )], j ∈ [1, n] and ej occurs. • Pattern P5 = N EG(e) (e is an atomic event), requires that e does not occur. Given the infinity of time, it
can have the temporal pattern SEQ(e1 .Ts , e2 .Ts ) and predicate e2 .Ts − e1 .Ts < 2sec. At last, a window predicate is defined, which specifies the time limit that one composite event could maximally last, e.g. in 10 mins. The specification of a composite event essentially forms a standing query Q, pre-registered in the system. When atomic event stream AES flow by, queries in the system will 2.2.3 Nested Temporal Patterns be answered on the fly. In order to make it easier for users to specify composite events, we use a SQL-like declarative Up till now, only atomic events are considered in the basic language introduced in [2, 19] to express queries. temporal patterns. However, they could be easily extended An example of composite event and corresponding query to support nested temporal patterns by treating (basic) patis given as follows. terns as atomic events and their corresponding IT B as their Example 1. Here is a scenario when Tom and John at time points. the office in US start the weekly meeting with Mohan in • Pattern P 1 = SEQ(e1 .Tt1 , ..., P, ..., en .Ttn ) (i ∈ [1, n], ei Singapore. Tom is in his office room 2059 equipped with 8 is an atomic event, ti ∈ {s, e}), requires that e1 .Tt1 < cameras and 4 microphones (and many other devices as part ... < P.Ts < P.Te < ... < en .Ttn . IT BP 1 = [e1 .Tt1 , en .Ttn ]. of E2E). The system determines that he is working on his Atomic event ei in pattern by default is an desk (e1 ). At this time, John enters the room (e2 ). Then non-instantaneous atomic event, and can be expanded John talks to Tom about calling Mohan in Singapore for as (ei .Ts , ei .Te ), which is a special case of nested patthe meeting (e3 ). Tom connects with Mohan for discussions tern P . If ei is instantaneous, it should be written as (e4 ). For further discussions Tom and John will go to the EQ(ei ). Lab in the CalIT2 building. Connectivity should be maintained so that the discussion can continue from the Lab. To • Pattern P 2 = EQ(e1 .Tt1 , ..., P, ..., en .Ttn ) (i ∈ [1, n], ei satisfy such requirement, our system should be able to recogis an atomic event, ti ∈ {s, e}), requires that e1 .Tt1 = nize the occurrence of this composite event. There are four ... = P.Ts = P.Te = ... = en .Ttn . IT BP 2 = [e1 .Tt1 , e1 .Tt1 ]. atomic events (e , 1 ≤ i ≤ 4 as indicated above) constituting i Similarly, EQ(ei ) can be expanded as EQ(ei .Ts , ei .Te ). the composite event. Specification of the composite event declaratively in SQL-like query language is the following: • Pattern P 3 = CON J(e1 , ..., P, ..., en ) (i ∈ [1, n], ei is SELECT SEQ(CONJ(e1 , e2 ), e3 , e4 ) an atomic event), requires that pattern P and all other WHERE e1 .P id =′ T om′ AND e1 .Et =′ working ′ atomic events occur, but no time order between them. AND e2 .P id =′ John′ AND e2 .Et =′ entering room′ IT BP 3 = [min(ei .Ts , P.Ts ), max(ei.Te , P.Te )]. is infeasible to negate an event without proper time constraints. Therefore in this paper, we consider a negation when it is bounded by two events. Pattern P5 = N EG(e, ep .Ttp , eq .Ttq ) (ep , eq are atomic events, tp , tq ∈ {s, e}), requires no occurrence of e between ep .Ttp and eq .Ttq . IT BP5 = [ep .Ttp , eq .Ttq ].
• Pattern P 4 = DISJ(e1 , ..., SEQP (Ts , Te ), ..., en ) (i ∈ [1, n], ei is an atomic event, ti ∈ {s, e}), requires that a subset of atomic events and pattern P occur, but no time order between them. If pattern P occurs, IT BP 4 = [min(ej .Ts , P.Ts ), max(ej .Te , P.Te )] (j ∈ [1, n], ej occurs); otherwise IT BP 4 = [min(ej .Ts ), max(ej .Te )] (j ∈ [1, n], ej occurs). • Negation has two types of nesting patterns. 1) Pattern P 51 = N EG(Pn1 , Pn2 ), requires that no pattern Pn1 occurs between Pn2 .Ts and Pn2 .Te . IT BP 51 = [Pn2 .Ts , Pn2 .Te ]. 2) Pattern P 52 = N EG(Pn1 , Pn2 , Pn3 ), requires that no pattern Pn1 occurs between Pn2 .Te and Pn3 .Ts . IT BP 52 = [Pn2 .Ts , Pn3 .Te ]. For example, Allen’s temporal relationships and their corresponding temporal patterns are listed in Table 1.
2.3 Composite Events and Queries Composite event is semantically more meaningful event. It is defined over a set of constituting events, which can be atomic events or composite events. Users describe their interests through the specification of composite event, which includes the temporal patterns of constituting events as well as predicates on other attributes of atomic events, i.e. P ID, LocRec and Et . Predicates of P id and Et are equality comparison on values, e.g. P id = 123, Et =′ walking ′ , while predicates of LocRec can be comparison on either spatial points or area ranges, e.g. 4