Available online at www.sciencedirect.com Available online at www.sciencedirect.com
Available online at www.sciencedirect.com
ScienceDirect Procedia Computer Science (2016)640–647 000–000 Procedia Computer Science 109C00 Procedia Computer Science 00(2017) (2016) 000–000
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
The The 8th 8th International International Conference Conference on on Ambient Ambient Systems, Systems, Networks Networks and and Technologies Technologies (ANT 2017) (ANT 2017)
On On Continuous Continuous Queries Queries in in Stream Stream Processing Processing K. Vidyasankar K. Vidyasankar Department of Computer Science, Memorial University, St. John’s, Newfoundland, Canada A1B 3X5 Department of Computer Science, Memorial University, St. John’s, Newfoundland, Canada A1B 3X5
Abstract Abstract Stream processing is about processing continuous streams of data by programs in a workflow. Continuous execution is discretized Stream processing is about processing continuous streams of data by programs in a workflow. Continuous execution is discretized by grouping input stream tuples into batches and using one batch at a time for the execution of programs. The programs may by grouping input stream tuples into batches and using one batch at a time for the execution of programs. The programs may generate stream data which may be input to subsequent programs in the workflow. They may also read as well as write some data generate stream data which may be input to subsequent programs in the workflow. They may also read as well as write some data in persistent store. Continuous queries are processed in the workflow. A continuous query (CQ) consists of a sequence of one time in persistent store. Continuous queries are processed in the workflow. A continuous query (CQ) consists of a sequence of one time queries. There is a general agreement that each CQ normally spans over several batches of stream inputs. Apart from this, different queries. There is a general agreement that each CQ normally spans over several batches of stream inputs. Apart from this, different notions of CQs exist in the literature, their one time queries ranging from just transactions (that is, single executions of programs) notions of CQs exist in the literature, their one time queries ranging from just transactions (that is, single executions of programs) to composite transactions. In this paper, we look at how CQs can be defined generically and propose a correctness criterion for to composite transactions. In this paper, we look at how CQs can be defined generically and propose a correctness criterion for concurrent executions of CQs. concurrent executions of CQs. c 2016 The Authors. Published by Elsevier B.V. c 2016 The Authors. Elsevierby B.V. 1877-0509 2017responsibility ThePublished Authors. by Published Elsevier B.V. Chairs. Peer-review©under of the Conference Program Peer-review under responsibility of the Conference Program Chairs. Keywords: Stream processing; composite transactions; continuous queries; consecutive serializability; saga Keywords: Stream processing; composite transactions; continuous queries; consecutive serializability; saga
1. Introduction 1. Introduction Stream processing is about processing continuous streams of data. Stream data arriving from external sources are Stream processing is about processing continuous streams of data. Stream data arriving from external sources are processed by programs in a workflow. Continuous execution is discretized by grouping (input) stream tuples into processed by programs in a workflow. Continuous execution is discretized by grouping (input) stream tuples into batches and using one batch at a time for the execution of programs. The programs may generate stream data which batches and using one batch at a time for the execution of programs. The programs may generate stream data which may be input to subsequent programs in the workflow. They may also read as well as write some data in persistent may be input to subsequent programs in the workflow. They may also read as well as write some data in persistent store, referred to simply as database in this paper. As source input batches arrive continuously, several batches may store, referred to simply as database in this paper. As source input batches arrive continuously, several batches may be processed in the workflow simultaneously. In addition, some OLTP transactions, accessing the database, may also be processed in the workflow simultaneously. In addition, some OLTP transactions, accessing the database, may also be executed concurrently in the workflow. Ensuring correctness of these concurrent executions is important. be executed concurrently in the workflow. Ensuring correctness of these concurrent executions is important. Concurrency issues have been studied widely in database context. The transaction concept has been extremely Concurrency issues have been studied widely in database context. The transaction concept has been extremely helpful to regulate as well as ensure the correctness of concurrent executions in database applications. Transactions helpful to regulate as well as ensure the correctness of concurrent executions in database applications. Transactions are characterized by ACID properties: Atomicity, Consistency, Isolation and Durability. In stream processing, we are characterized by ACID properties: Atomicity, Consistency, Isolation and Durability. In stream processing, we ∗ ∗
K. Vidyasankar. Tel.: +1-709-864-4369; fax: +1-709-864-2009. K. Vidyasankar. Tel.: +1-709-864-4369; fax: +1-709-864-2009. E-mail address:
[email protected] E-mail address:
[email protected]
c 2016 The Authors. Published by Elsevier B.V. 1877-0509 c 2016 The Authors. Published by Elsevier B.V. 1877-0509 Peer-review under responsibility of the Conference Program Chairs. Peer-review©under of Published the Conference ProgramB.V. Chairs. 1877-0509 2017responsibility The Authors. by Elsevier Peer-review under responsibility of the Conference Program Chairs. 10.1016/j.procs.2017.05.370
2
K. Vidyasankar / Procedia Computer Science 109C (2017) 640–647 K. Vidyasankar / Procedia Computer Science 00 (2016) 000–000
a2
a1
(a)
R
r1
r1
T2,1
T1,4
T1,3
r1
T2,2
r2
T2,3
c1 T3,1
T2,4
b’3
b’2
b’i
a’4
a’3
a’2
a’1
(b)
a4
a3
T1,2
T1,1
641
c2
T3,2
b’4 c3 T3,3
Fig. 1. Aggregation over three batches
take each execution of a program in the workflow as a transaction and then group several transactions into composite transactions, and study the appropriate ACID properties for them. The composite transactions need to be defined in a meaningful way. First, all the transactions that process a source input batch, both directly and transitively, can be considered to constitute a batch composite transaction, abbreviated as BCT. Secondly, composite transactions may correspond to continuous queries, abbreviated as CQs in this paper, that are evaluated in the workflow. We illustrate these with the example shown in Figure 1. Here, the workflow is a sequence of three programs P1 , P2 , processing each input batch individually, and P3 computing an aggregate value over three consecutive batches. Individual program executions give rise to transactions T i, j . The set of transactions processing batch a3 , that is, {T 1,3 , T 2,3 , T 3,1 , T 3,2 , T 3,3 }, will constitute a BCT. Interpreting the aggregate computation as a CQ, all the transactions contributing to, and including T 3,i , for a single i, would constitute a one time execution of the CQ, abbreviated as 1-CQ. An example is the set of transactions contributing to T 3,2 , which is {T 1,2 , T 1,3 , T 1,4 , T 2,2 , T 2,3 , T 2,4 , T 3,2 }. Then, a CQ comprises of several one time executions. In this example, we assume that P2 accesses the database, represented as a relation R. The database may be updated periodically by OLTP transactions. Thus, several transactions and composite transactions are involved. In this paper, we focus on CQs. There is a general agreement that each CQ normally spans over several batches of stream inputs. Apart from this, different notions of CQs exist in the literature, their 1-CQs ranging from just transactions (that is, single executions of programs) to composite transactions. In this paper, we look at how CQs can be defined generically and propose a correctness criterion for concurrent executions of CQs. There have been several studies on the application of the transaction concept in stream processing, including Botan et al. 1 , Wang et al. 2 , Meehan et al. 3 , G¨urgen et al. 4 , Conway 5 and Oyamada et al. 6 . We elaborate the approaches in the Related Works section. We start with core definitions of compositions, transactions and continuous queries in stream processing in Section 2. We describe a correctness criterion for concurrent executions in Section 3. We discuss related work in Section 4 and conclude in Section 5. 2. Executions A stream processing workflow is a composition of programs. Formally, a composition C is (P,≺ p ), where P is a set of transaction programs {P1 , P2 , . . . , Pn }, simply called programs, and ≺ p is a partial order, called program order, among them. The partial order consists of dataflow order (of the streams) and control order. We call the (acyclic) graph representing the partial order the composition graph GC(C). Streams coming from outside the composition are called source streams. The output streams (of any program) are called derived streams. Each execution of a program
K. Vidyasankar / Procedia Computer Science 109C (2017) 640–647 K. Vidyasankar / Procedia Computer Science 00 (2016) 000–000
642
P0
a2
a1 T1,1
(a)
RPR
(b)
T0,3
T0,2
T0,1
T0,4 a4
a3
T1,2
T1,4
T1,3
TR,2
TR,1
r1
3
a’1 T2,1
a’3 r2
a’2
T2,2
T2,3 b’3
b’2
b’i
c1 T3,1
a’4 T2,4
c2
T3,2
b’4 c3 T3,3
Fig. 2. All transactions version
yields a transaction. A transaction may have some stream and/or non-stream inputs, and may produce some stream and/or non-stream outputs. Non-stream data are assumed to be stored persistently in a database. We use the example in Figure 1 to illustrate the definitions. The composition is a workflow consisting of a sequence of three programs P1 , P2 and P3 . Input batches are denoted by unprimed variables xi and the corresponding outputs by primed variables xi . Stream inputs/outputs for P1 , P2 and P3 are denoted by a, b and c, respectively. The sequence of input batches for P1 is a1 , a2 , . . . , and the executions are transactions T 1,1 , T 1,2 , . . . (the first index is that of the program and the second index is that of the input batch), producing the output sequence a1 , a2 . . . . Arrival of a source input batch may trigger an execution instance of the composition. Some of the programs in the composition may be executed. We have called the resulting composite transaction a batch composite transaction BCT. A composite transaction T is a partially ordered set of transactions ({T 1 , T 2 , . . . , T m },≺t ) which are, as already mentioned, executions of programs in a composition. It is possible that T has more than one execution of some P j . We denote {T 1 , T 2 , . . . , T m } as set(T ). The partial order ≺t is called transaction order. It reflects the program partial order ≺ p , that is, if T i is an execution of P j , T k is an execution of Pl and P j ≺ p Pl , then T i ≺t T k . In addition, ≺t will contain triggering relationships, if any. The graph representing ≺t is called transaction graph GT (T ). The transaction graphs are acyclic. Programs in the workflow may access data from the database. One way to implement this in stream processing is converting database (relations) into streams, using relation-to-stream operators, as discussed in Conway 5 and using them as a stream input in the execution of the program. An alternate approach has been discussed in Botan et al. 1 : Both stream and non-stream data are treated the same way. Input batches from streams are said to be read by the program, and the output batches are written by the program. Thus, a program reads inputs from streams and/or database and may write outputs in streams and/or database. We use this model in this paper. Further, we associate a hypothetical program with each source input stream and consider each batch input in that stream as being written by a write-only transaction of that program. Thus, we can deal (only) with transactions as in traditional database. This is depicted in Figure 2, for the example in Figure 1. Transactions T 0,i ’s write ai ’s. Transaction T R,1 is assumed to write R1 which is used for T 2,1 , T 2,2 and T 2,3 , and T R,2 is assumed to write R2 which is used for T 3,4 . We identify the BCT for batch b as T (b). Suppose b is input to transaction T . Then we define T (b) as the union of {T } and all the transactions triggered directly or indirectly by T in the composition, with the corresponding partial order. In our example in Figure 2, T (a3 ) will have the set of transactions {T 0,3 , T 1,3 , T 2,3 , T 3,1 , T 3,2 , T 3,3 }.
4
K. Vidyasankar / Procedia Computer Science 109C (2017) 640–647 K. Vidyasankar / Procedia Computer Science 00 (2016) 000–000
643
Stream input batches arrive in sequence, for example, as b1 , b2 , . . .. The batch order is denoted ≺b . The batch b2 and a few more batches may arrive before all the transactions in T (b1 ) are completely executed. Thus many BCTs may be executed concurrently. The programs in the workflow processing the batches should follow the batch order. Several continuous queries may be evaluated in a composition. Continuous queries (CQs) consist of a sequence of one-time queries (1-CQs). (In spite of the reference as ‘queries’, we allow both reads and writes in their executions.) In the example of Figure 2, the CQ consists of the 1-CQs, each computing one aggregate value. The set of transactions involved in the second 1-CQ in the figure is {T 0,2 , T 0,3 , T 0,4 , T 1,2 , T 1,3 , T 1,4 , T 2,2 .T 2,3 , T 2,4 , T 3,2 }. In an execution of a composition, there could be several CQs, several BCTs and several OLTP type transactions and composite transactions.. In this paper, we use the term ‘transaction’ exclusively to denote some T i ; a T always denotes a composite transaction. A composite transaction is referred to as CT. Let T be a set of transactions executed in a composition. We define an execution graph GE(T) as the graph with vertex set T and edges for the following: • the transaction partial order ≺t of each CT that is defined over T; • The serial order among the transactions of the same program, for each program in the workflow; and • the conflict order among the transactions. We want the execution graph to be acyclic. We assume in this paper that each program in the workflow is executed serially. This implies that each transaction in a composite transaction is executed atomically, akin to the assumption that each operation in a database transaction is executed atomically. In addition, we want to prescribe the atomicity requirements for the CTs. The strict notion is consecutive serializability, that is, the execution should be equivalent to a serial execution where all the transactions in the CT occur consecutively. We refer to this as C-SER in this paper. Consecutive serializability of a CT amounts to contraction of the subgraph induced by its transaction set into a single vertex not creating a directed cycle in the execution graph. In our example, the second 1-CQ has the transaction set {T 0,2 , T 1,2 , T 2,2 , T 0,3 , T 1,3 , T 2,3 , T 0,4 , T 1,4 , T 2,4 , T 3,2 }. In the execution graph, there will be a conflict edge from T 2,3 to T R,2 and one from T R,2 to T 2,4 . These two edges will form a directed cycle on contraction of the associated subgraph. Absence of this consecutive serializability requirement yields the saga notion. This will accept interleaving of T R,2 among the other transactions as above. We denote the atomicity requirement of a CT as ψ. Our contention is that ♦ ψ should be defined individually to 1-CQs and CQs, and each can be either C-SER or saga. (The statements related to correctness criterion are labelled with ♦.) We refine this notion in the next section. 3. Correctness Criterion In general, several CQs will be processed concurrently over the same stream inputs. A workflow will be designed to accommodate several CQs simultaneously. On arrival of new CQs, and similarly removal of some current ones, the worflow will be re-arranged. In this paper, we consider only a simple workflow composed for a single CQ. We extend it to include a second CQ later. We take the composition given in Figure 2 and consider several variants of CQs. We have programs P1 , P2 , P3 and PR in our schema. The program PR reflects update of R. Initially, its value is written by T R,1 . It is updated by T R,2 in between the executions of T 2,3 and T 2,4 . We define the consistency requirement χ for 1-CQs and CQs, with respect to the values of R, as same-R or different-R, referring to whether the values of R used in the computation should be the same or could be different, respectively. The consistency requirement depends on the application. An example of the case where changed database value will lead to inconsistency is monitoring temperatures from sensors, arriving as stream data, and the unit of measurement stored in the database being changed from Celcius to Fahrenheit. An example where changes in values must be considered is computing chill factor with wind velocity ai arriving as stream data and current temperature stored in R. We denote a CQ by Γ and 1-CQ by γ. The variants are with respect to executions of P3 . Ex-1. Program P3 computes aggregate over single source input batches, and a CQ consists of a single 1-CQ.
644
K. Vidyasankar / Procedia Computer Science 109C (2017) 640–647 K. Vidyasankar / Procedia Computer Science 00 (2016) 000–000
5
We denote the CQs as Γi , for executions over the source input batch ai , and the 1-CQ as γi,1 . The set of transactions in the 1-CQ γi,1 , set(γi,1 ), is {T 0,i , T 1,i , T 2,i , T 3,i }. For χ being either same-R or different-R for both 1-CQ and CQ, the appropriate ψ value is C-SER for each 1-CQ and CQ. Ex-2. Program P3 computes aggregate over single source input batches, and a CQ consists of a sequence of 1-CQs executed over single source input batches. Here, we have only one CQ, denoted Γ. Again, representing The 1-CQ executed with batch ai as γi , set(γi ) is {T 0,i , T 1,i , T 2,i , T 3,i }. Here also, ψ is C-SER for the 1-CQs for either value of χ. For CQ, if χ(Γ) is same-R, then we need to split the CQ into segments each of which is consistent. They are Γ1 and Γ2 , consisting of {γ1 , γ2 , γ3 } and {γ4 }, respectively. The first segment uses R1 , and the second R2 , as the value of R. For each CQ segment, ψ is C-SER. If χ(Γ) is different-R, then ψ(Γ) is saga, and all the four 1-CQs can be in the same CQ segment. Thus: ♦ We split a CQ into segments, and apply the atomicity requirement ψ to the CQ segments, namely, to the 1-CQs in the individual segments. ♦ For the collection of segments, we always have saga as the atomicity requirement. Ex-3. Program P3 computes aggregate over independent sets of three consecutive source input batches, and a CQ consists of sequences of such 1-CQs. Here, the first 1-CQ will be executed over batches {a1 , a2 , a3 } and the second over {a4 , a5 , a6 }. For same-R consistency, ψ is C-SER for the 1-CQs. For different-R consistency, ψ is saga; in this case, a 1-CQ executed over the batches {a2 , a3 , a4 } will also be consistent. If the CQ has different-R consistency, both the above 1-CQs can be in the same segment. That is, the CQ is a saga, allowing for update of R in between the two 1-CQs. For same-R consistency, one segment of the CQ must end with the first 1-CQ, and another will start with the second 1-CQ. Here, ψ for the segments is C-SER. Ex-4. We now consider the original example, that is, P3 computes aggregates over every three consecutive source batch sets. Each computation forms a 1-CQ and the CQ consists of sequences of such 1-CQs. We denote the 1-CQ over the first three batches as γ1 , the second three batches as γ2 , etc. For same-R consistency of 1-CQ’s, γ1 is consistent, and γ2 and γ3 are not, and the inconsistent 1-CQs should not be in any CQ segment. If R value remains as R2 for the entire computation of γ4 (not shown in the figure), it will be consistent. Then, γ1 and γ4 can be in the same CQ segment if the CQ has different-R consistency, and they will be in two different segments for same-R consistency; ψ value will be saga and C-SER, respectively, for the CQ segments. For different-R consistency of 1-CQs, ψ is saga for the 1-CQs, and all the 1-CQs are consistent. In this case, if χ for the CQ is different-R and hence ψ is saga, then all the four 1-CQs can be in the same CQ segment. Here, same-R for χ and hence C-SER for ψ appears meaningless for the CQ segment. This is because the adjacent 1-CQs are overlapping, and update of R is interleaved with the transactions of these 1-CQs taken together.. To deal with this, we introduce an intermediate grouping of 1-CQs: groups of adjoining intersecting 1-CQs are treated as clusters. We define: ♦ A 1-CQ cluster as a maximal sequence of 1-CQs in which any two consecutive 1-CQs have some transactions in common. Therefore, no two clusters will have any transactions in common. Now, a CQ segment is a sequence of 1-CQ clusters. Thus, all the three 1-CQs will be in the same cluster. Now ψ for CQ segments will apply to the 1-CQ clusters in each segment. In the current case, with only one 1-CQ cluster in the segment, we can prescribe ψ as C-SER for the CQ segment. As another example, with the current composition, if T R,2 is executed between T 2,4 and T 2,5 , then for same-R consistency of the 1-CQs, γ1 and γ2 will be consistent and will form a cluster. The next consistent 1-CQ will be γ5 (assuming that R is not updated in the mean time), and all the 1-CQs that use R2 value of R will be in the same cluster. Then, for different-R consistency of the CQ segments, these two (or more) 1-CQ clusters will be in the same CQ segment, and for same-R consistency, these two clusters will be in different CQ segments. It seems appropriate that:
6
K. Vidyasankar / Procedia Computer Science 109C (2017) 640–647 K. Vidyasankar / Procedia Computer Science 00 (2016) 000–000
T0,2
T0,1
a2
a1 R1
T1,1
T0,3
R1
R’1
T1,2
T2,1
R’1
T3,1
T1,3
T2,2
R2
R’2
T1,4
T2,3
T3,2
T0,6
a5
a4
a3 R1
T0,5
T0,4
R2
R’2
T1,5
645
a6
R2
T1,6
T2,4
T3,3
Fig. 3. Two aggregates
♦ ψ for a 1-CQ cluster be the same as that for the 1-CQs it contains. ♦ The serializability requirement of C-SER or saga is with respect to the 1-CQ clusters in the segments: saga allows execution of other transactions in between the clusters whereas C-SER does not allow. We give further examples in Ex-5 below. We propose the following correctness criterion. Definition: A concurrent execution of CQs is correct if the execution graph (i) is acyclic and (ii) satisfies the atomicity requirements ψ of (a) each 1-CQ cluster and (b) each CQ segment. Ex-5. Figure 3 describes two aggregations, the first, CQ1, over every three consecutive source input batches as in Ex-4, and the second, CQ2, over two of the first aggregates. The program P2 accesses relation R, and the program P3 accesses relation R . The initial value of R is R1 , written by T R,1 , and it is updated to R2 by T R,2 between T 1,3 and T 1,4 . The initial value of R is R1 , written by T R ,1 , and it is updated to R2 by T R ,2 between T 2,2 and T 2,3 . The relation update transactions are not shown in the figure to reduce cluttering. We denote CQ1 as Γ1 and CQ2 as Γ2 . Segments of Γi will be Γi, j , for different j’s. The 1-CQs of Γi will be represented as γi, j , and a cluster of γi, j to γi,k as γi, j−k . As before, we can observe that ψ value C-SER implies same-R consistency and does not allow interleaving of the relation R update among the transactions of the CT, and saga does. The same property applies to the update of R in the consideration of CQ2. 5.1. ψ(γ1 ) is saga. Then every 1-CQ1 in the figure is consistent, the four 1-CQ1s given in the figure form a 1-CQ1 cluster γ1,1−4 and this forms a CQ1 segment. Because of the single element, ψ value of C-SER as well as saga applies to the CQ1 segments. Two subcases arise with respect to CQ2: (a) ψ(γ2 ) is saga. Then, every 1-CQ2 in the figure is consistent. We have a cluster with all the three 1-CQ2s given in the figure. This cluster constitutes a CQ2 segment. Again, ψ for this can be either C-SER or saga. (b) ψ(γ2 ) is C-SER. Then, the second 1-CQ2 in the figure is not consistent. It cannot be in any CQ2 segment. Therefore, one segment will have only γ2,1 and the next segment will start with γ2,3 . Here also, each CQ2 segment has only one element and so ψ(Γ2 ) could be either C-SER or saga.
646
K. Vidyasankar / Procedia Computer Science 109C (2017) 640–647 K. Vidyasankar / Procedia Computer Science 00 (2016) 000–000
7
5.2 ψ(γ1 ) is C-SER. Then, only γ1,1 and γ1,4 are consistent. These two 1-CQ1s are available for the second aggregation by P3 . Two subcases arise: (a) ψ(γ2 ) is saga. Then, the above two 1-CQ1s can be aggregated to get a 1-CQ2, which will form a CQ2 segment, with ψ either C-SER or saga. (b) ψ(γ2 ) is C-SER. Then the above two 1-CQ1s cannot be aggregated for 1-CQ2. 4. Related Work Transactional properties for continuous queries in stream processing have been discussed in several papers. The unit of atomicity in Conway 5 is just a transaction, that is, execution of a single program in the composition. The model is called window isolation stated as, in the context of mixed join, “during the computation of a single window’s worth of results, the mixed join accesses an immutable, consistent snapshot of all the relations in the system”. The same is true in Meehan et al. 3 . However, they do consider nested transactions which are CTs. A unified transaction model, called UTM, is proposed in Botan et al. 1 . It treats stream data and database data the same way, using reads and writes. Here also, each program execution is a transaction which is the unit of atomicity. Updating database as part of a program execution is done in Wang et al. 2 also, in the context of processing over event streams, the update being triggered by active rules. They also define stream ACID properties for transactions: s-Atomicity, s-Consistency, s-Isolation and s-Durability. The s-Atomicity notion requires that “all operations stimulated by a single input event should occur in their entirety”. In Oyamada et al. 6 , read-only parts of CQs and write-only database updates are considered. A concurrency control mechaniam is also given. When an inconsistent execution of a 1-CQ is detected, appropriate upstream batches are reprocessed. Referring to our example in Figure 2, for same-R consistency, for the execution of T 3,2 , T 2,2 and T 2,3 will be re-executed with R2 instead of R1 , but with the same stream inputs a2 and a3 , respectively. This will guarantee consistency of the resulting γ2 . This approach will create problems if the CQs can also write in the database. In that case, two executions of P2 , for each of a2 and a3 , will write in the database. Apart from re-executions, other options may be available. In the same example, since the inconsistent 1-CQs do not contribute to any CQ instance, they need not be computed. This amounts to not executing T 3,2 and T 3,3 . Another option, if the semantics accepts aggregation even over one or two source input batches, is to execute T 3,2 with just b2 and b3 , and similarly, T 3,3 with just b3 . In G¨urgen 4 , a temporally nested transaction model is given for CQs. CQs are decomposed into 1-CQs, and unit of atomicity is a single transaction whose consistent execution is discussed. The paper by Meehan et al. 3 introduces atomic batches and batch composite transactions (though not using the term BCT). The BCTs are studied in the context of batches being split, merged and overlapped in executions in Vidyasankar 7 and Vidyasankar 8 . Updates of database by OLTP transactions are not considered in them. 5. Conclusion Continuous queries form an integral part of stream processing. Several CQs may be involved and each may be over several stream input batches. Applying a single overall transactional property for the entire execution will be difficult. In this paper, we have proposed applying transactional property piecemeal to each CQ. We have decomposed a CQ into segments, each consisting of 1-CQ clusters each of which comprises of one or more 1-CQs that are adjoined together. We find that the cluster notion is helpful since computations are shared between several consecutive 1-CQs, when batches are merged or overlapped in the execution. The cluster notion is akin to the atomic batch set notion in Vidyasankar 7 ; it refers to a set of batches that are processed in isolation. We have shown that atomicity requirement (consecutive serializability or saga) needs to be prescribed separately for 1-CQ (clusters) and CQ (segments), and each could have a different property. We believe this proposal is new. We have used (single) transactions that update database to bring out the distinctions. We can generalize this to composite (for example, OLTP) transactions. Then we come across the notion of compatible transactions of Garcia-Molina 9 , applied to CTs, namely: (i) two compatible CTs can interleave arbitrarily, whereas incompatible ones need to be serialized and (ii) a CT may be compatible to some CT but incompatible to some other CT.
8
K. Vidyasankar / Procedia Computer Science 109C (2017) 640–647 K. Vidyasankar / Procedia Computer Science 00 (2016) 000–000
647
We have exhibited some inconsistent 1-CQs and CQs. We should design concurrency control mechanisms that could detect and perhaps avoid inconsistent 1-CQ and CQ computations. It is inevitable that, in some applications, inconsistent executions cannot be avoided and they should simply be ignored. For example, if an aggregate is meaningful only while taken over four source input batches, then 1-CQs executed until the fourth batch arrives will not be useful. The same situation will occur in an execution that is started after recovery from some failure. Acknowledgment This research is supported in part by the Natural Sciences and Engineering Research Council of Canada Discovery Grant 3182. References 1. Botan, I., Fischer, P.M., Kossmann, D., Tatbul, N.. Transactional stream processing. In: Proceedings of the 15th International Conference on Extending Database Technology. EDBT ’12; New York, NY, USA: ACM. ISBN 978-1-4503-0790-1; 2012:204–215. URL: http://doi.acm.org/10.1145/2247596.2247622. doi:10.1145/2247596.2247622. 2. Wang, D., Rundensteiner, E.A., III, R.T.E.. Active complex event processing over event streams. In: Proceedings of the VLDB Endowment. ACM Press; 2011:634–645. 3. Meehan, J., Tatbul, N., Zdonik, S., Aslantas, C., Cetintemel, U., Du, J., Kraska, T., Madden, S., Maier, D., Pavlo, A., Stonebraker, M., Tufte, K., Wang, H.. S-store: Streaming meets transaction processing. Proc VLDB Endow 2015;8(13):2134–2145. 4. G¨urgen, L., Roncancio, C., Labb´e, S., Olive, V.. Transactional issues in sensor data management. In: Proceedings of the 3rd International Workshop on Data Management for Sensor Networks (DMSN’06), Seoul, South Korea. 2006:27–32. 5. Conway, N.. Transactions and data stream processing. In: Online Publication. http://neilconway.org/docs/stream txn.pdf; 2008:1–28. 6. Oyamada, M., Kawashima, H., Kitagawa, H.. Continuous query processing with concurrency control: Reading updatable resources consistently. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing. SAC ’13; New York, NY, USA: ACM. ISBN 978-1-4503-1656-9; 2013:788–794. URL: http://doi.acm.org/10.1145/2480362.2480514. doi:10.1145/2480362.2480514. 7. Vidyasankar, K.. On atomic batch executions in stream processing. In: Procedia Computer Science. Elsevier; 2016:72–79. doi:10.1016/j.procs.2016.09.013. 8. Vidyasankar, K.. Transactional composition of executions in stream processing. In: 27th International Workshop on Database and Expert Systems Applications (DEXA 2016). IEEE Computer Society Conference Publishing Services; 2016:114–118. doi:10.1109/DEXA.2016.26. 9. Garcia-Molina, H.. Using semantic knowledge for transaction processing in a distributed database. ACM Trans Database Syst 1983;8(2):186– 213. URL: http://doi.acm.org/10.1145/319983.319985. doi:10.1145/319983.319985.