2009 International Conference of Soft Computing and Pattern Recognition
Rule Modeling Engine for Optimizing Complex Event Processing Patterns Babak Behravesh, Siti Mariyam Shamsuddin, and Anazida Zainal Faculty of Computer Science and Information Systems Universiti Teknologi Malaysia 81310 Skudai, Malaysia
[email protected],
[email protected], and
[email protected] each pair of state and input symbol there may be several possible next states. This distinguishes it from the deterministic finite automaton (DFA), where the next possible state is uniquely determined. To extract meaningful rules from raw events we have to use an algorithm. In computer science and data mining, Apriori is a classic algorithm for learning association rules. Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). Other algorithms are designed for finding association rules in data having no transactions, or having no timestamps (DNA sequencing) [13, 31, 32]. It is common in association rule mining, given a set of itemsets (for instance, sets of retail transactions, each listing individual items purchased), the algorithm attempts to find subsets which are common to at least a minimum number C (the cutoff, or confidence threshold) of the itemsets. A-priori uses a "bottom up" approach, where frequent subsets are extended one item at a time, and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found. A-priori uses breadth-first search and a tree structure to count candidate item sets efficiently. It generates candidate item sets of length k from item sets of length k − 1. Then it prunes the candidates which have an infrequent sub pattern. According to the downward closure lemma, the candidate set contains all frequent k-length item sets. After that, it scans the transaction database to determine frequent item sets among the candidates. A-priori, while historically significant, suffers from a number of inefficiencies or tradeoffs, which have spawned other algorithms. Candidate generation generates large numbers of subsets. Bottom-up subset exploration (essentially a breadth-first traversal of the subset lattice) finds any maximal subset S only after all 2 | S | − 1 of its proper subsets [13].
Abstract— In Complex Event Processing (CEP), we deal with how to search through a sequence of incoming events to find a specified and desired pattern. CEP has a broad use in today enterprise. It can act on sent and/or received events. The result can generate other events that can be used in different layers of an enterprise system. Growing number of areas dealing with arisen events like Business Activity Monitoring (BAM), Fraud detection and intrusion detection makes CEP a hot topic for researchers. Generating efficient high-performance patterns is the issue which has been addressed in this paper. The pattern can be made from any query given by user. The user defined query is CQL (Continuous Query Language) which is relevant for time series data. NFA (Nondeterministic Finite Automaton) is used for modeling patterns although it has some defects which are addressed. The focus of this paper is on developing a rule modeling engine and taking into account the role of historical data to make efficient patterns. We developed some algorithms for each component of proposed model. The results are optimized patterns produced based on historical data and queries given by user. Finally we show that these techniques can be efficient when we deal with high volume event-base data. Keywords- Data Mining, Association Rules, A-priori algorithm, Event streams, Pattern matching, Query optimization.
I.
INTRODUCTION
Complex Event Processing (CEP) is primarily an event processing concept that deals with the task of processing multiple events with the goal of identifying the meaningful events within the event cloud. CEP employs techniques such as detection of complex patterns of many events, event correlation and abstraction, event hierarchies, and relationships between events such as causality, membership, and timing, and event-driven processes. CEP ultimately creates complex events even if some or all of the source events are simple events [5, 8]. In this paper Non-deterministic Finite Automaton is used for modeling CEP patterns. In the theory of computation, a Non-deterministic finite state machine or Non-deterministic Finite Automaton (NFA) is a finite state machine where for
978-0-7695-3879-2/09 $26.00 © 2009 IEEE DOI 10.1109/SoCPaR.2009.37
A. Events types and Event processing Event is an action which can effect on a system. A certain object is able to react on set of events, In case of event happening whether the object shows reaction or not the event has happened. Events can be assumed as tuples in a database with some attributes like time of trigger, duration and their source. Intrinsically trigger time and tag ID of reader are two of the key factors of events. Fig. 1 and fig. 2 illustrate different event types and their relationships.
134 128
An Event pattern contains event templates, relational operators and variables. An event pattern can match sets of related events by replacing variables with values. Examples: i. A pattern of events defining those sets of events in a completed sales transaction. ii. A pattern of events in an email correspondence: String Msg, Time T1, T2 ; Send(John, Msg, T1) and Receive(John, Msg, T2). A pattern defining the events in any successfully resolved customer complaint: Customer C, Agent A, Problem P, Time T1, T2, T3; Complain(C, P, T1) → Engage(A, C, T2) → Resolved (P, T3). Event patterns can be specified graphically[1, 2, 6, 8]. The gap still remaining unfilled was role of historical for making patterns never mentioned [1, 6, 7 10, 18]. Fig 3 shows the model earlier researchers had been used to make event patterns.
Events Simple
Complex
Figure 1. Events types
It is needed before defining Complex Events to know about Abstraction. An event is an abstraction of a set of events if it summarizes, represents, or denotes that set of events [5]. A complex event is an abstraction of other events called its members. e.g. the 2008 stock market crash – an abstraction denoting many thousands of member events, including individual stock trades [8]. Event processing is a typical computing that performs operations on events, including reading, creating, transforming and deleting events. A Complex event is an event that is an abstraction of other events called its members. A Derived event is an event that is generated as a result of applying a method or process to one or more other events. A composite event is some kind of derived event created by combining base events using a specific set of event constructors such as disjunction, conjunction, sequence, etc. A composite event always includes the base (member) events from which it is derived [12].
II.
Recently, CEP became a hot topic for its broad use in enterprise. There are several research academic teams, SASE and SASE+ (UC Berkeley/UMass Amherst) and Cayuga (Cornell University), work on CEP optimization [6,
7, 18]. According to fig 3, the general idea is to have a nondeterministic finite state automata (NFA), which is simulated while new mined rules arrive. Ultimately, NFA model can use mined rules to generate Event Patterns appropriate for individual problem. Because the automata is non-deterministic, the NFA can be in multiple states at the same time which is determined based selected strategy [1, 2]. Every time the automata arrives at an accept state, a complex event is detected and can therefore be constructed. Although researchers worked on how to use NFA making CEP patterns [1, 2, 3, 4, 6, 7, 9, 18], defining an efficient rule based technique to optimize CEP patterns over NFA is still remaining uncovered [1, 6, 7, 10, 18]. NFA has some defects for supporting out of order strings [13]. Recently in some research papers a buffer for keeping match cases has been proposed [1, 2].
B. Event template and event pattern Events are related by time, causality, abstraction, and other relationships. Time and causality impose partial orderings upon events. There are two points: firstly, regarding the relationships of composite, derived and complex events: A composite event or a derived event is a complex event [8]. The converses are not necessarily true. Secondly, the term aggregate event is sometimes used for some forms of composite or derived event. An Event template is an event form or descriptor, some of whose parameters are variables. An event template matches single events by replacing the variables with values. E.g., Send of any message and String Msg; Send(John, Msg).
Complex Events
BACKGROUND
NFA Modeler
Query
User
Modeled
Derived Events
Events Event
Composite
Pattern
Events
Events
RFID Readers
Figure 3. A model for making event pattern Figure 2. Complex event types
129 135
NFA have applied NFA by earlier researchers for modeling as a tool [6, 7, 15, 19], pattern matching [18, 20], and fuzzification [16, 17]. On the other hand CEP developers have used it in modeling [21, 23, 24] and pattern matching [6, 7, 22, 25]. Indeed A-priori algorithm was employed in rule mining [26, 27, 28, 29, 30, 31, 32] and modeling [33, 34, 35, 36]. In this paper a new technique has been presented for optimizing CEP patterns. It reduces system overload through optimizing NFA model generated over mined rules. We proposed an algorithm, Enhanced A-priori Algorithm (EAA), which is inspired form A-priori algorithm. It plays an important role in optimization process.
Phase 1. Investigation Association Rules formalism over CEP
Phase2. Investigation NFA formalism over Association Rules
Phase 3. Developing a new algorithm to generate Association Rules given query and raw events
III. THE PROPOSED METHOD In this part we explain the methodology that has been used to generate an efficient rule engine for complex events processing. Phases of the research with brief description, fig. 4, are presented. Then each phase is described with more details. Input, output and process for complicated phases are introduced clearly.
Phase 4. Developing a new algorithm to model Association Rules using NFA
Phase 5. Generating Complex Event Patterns Using modeled rules
A. Investigation Association Rule formalism over CEPPhase 1 In this step the survey and investigation is done to show how we can use the Association Rules formalism over CEP. The specifications of most appropriate algorithm for this goal will be investigated. Then we will investigate if mentioned algorithm is expressible enough for this case study. In section 1 it has been tried to introduce the CEP deeply and to show how it works then in another subsection the characteristics of Association Rules in terms of its strength in shortening high volume data has been discussed.
Phase 6. Developing a program for proposed algorithms
Phase 7. Implementing the program for the mentioned case study Figure 4. The main phases of research
D. Developing a new algorithm to model Association Rules using NFA to generate patterns- Phases 4 and 5 In this section a new algorithm is proposed to model Association Rules given by NFA model. In this stage Association Rules generated in 3.3 are input. Expected output is extracted rules which all have been mapped on NFA model. The process on input events will be done using appropriated algorithm which is investigated in 3.2. Then generated modeled rules over NFA have to be formalized as Complex Event Patterns. In validation of the algorithm quality attributes like expressibility have to be considered. The validation is based on the method in the computer program that is discussed in section 4.
B. Investigation NFA formalism over Association RulesPhase 2 This step deal with how we can use NFA model to formalize Association Rules. Set of solutions proposed by other researchers will be considered to find most efficient algorithm. A closer previous match algorithm with the case study is the better. In section 1, NFA has been defined and its strength in indeterminism modeling and evaluating systems has been shown. C. Developing a new algorithm to generate Association Rules given query and raw events- Phase 3 In this section new algorithm is developed. Incoming events are input and extracted Association Rules is the expected output. The process on input events will be done using appropriated algorithm which is nominated in 3.1. In validation of the algorithm quality attributes like expressibility have to be considered. The validation is based on the method in the computer program that is discussed in section 5.
E. Developing the program for the case study- Phases 6 and 7 A computer program will be provided which helps to implement the algorithms. Arising random events will start by running program and it can continue forever. To generate Association Rules given query and raw events are needed. The results are set of patterns.
130 136
Optimized
The result shows three important issues. Firstly, it shows compatibility of events over patterns. Secondly, system can take an appropriate action when it gets to a final state. Thirdly, it continues to run clearing by the buffer and jumping to first step. Output can be arisen as a warning or critical alert or a text message as a suggested decision proposed to be taken by person in charge. III.
Properties of
Query
Raw NFA Modeler
Query
User
the system
Revise
Event Pattern
CASE STUDY, RESULTS AND ANALYSIS
In this stage, we want to illustrate and examine our findings in section 4. The case study is real and event base example although simulated data is used; a shopping mall can be a good candidate to generate event streams. In a complex event case there are many events constitutes. We may be able to prune extra predicates on some transition lines. In each transition line we might have some predicates which were never checked. So they are extra and can be pruned without any effect on final result. Pruning these predicates increases system performance by reducing number of predicates has to be checked. Moreover, we may have some extra states that can be pruned without any effect on finial results. Pruning extra predicates or extra states can reduce system overload in term of memory allocated space and CPU time. In this research a model is needed to be applied before going through details. Some models will be introduced in order to have better understanding about advantages of optimized model. Description for last model, fig. 7, with some changes can be easily used to describe two earlier models which are shown in fig. 5 and fig. 6, respectively.
Model
Event
RFID Devices
Figure 6. 2nd proposed model for modeling system activities
The second model, fig. 6, has following Shortcomings: • Historical data will be lost. • Revision is needed but how it can effect on modeling process is ambiguous • Incoming Events are not involved in modeling In order to investigate another model we change the place of RFID Devices and its related arrow to be connected to NFA Model, we have the same defects as above. B. Developing a model to construct Rule modeling Engine Fig. 7 shows an optimized model for a rule generator. RFID devices send events as a set of discrete tuples to the storage to be saved in it. On the other hand user can send queries to the rule generator. In order to generate rules, relevant event tuples have to be retrieved from storage. Rule generator sends a request to the storage for sending relevant event based on query given by user. In this step Rule generator can generate rules. The key point is by taking into account historical data we can modify different parts of the query to include SEQUENCE and WHERE clause parts of the query. Mined Rules go to NFA modeler to be modeled. NFA is appropriate for modeling because it is able to express uncertainty principle [1, 2] on events. NFA modeler can model rules as event patterns. From now on events can be sent directly to the Event Pattern. Indeed generating event pattern can be continued during system working time or system idle. An overall flowchart for optimization process has been drawn as fig. 8.
A. Analysis on some of the proposed models Before developing the efficient model, we have proposed some models. Firstly, we investigate two models (fig. 5 and fig. 6) briefly. These two models have some defects which have been listed. Description which is addressed for the third model, fig. 7, can be used for both earlier models with small changes. The model, fig. 5, has following shortcomings: • There is an ambiguity for generating Optimized Query respect to properties of the system • Historical data will be lost. • Revision is needed but how it can effect on modeling process is ambiguous. • Events are not involved in modeling
RFID Readers
Events
Storage Relevant Events
Request for Relevant Events
Rule Generator
Query
User
NFA Modeler
Query
User
Events
Revise
Model
Mined Rules NFA Modeler
Event Pattern
Event
Modeled Rules
RFID Devices
Rule modeling Engine
Analyze
(White Box) Event Pattern
Properties of the system
Figure 7: 3rd proposed model for modeling system activities
Figure5. 1st proposed model for modeling system activities
131 137
Start
Start
Query given by user Is there any query which has not been
No
Finish
checked?
Yes
Rule modeling engine
Yes
asks is there any suitable
Delete the query
event pattern? No
Is there any match case respect to Sequence
No
from the list of queries subjected to make patterns
part of query?
Request for relevant event tuples from storage respect to the query given by user
Yes
Sending back relevant event tuples from
No
storage to the rule generator
Generating Mined Rules from relevant event tuples by Rule generator
Delete the predicate
Is there any predicate makes
Yes
from the list of
no limitation on
predicates for this
data returned by
query
Query?
To model mined rules using NFA modeler
End
Figure 9: A flowchart for generating mined rules Match the case on
Generating event patterns using modeled
the pattern
rules
of itemsets and uses a candidate generation function which exploits the downward closure property of support. In this paper we inspired from A-priori algorithm for extracting association rules from incoming events. a. Check if there is any query which has not been checked; go to 2 otherwise go to End. b. Check if there is any match case respect to Sequence part of query go to 3 otherwise Delete the query from the list of queries subjected to make patterns. c. Check if there is any predicate makes no limitation on data returned by Query. If there is any, delete the predicate from the list of predicates for this query otherwise go to 1. The flowchart, fig. 9, was presented can be helpful to better understanding about EAA.
Figure 8: A flowchart for the model
Some function boxes related to database optimization have not been involved in fig. 7 for interest of simplicity but they are addressed in section6 concisely. C. A-priori algorithm for optimization Considering the event base case study we are working on, it is needed to employ a high-performance algorithm. Many algorithms for generating association rules were presented. Some well known algorithms are A-priori, Eclat and FPGrowth. A-priori is the algorithm to mine association rules over time which is used by many researchers and found one the best [13, 26, 27, 28, 29, 30, 31, 32]. It uses a breadthfirst search strategy to counting the support of itemsets and uses a candidate generation function which exploits the downward closure property of support. In this paper we used A-priori algorithm for extracting association rules from incoming events.
E. System overload In case of overload two circumstances may be arisen in our system. a. User may give a non-optimized query with extra states or predicates. b. User does not know about data so query given by user may consist of extra states or predicates. A non-optimized query needs many states in modeling and consequently a large pattern will be produced. So it is needed to check predicates assigned to each transition arrow between states. Event processing for such a pattern can be time and memory consuming. Data presented in table I is set of simulated data sent to RFID devices from RFID tag readers. Then RFID readers send data to the storage as discrete tulpes. Attributes of this
D. A-priori algorithm for optimization Considering the event base case study we are working on, it is needed to employ a high-performance algorithm. Many algorithms for generating association rules were presented. Some well known algorithms are A-priori, Eclat and FP-Growth. A-priori is the algorithm to mine association rules over time which is used by researchers and has found one the best [13, 26, 27, 28, 29, 30, 31, 32]. It uses a breadth-first search strategy to counting the support
132 138
Table II. Two queries for shoplifting cases
table are: tag_ID is a unique identifier for each RFID tag shows which tag send data to RFID reader. Reader_ID is the unique identifier for RFID reader. Arrival_Time shows when a certain event happens (Exactly when RFID reader reads it from RFID tag). Duration is the time duration of the event.
st
1 continues query PATTERN SEQ(Warehouse w, ShoppingSaloon Sal), Shelf Sh), ~(CashRegister c), ~(Exit x) ) WHERE skip_till_next_match(w, Sal, Sh, c, x) { w.tag_id Null and Sal.tag_id = w.tag_id and w.tag_id = Sh.tag_id and c.tag_id = Sh.tag_id and x.tag_id = c.tag_id /* equivalently, [tag_id] */ } WITHIN 1 year
Here are two scenarios: 1. One of the staffs very likely does shoplifting. (1st query and 1st path ) 2. One of the customers or staffs does shoplifting. (2nd query and 2nd path) In table I returned tuples have been highlighted respect to two given queries, table II, and on hand data in storage. Each pair has been specified by the same level of darkness is related to each other since they has been matched to 2nd query.
nd
2
PATTERN SEQ(Shelf Sh , ~(CashRegister c), Exit x) WHERE skip_till_next_match (Sh, c, x) { Sh.tag_id Null and c.tag_id = Sh.tag_id and x.tag_id = c.tag_id /* equivalently, [tag_id] */ } WITHIN 12 hours
Table I. Data sent from RFID readers to the storage Duration
Reader_Locatio
R
Tag_ID
ReaderID
Arrival_Time
(sec)
n
1
P2191015
RW-1501
12-12-2008 13:40:30
6120
Warehouse
2
P1234567
RW-1555
12-12-2008 13:40:40
1650
Warehouse
3
P1234567
RW-1555
12-12-2008 14:20:05
5
Shopping Saloon
4
P2191015
RP-0100
13-12-2008 18:40:30
5
Shopping Saloon
5
P2191015
RS123211
14-12-2008 10:20:10
2100
Shelf
6
P1111111
RS-2345
14-12-2008 10:12:20
2220
Shelf
7
P4555111
RW-1501
12-12-2008 11:12:20
1225
Warehouse
8
P2191015
RE-3244
14-12-2008 12:24:10
5
Exit
9
P1111111
RE-1115
14-12-2008 12:12:20
5
Exit
10
P9999999
RS-1501
12-12-2008 12:16:30
6220
Shelf
continues query
Figure 10. Non-optimized pattern without considering association rules
Nodes can have some other edges as it is expected in NFA [1, 2, 3, 6]. Indeed there are multiple strategies can be applied [1, 2, 3, 7, 15]. Table III. Transition predicates on each state for each path
st
1 path 0,1,6,7,4,5
nd
P8888888
RW-1555
12-12-2008 12:20:40
1650
Shelf
12
P2191015
RE223244
14-12-2008 12:24:10
5
Exit
13
P9999999
RS-1501
12-12-2008 12:16:30
5
Cashier
14
P8888888
RW-1555
12-12-2008 12:20:40
5
Cashier
15
P2222222
RW-4040
12-12-2008 12:40:05
5
Shopping Saloon
16
P9999999
RS-1501
12-12-2008 13:10:30
5
Exit
17
P8888888
RW-1555
12-12-2008 13:10:40
5
Exit
18
P1212121
RW-1212
12-12-2008 14:04:20
1290
Warehouse
Sal.tagID = w.tagID
Sh.tagID = w.tagID
¬C.tagID =Sh.tagID
2 path 0,2,3,5 11
w.tagID Null
Sh.tagID= w.tagID ^ w.tagID Null
¬Cash.tagID =Sh.tagID
x.tagID= C.tag_id ^ x.tagID = Sh.tagID
Exit.tagID=
¬C.tagID ^ x.tagID = Sh.tagID
Considering association rules we can make sure the 1st scenario never happens. It means there is no match case for the 1st scenario in association rules. So we can ignore checking by pruning extra parts. By pruning this extra branch we check fewer conditions during runtime. Thus, we will have a better performance in our system. It can reduce the overload to 50% for data in table I. Indeed, some of predicates which mentions by user in query may be extra ,e.g. Exit.tagID = Shelf.tagID, which is deleted from optimized model because it does not have any effect on our system. Lifting extra overload can optimize system performance.
133 139
Mentioned techniques can reduce search time by minimizing search space. NFA in some cases is not expressible enough [14]. The advantages of using proposed rule engine has been discussed. Although implementing the storage and processes related to association rules is costly, using this system is worthy. Association Rules extraction can be done during idle time.
Figure 11. Optimized pattern considering association rules
nd
2 path 0,2,3,5
Sh.tagID =w.tagID ^ w.tagID Null
¬Cash.tagID= Sh.tagID
Exit.tagID= ¬Cash.tagID Exit.tagID = Sh.tagID
^
Table IV. Transition predicates on each state for each path
VI.
Authors would like to thank Research Management Centre (RMC) Universiti Teknologi Malaysia, for the research activities, and Soft Computing Research Group (SCRG) for the support in making this study a success.
Sometimes changes are smaller than above and just a single predicate is subjected for pruning. In such a case if all predicates for moving from one state to the next are extra so we can prune the related state. IV.
DISCUSSION
VII. REFERENCES
In this research steps needed to optimize complex events processing patterns was described. Comparison in term of advantages and disadvantages of proposed system is a key factor which is investigated in this section.
[1]. J.Agrawal, Y.Diao, D.Gyllstrom, and N.Immerman. “Efficient pattern matching over event streams,” Proc ACM, ACM SIGMOD international conference on Management of data. Vancouver, Canada, pp. 147-160, June 09 - 12, 2008. [2]. D.Gyllstrom, J.Agrawal, Y.Diao and N.Immerman. “On supporting Kleene Closure over event streams”, in press, The 24th International Conference on Data Engineering ICDE, Cancún, México, pp. 13911393, April 7-12, 2008. [3]. E.Wu, Y.Diao, and S.Rizvi. “High-performance complex event processing over streams”. In press, ACM SIGMOD international conference on Management of data, ACM ,New York, NY, USA, pp. 407–418, 2006. [4]. T.J.Owens, “Survey of event processing”, in Air force research laboratory information directorate Rome research site, New York, 2007. [5]. D.Luckham, “The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems”, Addison Wesley Publishers, 2002. [6]. A.Demers, J.Gehrke, M.Riedewald, V.Sharma, B.Panda and W.White, “Cayuga: A General Purpose Event Monitoring System. In Proc. Biennial Conf. on Innovative Data Systems Research , CIDR, pp. 411-422, 2007. [7]. D.Gyllstrom, Y.Diao, E.Wu, H.J.Chae, P.Stahlberg and G.Anderson, “SASE: Complex Event Processing over Streams”, 3rd Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, California, USA. pp. 407- 411, January 7-10 2007. [8]. David Luckham, Roy Schulte, “Event Processing Glossary - Version 1.1”, [website] Available at: “http://www.ep-ts.com/component/ option, com_docman/task,doc_download/gid,66/Itemid,84/”, 2008. [9]. M.K.Aguilera, R.E.Strom , D.C.Sturman, M.Astley and T.D.Chandra. “Matching events in a content-based subscription system”. In press, PODC, Atlanta, Georgia, United States, pp. 53–61, 1999. [10]. R.Sadri, C.zaniolo, A.Zarkesh, and J.Adibi, “Optimization of sequence queries in database systems”, in press, Conference: 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Publisher: Association for Computing Machinery 2001, Santa Barbara, pp. 71-81, May 21-23 2001. [11]. R.Sadri, C.zaniolo, A.Zarkesh, and J.Adibi, ”Expressing and optimizing sequence queries in database systems”, in press, ACM, Volume 29 , Issue 2, pp. 282 – 318, June 2004. [12]. F.Bry, and M.Eckert, “Temporal order optimizations of incremental joins for composite event detection”, in press, ACM International Conference Proceeding Series; Vol. 233, international conference on Distributed event-based systems, Toronto, Ontario, Canada, ISBN:978-1-59593-665-3, , pp. 85 – 90, 2007.
A. Overall performance The overall performance of system shown better results than prior systems due to: i. Reducing CPU overload. a. Fewer patterns are investigated for pattern matching which can reduce CPU overload. b. Patterns can be made in system idle to reduce CPU overload while many events have to be checked over patterns. ii. On hand data in storage are useful for: a. Running SQL or CQL is allowed which can make better and reliable results. b. Prediction and Estimation of critical circumstances optimizing decision support systems using on hand data in storage. c. Utilizing Storage and Query is more efficient technique rather than using query alone. Defects are: i. Storage is needed for rule engine. ii. Generating association rules make overload for system. The optimization will has much effect on complicated queries with a long sequence of events. V.
ACKNOWLEDGMENTS
SUMMARY
In order to implement an efficient rule engine, storage is needed to store upcoming event tuples. These stored events have been used to generate association rules. Generated rules are the basement of our optimization due to understanding which predicates or states are extra. Then extracted rule has been modeled using NFA modeler. Then the pattern was constructed. This system has a better performance compare with other systems [1, 2, 3, 7, 18] which do not use historical data. Although today memory is cheap, in order to search efficiently on storage space, data cleansing and data clustering techniques are useful.
134 140
[31] R. Agrawal, T. Imielinski and A. Swami, "Mining Association Rules Between Sets of Items in Large Databases”, Proc. ACM SIGMOD Conf. Management of Data, pp. 207-216, Washington, D.C., May 1993. [32] E.R.Omiecinski, ”Alternative interest measures for mining associations in databases”, Knowledge and Data Engineering, IEEE Transactions, pp. 57- 69, Volume: 15, Issue: 1, Jan.-Feb. 2003. [33] C.Romero, S.Ventura , and P.De Bra, “Knowledge Discovery with Genetic Programming for Providing Feedback to Courseware Authors, User Modeling and User-Adapted Interaction”, v.14 n.5, p.425-464, January 2005. [34] A.Abu Bakar , M.Nasir Sulaiman , M.Othman and M.Hasan Selamat, “IP algorithms in compact rough classification modeling, Intelligent Data Analysis”, v.5 n.5, p.419-429, October 2001. [35] J.Lin, T.Huang, B.Zhao, “A Fast Fuzzy Set Intrusion Detection Model”, International Symposium on Knowledge Acquisition and Modeling, pp. 601-605, Dec 2008. [36] F.Heuvel and G.Vosselman, “Efficient 3D-modeling of buildings using A-priori geometric object information”, Conference: Videometrics V, Proceedings of SPIE - The International Society for Optical Engineering, v 3174, p 38-49, 30 July 1997.
[13]. R.Agrawal and R.Srikant. “Fast algorithms for mining association rules in large databases”, Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, Santiago, Chile, pp. 487-499, September 1994. [14]. Y.Mei and S.madden.” ZStream: A Cost-based Query Processor for Adaptively Detecting Composite Event”, proc, SIGMOD’09, Providence, Rhode Island, USA, pp. 193- 206, June 29–July 2 2009. [15]. Y. Diao, N. Immerman, and D. Gyllstrom. Sase+: An agile language for kleene closure over event streams. In UMass, Technical Report 07-03, 2007. [16]. S.Konstantinidis, S.Nicolae, and S.Yu , “Fuzzification of rational and recognizable sets, Fundamental Information” , in press, in Fundamenta Informaticae, In press, v 76, n 4, ISSN: 01692968, Publisher: IOS Press, p 413-447, 2007. [17]. G.Liu, Yonghui Fang, Xufei Zhen and Yuhui Qiu,”Tuning Neurofuzzy Function Approximator by Tabu Search”, in press, Springer, ISNN (1), pp. 276–281, 2004. [18]. R.A.Baeza and G.H.Gonnet, “Fast text searching for regular expressions or automaton searching on tries”, in press, Journal of the ACM (JACM), ISSN:0004-5411, Volume 43, Issue 6, pp: 915 – 936, 1996. [19]. D.Wagner and D.Dean, ”Intrusion Detection via Static Analysis”, In press, IEEE conference, Oakland, CA, USA, ISBN: 0-7695-1046-9, pp.156-168, May 2001. [20]. Y.Abarbanel, I.Beer, L.Gluhovsky, S.Keidar and Y.Wolfsthal, “FoCs - Automatic Generation of Simulation Checkers from Formal Specifications”, In press, Lecture notes in computer science, Springer, ISBN: 3-540-67770-4, pp. 538-542, 2000. [21]. M.Suntinger, J.Sciefer, H.Obweger, G.Hannes and M.Eduard, ”The event tunnel: Interactive visualization of complex event streams for business process pattern analysis”, proc, IEEE Pacific Visualization Symposium (PacificVis 2008) , pp.111- 118, 2008. [22]. C.zang ,Y.Fan and R.Liu, “Architecture, implementation and application of complex event processing in enterprise information systems based on RFID”, In press, in Special Issue on Enterprise Information Systems (EIS), v 10, n 5, pp 543-553, November 2008. [23]. RVecera, S.Rozsnyai and H.Roth, “Indexing and search of correlated business events”, proc, The Second International Conference on Availability, Reliability and Security, ARES 2007, ISBN: 0-76952775-2, pp 1124-1131, 2007. [24]. R.S.Barga, and H.Caituiro-Monge, “Event Correlation and Pattern Detection in CEDR”, In press, Springer-Verlag, Berlin Heidelberg, ISBN 3-540-46788-2, pp. 919 – 930, 2006. [25]. K.Kepaptsoglou, M.G.Karlaftis, T.Bitsikas, P.Panetsos, and S.Lambropoulos, “A methodology and decision support system for scheduling inspections in a bridge network following a natural disaster”, Proc, 3rd International Conference on Bridge Maintenance, Safety and Management - Bridge Maintenance, Safety, Management, Life-Cycle Performance and Cost, p 419-420, 2006. [26] J.Roberto, Jr.Bayardo, and R.Agrawal, “Mining the most interesting rules”, In press, International Conference on Knowledge Discovery and Data Mining 1999, Pages: 145- 154 , 1999. [27] G.Cong, A.K.H.Tung, X.Xu, F.Pan, and J.Yang, “FARMER: finding interesting rule groups in microarray datasets,” Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France [doi: 10.1145/1007568.1007587]. [28] M.J.Pazzani, “Knowledge discovery from data,” IEEE Intelligent Systems, vol. 15, no. 2, pp. 10-13, doi:10.1109/5254.850821, Mar/ Apr. 2000. [29] W.Kosters, E.Marchiori, and A.Oerlemans, “Mining cluster with association rules”, Lecture Notes in Computer Science 1642, Springer, pp.39-50, 1999. [30] J.Hipp, U.Güntzer, “Is pushing constraints deeply into the mining algorithms really what we want? : an alternative approach for association rule mining”, ACM SIGKDD Explorations Newsletter, v.4 n.1, pp. 50- 55, June 2002.
135 141