Event queries formulate patterns of events (i) being of certain types that (ii) may have specified payload data values, (iii) within a time window, and (iv) according to a consumption policy. These consumption policies control how a CEP handles multiple events of the same type that arrive at different positions in the event stream. If events match the query, a result event can be returned and further processed.
Executing Event Queries
Matching event patterns to event queries is not trivial. For example, CEP systems must account for queries containing closures, i.e., sequences of events with indefinite length. Thus, they can be implemented as nondeterministic finite automata (NFAs). Let us imagine a query where an event of type A shall be followed by multiple events of type B and finally an event of type C. An incoming event of type A initializes the automaton. In contrast, each subsequent event of type B simultaneously transitions the automaton to a state waiting for more B events and to the state waiting for C. With each event of type B consumed by the CEP system, the number of intermediate partial matches grows exponentially.
Maintaining all intermediate results is challenging for commercial CEP systems offered by different companies. Yet, efficiency in terms of execution speed is crucial for CEP systems, as the value of information decreases over time and many use cases for CEP are time-critical. For instance, the example mentioned above of credit card fraud detection relies on minimal delays to protect customers from financial losses.
Increasing the Efficiency of CEP Systems
One approach to speed up CEP is parallelizing pattern matching [1]. The event stream is split into batches, and multiple compute resources perform the matching concurrently. Accounting for matches that span batches is achieved by overlapping batches and a share of redundant computation.
Another way to increase execution speed is lowering access latencies of fetching additional remote data. The EIRES framework achieves this goal by two means: prefetching and lazy evaluation [3]. Prefetching predicts which data is needed in a subsequent state based on the information available in a prior state. The prefetched data is locally cached. However, the predictions may not be correct and data must be requested again. Lazy evaluation moves the calculation of predicates to a subsequent state where the requested data arrives. Though data does not need to be fetched again, lazy evaluation leads to additional partial matches. The framework combines both approaches. Decisions on when to fetch which data are made based on utility estimations and monitoring of cache misses.
The third approach for increasing the efficiency of CEP is load shedding [4]. It aims to provide best-effort processing instead of accurate results when the CEP system is overloaded. Rather than slowing down the computation, only certain relevant matches are detected without increasing the response latency. The presented approach combines disregarding parts of the input, which is very beneficial for performance, and discarding partial results, which provides finer-grained filtering.
Conclusion
Analyzing events is a promising solution for many applications that make decisions or monitor systems in real-time. Complex event processing enables deriving sophisticated knowledge based on matching patterns of events from continuous event streams. As the execution of queries on event streams is computationally complex but time-critical, there are several approaches to reduce the response latencies of CEP systems. These approaches accomplish increasing performance by:
- distributing computation to multiple resources,
- lowering the time spent on waiting for additional remote data, and
- intelligently focusing on a subset of pattern matches in overload situations.
References
[1] Cagri Balkesen et al. "RIP: Run-based Intra-query Parallelism for Scalable Complex Event Processing". In: Proceedings of the International Conference on Distributed Event-Based Systems (DEBS). 2013
[2] Lukasz Golab and M. Tamer Özsu. "Issues in Data Stream Management". In: SIGMOD Record 32.2 (2003)
[3] Bo Zhao et al. "EIRES: Efficient Integration of Remote Data in Event Stream Processing". In: Proceedings of the International Conference on Management of Data (SIGMOD). 2021.
[4] Bo Zhao, Nguyen Quoc Viet Hung, and Matthias Weidlich. "Load Shedding for Complex Event Processing: Input-based and State-based Techniques". In: Proceedings of the International Conference on Data Engineering (ICDE). 2020.
Figure 1 is taken from Prof. Weidlich's presentation, which is available on Tele-Task.