Hasso-Plattner-Institut
Prof. Dr. Tilmann Rabl
 

Efficient Complex Event Processing

Matthias Weidlich, HU Berlin

Biography

Prof. Matthias Weidlich has been the head of the Database and Information Systems group at Humboldt-Universität Berlin since 2018. His group focuses on process-oriented and event-driven information systems, especially on research in process mining, event stream processing, or explorative data analysis. Before his full professorship, he was a junior professor at the department. Previous affiliations include the Imperial College London, the Technion in Haifa, or the Université Paris Dauphine. He received his Ph.D. (Dr. rer. nat.) under the supervision of Prof. Weske at HPI in 2011.

Summary

Written by Stephan Krumm & Daniel Lindner

Events are everywhere around us. Everything from receiving an email over a credit card transaction to hiring a person can be denoted as an event and be further processed, aggregated, and analyzed by specific systems. In his lecture, Prof. Weidlich introduced the foundations of events and complex event processing (CEP) before outlining current approaches in research on performant and scalable CEP systems.

Though different definitions of an event exist in the literature, we can describe them as a significant change in state or a happening of interest. Intentionally, these definitions allow for virtually anything happening in the real world or a computer system, as long as it has three characteristics. In particular, an event is

  • observable and leads to a change in state,
  • instant and has no duration, and
  • occurs once, whereas only the notification of an event occurrence can be passed elsewhere.

Nevertheless, the terms event and event notification are commonly used interchangeably.

 

From Events to Complex Event Processing

A single event can already be of interest, e.g., an alarm from a fire detector that measures smoke emission is already sufficiently significant. In other cases, only patterns of multiple events provide enough information to ensure that a particular situation has happened. For instance, the fact that various transactions were made from a single IP address using different user accounts and credit card numbers in a short period hints at a potential credit card fraud. Complex event processing aggregates these individual events and derives knowledge from event patterns, resulting in the emission of a new event.

Such CEP systems listen to event streams, i.e., an infinite real-time, continuous, ordered sequence/set of events [2]. Hence, they have no control over the arrival time or the order of incoming events. Events in an event stream are of a specific type and have payload data that characterizes the event. CEP systems execute event queries on event streams, whereas they may also access data from a data store to fulfill their task. Figure 1 shows the overview of a CEP system.

Figure 1: Overview of a CEP System.

Event queries formulate patterns of events (i) being of certain types that (ii) may have specified payload data values, (iii) within a time window, and (iv) according to a consumption policy. These consumption policies control how a CEP handles multiple events of the same type that arrive at different positions in the event stream. If events match the query, a result event can be returned and further processed.

Executing Event Queries

Matching event patterns to event queries is not trivial. For example, CEP systems must account for queries containing closures, i.e., sequences of events with indefinite length. Thus, they can be implemented as nondeterministic finite automata (NFAs). Let us imagine a query where an event of type A shall be followed by multiple events of type B and finally an event of type C. An incoming event of type A initializes the automaton. In contrast, each subsequent event of type B simultaneously transitions the automaton to a state waiting for more B events and to the state waiting for C. With each event of type B consumed by the CEP system, the number of intermediate partial matches grows exponentially.

Maintaining all intermediate results is challenging for commercial CEP systems offered by different companies. Yet, efficiency in terms of execution speed is crucial for CEP systems, as the value of information decreases over time and many use cases for CEP are time-critical. For instance, the example mentioned above of credit card fraud detection relies on minimal delays to protect customers from financial losses.

Increasing the Efficiency of CEP Systems

One approach to speed up CEP is parallelizing pattern matching [1]. The event stream is split into batches, and multiple compute resources perform the matching concurrently. Accounting for matches that span batches is achieved by overlapping batches and a share of redundant computation.

Another way to increase execution speed is lowering access latencies of fetching additional remote data. The EIRES framework achieves this goal by two means: prefetching and lazy evaluation [3]. Prefetching predicts which data is needed in a subsequent state based on the information available in a prior state. The prefetched data is locally cached. However, the predictions may not be correct and data must be requested again. Lazy evaluation moves the calculation of predicates to a subsequent state where the requested data arrives. Though data does not need to be fetched again, lazy evaluation leads to additional partial matches. The framework combines both approaches. Decisions on when to fetch which data are made based on utility estimations and monitoring of cache misses.

The third approach for increasing the efficiency of CEP is load shedding [4]. It aims to provide best-effort processing instead of accurate results when the CEP system is overloaded. Rather than slowing down the computation, only certain relevant matches are detected without increasing the response latency. The presented approach combines disregarding parts of the input, which is very beneficial for performance, and discarding partial results, which provides finer-grained filtering.

Conclusion

Analyzing events is a promising solution for many applications that make decisions or monitor systems in real-time. Complex event processing enables deriving sophisticated knowledge based on matching patterns of events from continuous event streams. As the execution of queries on event streams is computationally complex but time-critical, there are several approaches to reduce the response latencies of CEP systems. These approaches accomplish increasing performance by:

  1. distributing computation to multiple resources,
  2. lowering the time spent on waiting for additional remote data, and
  3. intelligently focusing on a subset of pattern matches in overload situations.

 

References

[1] Cagri Balkesen et al. "RIP: Run-based Intra-query Parallelism for Scalable Complex Event Processing". In: Proceedings of the International Conference on Distributed Event-Based Systems (DEBS). 2013

[2] Lukasz Golab and M. Tamer Özsu. "Issues in Data Stream Management". In: SIGMOD Record 32.2 (2003)

[3] Bo Zhao et al. "EIRES: Efficient Integration of Remote Data in Event Stream Processing". In: Proceedings of the International Conference on Management of Data (SIGMOD). 2021.

[4] Bo Zhao, Nguyen Quoc Viet Hung, and Matthias Weidlich. "Load Shedding for Complex Event Processing: Input-based and State-based Techniques". In: Proceedings of the International Conference on Data Engineering (ICDE). 2020.

Figure 1 is taken from Prof. Weidlich's presentation, which is available on Tele-Task.