In the last two decades, process mining has developed from a niche discipline
to a significant research area with considerable impact on the industry. Organiza-
tions can gain deep insights about their running business processes by applying
different process mining techniques like discovery, conformance checking, and per-
formance analysis. All these techniques require as input an event log — a list of
timestamped events that mark meaningful happenings in an organization. These
events are created from the organization’s running information systems and are
usually extracted from the respective databases beforehand.
The extraction usually requires access to the whole database, specific knowledge
about its structure and domain expertise. This becomes challenging for several
reasons, i.e., the restricted number of database experts that are aware of the whole
database schema, distributed databases or microservice architectures where by
definition, there is no holistic perspective on all database layers. This makes the
extracted event logs even more valuable in that, besides capturing the behavioral
information, they are a rich source of domain-specific knowledge. This aspect of
event logs is usually overlooked.
This thesis proposes 1) a method that facilitates the extraction of event log from
a different kind of data source than a database and 2) a set of methods that enable
the discovery of the contextual and domain information shaping the business
processes.
Specifically, this thesis introduces a method that extracts an event log from so-
called Redo Logs (i.e., used to bring the database into a consistent state in case of
system failure) without having access to the source database. Using this method, the
process experts can work on process mining projects even when the organization’s
database structure and domain knowledge are not available or have limited access.
Process experts can use the event log to discover the process model (i.e., by
applying any process mining discovery algorithm), which provides information
about the steps executed within an organization to reach a certain business goal.
However, the discovered process model is not sufficient to understand how the
process model affects its contextual environment and the impact of the environment
on the process. The environment can be data or other processes that are intertwined
with the process at hand. This thesis equips the process experts with additional
methods that enable the identification of the data involved in the process as well
as discovering the behavior of such data alongside the process execution.
Finally, the data within one organization can not always be isolated in that other
processes can take place and they can share access to common data. That means
that having a narrow view of the data each process accesses or modifies is not suf-
ficient to understand how data changes and, most importantly, how these business
processes are interconnected. This thesis is concluded by providing a method to
discover the relations between processes, in the form of a Business Process Archi-
tecture, from a set of event logs. The state-of-the-art research on Business Process
Architecture relies only on process models, thus neglecting processes execution
information like event logs. Our discovered architecture captures behavioral depen-
dencies between two or more business process models using information stored in
event logs. We extended an existing business process architecture representation to
accommodate such behavior.
For each method mentioned above, a prototype is implemented and their feasi-
bility is evaluated based on the real-life datasets.