Process Mining: Acquaintance

Greetings, Habrahabr!

In this article I will try to open the curtain on an interesting technology from the field of business process management ( eng wikiBPM ). Intelligent analysis process ( eng wikithe Process Mining ) focuses on the discovery, analysis and optimization of business processes based on the data from the event logs (Eng. Event logs), representing a missing link between the classical analysis of business processes using their models and data mining ( eng wikithe Data Mining )

This article is based on materials from the online course courseraProcess Mining: Data Science in Action , owned by the Technical University of Eindhoven . The use of article materials is possible only with the permission of the authors of the course and with reference to the source.

Figure 1. Positioning of Process Mining.

Next, we will develop the topic of positioning, touch on use cases, talk about the source data and consider various types of process intelligence.


Process intelligence uses data to analyze business processes, neglecting the analysis of the data itself. In other words, Process Mining, unlike Data Mining, is not interested in low-level patterns in the source data and does not try to make decisions based on them, but it sets the task of optimizing business processes (especially end-to-end) arising from the source data.

The questions that Process Mining answers can be divided into two groups (see the left and right arrows in Figure 1):
  • Issues of productivity (efficiency) of processes.
  • Consistency issues.

Use cases

The table below shows some of the options for using process intelligence, as well as related issues, broken down into the above groups.
No.Use caseQuestionsQuestion Group
1Real business process discoveryWhat does a process look like that actually (and not in words and not in theory) describes the current activity?Coherence
2Finding bottlenecks in business processesWhere in the process are places that limit the overall speed of its execution? What causes the appearance of such places?Performance
3Identify deviations in business processesWhere does the real process deviate from the expected (ideal) process? Why do such deviations occur?Coherence
4Finding Fast / Short Ways to Run Business ProcessesHow to complete the process the fastest? How to complete the process in the least number of steps?Performance
5Forecasting problems in business processesIs it possible to predict the occurrence of delays / deviations / risks / ... during the process?Performance / Consistency

Initial data

Often the starting point for process intelligence is data from event logs. Consider a magazine that suits us. Each line in such a log corresponds to a separate event. In turn, each event carries information about the event that generated it, performed in its framework of activity and the time of its registration. Such event logs can be considered as sets of cases, and individual cases as sequences of events referring to them.

Based on the above assumptions, we highlight the main attributes of events in the logs:
  • Event identifier ( a case id ): stores instances (objects), which are built log of events.
  • Activity name : stores actions performed as part of log events.
  • The time stamp ( a timestamp ): stores the date and time of log events.
  • Resource ( resource ): keeps the main actors log events (those who perform actions within the event log).
  • Other ( other data ): here all the information remaining in the journal (not interesting to us) gets here.

Figure 2. Event Log - Patient Admission Data.

Of course, the choice of the above attributes depends on the objectives of the analysis. For example (we look at Figure 2), if we are interested in the process that describes the procedure for patients to receive the appropriate treatment, then we use patients as the case identifiers ( patient column ), activities as the procedures received by patients ( activity column ), and resources that designate the doctors performing these procedures ( column doctor ). If we are interested in another process that describes the procedure for doctors to perform the procedures, then the identifiers of the events will be the doctors themselves (column doctor ), activities - the procedures performed by these doctors (column activity), and resources - attention, doctors will also become (column doctor ).

Types of Process Mining

Process intelligence focuses on the relationship between business process models and event data. There are three types of such relationships, which determine the types of analysis.

Play out

We start with a ready-made process model. Next, we simulate various scenarios of the process (according to the model) for filling the event log with data on events recorded during the simulation.

Figure 3. Example Play-Out.

Figure 3 shows an example of a simulation of a finished workflow model (English workflow). The process model is implemented using simplified eng wikiBPMN notation . The steps in one of the possible ways of the process execution are shown in red, and the log below is filled with data about the events in the order they were recorded while passing this path.

Play-Out is used to check the developed process models for compliance with the expected data (sequences of events) from their execution.

Play in

We start with the finished data in the event log. Next, we obtain a process model that ensures the execution of the sequences of events presented in the journal (we train the process model based on data).

Figure 4. Example Play-In.

Figure 4 shows an example of obtaining a process model from ready-made sequences of events (indicated in red). If you look closely, you will notice that all sequences of events in the figure begin at step a and end at step g or h . The resulting process model exactly corresponds to the observed features, which illustrates the basic principle of its derivation from the data.

Play-In is useful when you need a formal description of processes that generate known data.


At the same time, we use a process model (possibly obtained using the Play-In) and data in the event log (possibly obtained using the Play-Out) to play real sequences of events according to the model.

Figure 5. Replay example.

Figure 5 shows an example of an attempt to reproduce an existing sequence of events according to a ready-made process model. The attempt failed due to the fact that the model requires passing step d before the transition to step e is opened (a more detailed understanding of the causes of failure will help studying the gateways (English gateways) of eng wikiBPMN notation ).

Replay allows you to find the deviations of models from real processes, but can also be used to analyze the performance of processes - when playing back, start to note the time of registration of events, as you will see the places of delays and speed sections along the paths of the processes.


For those who wish to independently try to apply the acquired knowledge in practice, I hasten to report on a tool that will allow you to realize your bold undertakings. ProM is a free framework that includes everything you need to perform process intelligence. The stable version of ProM is available for download under Windows and other OSs . General information (including sample source data, guides, and exercises) is located on the ProM Tools website .


The existing gap between the analysis of business process models and data makes it difficult to find solutions to many interesting and complex tasks of the modern world, where the value of data has long been compared with the value of oil (see Data is the new oil ). Process Mining aims to bridge this gap by taking business process analysis to a new level.

Thank you for your attention and strongly recommend continuing to study the topic yourself! A great start is the aforementioned online course, courseraProcess Mining: Data Science in Action .

Also popular now: