Business processes. Extract BPMN model from document. Part 1
Modern projects on optimization and automation of business processes, as a rule, assume at the initial stage the analysis of large volumes of Customer documents in order to simulate as-is business processes on their basis in a short time. The list of analyzed documents may include regulations, industry standards, interview protocols, regulations, regulations, technical tasks and other corporate documents.
The project analyst is assigned a rather laborious and, at the same time, routine task , which currently has no automation equipment. As the analysis of modern business process modeling tools shows, even such well-known applications in the market asEnterprise Architect, Business Studio, Bizagi Modeler - do not have mechanisms to support building models of business processes according to their textual description.
The article solves the problem of extracting the BPMN-model from the document.
It should be noted that at present in the market of business process management ( BPM ) there is a technology of intellectual analysis of processes ( Process Mining ). However, unlike the technology described below, the input to the Process Mining system is a database with the results of the business process being modeled, and not a set of documents with its textual description.
The formulation of an ideal task can be represented as a “ big red button ”, by pressing which the entire volume of the documents to be analyzed is automatically converted into a network of BPMN-models of the Customer’s business processes available for analysis, optimization and automation.
Solving the problem in such a formulation is a matter of the future. We introduce a series of logical and technical constraints for a real pilot task.
Objective: To minimize the complexity of building a business process model for the text description while ensuring the completeness and connectedness of the model.
At the entrance there is a document in Microsoft Word format , which:
At the output we get an xml file in the format BPMN2.0 , which:
As a test example will use the text description, such widespread process as Incident Management ( Incident Management ) standard library ITIL ( Information Technology Infrastructure Library ). The test case is consciously taken in English. English has no cases and is selected to facilitate the processing of references ( coreferences ) to the elements of a business process as part of a pilot task ( this will be discussed in more detail in the second part ).
At the output, an Incident Management model “ no worse than»Provided in the library ITIL flowchart. By “ not worse ” criterion we will understand: the completeness and connectedness of business functions, data, decision-making conditions and participants in a business process.
Figure 1. A flowchart of the Incident Management process (ITIL v.3 Official Introduction, p.98)
According to the BPMN glossary ( Business Process Model and Notation, version 2.0 ), the business process ( Process ) is represented as "the graph of Flow-elements (a set of activities, events, gateways) and the Sequence Flow relationships that link them into an executable stream ."
Definition By BPMN-graph we mean a finite, directed graph ( Graph Theory ) with the following extensions:
Statement 1. Textual description of the business process in the document (in natural language) - contains the BPMN graph in an implicit form .
Proposition 2. The task of extracting BPMN model from the document belongs to a class of information extraction from semi-structured problems of machine-readable documents ( Information Part extraction ), whose main subtasks are: identification of entities ( named is the entity recognition ), identification of relationships ( relationship extraction ), reference resolution ( coreference resolution ) .
Combining the algorithms of graph theory and information extraction , we obtain the following solution steps .
Figure 2. Process diagram of extracting a BPMN model from a document (BPMN Text Extraction)
For marking BPMN-elements of the business process in the document we will use BPMN-tags.
Definition A BPMN tag is a colored text marker with an identifier containing the type of BPMN element. The name and color of a BPMN tag corresponds to a specific category of BPMN element.
Below are the colors, categories and types of BPMN tags, as well as recommendations for marking up the document ( finding the exact rules for identifying BPMN elements is the task of the next stage of the project ).
Table 1. Description of BPMN tags.
General principle for performing operations with BPMN tags: select the text fragment containing the BPMN element and press the button of the corresponding BPMN tag .
For example, to select a business process, select " INCIDENT MANAGEMENT ", then click the < Business Proces s> button . The background of the selected BPMN element is colored in the color of the selected BPMN tag, and a bookmark with the BPMN tag identifier is added to the document tabs.
Figure 3. The menu bar of the BPMN tab (a group of BPMN tags, Edit tags)
The following are the main operations on BPMN tags:
As a result of the markup of the test document, we obtain the following result. Figure 4. BPMN markup of the text description of the Incident Management process (the picture is clickable) Note that the text contains “ duplicate ” BPMN tags that have the same text and color (for example, Service Desk, Problem Management, Incident Record ) are links to one the same element of the process. Processing of such references ( coreferences ) will be considered at the 2nd step of the solution. To be continued…
The project analyst is assigned a rather laborious and, at the same time, routine task , which currently has no automation equipment. As the analysis of modern business process modeling tools shows, even such well-known applications in the market asEnterprise Architect, Business Studio, Bizagi Modeler - do not have mechanisms to support building models of business processes according to their textual description.
The article solves the problem of extracting the BPMN-model from the document.
It should be noted that at present in the market of business process management ( BPM ) there is a technology of intellectual analysis of processes ( Process Mining ). However, unlike the technology described below, the input to the Process Mining system is a database with the results of the business process being modeled, and not a set of documents with its textual description.
Formulation of the problem
The formulation of an ideal task can be represented as a “ big red button ”, by pressing which the entire volume of the documents to be analyzed is automatically converted into a network of BPMN-models of the Customer’s business processes available for analysis, optimization and automation.
Solving the problem in such a formulation is a matter of the future. We introduce a series of logical and technical constraints for a real pilot task.
Objective: To minimize the complexity of building a business process model for the text description while ensuring the completeness and connectedness of the model.
At the entrance there is a document in Microsoft Word format , which:
- contains a text description of one internal business process ( Private Business Process ).
- involved in the business process one performer ( the Participant ).
- the business process is described at the same level of detail (there are no sub- processes ).
At the output we get an xml file in the format BPMN2.0 , which:
- contains a business process model corresponding to the baseline description level ( BPMN Descriptive Conformance Sub-Class ).
- correctly opened for editing in Bizagi Modeler .
As a test example will use the text description, such widespread process as Incident Management ( Incident Management ) standard library ITIL ( Information Technology Infrastructure Library ). The test case is consciously taken in English. English has no cases and is selected to facilitate the processing of references ( coreferences ) to the elements of a business process as part of a pilot task ( this will be discussed in more detail in the second part ).
At the output, an Incident Management model “ no worse than»Provided in the library ITIL flowchart. By “ not worse ” criterion we will understand: the completeness and connectedness of business functions, data, decision-making conditions and participants in a business process.
Figure 1. A flowchart of the Incident Management process (ITIL v.3 Official Introduction, p.98)
Solution concept
According to the BPMN glossary ( Business Process Model and Notation, version 2.0 ), the business process ( Process ) is represented as "the graph of Flow-elements (a set of activities, events, gateways) and the Sequence Flow relationships that link them into an executable stream ."
Definition By BPMN-graph we mean a finite, directed graph ( Graph Theory ) with the following extensions:
- The vertices of the graph correspond to the BPMN-elements of the process ( Flow, Data, Participant ).
- The edges of the graph correspond to the BPMN process connections ( Sequence Flow, Message Flow, Association ).
- Vertices and edges have obligatory attributes: identifier ( id ), name ( name ), comment ( documentation ).
- Required vertex types are elements of the Flow category ( Activity, Event, Gateway ).
- Mandatory edge types are control flow connections ( Sequence Flow ).
Statement 1. Textual description of the business process in the document (in natural language) - contains the BPMN graph in an implicit form .
Proposition 2. The task of extracting BPMN model from the document belongs to a class of information extraction from semi-structured problems of machine-readable documents ( Information Part extraction ), whose main subtasks are: identification of entities ( named is the entity recognition ), identification of relationships ( relationship extraction ), reference resolution ( coreference resolution ) .
Combining the algorithms of graph theory and information extraction , we obtain the following solution steps .
- Document markup with BPMN tags ( to identify process elements ).
- Compiling BPMN tags into a BPMN process model ( to identify process associations ).
- Verification of the BPMN model ( to resolve links ).
- Correction of BPMN-model ( in case of non-compliance of the model with the text description ).
- Export the BPMN model to an xml file ( to convert a BPMN graph to a standard format ).
Figure 2. Process diagram of extracting a BPMN model from a document (BPMN Text Extraction)
Decision. Step 1: Markup of the document with BPMN tags
For marking BPMN-elements of the business process in the document we will use BPMN-tags.
Definition A BPMN tag is a colored text marker with an identifier containing the type of BPMN element. The name and color of a BPMN tag corresponds to a specific category of BPMN element.
Below are the colors, categories and types of BPMN tags, as well as recommendations for marking up the document ( finding the exact rules for identifying BPMN elements is the task of the next stage of the project ).
Table 1. Description of BPMN tags.
General principle for performing operations with BPMN tags: select the text fragment containing the BPMN element and press the button of the corresponding BPMN tag .
For example, to select a business process, select " INCIDENT MANAGEMENT ", then click the < Business Proces s> button . The background of the selected BPMN element is colored in the color of the selected BPMN tag, and a bookmark with the BPMN tag identifier is added to the document tabs.
Figure 3. The menu bar of the BPMN tab (a group of BPMN tags, Edit tags)
The following are the main operations on BPMN tags:
- Add ( BPMN tag ) - adds a new BPMN tag to the bookmarks of the document ( Word Bookmarks ) and marks the selected text with the corresponding color.
- Show / Hide ( Show Tags ) - enables / disables BPMN tags in the text of the document.
- Resize ( Resize ) - changes the text area marked BPMN-tag.
- Removing ( the Delete ) - delete BPMN-tag (bookmark and marker) from the document.
- Detailed Information ( Details ) - shows detailed information on the BPMN tag (identifier, category, type and text of the BPMN tag).
- Report ( the Report ) - shows a statistical report on the number and types of BPMN-tags in the active document.
As a result of the markup of the test document, we obtain the following result. Figure 4. BPMN markup of the text description of the Incident Management process (the picture is clickable) Note that the text contains “ duplicate ” BPMN tags that have the same text and color (for example, Service Desk, Problem Management, Incident Record ) are links to one the same element of the process. Processing of such references ( coreferences ) will be considered at the 2nd step of the solution. To be continued…