Buy a finished MDM or develop your own?

    Here I already wrote about what an MDM system is and why it is needed. Now I want to touch on the topic of choice, which one way or another faces all who are thinking about managing master data: whether to buy a ready-made MDM system or develop it on their own.

    A universal recipe traditionally does not exist, and everyone must decide for himself which path to choose. To make the right decision, you need to define a set of requirements for MDM, and then correctly evaluate your strengths and functional needs.

    Therefore, I will begin by describing the typical functionalities that a modern MDM system should have.

    Master Data Lifecycle Management:


    The key functionality of MDM systems is the ability to manage master data throughout its life cycle: from the moment it is defined to the moment it stops using it.

    To do this, the MDM system must support the following features:

    • Creating a data model . As part of creating a data model, master data objects, their model structures, and attributes are defined. It is fundamentally important to provide the ability to flexibly create and modify a data model throughout the life cycle of a master data object. In the process of daily work, situations often arise when you need to quickly add the missing attribute or change the existing schema of the master data object model. This task must be performed quickly, in user mode, without reprogramming and stopping the system.
    • Use of universal data warehouses . Unlike, for example, ERP systems, MDM data is stored in special formats that allow you to store data simultaneously in different DBMSs. This provides quick access to data in various scenarios and enables horizontal scaling and clustering of data warehouses. A typical approach is the separation of information domains in different data warehouses.
    • Storage of caches of "hot" data placed in memory with active paging. To ensure high speed access to data, the hottest data is loaded into various caches in RAM. Special mechanisms monitor the changing activity of a data request and, using forecasting tools, perform operational updating of data in caches.
    • Manage groupings and hierarchies of master data objects . The combination of master data objects in groups or hierarchies is used to solve many applied problems, for example, creating a hierarchy of organizations that are part of a holding, or grouping goods by any characteristic, etc.
    • Create and manage relationships . Relationships exist between master data objects both within the same domain and between domains. For example, you can establish several types of relationships between individuals and organizations: an individual can work in an organization, be a client of an organization, be a supplier of an organization, etc.
    • Versioning and storing change history . It is very important to store historical information not only by the master data objects themselves and the attributes of their models, but also by the structure of models, relationships with other objects, hierarchies, groupings, etc. ... For example, to make a decision it may be important to know that such and such an individual used to be an employee of such and such an organization. In the ideal case, the history should provide the ability to roll back the data state to any selected recovery point.
    • Maintaining taxonomies . Various taxonomies can be defined for master data objects. For example, for material and technical resources, one or more classifiers can be specified: the first is grouping elements from the point of view of the buyer according to product categories, and the second from the point of view of the buyer by suppliers. The set of attributes of the model of a particular master data element may depend on the established taxonomy.
    • Data security. The MDM system should have tools to configure and ensure access control for data both at the record level and at the attribute level.
    • Conducting an audit . It should be possible to trace the history of all changes to data and models, by whom and when they were committed.

    Quality control


    Poor-quality data nullifies the whole effect of data centralization and their centralized management.

    To manage data quality in the system, the following mechanisms and tools must be present:

    • Data Analysis and Profiling. Before proceeding with any manipulations with the data, it is necessary to study this data. Automatic mechanisms for analyzing and profiling data make it possible to roughly assess the quality of data, identify errors in the data, and build a strategy for their processing. To analyze data without reference to any subject area, statistical analysis methods are most often used. This analysis allows us to identify the presence and depth of problems with data completeness (omissions), “suspicious” records (extreme values ​​and outliers by one of the attributes, records that did not fall into any cluster), unsuitable attributes without prior preparation for use in machine learning methods (omissions, outliers, extreme values, low frequency of encounters of some unique values, etc.). If you conduct analysis with immersion in the subject area and the analyzed domains, then you should consider the data subtypes (ordered and categorical) and data types (continuous and discrete) for each attribute. Having determined the main types, subtypes and types of data, it is possible to reasonably analyze the calculated statistical indicators and conduct profiling of the available data, determine ways of correcting their values, and prepare data for using modern modeling methods.
    • Validation, standardization, purification and enrichment of data . Here the simplest mechanisms can be applied such as converting values ​​to a single format (for example, phone numbers), deleting / replacing random interspersed characters of a "different" alphabet, removing extra spaces, replacing abbreviations and abbreviations in the dictionary, correcting obvious typos, etc. ... Also more sophisticated mechanisms can also be applied based on business rules used to clean and enrich external databases (for example, address databases or legal entities).
    • Identify duplicate master data entities . One of the key features of the system. There should be both deduplication mechanisms based on clear business rules for structured data (often used in the Clients domain) and various complex semantic mechanisms with self-learning capabilities for weakly structured and unstructured data (often used in the Nomenclature domain).
    • The work of data stewards (experts) engaged in semi-automatic or manual data processing . There should be workplaces where it is convenient to perform various manipulations with data that could not be performed automatically at earlier stages, or stages for which the decision maker is responsible. The work of data stewards may include editing attributes that are not amenable to automatic processing, confirming duplicates, and selecting a “surviving” element or attribute, etc.
    • Evaluation of data quality changes over time . This is an opportunity to create specialized KPIs for data quality and track their status over time. Based on these indicators, it is possible to build a motivation policy for the NSI unit in companies.

    Integration and synchronization of information


    The task of integrating and synchronizing information between the MDM system and application systems consuming master data is one of the main ones. Data must be synchronized between all participants in the interaction. Often this function is provided not by the MDM system itself, but by a specialized ESB or MQ system. Ideally, the ESB system should be built on a technology platform unified with the MDM system, as this ensures maximum integration between them.

    To build interaction mechanisms, the following capabilities must be present:

    • Getting data or changes in data from application systems in synchronous and asynchronous modes.
    • Distribution of master data from the MDM system by application systems in synchronous and asynchronous modes.
    • Transfer of various events from an MDM system to application systems. For example, that some data is out of date, and we are no longer working with some client, and the data about it needs to be deleted or sent to the archive.
    • Correction of synchronization errors: tracking sent, but not received data, resending, conflict resolution in the relevance of the transmitted data, etc.
    • It is important to provide real-time interaction for the effective functioning of end-to-end business processes. This is especially important with the operational method of using MDM (Operational MDM), when application systems can use MDM services as part of a single business transaction.

    The list given by me does not claim to be complete. There are a large number of other functions of MDM systems. I just cited the most important ones, without which most full-fledged MDM implementations cannot do.

    And yet, write or buy?


    If you are inclined to implement an MDM system on your own, then evaluate which of the above functions you need not only now, but also in the future. Often, companies that go along this path start with the implementation of a kind of central database into which master data objects are placed in a high degree of readiness (manually or using batch loading) and from which master data is then uploaded to subscription application systems . Often this approach is called “centralizing the entry of reference books”, which is used to provide a single point of entry for reference information, actually to simplify its input. In most cases, projects tied to independent development end there, and further development of the functionality does not occur due to the greater complexity of implementing other capabilities of MDM systems. Strictly speaking, such results cannot be considered full-fledged master data management. Nevertheless, for some companies that do not have so many master data entities and low requirements for data quality, this is enough.

    If you do not want to be limited to a simple “centralization of the input of directories”, then with a high degree of probability you will want to implement a ready-made MDM system. Here you have a difficult choice of an MDM system that is best for you.

    Regarding the functionality of the system, you should analyze only after you more or less imagine what tasks you want to solve using the MDM system: what data domains you have, what method of use and what style implementation style you choose. Here is more about this. Only after defining the tasks can you evaluate the functionality of the MDM system, as not all MDM systems are equally good in a wide variety of domains / usage methods / implementation styles.

    In addition to the functions listed above, pay particular attention to the following aspects of MDM systems:

    • Domain support. Historically, many MDM systems have developed architectures that support a single domain, such as Clients. Such systems often poorly support other domains and do not specialize in them. For example, the principles of working with the data of the Clients domain and with the data of the Product domain are very different. Therefore, it’s categorically insufficient to analyze the system’s functionality using an example of a single domain; you need to look at everything.
    • If you plan to implement the Collaborative method of use, then pay attention to the convenience of setting up business processes and user roles. If possible, this should be done without programming, in parametric mode, since processes and regulations often change.
    • If you plan to implement the Operational method of use with the maximum automation of data processing functions and with the minimum involvement of data stewards, you need to pay attention to the availability of automatic processing mechanisms and mechanisms for setting up the sequence of their use, to the availability of quick transfer methods data between source systems and MDM.

    It is also critical to pay attention to the performance and fault tolerance of the system you are offering. Without these two properties, any MDM functionality would be useless.

    Here are some points to be sure to check:

    1. Ask a potential MDM provider to model the largest master data object and load this data into MDM. Estimate the speed of data loading.
    2. Perform various searches on the downloaded data: search by basic attributes, search by additional attributes, fuzzy search by different algorithms, full-text search. Rate your search. This is a very important basic parameter. Many other system functions and their speed depend on the speed and quality of the search. If at this stage the system slows down, then it will only get worse.
    3. Change the model of any master data object or its attribute. Estimate the speed of restructuring information and the speed of rollbacks in the event of an unforeseen situation.
    4. Analyze the response time of the system to standard queries in the mode of use that is planned for implementation in your company. For example, many MDM systems work satisfactorily in the Transactional Hub mode, when all the data is entered directly into the MDM and then distributed to the subscriber systems, but their performance is not enough when working in the Coexistence Hub mode, when you need to interact very quickly between systems in two-way in real time.
    5. Analyze what integration mechanisms the MDM system supports and how much this is consistent with the systems with which it is supposed to interact. Check the ease of connection of new subscriber systems and the speed of their connection. Also important is the ability to change the logic and routes of receiving and distributing data without deep modification of all systems and with minimal downtime.

    In any case, the path selection process is a creative process and it is impossible to predict all the questions that will arise, but I tried to describe the main ones that seem important to me.

    Maxim Vlasov, Development Director

    Also popular now: