Layered extensible business information warehouse architecture. LSA and SAP BW. Traditional approach

    With the help of ERP-systems for more than 40 years ago, enterprises automate their business processes. Over time, as well as with the increase in the number and depth of automation of business processes, data volumes grow at a rapid pace. For companies operating in a competitive environment, the analysis of this information and the correct conclusions made on the basis of the analysis can bring commercial success: increase revenue, reduce costs, increase efficiency.

    The problem is that with the growth of data volumes, it becomes more difficult to analyze information. The main problem is low productivity and, often, the lack of special analysis tools in ERP systems. Therefore, remaining on the current architecture of ERP-systems, it is no longer possible to perform data analysis in an acceptable time. Everything works slowly, or with a steady tendency to slow down. Even the increase in computing power of ERP-system servers sometimes saves only in the short term.

    Therefore, about 30 years ago, software architects thought about creating a new class of systems - a data warehouse. The goals for implementing data warehousing (HD) are typically as follows:
    • Accumulation and storage of data in structures optimal for analysis;
    • Consolidation of data from several systems;
    • Reducing the computational burden on ERP systems - data providers;
    • Providing users with features:
      • Create reports on CD data independently using convenient tools;
      • Use data mining, OLAP analysis capabilities that have not been used before.

    The question immediately became how to properly organize the data in the repository, so that it would not only satisfy the requirement of “structures that are optimal for analysis”, but would also make it possible to manage and reduce the cost of resources for development and maintenance. Through trial and error, data warehouse architects came up with the layered pie architecture, LSA - Layered Scalable Architecture . Moreover, as the name implies, not just layered, but also scalable.

    image
    In the figure, LSA is a green block. Above it are tools for visualizing report data based on Excel, Internet browsers, or simply file formats for downloading data from CDs. Below the green LSA block are the types of data provider systems.

    From the point of view of implementation, it should be said that LSA is a logical division of data structures (in terms of DBMS tables) of data storage into several levels, each of which performs a certain function. Data is copied from level to level, transforming during copying and end up in structures that are optimal for analysis (see storage deployment goals). Each level is not one object, not one structure and table. In 99% of cases at the same level there are many objects of various structures.

    As mentioned earlier, the data should be copied from the supplier systems to the CD so that the load during analytical calculations falls on the CD - a system specially optimized for such tasks. LSA architecture for temporary storage of data in the download format provides a special "level of loading". When loading to this level, no data transformations are done. Only type checking is performed. For example, so that letters are not written in the number field, there is no “August 32”, etc.

    After the data at this level is saved, with t.z. further processing of the CD is already “all the same” where the data came from: from a file, an ERP system, or from a different type of source. Storage at boot level with t.z. LSA is assumed to be temporary. Therefore, usually immediately after the successful execution of the next download, the data is copied "higher", transforming along the way to the next level, etc. At the load level, it is usually recommended to store data for no more than 5 - 10 days, and then delete it. To store data for a longer period, to accumulate large volumes at this level is not rational for performance reasons. Indeed, when reading, filtering by non-indexed fields of tables of this level can be performed with a long delay.

    However, there are scenarios where data in the “download” format may be required in the CD much later than 5-10 days. For such cases, HD provides for the level of "corporate memory" not shown in the picture. By the structure and composition of the fields, each Corporate Memory object is identical to the corresponding load level object. However, data from corporate memory is not deleted so soon. But in order not to conditionally accumulate “infinite” volumes there, one should regularly archive data from this level with the possibility of subsequent quick recovery on demand.

    At the next level, called the "data warehouse level"the data is located in the most detailed structures that may be needed for consumers (user reports, other systems). Important notice: only for consumers according to the requirements laid down in the project! Although this is not always the case. An experienced HD architect should try to anticipate the requirements unknown at the time of the survey and design and include them in the model. When the client announces them, they will already (voi la!) Be provided by the data model, or the modification of the model will not be time-consuming.

    When loading to the storage level, the following operations can be performed (the list is incomplete):
    • Coordination of homogeneous in meaning, but different in reference codes codes. For example, gender F and F need to be reduced to the same value;
    • Calculations that require the most detailed data possible (for example, the conversion of currencies at the exchange rate from a document item);
    • Checks for the validity of analyst values ​​(check constraint);
    • Filtration.

    It is highly advisable to observe "isolation" when setting up loading in objects of this level. The transformation logic when loading one object of the storage level should not depend on the data of another object of the same level. Otherwise, it will be necessary to observe the order of loading, and if the logic of transformations is complicated, it is easy to find yourself in a situation of “mutual expectation” of the end of the download.

    It is at the level of storage data should provide a "single version of the truth» ( single version of truth ). In case of any questions on the part of consumers, the quality of the data should be addressed to the storage level and checked there. In order for this to work this way, the processes of timely and technically correct data loading to this level with the performance of all checks, etc. become very critical.

    Suppose this is all done, and at the data warehouse level we have a single version of the truth. But the information here is stored in the most detailed structure, which usually means a large amount of data. In addition, at this level, not all required indicators and analytics are calculated and saved. Indeed, it may be irrational to calculate all the required indicators at the level with maximum data granularity due to the large number of unnecessary calculations, or it is completely impossible in principle, since calculation may require not only preliminary aggregation of data, but also data of other objects of the same level. And other objects, for example, have different update regulations ...

    Therefore, there is a need for the next, already third from the bottom data level, which would store subsets of data adapted to the needs of a user group and / or report group. This level is called the " data storefront level ".

    At this level, along with simple aggregation (for example, compaction to a group of goods and a month of shipment), data is also merged from several objects of the data warehouse level. In objects of this level, the results of the calculation of indicators and the analyst are stored.

    The last level is virtual data providers and reports.. It is intended for combining (virtual, that is, without storage) data from various objects of both its own level and the level of “data marts”. And finally, user reports are defined here, which also usually include the logic for calculating indicators.

    The xDBMS cylinder in the diagram affects the 3 lower levels. It means the storage of data of the corresponding levels in the DBMS tables. The data of the highest “virtual” level, as follows from the scheme, are not stored in the DBMS tables, but are read from it.

    And, of course, the Napoleon cake on the diagram is an excellent metaphor for layering. As in the case of Napoleon, when the number of layers is not set by standards, LSA is also not a dogma. LSA concepts only offer the design of an HD data model to focus on a basic set of levels. And the rest is decidedConfectioner architect HD and client.

    SAP has launched a Data Warehouse class product back in 1998. This is SAP NetWeaver BW (Business Warehouse or SAP BW), widely known in the world of SAP. Since then, SAP BW has been constantly improved, and new versions are regularly released. SAP BW is not a ready data warehouse, it is a tool for creating data warehouses. The set of object types from which the data warehouse is constructed in SAP BW fits perfectly with the LSA levels: for each of them there is a recommended set of object types.
    • For the load level, these are data sources with PSA tables (persistent stage area),
    • For the data warehouse level, these are standard DSO (Data Store Object)
    • For the corporate memory level, this is a Write-optimized DSO
    • For the “data marts” level - infocubes.
    • Finally, for the last level of “Virtual Providers and Reports” - multi-providers, info sets, virtual info cubes and the reports themselves created in the Bex Query Designer tool.

    All these objects (with the exception of reports and data sources) at all these levels form a “data model” built in SAP BW from the so-called information objects - signs (other names: characteristics, analytics) and indicators. Info objects in BW are like bricks when building a house. This is from a technical point of view, and from a functional point of view, in the SAP BW data model, the info-objects-signs implement the reference information.

    In SAP BW, objects like "transformation" and "data transfer process" are used to copy data from level to level with the implementation of transformations. To regularly perform sequences of actions for extracting, transforming, and loading a data model, a constructor of “process chains” is provided.

    All versions of SAP BW can use the main vendors as relational DBMS: Oracle, Microsoft, IBM, SAP / Sybase. “Basic” here means that all business data, metadata, and system information are stored in DBMS tables.
    Whatever DBMS is chosen, uniform LSA rules are recommended for designing an effective data model in SAP BW.

    Starting from 2012, SAP BW, in addition to those mentioned above , supports SAP HANA as the main and DBMS . The new features of the SAP HANA DBMS and the platform of the same name made it possible to reconsider the LSA so that for BW on HANA some key approaches have been de facto changed, which is reflected in the new LSA ++ name . I will tell you more about the changes in one of the following publications.

    Also popular now: