6 practical tips for beginners when building a simple BI solution

    With this article, I would like to show the steps and offer some recommendations in the process of creating a BI solution using almost the entire Microsoft BI stack. In creating the BI solution, SQL Server, SQL Service Integration Services, SQL Server Analysis Services will be used.

    For example, we chose the development of our company RetailIQ - a BI-system of in-depth analysis of retail sales receipts, deliveries and stocks for a network of pharmacies. For a general understanding of the context of the topic: all data is downloaded from accounting systems (1C, M-Pharmacy, etc.), verified, added to a special database with the subsequent construction of multidimensional OLAP cubes. From sources (accounting systems) using ETL we transfer data to the storage, on the basis of which we build a cube, which will be discussed later.

    Data Mapping

    Building a data showcase is the most important step, since the correctness of the displayed data and the processing time of the SSAS cube depend on the data showcase.

    At this step, it is decided what data needs to be displayed (balances, sales, etc.) and in what sections (for example, product, date, employee, branch).

    Data sources are determined from which data can be obtained (files, web-services, other databases). What transformations need to be done in the data to maintain integrity.

    A data showcase is also being prepared - a relational database into which data from disparate sources will merge. To build a data showcase, you can use two data storage schemes: a star scheme and a snowflake . In this articleThere is a description of both schemes. Which scheme to apply in a particular case is largely dependent on the data. However, we usually use a snowflake scheme, since, in our opinion, this allows us to simplify data manipulation.

    Tip # 1 Sources for a cube should be views

    Sources for an OLAP cube, in our opinion, are best done in the form of representations, rather than directly linking to a table. This will allow you to make any changes to the source without changing the OLAP cube itself. Also, in our opinion, it is best not to make a request in the Data Source View itself, since it is problematic to make changes to the DSV of the SSAS project.

    Create ETL

    ETL is the process of transferring and transforming data according to specific business rules. For this step, we have a number of recommendations.

    Tip # 2 Line NA

    “No data” (NA) elements are added to the dimension tables. They will serve as a binding to the data for which there is no binding in the fact tables in the sources. For example, if we have sales for a product that cannot be determined, we will add NA with its key. If a dimension has a binding to other tables, then in those tables you also need to define NA elements and set a binding to these records in the NA element of the dimension.

    For example, suppose we have a Car table that has bindings to the CarType and CarMark tables. We do something like this:

    INSERT INTO CarType (ID, Name) VALUES (0, ‘NA’)
    INSERT INTO CarMark (ID, Name) VALUES (0, ‘NA’)
    INSERT INTO Car (ID, Name, CarTypeID, CarMarkID) VALUES(0, ‘NA’, 0, 0)
    


    Tip # 3 Surrogate Keys

    We recommend generating a surrogate key, even if there is a primary key in the source. It is better to write the primary key of the source in a separate cell of the measurement table with the name “Native Key”. What do we get:

    • we can determine the NA element
    • primary keys from different sources may match
    • we have the freedom to choose the format of the primary key in our database (for example, we can use Guid, even if INT is used in the sources).

    Tip # 4 Setting NA Values ​​in an ETL Package

    If the CarID and EmployeeID cell values ​​in the source of facts contain NULL, or the data that is not in our measurements, then we use the following transformation:



    In both Lookups, set the “Specify how to handle rows with no matching entries” field to “Ignore failure” . Thus, unknown keys will be NULL. In the “Set NA To Dimension” element, NULL is replaced with the NA value for each dimension.

    Tip # 5 Documenting ETLs

    Upon completion of the package, create the following .xls file, which will serve as the documentation for our ETL package.



    From this table, you can easily determine where and where the data “flows” from.

    Create a cube

    The cube is the end point of our work. To create it, we also have several tips in store.
    Date and time must be separated into different dimensions.

    Typically, OLAP cubes created using SSAS do not work very well with dimensions that contain a large number of records. In our opinion, it is best to avoid cases where the date and time are in the same dimension. Suppose we want to create a Date-Time dimension that will be accurate to the second. The records in this dimension over 10 years will be: 10 years * 365 days * 24 hours * 60min * 60 seconds = 315,360,000 ≈ 315 million records.

    Accuracy of up to a second in analytical databases is usually not required, since the tasks of the cube are different compared to the operational bases, but if you still need to add time to the cube, then it is better to do this in a separate dimension.

    Tip # 6 Creating hierarchies with the same members

    Suppose we have a task to build a hierarchy: Car type-> Car make-> Car name from a table of the form:

    CaridNameCartypeidCartypeCarmarkCarmark
    1Mercedes-Benz F 800 Style1Sport1Mercedes
    2Smart2Microcar1Mercedes


    CarID in this case will be the key to the dimension, and CarTypeID and CarMarkID are attributes of the dimension. We assume that after processing the cube we get the following:



    But, unfortunately, it just won’t succeed, the Mercedes group will belong to either Sport or Microcar (it depends on which line will be processed in the first place). This restriction can be circumvented by using the composite key for the attribute. Let's make a key for the CarMark attribute of the form CarTypeID + '_' + CarMarkID. As a result, at the entrance to the cube we get something like this table:

    CaridNameCartypeidCartypeCarmarkCarmark
    1Mercedes-Benz F 800 Style1Sport1_1Mercedes
    2Mercedes-Benz F 800 Style2Microcar1_2Mercedes


    As a result, we obtain the hierarchy necessary for us.

    We would also recommend that for each dimension attribute, to the extent possible, determine the name and key from individual sources.
    We have provided only the simplest recommendations when building an OLAP cube, which can add flexibility and extensibility to your BI solutions. We hope you find the tips useful and make the job of creating analytical solutions easier!

    Sources
    Basic information about cubes can be found in the article habrahabr.ru/post/66356 .

    The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data

    Expert Cube Development with Microsoft SQL Server 2008 Analysis Services

    www.sql.ru - :)


    Also popular now: