Many-to-many, OLAP, and MS SQL Server Analysis Services
- From the sandbox
Many-to-Many Relationships and MS SQL Server Analysis Services
I want to devote this post to such a simple problem as the OLAP cube processing of many-to-many relationships using MS SQL Server Analysis Services.
First, a little about how I approached this. The authorities said that it would be nice to deploy some kind of analytics on the existing database. After conducting a small Google search, it turned out that OLAP technology is perfect. And since MS SQL Server is deployed on the company's server, the Analysis Services component there is even more excellent.
Rubbing my hands joyfully, I grabbed the Microsoft tutorial that comes bundled with SQL Server. And two days later I was sure that I could do everything. But, it wasn’t here ... In the existing database, most of the relationships between the tables turned out to be many-to-many relationships, which at first did not portend any complications. But in practice, it turns out that one cannot do without additional very meaningful hand-waving, because otherwise, the cube simply gives incorrect information.
In RuNet, finding information on this topic was not an easy task. Either this is so a matter of course for everyone, or the skis do not go. However, I still found a very cool large English manual on this topic. Actually, those who perfectly understand English or want to understand the topic very well are invited not to read further, but to follow the link: http://www.sqlbi.com/articles/many2many/
Those who, like me before, does not have time to read English textbooks, read on. The post is not a translation of this manual: rather, a certain summation of its first examples + some basics from the Microsoft textbook mentioned.
Actually what is the problem. If the reader is acquainted with this post: http://habrahabr.ru/post/67272/, he knows that the standard schemes for Olap are the "Star" and "Snowflake" schemes. But what if we were surrounded by many-to-many relationships?
The easiest option is to get away from many-to-many relationships using views - this will positively affect the speed of processing requests. Unable to leave? Let's get it right.
Formulation of the problem. Online store. MS SQL Server, on it is a database with the notorious M2M connections, which looks like this: A table of purchase identifiers, a category table (food, sports, etc.) and an account table are attached to it. Let's complicate the task: now let several people use one account at once (for example, husband and wife make a home purchase), respectively, through M2M, a person table is linked. And so that it’s not sugar at all: let the person category table be linked to the person table through M2M. And we are interested in: which categories of people, which purchases they make more often, and when they do it.
A far-fetched example, but the problem is still visible: how to connect the future dimension with a fact table through two M2M connections? It's simple, we will tell SSAS where to look.
Having performed the preliminary steps for creating dimensions (Types, Dates, Categories, Persons, Accounts) (see http://habrahabr.ru/post/67272/ ), we are trying to create a cube on the Sales measure (number of rows). By default, Visual Studio will offer us only three measures (Types, Accounts, Dates) - because only they are directly related to our measure. After creating the cube, add the remaining two measures with your hands. Plus, we create two more auxiliary measures inside the cube that will be responsible for processing the M2M connection: Bridge Accounts Persons and Bridge Persons Categories (both - the number of rows in which tables are clear).
Thus, we will have the following picture:
We see a lot of gray boxes and the fact that Visual Studio has already processed one M2M relationship: between the Bridge Persons Categories auxiliary measure and the Accounts dimension. And that’s good, but not enough. If we ask our cube to give out something right now, we won’t get anything good. No, the Dates and Accounts dimensions will do everything right, but alas, with Persons and Categories.
To correct this misunderstanding, we will show SSAS where to look for information to process our requests. To do this, fill in the gray boxes on the tab “Using Measurements” as follows: click on the gray boxes-> ellipsis-> type of connection select “Many to many” -> Choose an intermediate group of measures, as it is written in the picture:
Voila! Everything works. You can see the next picture:
What is the magic? We indicated SSAS where and how to look for information about relationships. Please note: it is impossible, for example, to correctly fill in the intersection “Categories dimension and measure Sales” before you correctly fill in the “dimension Categories and measure Bridge Account Persons”. In options, only Bridge Persons Categories drop out, because Visual Studio simply does not know another way. But supporting measures are also measures. And the paths for them must be indicated in the same way as for ordinary (target) measures. As the table populates, Visual Studio gains knowledge and offers more options.
Now we’ll formulate a mnemonic rule on how to fill out the relationship table: “Between the target measure and the target dimension, select the nearest measure table to the target measure.” Thus, it turned out that for the Type and Date measurements in the second column there will be a full-fledged, interesting measure of Sales, and in the third, the auxiliary Bridge Accounts Persons. Similarly for the Categories measurement.
Is it always worth it to completely fill out the link table? Not. There is no need to process uninteresting information. M2M connections adversely affect cube performance so that, if possible, it is better to get rid of them.
For sim everything. I hope I have a useful post. At a minimum, I can say that at one time I was looking for exactly something similar, but I did not find it. I can send source codes of a database and a cube from an example.