Control of mobile communication costs within the organization: implementation
I would like to tell you some technical details of the creation of a mobile communications cost control system, the concept of which was described in a previous post.
As an example, we will consider an account for one of the months from MTS and create a simple OLAP project.
Judging by the responses, this topic is quite relevant, so who cares, please, under cat.
Information on expenses we take from the account file, or rather from the summary part for each number:
From the file we extract:
- phone number
- ICCID - id sim card
- invoice period
- summarized data on communication services with amounts
Also from the “header” of the account we extract:
- personal account
- provider name
We will skip the description of the parser to extract all this data, due to the fact that in each region and each provider the format of accounts will be different and there simply cannot be universal software.
You can pay attention to some features, namely:
- to check the parser is correct, it is necessary to build a comparison of the total amount for each number and the amounts of components by type of expense
- ICCID is the only unique id for each sim. Voice numbers may change over time, but the ICCID remains constant. The most correct way to build relationships is with this parameter
- personal account - if possible, “hang” the SIM cards used in different components of your system to different l / s (employees, m2m devices, communication for servers, etc.), so it will be most convenient to track expenses
We import the data into the sql table in accordance with the above components.
Raw data data looks like this:
If you select all the unique values by type of traffic, then we will get a list of several hundred lines, which in no way can contribute to the analyst’s comfortable work.
Below is only a small part of the data. You can see, for example, that inside network roaming is indicated for each city separately.
It is necessary to bring data on the type of connection to standard 10-15 values.
You can do this through the case statement, having analyzed all possible variants of the same value, for example like this:
CASE
WHEN rom.SERVICEDESCRIPTION LIKE '%роуминг%sms%' THEN 'роуминг смс'
WHEN (rom.SERVICEDESCRIPTION LIKE '%sms%' AND rom.SERVICEDESCRIPTION NOT LIKE '%роуминг%') THEN 'смс'
WHEN (((rom.SERVICEDESCRIPTION LIKE '%gprs%' OR rom.SERVICEDESCRIPTION LIKE '%мобильный интернет%') and rom.SERVICEDESCRIPTION not LIKE '%hsdpa%') AND rom.SERVICEDESCRIPTION NOT LIKE '%роуминг%') THEN 'gprs'
WHEN ((rom.SERVICEDESCRIPTION LIKE '%роуминг%gprs%' OR rom.SERVICEDESCRIPTION LIKE '%роуминг%мобильный интернет%') and rom.SERVICEDESCRIPTION not LIKE '%роуминг%hsdpa%') THEN 'роуминг gprs'
WHEN ((rom.SERVICEDESCRIPTION LIKE '%hsdpa%') AND rom.SERVICEDESCRIPTION NOT LIKE '%роуминг%') THEN 'hsdpa'
WHEN (rom.SERVICEDESCRIPTION LIKE '%роуминг%hsdpa%') THEN 'роуминг hsdpa'
WHEN ((rom.SERVICEDESCRIPTION LIKE '%исходящая связь%' OR rom.SERVICEDESCRIPTION LIKE '%исходящие%вызовы%' OR rom.SERVICEDESCRIPTION LIKE '%исх. связь%' OR rom.SERVICEDESCRIPTION LIKE '%Передача данных, факсов, WAP и Интернет%') AND rom.SERVICEDESCRIPTION NOT LIKE '%роуминг%') THEN 'исходящая связь'
WHEN (rom.SERVICEDESCRIPTION LIKE '%роуминг%исходящая связь%' OR rom.SERVICEDESCRIPTION LIKE '%роуминг%исходящие%вызовы%' OR rom.SERVICEDESCRIPTION LIKE '%роуминг%исх. связь%' OR rom.SERVICEDESCRIPTION LIKE '%за предел.Дом%') THEN 'роуминг исходящая связь'
WHEN rom.SERVICEDESCRIPTION LIKE '%роуминг%входящая связь%' THEN 'роуминг входящая связь'
WHEN ((rom.SERVICEDESCRIPTION LIKE '%входящая связь%') AND rom.SERVICEDESCRIPTION NOT LIKE '%роуминг%') THEN 'входящая связь'
WHEN rom.SERVICEDESCRIPTION LIKE '%абонентская плата%' THEN 'абонентская плата'
WHEN rom.SERVICEDESCRIPTION LIKE '%детализация%' THEN 'детализация' ELSE 'остальные' END
As a result, we get the standard values:
Next, we need to convert text values by the amount of traffic to numeric values
Using the same case statement
CASE
WHEN rom.serviceqtydescription LIKE '0.00 Мб' THEN '0'
WHEN rom.serviceqtydescription LIKE '%.%.%' THEN '1'
WHEN rom.serviceqtydescription LIKE ' ' OR rom.serviceqtydescription LIKE '' THEN '0'
WHEN rom.serviceqtydescription LIKE '%минута' THEN CAST(replace(rom.serviceqtydescription, ' минута', '') AS numeric(20, 2))
WHEN rom.serviceqtydescription LIKE '%мин.' THEN CAST(replace(rom.serviceqtydescription, ' мин.', '') AS numeric(20, 2))
WHEN rom.serviceqtydescription LIKE '%мин. 0 сек.' THEN CAST(replace(rom.serviceqtydescription, ' мин. 0 сек.', '') AS numeric(20, 2))
WHEN rom.serviceqtydescription LIKE '%секунда' THEN CAST(replace(rom.serviceqtydescription, ' секунда', '') AS numeric(20, 2)) / 60
WHEN rom.serviceqtydescription LIKE '%килобайт' THEN CAST(replace(rom.serviceqtydescription, ' килобайт', '') AS numeric(20, 2))
WHEN rom.serviceqtydescription LIKE '%байт' THEN CAST(replace(replace(rom.serviceqtydescription, ' байт', ''),',','.') AS numeric(20, 2)) / 1024
WHEN (rom.serviceqtydescription LIKE '%.% Мб' AND rom.serviceqtydescription NOT LIKE '0.0 Мб') THEN CAST(replace(rom.serviceqtydescription, ' Мб', '') AS numeric(20, 2)) * 1024
WHEN rom.serviceqtydescription LIKE '%штука' THEN CAST(replace(rom.serviceqtydescription, ' штука', '') AS numeric(20, 2))
when rom.serviceqtydescription = 'не указано' then 0
ELSE CAST(replace(rom.serviceqtydescription, ' факт', '') AS numeric(20, 2)) END AS
We get: SMS in pieces, voice in minutes, data transfer in kilobytes. The
final table looks like this.
Now the data on the amounts and traffic are amenable to arithmetic operations.
In this form, the data is already quite amenable to analysis either from the managment studio window, or through the Excel interface, connecting the database as an external data source.
As you understand, we made a fact table for the future OLAP Cuba.
To create a simple cube, you need to select the dimensions from the fact table:
- period
- personal account
- telephone
- provider
- type of traffic
It is very desirable to display the data in separate tables - directories with a separate primary key of the int format and replace the values in the fact table with the corresponding values from the directories. The values in the directories must be unique.
All of the above does not apply to the time reference; it is generated separately on the server.
We start creating an OLAP project
In BIDS, you need to create a new data mining project.
Create a new data source named DWH.
Create a data source view where we add our tables.
In this example, there is no providers table, but it is added in the same way. And also I did not follow the recommendations with the keys for the int format field - it will work, but with large amounts of data it will affect performance.
I will not describe the creation of a time measurement, the network has enough information about this.
The next step is to create a cube, for this we launch the creation wizard.
On the first screen, select - use existing tables.
Next, select our measurement table and follow the wizard
After creating the cube, the measurements will be automatically added. The
cube will take the following form:
Also, you need to double-click the time measurement and drag the necessary time components into the attribute section:
And also, if necessary, create hierarchies
You can start deploying the cube on the server, for this click on the project and we start processing.
After processing, you can connect to the cube via Excel, this is done as follows:
Data - from other sources - from analytics services, in the wizard we enter the name of the server on which we deployed the cube.
The described solution is not perfect, experts in the field of sql and BI will point out many shortcomings, but I hope that this description will allow people who are new to building business intelligence systems to start doing this.