How to display 350 million rows from a database on a web form
The title of this article is a brief summary of what the customer requested. I was on vacation at that time, and my manager, without going into details, gave the green light to implementation. Having a good rest and going to work, I felt myself in the shoes of a hedgehog who fell into the fog in one famous cartoon. And the situation demanded the Wizard of the Emerald City.
So, everything in order:
The customer was the department for dealing with problem debt. To successfully solve their problems they needed
Customers saw the solution without fail in the form of a Web form, this was due to the fact that the department was the main one, and its subdivisions were located in different cities of our vast country and all employees should have seen the same picture, delimited by access rights.
The task was slightly facilitated by the fact that a daily review of the loan portfolio was needed only in the last three months. Later information should have been on the first day of each past month.
In addition to this condition, there was another problem: a loan was understood to mean all types of lending, including credit cards, which were stored separately and differed by measurement.
At the time of implementation, there was a repository on the Oracle database, in which all information on cards and loans was stored, in the form in which it came from operational sources. Information was received with a delay of 1 day. On average, 3 million new entries appeared every day, some of which were loans, part of the card.
To solve this problem, the following algorithm was chosen:
In addition to the implementation of the algorithm, it was necessary to minimize or eliminate all the risks and obstacles that could negate the operation of this system.
The implementation itself was based on the knowledge and technology that I owned at that time and the company's software policy. Therefore, if suddenly in the implementation it seems to you that I have inflated a ball in the shape of a kitten, then I ask you to be lenient, not everything depended on me.
The system was deployed on Windows Server 2008 64-bit, the Web form on IIS, ASP.NET technology.
A little about the technical subtleties and tricks that were applied in the development of this system.
In this article, I wanted to show how you can solve problems associated with large amounts of data using available and long-existing technologies. Before implementing OLAP, my leadership, in search of a “golden bullet”, considered a lot of boxed solutions, both from large manufacturers and from small companies. It often turned out to check that the support of solutions out of the boxes required a lot of effort and financial cost. In some cases, the boxed product simply could not do what they wanted from it because of its architecture. Perhaps the company where I work was looking not in the right place and there is such a “golden” or at least a “silver” bullet. But what was, was. On the other hand, this article may stop someone from inventing “bicycles” in the form of building additional aggregated tables, some complex architectures and bulky sites.
But how many people have so many opinions and on this score I like the fictional story of the invention of the ballpoint pen:
NASA spent millions of dollars on the invention of a pen that can be written in zero gravity.
The Russians simply wrote in pencil.
But where is the pencil now and where is the pen. Therefore, any solution, even a very complex and cumbersome one, can find its user, just like a simple one.
Thanks to everyone who was able to read everything I wrote above!
So, everything in order:
Staging
The customer was the department for dealing with problem debt. To successfully solve their problems they needed
- Track the history of a problem loan: watch when it went into arrears and how it was repaid. Also check to see if the borrower who has gone on delay has other loans.
- Receive information on the loan portfolio, track trends and changes in the portfolio for any arbitrary period.
- The ability to “fail” from aggregated indicators such as “Date of issue of a loan”, “Region of issue”, “Overdue loans”, etc. to directly to the loan agreements themselves.
- Track the work of credit managers: check how successful loans they give out.
Customers saw the solution without fail in the form of a Web form, this was due to the fact that the department was the main one, and its subdivisions were located in different cities of our vast country and all employees should have seen the same picture, delimited by access rights.
The task was slightly facilitated by the fact that a daily review of the loan portfolio was needed only in the last three months. Later information should have been on the first day of each past month.
In addition to this condition, there was another problem: a loan was understood to mean all types of lending, including credit cards, which were stored separately and differed by measurement.
Decision
At the time of implementation, there was a repository on the Oracle database, in which all information on cards and loans was stored, in the form in which it came from operational sources. Information was received with a delay of 1 day. On average, 3 million new entries appeared every day, some of which were loans, part of the card.
To solve this problem, the following algorithm was chosen:
- A data showcase was built that combined loans and cards. The showcase was built according to the “Star” scheme.
- Daily, with the help of the task scheduler, yesterday “poured” it.
- Ball created OLAP cube for calculating aggregates.
- At the end of the showcase update, the OLAP cube update was launched.
- OLAP submitted data to the web form through the OLAP Viewer.
In addition to the implementation of the algorithm, it was necessary to minimize or eliminate all the risks and obstacles that could negate the operation of this system.
Preparatory work:
- A common key was created in the window for combining loans and cards, which uniquely identified both the credit and the card. The key was stored in a table, each line of which uniquely identified either a credit or a card.
- To the daily fill was added a mechanism for tracking changes in the operational tables of not only yesterday's date, but also all data for an arbitrary interval in the past, starting from yesterday. This was done in case of any changes or errors in the Yesterday database. A situation where information on loans and cards is in the database with a certain delay occurred and we had to deal with it. Checked the last 15 days from yesterday.
- The cube was created for the capabilities of the Web application. There were about 3 million loans and cards in total, and displaying them in a browser all at once was either very difficult or impossible. To solve this problem, in addition to the standard separation of loans and cards by opening date, region of reference, issued by the manager, etc. additional dimensions were invented. In the described case, gradations of loans were added by the Amount, by the Amount of the delay, by the number of days of the delay, by the sign - whether it was at least once in the delay. Also, according to the date of opening and closing the loan, hierarchies of the year-quarter-month-day type were built. As a result, all loans were grouped, the largest of which was no more than 10,000 lines. The customer was only interested in the delay, therefore, for loans that were repaid on time, no additional measurements were invented. Access to them was limited by Viewer, and could be open at any time. In the current version, the user saw on them only information aggregated by standard measurements. But some of these loans could still be seen - these were the loans of the debtor, from whom only one loan was overdue, but several were taken.
- The cube update was supposed to begin immediately after the window was updated. For this, a service was created that checked the status of the Oracle task scheduler. A utility was also written for him, with the help of which it was possible to urgently start / interrupt the update of the cube or to monitor the current status.
- It was necessary to find an OLAP Viewer that could provide all the required functionality, normal appearance and flexibility of settings. This task was not easy: many of them are not very user friendly, and working with them is not very convenient. Even with the one that was found, I had to suffer.
The implementation itself was based on the knowledge and technology that I owned at that time and the company's software policy. Therefore, if suddenly in the implementation it seems to you that I have inflated a ball in the shape of a kitten, then I ask you to be lenient, not everything depended on me.
Implementation:
The system was deployed on Windows Server 2008 64-bit, the Web form on IIS, ASP.NET technology.
- Data Showcase - Oracle Database
- Task Scheduler - Oracle Database
- OLAP Cube - Microsoft Analysis Services
- Track storefront updates and launch updates for a cube - Windows Service
- OLAP Viewer - DevExpres AspxPivotGrid
A little about the technical subtleties and tricks that were applied in the development of this system.
Technical Tricks:
- 1. For the correct operation of 32-bit and 64-bit Oracle providers under Windows:
- Download 32-bit and 64-bit Oracle drivers from the official site.
- Put in a separate folder Client32 Runtime.
- To restart a computer.
- Put in a separate folder Client64 Runtime.
- In the Windows registry, in the KEY_LOCAL_MACHINE \ SOFTWARE \ Wow6432Node \ ORACLE section, add the value TNS_ADMIN and set the path to the folder for Clinet32 - ... \ network \ admin
- To restart a computer
- When delimiting access in Analysis Services cube, on the Cubes tab, you need to set Read, then select the dimension values in the cube, select those to which the role has access, and, switching to the Advanced tab, enable the Enable Visual Totals checkbox, it is disabled by default. If it is not included, then the totals for this dimension will display the total not only for those dimensions to which the role had access, but also those measurement values to which there was no access.
- For large dimensions, ASPxPivotGrid must disable the default sorting.
About the important!
In this article, I wanted to show how you can solve problems associated with large amounts of data using available and long-existing technologies. Before implementing OLAP, my leadership, in search of a “golden bullet”, considered a lot of boxed solutions, both from large manufacturers and from small companies. It often turned out to check that the support of solutions out of the boxes required a lot of effort and financial cost. In some cases, the boxed product simply could not do what they wanted from it because of its architecture. Perhaps the company where I work was looking not in the right place and there is such a “golden” or at least a “silver” bullet. But what was, was. On the other hand, this article may stop someone from inventing “bicycles” in the form of building additional aggregated tables, some complex architectures and bulky sites.
But how many people have so many opinions and on this score I like the fictional story of the invention of the ballpoint pen:
NASA spent millions of dollars on the invention of a pen that can be written in zero gravity.
The Russians simply wrote in pencil.
But where is the pencil now and where is the pen. Therefore, any solution, even a very complex and cumbersome one, can find its user, just like a simple one.
Thanks to everyone who was able to read everything I wrote above!