Report-Driven Design
This article discusses an alternative approach to creating applications and implementing a database structure.
The main idea of this approach is that the application development process is based on the analysis of reports that should be generated by a ready-made business application.
We call this approach RDD - the Report-Driven Design .
In Developer Express, we are engaged in the creation of various components, including those that implement what is commonly called Business Intelligence or Business Analitycs, namely data analysis and processing. Therefore, we have extensive experience in communicating with users who create business applications, and one of the main tasks for these users is to generate reports.
Communicating with these users, we unexpectedly discovered a number of problems that often arise when implementing the reporting module in a typical business application.
For example, it often happens that almost everything is implemented in an application. And so, when the time comes for creating reports, it turns out that making many of them is quite problematic.
This is mainly because during the design of the database some connections were not thought out. Or vice versa, the database is overly complicated: there are many unnecessary links in the tables, most of which are not used to create reports.
Moreover, this problem can be expressed in the fact that due to the non-optimal database structure, processing of SQL queries for creating reports takes too long. This could happen, for example, because initially more attention was paid to how the information will be entered, and not to how it is better to display it.
Agree, this happens quite often. Then the reasonable question arises: “Why not take care of the reports from the very beginning? Why not make them the main goal when designing an application, if that’s exactly what we need in the end? ”
Let's first decide what we mean by a report.
What associations do you have in your head when reporting? Probably, these are pages with some data presented in a table or other form, which is convenient for further analysis. Moreover, it is clear that the report is not necessarily a printed version. It can be some kind of file, be it xls, pdf, doc or even a web page. It is clear that a report can contain not only tabular or cross-tabular data (the so-called pivot tables), but also graphs, and a lot of other information that can also be useful for analysis. Thus, by a report we mean any page with information.
So what is the Report Driven Design development process? First of all, we understand it as an iterative process, which makes it similar to other Agile practices .
In this case, we formulate the basic concept of RDD in the form of four I:
Now I will explain what all this means.
Before you begin designing the program and the database for it, you need to get from the customer a set of typical reports that will need to be generated. Then you take one of these reports and analyze what information is presented in it:
Go to the next point: Interaction . It is important to understand how the data from the current report interact with the information obtained from the analysis of previous reports, and how to modify the current structure so that it is equally convenient for the analyzed report and for all previous reports.
With the item Input , I think everything is clear too. Here it is necessary to decide how the user will enter the information presented in the report. Since we go to the design of the application from the end, from the reporting module, then in fact at the Input stage we will do what is often implemented at the very beginning - the data entry form.
Well, the last point is Iterate : go to the next report and repeat all the steps again.
I assume that at first such a procedure may not seem to be applicable to you, at least for your specific case. But if in your application reporting generation occupies a special place, then you can safely try to apply RDD and make sure that it works .
Let's look at how RDD works in a live example. To illustrate, let's take typical tasks from the life of an online store.
Suppose that when creating an online store, a program was written that allows you to enter data on goods in the warehouse of this store. Information is quickly and conveniently entered into the database on the server, which is located in the same warehouse.
Then at the same store appears 2 other warehouses, already with a different assortment of goods. And there the same approach began to be used - a server with a program where operators enter information about the product. In total, we have 3 different warehouses with 3 databases.
Now let's try to implement a typical report from the administration module. This report will display information on the availability of different goods in existing warehouses. Looking at such a report, it becomes clear that if this data is stored in different databases, physically remote from each other, then it can take a long time to form.
Thus, it would be completely logical to combine all these tables into one database and place it on the same local network with the application for generating reports.
It is possible that data entry will slow down (if the operator is on another local network), but not much, and report generation will increase significantly.
Now we move on and move on to another report. This is a typical report for the user of an online store, which is a list of products ordered by the user. It is clear that information about the warehouse in which the product ordered by the user is stored is not at all interesting to him. So, we can combine all these products into one table. And so that the information about the warehouse does not disappear at all, you can enter another column in this table - the warehouse ID.
Now consider another example from the task of creating an online store. This is the so-called Rating Control - an element for voting, for example, for the buyer to evaluate the selected product.
The algorithm here is simple: any user can vote for any product once. Following this scenario, we record each such vote in our table.
Now we need to see what reports we need based on this data.
For example, the above table will be very useful if we need a report that contains information about how a particular person voted for different products.
If such a report is not in our list, that is, information on a particular voter is not important to us, but information on a specific product is important, then we can safely remove the CustomerID field from this table.
(If we still want to save information on which products a person voted on - and we want to prevent him from voting again - then we can store this data either in a separate CustomerID + ProductID table or add another RatedProducts column to the Customers table and store this information there in the form of meta-data)
Moving on, we may suddenly realize that the above report is either not very important to us or is not needed at all.
And it’s much more important for us to optimize the application so that it quickly displays information about the current rating on the product page. Then we can even significantly simplify our table by saving not all the votes that were, but their total, as well as the total number of users by which to divide this amount.
All that remains is the division operation, which will be performed each time information on the current product rating is displayed. Excellent.
How often will this information be shown? Very often - after all, a lot of users can “leaf through” a product catalog (for example, 20 products per page), comparing their rating. Backfill question : can we get rid of the division operation in this case? ;-) Answer : we can store not the sum of all votes, but the current rating. Then the division operation will be needed exactly at the time of voting, that is, again, when entering data. Here is the formula: where x is the user's voice.
In conclusion, I can say that we do not propose using such an approach as a panacea for all ills. However, in certain situations, it may well be useful and will allow you to look at your application and your data from a different perspective. Try to use it at least once, and I hope this comes in handy.
PS
By the way, if you are suddenly going to visit DevCon'11 this week, I am ready to personally discuss with you the pros and cons of this approach. On the second day of the conference I will make a report on the topic of RDD and I will be very glad to hear your opinion, well, just shake hands with all the real Khabrovites.
See you at DevCon!
The main idea of this approach is that the application development process is based on the analysis of reports that should be generated by a ready-made business application.
We call this approach RDD - the Report-Driven Design .
In the beginning there was a report
In Developer Express, we are engaged in the creation of various components, including those that implement what is commonly called Business Intelligence or Business Analitycs, namely data analysis and processing. Therefore, we have extensive experience in communicating with users who create business applications, and one of the main tasks for these users is to generate reports.
Communicating with these users, we unexpectedly discovered a number of problems that often arise when implementing the reporting module in a typical business application.
For example, it often happens that almost everything is implemented in an application. And so, when the time comes for creating reports, it turns out that making many of them is quite problematic.
This is mainly because during the design of the database some connections were not thought out. Or vice versa, the database is overly complicated: there are many unnecessary links in the tables, most of which are not used to create reports.
Moreover, this problem can be expressed in the fact that due to the non-optimal database structure, processing of SQL queries for creating reports takes too long. This could happen, for example, because initially more attention was paid to how the information will be entered, and not to how it is better to display it.
Agree, this happens quite often. Then the reasonable question arises: “Why not take care of the reports from the very beginning? Why not make them the main goal when designing an application, if that’s exactly what we need in the end? ”
What is a report?
Let's first decide what we mean by a report.
What associations do you have in your head when reporting? Probably, these are pages with some data presented in a table or other form, which is convenient for further analysis. Moreover, it is clear that the report is not necessarily a printed version. It can be some kind of file, be it xls, pdf, doc or even a web page. It is clear that a report can contain not only tabular or cross-tabular data (the so-called pivot tables), but also graphs, and a lot of other information that can also be useful for analysis. Thus, by a report we mean any page with information.
Four "I"
So what is the Report Driven Design development process? First of all, we understand it as an iterative process, which makes it similar to other Agile practices .
In this case, we formulate the basic concept of RDD in the form of four I:
- Information
- Interaction
- Input
- Iterate (Go to next)
Now I will explain what all this means.
Before you begin designing the program and the database for it, you need to get from the customer a set of typical reports that will need to be generated. Then you take one of these reports and analyze what information is presented in it:
- horizontal information indicates which columns may be in the corresponding table;
- vertical data change provides an understanding of what information should be stored in rows;
- and grouping data in the report gives an idea of which fields to group and whether or not to place them in a separate table.
Go to the next point: Interaction . It is important to understand how the data from the current report interact with the information obtained from the analysis of previous reports, and how to modify the current structure so that it is equally convenient for the analyzed report and for all previous reports.
With the item Input , I think everything is clear too. Here it is necessary to decide how the user will enter the information presented in the report. Since we go to the design of the application from the end, from the reporting module, then in fact at the Input stage we will do what is often implemented at the very beginning - the data entry form.
Well, the last point is Iterate : go to the next report and repeat all the steps again.
I assume that at first such a procedure may not seem to be applicable to you, at least for your specific case. But if in your application reporting generation occupies a special place, then you can safely try to apply RDD and make sure that it works .
Let's look at how RDD works in a live example. To illustrate, let's take typical tasks from the life of an online store.
Example 1: Warehouses
Suppose that when creating an online store, a program was written that allows you to enter data on goods in the warehouse of this store. Information is quickly and conveniently entered into the database on the server, which is located in the same warehouse.
Then at the same store appears 2 other warehouses, already with a different assortment of goods. And there the same approach began to be used - a server with a program where operators enter information about the product. In total, we have 3 different warehouses with 3 databases.
Now let's try to implement a typical report from the administration module. This report will display information on the availability of different goods in existing warehouses. Looking at such a report, it becomes clear that if this data is stored in different databases, physically remote from each other, then it can take a long time to form.
Thus, it would be completely logical to combine all these tables into one database and place it on the same local network with the application for generating reports.
It is possible that data entry will slow down (if the operator is on another local network), but not much, and report generation will increase significantly.
Now we move on and move on to another report. This is a typical report for the user of an online store, which is a list of products ordered by the user. It is clear that information about the warehouse in which the product ordered by the user is stored is not at all interesting to him. So, we can combine all these products into one table. And so that the information about the warehouse does not disappear at all, you can enter another column in this table - the warehouse ID.
Example 2: Voting
Now consider another example from the task of creating an online store. This is the so-called Rating Control - an element for voting, for example, for the buyer to evaluate the selected product.
The algorithm here is simple: any user can vote for any product once. Following this scenario, we record each such vote in our table.
CustomerID | Productid | Rating |
---|---|---|
Now we need to see what reports we need based on this data.
For example, the above table will be very useful if we need a report that contains information about how a particular person voted for different products.
If such a report is not in our list, that is, information on a particular voter is not important to us, but information on a specific product is important, then we can safely remove the CustomerID field from this table.
Productid | Rating |
---|---|
(If we still want to save information on which products a person voted on - and we want to prevent him from voting again - then we can store this data either in a separate CustomerID + ProductID table or add another RatedProducts column to the Customers table and store this information there in the form of meta-data)
Moving on, we may suddenly realize that the above report is either not very important to us or is not needed at all.
And it’s much more important for us to optimize the application so that it quickly displays information about the current rating on the product page. Then we can even significantly simplify our table by saving not all the votes that were, but their total, as well as the total number of users by which to divide this amount.
Productid | Totalrating | CustomersNumber |
---|---|---|
All that remains is the division operation, which will be performed each time information on the current product rating is displayed. Excellent.
How often will this information be shown? Very often - after all, a lot of users can “leaf through” a product catalog (for example, 20 products per page), comparing their rating. Backfill question : can we get rid of the division operation in this case? ;-) Answer : we can store not the sum of all votes, but the current rating. Then the division operation will be needed exactly at the time of voting, that is, again, when entering data. Here is the formula: where x is the user's voice.
TOTAL
In conclusion, I can say that we do not propose using such an approach as a panacea for all ills. However, in certain situations, it may well be useful and will allow you to look at your application and your data from a different perspective. Try to use it at least once, and I hope this comes in handy.
PS
By the way, if you are suddenly going to visit DevCon'11 this week, I am ready to personally discuss with you the pros and cons of this approach. On the second day of the conference I will make a report on the topic of RDD and I will be very glad to hear your opinion, well, just shake hands with all the real Khabrovites.
See you at DevCon!