
ORM or object relational projector
Today we bring to your attention an excerpt from the book of Sergey Tarasov “Defragmentation of the brain. Software Engineering from the Inside ” , which is being prepared for release by our publishing house.
Mentioning one of the largest pillars of modern software engineering - the world of OOP, you can not ignore the other - the world of relational databases. I intentionally inserted the adjective “relational” as applied to all the main DBMSs, although back in the 1970s such a generalization would have been unlawful.
Nevertheless, it was the relational DBMS that managed to free programmers from the knowledge of unnecessary details of organizing physical data storage in the 1980s by isolating themselves from them with logical-level structures and the standardized SQL language for accessing information. It also turned out that most of the data formats used by programs fit well on the model of two-dimensional tables and the relationships between them. These two factors determined the success of relational DBMSs, and as a reward, the community received a rigorous mathematical theory at the core of the technology.
In contrast to the relational world, OOP was developed by practical engineers quite spontaneously, based on the needs of the programming community, and therefore had no rigorous theory under it. Attempts to bring it under the PLO retroactively failed. The authors of the declared UML standard achieved the maximum result, which, however, is still mainly used as pictures illustrating the code. But a bad standard is better than none.
Both relational and object models belong to the logical level of software system design. They are orthogonal and in essence represent two views of the same entity. This means that you can implement the same system while remaining within the framework of only one relational-procedural approach or following exclusively OOP.
In practice, there is a situation where programs are written primarily using OOP, while data is stored in relational databases. Without touching on the question of the appropriateness of such a cross "hedgehog with a snake", we will take the situation as a given. From which it follows the need to map (project) objects onto relational structures and vice versa.
In view of the aforementioned lack of a formal theoretical base under the PLO, this problem is unsolvable in the general case, but is feasible in particular. A component of a software system that implements mapping is called ORM (Object-Relational Mapping) or an object-relational projector — ORP. A full-fledged ORM can be a very non-trivial component that exceeds the complexity of the rest of the system. Therefore, although many developers have successfully used their own private implementations, several widely used frameworks have appeared in the industry over the past 10 years, which also perform the projection task.
A review of object-relational projection tools was not planned as part of the book. They are already enough on the network, including a small one of my own, made back in 2005, but not very outdated. Therefore, the following examples will mainly concern the NHibernate framework .
The technology for displaying objects on an RDBMS has a very important point, the understanding of which largely determines the success of your project. I have repeatedly heard the opinion of programmers that for a domain layer, the SQL generated by the projector is an analogue of translating a high-level language into the assembler of the target processor. This opinion is not only deeply mistaken, but quickly leads the team to create difficult-to-maintain systems with congenital and practically unrecoverable performance problems.
Simply put, as soon as you think of SQL as a kind of assembler in relation to the OOP language you are using, you immediately get stuck in a very bad story.
SQL is a fourth-generation high-level declarative specialized language, unlike Java or C #, which are still in the third generation of imperative languages. The only SQL statement for three dozen lines, which performs something more complicated than key selection, will require many, if not an order of magnitude, more lines in C # to achieve the same result.
This situation leads ORM developers to the need to create their own SQL-like language for manipulating objects and to translate it into a sequel (HQL - Hibernate Query Language - SQL-like query language used in Hibernate / NHibernate). Or use directly SQL with dynamic conversion of the result into a collection of objects.
Otherwise, the application programmer is doomed to extract from the database and subsequently process large amounts of data directly in his application. The table data was processed in approximately the same way in the absence of embedded SQL by developers on early versions of Clipper in the late 80s. There it was called "navigation processing." I think the term is relevant here.
In the era of the massive transition from Clipper-like programs and file-server technologies to client-server RDBMSs, many applications and their developers continued to use the navigation approach. Applications worked slowly, often blocking work in a multi-user environment. Because to work effectively with RDBMSs, you need to use approaches that focus on processing sets on the server, assuming the developer has the ability to work with declarative languages.
Nevertheless, having ORM at its disposal, the programmer often returns to navigational approaches to processing data arrays outside the RDBMS with the only difference that now this array, hopefully, does not represent the contents of the whole table.
Why? They are trying to replace the lack of knowledge of RDBMSs with additional levels of abstractions. In fact, the opposite is true: the abstraction levels do not hide the details of the object storage layer from the programmer, but rather their incompetence in the database area from the DBMS. Until some time.
Despite the thick layer of abstractions provided by Hibernate-type ORMs, it is practically impossible to make an application work effectively with an RDBMS without knowledge of the relevant principles of the orthogonal world and the SQL language.
But the attempts continue. Some still develop projectors for their internal needs, often very light. Others are looking for simplification and way out in noSQL. But for the time being, those who have “basic data” competencies are the winners, programmers and consultants who make money on those who do not possess them.
In software construction presentations, beautiful schemes are often drawn to separate the presentation layers, business logic, and stored data. The novice programmer’s blue dream is to use only one environment and language to develop all layers and forget about the need for knowledge of relational DBMSs, reducing their purpose to a kind of “intellectual file system”. The word SQL evokes negative associations associated with something ancient, not to mention triggers or stored procedures. Good people appear on the horizon, with books by various gurus about domain-oriented development under the arm, telling newcomers something like this: “Guys, relational DBMSs are a relic of a protracted era of 30 years ago. Now everything is built on OOP. And there is a wonderful thing - ORM. Start using it and forget about the difficult legacy of the past! ”
The guys accept the offer. Further evolution of the development of the system is approximately as follows.
First, the ORM framework for displaying is selected. Already at this stage, it turns out that with the theory and standards the situation is bad. It would be fine to be wary, but a presentation showing how in 10 minutes you can create the basis of an application such as a contact notebook is fascinating. Solved!
We begin to implement the domain model. Add classes, properties, relationships. We generate a database structure or connect to an existing one. We are building an interface for managing objects of the CRUD type. Everything is quite simple. At least it seems quite comparable to the manipulation of the DataSet. Those who know about them, of course, because not everyone suspects the existence of tabular data life forms in the application outside the DBGrid display grids.
Once the developers have implemented CRUD logic, the main action begins. Using the sequel directly is now difficult. Without touching upon display strategies and application portability problems between DBMSs, in fact, every SQL query with connections, having risen to a domain, is accompanied by a specific projection of the table result onto the class created for this occasion. Therefore, you have to use your own ORM query language. Non-standard, without debugging and profiling tools. If he, the language, is generally available in this ORM. For those who support appropriate integration, the .NET environment makes it possible to use LINQ, which allows you to catch some errors at the compilation stage.
Compare the expressiveness of the language with a simple example, which I will leave without comment.
SQL
NHibernate HQL
NHibernate without HQL with criteria
LINQ (NHibernate)
It suddenly turns out that your own query language generates far from the most optimal SQL. When the database is relatively small, a hundred thousand records in the longest tables, and the queries are not too complicated, then even a suboptimal sequel in many cases will not cause obvious problems. The user will wait a bit.
However, requests such as “choose employees whose salaries during the past year did not exceed the average for the previous year” already cause problems at the embedded language level. Then developers often go in the only possible way: we select a collection of objects and filter and calculate in cycles, calling methods of related objects. Or use the same LINQ over the selected array. The number of intermediate short SQL queries to the DBMS in this collection processing can amount to tens of thousands.
Hiding a database or how to cross a hedgehog with a snake
Mentioning one of the largest pillars of modern software engineering - the world of OOP, you can not ignore the other - the world of relational databases. I intentionally inserted the adjective “relational” as applied to all the main DBMSs, although back in the 1970s such a generalization would have been unlawful.
Nevertheless, it was the relational DBMS that managed to free programmers from the knowledge of unnecessary details of organizing physical data storage in the 1980s by isolating themselves from them with logical-level structures and the standardized SQL language for accessing information. It also turned out that most of the data formats used by programs fit well on the model of two-dimensional tables and the relationships between them. These two factors determined the success of relational DBMSs, and as a reward, the community received a rigorous mathematical theory at the core of the technology.
In contrast to the relational world, OOP was developed by practical engineers quite spontaneously, based on the needs of the programming community, and therefore had no rigorous theory under it. Attempts to bring it under the PLO retroactively failed. The authors of the declared UML standard achieved the maximum result, which, however, is still mainly used as pictures illustrating the code. But a bad standard is better than none.
Both relational and object models belong to the logical level of software system design. They are orthogonal and in essence represent two views of the same entity. This means that you can implement the same system while remaining within the framework of only one relational-procedural approach or following exclusively OOP.
In practice, there is a situation where programs are written primarily using OOP, while data is stored in relational databases. Without touching on the question of the appropriateness of such a cross "hedgehog with a snake", we will take the situation as a given. From which it follows the need to map (project) objects onto relational structures and vice versa.
In view of the aforementioned lack of a formal theoretical base under the PLO, this problem is unsolvable in the general case, but is feasible in particular. A component of a software system that implements mapping is called ORM (Object-Relational Mapping) or an object-relational projector — ORP. A full-fledged ORM can be a very non-trivial component that exceeds the complexity of the rest of the system. Therefore, although many developers have successfully used their own private implementations, several widely used frameworks have appeared in the industry over the past 10 years, which also perform the projection task.
A review of object-relational projection tools was not planned as part of the book. They are already enough on the network, including a small one of my own, made back in 2005, but not very outdated. Therefore, the following examples will mainly concern the NHibernate framework .
The technology for displaying objects on an RDBMS has a very important point, the understanding of which largely determines the success of your project. I have repeatedly heard the opinion of programmers that for a domain layer, the SQL generated by the projector is an analogue of translating a high-level language into the assembler of the target processor. This opinion is not only deeply mistaken, but quickly leads the team to create difficult-to-maintain systems with congenital and practically unrecoverable performance problems.
Simply put, as soon as you think of SQL as a kind of assembler in relation to the OOP language you are using, you immediately get stuck in a very bad story.
SQL is a fourth-generation high-level declarative specialized language, unlike Java or C #, which are still in the third generation of imperative languages. The only SQL statement for three dozen lines, which performs something more complicated than key selection, will require many, if not an order of magnitude, more lines in C # to achieve the same result.
This situation leads ORM developers to the need to create their own SQL-like language for manipulating objects and to translate it into a sequel (HQL - Hibernate Query Language - SQL-like query language used in Hibernate / NHibernate). Or use directly SQL with dynamic conversion of the result into a collection of objects.
Otherwise, the application programmer is doomed to extract from the database and subsequently process large amounts of data directly in his application. The table data was processed in approximately the same way in the absence of embedded SQL by developers on early versions of Clipper in the late 80s. There it was called "navigation processing." I think the term is relevant here.
In the era of the massive transition from Clipper-like programs and file-server technologies to client-server RDBMSs, many applications and their developers continued to use the navigation approach. Applications worked slowly, often blocking work in a multi-user environment. Because to work effectively with RDBMSs, you need to use approaches that focus on processing sets on the server, assuming the developer has the ability to work with declarative languages.
Nevertheless, having ORM at its disposal, the programmer often returns to navigational approaches to processing data arrays outside the RDBMS with the only difference that now this array, hopefully, does not represent the contents of the whole table.
Why? They are trying to replace the lack of knowledge of RDBMSs with additional levels of abstractions. In fact, the opposite is true: the abstraction levels do not hide the details of the object storage layer from the programmer, but rather their incompetence in the database area from the DBMS. Until some time.
Despite the thick layer of abstractions provided by Hibernate-type ORMs, it is practically impossible to make an application work effectively with an RDBMS without knowledge of the relevant principles of the orthogonal world and the SQL language.
But the attempts continue. Some still develop projectors for their internal needs, often very light. Others are looking for simplification and way out in noSQL. But for the time being, those who have “basic data” competencies are the winners, programmers and consultants who make money on those who do not possess them.
As usual use ORM
In software construction presentations, beautiful schemes are often drawn to separate the presentation layers, business logic, and stored data. The novice programmer’s blue dream is to use only one environment and language to develop all layers and forget about the need for knowledge of relational DBMSs, reducing their purpose to a kind of “intellectual file system”. The word SQL evokes negative associations associated with something ancient, not to mention triggers or stored procedures. Good people appear on the horizon, with books by various gurus about domain-oriented development under the arm, telling newcomers something like this: “Guys, relational DBMSs are a relic of a protracted era of 30 years ago. Now everything is built on OOP. And there is a wonderful thing - ORM. Start using it and forget about the difficult legacy of the past! ”
The guys accept the offer. Further evolution of the development of the system is approximately as follows.
First, the ORM framework for displaying is selected. Already at this stage, it turns out that with the theory and standards the situation is bad. It would be fine to be wary, but a presentation showing how in 10 minutes you can create the basis of an application such as a contact notebook is fascinating. Solved!
We begin to implement the domain model. Add classes, properties, relationships. We generate a database structure or connect to an existing one. We are building an interface for managing objects of the CRUD type. Everything is quite simple. At least it seems quite comparable to the manipulation of the DataSet. Those who know about them, of course, because not everyone suspects the existence of tabular data life forms in the application outside the DBGrid display grids.
Once the developers have implemented CRUD logic, the main action begins. Using the sequel directly is now difficult. Without touching upon display strategies and application portability problems between DBMSs, in fact, every SQL query with connections, having risen to a domain, is accompanied by a specific projection of the table result onto the class created for this occasion. Therefore, you have to use your own ORM query language. Non-standard, without debugging and profiling tools. If he, the language, is generally available in this ORM. For those who support appropriate integration, the .NET environment makes it possible to use LINQ, which allows you to catch some errors at the compilation stage.
Compare the expressiveness of the language with a simple example, which I will leave without comment.
SQL
SELECT *
FROM task_queue
WHERE
id_task IN (2, 3, 15)
AND id_task_origin = 10
NHibernate HQL
IList queues = session
.CreateQuery("from TaskQueue where Task.Id in (2, 3, 15) and TaskOrigin.Id = 10")
.List();
NHibernate without HQL with criteria
IList queues = session.CreateCriteria()
.Add(Expression.In("Task.Id", someTasks.ToArray()))
.Add(Expression.Eq("TaskOrigin.Id", 10))
.List();
LINQ (NHibernate)
IList queues = session
.Query()
.Where(q => someTasks.Contains(q.Task.Id) &&
q.TaskOrigin.Id == 10).ToList();
It suddenly turns out that your own query language generates far from the most optimal SQL. When the database is relatively small, a hundred thousand records in the longest tables, and the queries are not too complicated, then even a suboptimal sequel in many cases will not cause obvious problems. The user will wait a bit.
However, requests such as “choose employees whose salaries during the past year did not exceed the average for the previous year” already cause problems at the embedded language level. Then developers often go in the only possible way: we select a collection of objects and filter and calculate in cycles, calling methods of related objects. Or use the same LINQ over the selected array. The number of intermediate short SQL queries to the DBMS in this collection processing can amount to tens of thousands.