Work with a database from the application

    In the beginning I will identify some of the problems and features when working with the database, show the holes in the abstractions. Next, we analyze a simpler abstraction based on immutability.


    It is assumed that the reader is somewhat familiar with the patterns of Active Record , Data Maper , Identity Map and Unit of Work .


    Problems and solutions are considered in the context of sufficiently large projects that can not be thrown out and quickly rewritten.


    Identity map


    The first problem is the problem of preserving identity. Identity is something that uniquely defines an entity. In the database it is the primary key, and in memory the reference (pointer). Well, when the links point to only one object.


    For the ActiveRecord ruby library, this is not the case:


    post_a = Post.find 1
    post_b = Post.find 1
    post_a.object_id != post_b.object_id # true
    post_a.title = "foo"
    post_b.title != "foo"# true

    Those. we get 2 references to 2 different objects in memory.


    In this way, we can lose changes if, through carelessness, we start working with the same entity, but represented by different objects.


    Hibernate has a session, in fact, a first-level cache, which stores an entity identifier mapping to an object in memory. If we re-request the same entity, we get a link to an existing object. Those. Hibernate implements the Identity Map pattern .


    Long transactions


    But what if we do not sample by id? To avoid out of sync condition of the objects and the state of the database, Hibernate to fetch the request makes the flush ,
    ie, flushes dirty objects to the database so that the query reads the agreed data.


    This approach forces you to keep the database transaction open while the business transaction is in progress.
    If a business transaction is long, then the process responsible for the connection in the database itself is idle, too. For example, this can happen if a business transaction requests data over a network or performs complex calculations.


    N + 1


    Perhaps the biggest "hole" in the ORM abstraction is the problem of N + 1 queries.


    An example of a ruby ​​for the ActiveRecord library:


    posts = Post.all # select * from posts
    posts.each do|post|
      like = post.likes.order(id::desc).first
      # SELECT * FROM likes WHERE post_id = ? ORDER BY id DESC LIMIT 1# ...end

    ORM inclines the programmer to the idea that he works simply with objects in memory. But it works with a service available over the network, and
    it takes time to establish connections and transfer data . Even if the request is executed 50ms, then 20 requests will be executed a second.


    Additional data


    Let's say to avoid the N + 1 problem described above, you write this
    query :


    SELECT * FROM posts JOIN LATERAL (
      SELECT * FROM likes WHERE post_id = posts.id ORDERBY likes.id DESCLIMIT1
    ) as last_like ONtrue;

    Those. besides the post attributes, all the attributes of the last like are also selected. On which entity to display this data? In this case, you can return a couple of the post and Like, because the result contains all the necessary attributes.


    But what if we chose only a part of the fields, or chose fields that are missing in the model, for example, the number of publication likes? Do I really need to map them to entities? Maybe leave them just data?


    State & identity


    Consider the js code:


    const alice = { id: 0, name: 'Alice' };

    Here the link to the object was given a name alice. Since this is a constant, then there is no possibility to call Alice another object. At the same time, the object itself remained mutable.


    For example, we can assign an existing identifier:


    const bob = { id: 1, name: 'Bob' };
    alice.id = bob.id;

    Recall that an entity has 2 identities: a link and a primary key in the database. And constants cannot prevent Alice from making Bob, even after saving.


    The object, to which we have named alice, performs 2 duties: it simultaneously models identity and state. A state is a value that describes an entity at a given point in time.


    But what if we divide these 2 responsibilities and use immutable structures for the state ?


    function Ref(initialState, validator) {
      let state = initialState;
      this.deref = () => state;
      this.swap = (updater) => {
        const newState = updater(state);
        if (! validator(state, newState) ) throw "Invalid state";
        state = newState;
        return newState;
      };
    }
    const UserState = Immutable.Record({ id: null, name: '' });
    const aliceState = new UserState({id: 0, name: 'Alice'});
    const alice = new Ref( aliceState, (oldS, newS) => oldS.id === newS.id );
    alice.swap( oldS => oldS.set('name', 'Queen Alice') );
    alice.swap( oldS => oldS.set('id', 1) ); // BOOM!

    Ref- container for unchangeable state, allowing its controlled replacement. Refmodels identity in the same way that we give names to objects. We call the river "Volga", but at each moment of time it has a different unchanging state.


    Storage


    Consider the following API:


    storage.tx( t => {
      const alice = t.get(0);
      const bobState = new UserState({id: 1, name: 'Bob'});
      const bob = t.create(bobState);
      alice.swap( oldS => oldS.update('friends', old => old.push(bob.deref.id)) );
    });

    t.getand t.createreturn the copy Ref.


    We open a business transaction t, find Alice by her ID, create Bob and indicate that Alice considers Bob to be her friend.


    The object tcontrols the creation ref.


    tcan store within itself a mapping of entity identifiers to the state containing them ref. Those. can implement Identity Map. In this case, it tacts as a cache; if Alice re-queries, there will be no query in the database.


    tIt can memorize the initial state of entities in order to track at the end of a transaction which changes need to be written to the database. Those. can implement Unit of Work . Or, if we add to the Refsupport of observers, it becomes possible to refreset changes in the database with each change . These are optimistic and pessimistic approaches to fixing changes.


    With an optimistic approach, you need to keep track of the versions of the states of the entities.
    When changing from the database, we must remember the version, and when committing changes, check that the version of the entity in the database does not differ from the initial one. Otherwise, you need to repeat the business transaction. This approach allows the use of group insert and delete operations and very short DB transactions, which saves resources.


    With a pessimistic approach, a DB transaction fully corresponds to a business transaction. Those. we are forced to take the connection from the pool all while the business transaction is being executed.


    The API allows you to extract entities one by one, which is not very optimal. Since we have implemented the Identity Map pattern , then we can enter a method into the API preload:


    storage.tx( t => {
      t.preload([0, 1, 2, 3]);
      const alice = t.get(0); // from cache
    });

    Queries


    If we don’t want long transactions, we can’t make samples using an arbitrary key, since the memory may contain dirty objects and the sample will return an unexpected result.


    We can use Queries and retrieve any data (state) outside the transaction and re-read the data while in the transaction.


    const aliceId = userQuery.findByEmail('alice@mail.com');
    storage.tx( t => {
      const alice = t.getOne(aliceId);
    });

    In this case there is a division of responsibility. For queries we can use search engines, scale reading with the help of replicas. And the storage API always works with the main storage (master). Naturally, the replicas will contain outdated data, rereading the data in the transaction solves this problem.


    Commands


    There are situations where an operation can be performed without reading the data. For example, write off the monthly fee from all customers. Or insert and update if the conflict data (upsert).


    In case of performance problems, a bunch of Storage and Query can be replaced with such a command.


    Connections


    If entities randomly link to each other, it is difficult to ensure consistency when they change. Relationships are trying to simplify, streamline, abandon unnecessary.


    Aggregates - a way to organize communications. Each unit has a root entity and nested entities. Any external entity can only refer to the aggregate root. The root ensures the integrity of the entire unit. A transaction cannot cross the border of an aggregate; in other words, the entire aggregate participates in a transaction.


    An aggregate may, for example, consist of a Post (root) and its translations. Or the Order and its Positions.


    Our API works with whole units. At the same time ensuring referential integrity between the units rests on the application. The API does not support lazy link loading.
    But we can choose the direction of relationships. Consider the one-to-many relationship User - Post. We can store the user ID in the post, but will it be convenient? We will get much more information if we store an array of post IDs in the user.


    Conclusion


    I focused on the problems when working with the database, showed the option of using immunity.
    The format of the article does not allow to disclose the topic in detail.


    If you are interested in this approach, then pay attention to my book app from scratch , which describes the creation of a web application from scratch with a focus on architecture. It understands SOLID, Clean Architecture, patterns of working with the database. The code examples in the book and the application itself are written in Clojure, which is imbued with ideas of immunity and convenience of data processing.


    Also popular now: