Soft delete in REST API

    image

    So that the user does not feel pain from irretrievably lost data, it is worth thinking about soft deletion. With soft deletion, the record is not physically deleted from the database, but only marked as deleted. This makes it easy to recover data by resetting the flag.

    I recently implemented soft deletion in one of our REST services. Those who are interested in what I did, I invite you to cat.

    Required entry


    The debate over whether to use mild removal is very old. Just look at the long holivars here and here .

    The most reasonable is the position according to which everything depends on the situation. There are cases when soft deletion is convenient or even necessary; there are cases when the arguments of opponents of soft deletion deserve attention. By the way, an important argument against soft deletion is the answer that came from 2018: if we are talking about user accounts, then soft deletion is contrary to GDPR .

    We decided that in our service for document storage, soft deletion is necessary.

    RESTful approach


    If we want to implement soft deletion in a service, we need to understand how it should look from the point of view of the interface. A search on the Internet showed that a typical question that people have is whether to use DELETE {resource} as before, or is it better to use the PATCH method instead with a body that includes something like {status: 'deleted'} .

    Here the opinion of the people is unequivocal: it is necessary to use DELETE as before. From the client’s point of view, deletion is also a deletion in Africa. Nothing should change: if a resource is deleted, it becomes inaccessible; if the client wants to delete the resource, he knows that the HTTP DELETE method is for this purpose. It is not necessary to dedicate the client to the details of exactly how the service implements the deletion.

    But besides this, I was worried about the question of how to recover deleted resources. Of course, this problem is solved by administering the database. However, I would like to be able to do this through the REST API. And here we come into conflict. It turns out that the client still needs to be dedicated to the implementation details?

    The search for a long time yielded no results, until I came across a good article by Dan Yoder . The article examines the semantics of different HTTP requests and suggests that instead of physical deletion, move remote resources to the archive . In addition, it will be nice if DELETE returns a link to the archived resource. The user can always restore the deleted resource by sending a POST request to the archive.

    Design


    Our REST service is built on the ASP.NET Web API using the Entity Framework. As I said, I do a soft deletion for a resource called document.

    So, first you need to add the columns to the corresponding table. As a flag, I use a timestamp called Deleted. If the value is not NULL, the resource is considered deleted. In addition, it is useful to have information about who deleted the resource.

    ALTER TABLE Documents ADD
        Deleted datetime NULL,
        DeletedBy int NULL
    GO
    

    The DELETE action in the controller will now simply set the values ​​of these fields instead of physically deleting the record. In addition, DELETE will return a body with a standard reference to the archived document:

    {
      "links": {
        "archive": "documents/{id}/deleted"
      }
    }

    In fact, this is an important point: the link helps the client understand that the document is not deleted, but moved .

    The new controller for archived documents should provide the following methods:
    GET documents / deletedGets a collection of all deleted documents
    GET documents / {id} / deletedReturns the deleted document
    POST documents / {id} / deletedRecovers deleted document;
    does not require a body; returns 201 Created
    DELETE documents / {id} / deletedPhysically deletes a document

    Implementation


    Initially, I planned to add two views to my database:

    CREATE VIEW DeletedDocuments
    AS
    SELECT *
      FROM Documents
      WHERE Deleted IS NOT NULL
    GO
    CREATE VIEW AvailableDocuments
    AS
    SELECT *
      FROM Documents
      WHERE Deleted IS NULL
    GO

    It seemed to me that this would be less trouble: instead of setting conditions in the code, I just get two different DbSet properties in my DB context. True, you have to have two identical entities in the model, but such is the property of POCO objects in the context of EF - exactly one entity corresponds to each table.

    By the way, representations in SQL can be useful for the Entity Framework in other respects: with their help, for example, you can refer to tables from another database if you do not want to create several DB contexts.

    However, in my case, the number with the views did not pass. During authorization, you need to work with all documents, because users have the same rights to deleted documents as existing ones.

    Therefore, I decided to have only one DbSet Documents in DbContext, and in the code every time I figure out what exactly is needed at the moment:

    var availableDocuments = DbContext.Documents.Where(d => d.Deleted == null);
    var deletedDocuments = DbContext.Documents.Where(d => d.Deleted != null);
    var allDocuments = DbContext.Documents;

    Related Resources


    A document is a resource that other resources are associated with. For example, we have a document alias. That is, you can get a document not only by the documents / {id} path, but also by the documents / {alias} path , where alias is a unique string.

    After deleting a document, all the aliases associated with it should become “invisible”: if earlier the client received a list of all aliases using GET documents / aliases, then after deleting the document, its aliases from the list will disappear.

    But they remained in the database! We want to provide the ability to restore the document in the state in which it was deleted. This may cause confusion for the client: he is trying to add a new alias for another document, the list returned fromGET documents / aliases does not contain such a line, but the service nonetheless refuses to add.

    I don’t think this is a serious problem. Nevertheless, if you need to solve it, you can add the endpoint GET documents / deleted / aliases . Then everything falls into place: the service cannot add an alias, since such a value is already used by the remote document.

    The question may arise: is it worth throwing an alias from the list returned from documents / aliases? Let them stay! I do not think that such a decision would be correct. Then, it turns out that the list of aliases will contain broken links, because the service will return 404 Not Found if the client tries to get the deleted document by alias. If it comes to the child resources associated with the document, then the behavior should be exactly the same as if we were deleting the document physically.

    Archive cleaning


    Soft deletion, in addition to being able to easily recover data, has several more advantages. The delete operation in relational databases is an expensive operation. And if even deleting one record leads to cascading deletion of records in other tables, then this is fraught with deadlocks. Therefore, soft removal is faster and more reliable than physical removal.

    But there is one significant drawback. The base is starting to grow.

    Therefore, at the final stage, you should take care of the automatic cleaning of the archive. You can, of course, clean the base manually, but it is better to automate this process. If we directively set the expiration date of a remote object, say, 30 days, then the client can display the archive page on which the elements whose lifetime is near the end will be highlighted.

    My hands have not reached this task yet. We plan to add to our task system a task that once a day will run a simple SQL query that removes all foul objects from the archive. As a parameter, the task should take an expiration date. It will be necessary to ensure that the current value of this parameter is stored somewhere in one place. Then it will be possible to implement a method in the service that returns this value to the client.

    Also popular now: