Triggers, permissions, and versioning on a SPARQL access point

    Anyone who tries to use the SPARQL access point as a database replacement in an industrial project will have to face several troubles. One of them is the lack in the arsenal of tools for such an access control product, triggers, and versioning organization capabilities. Having studied everything that is offered on the market today, we have come to the need to implement such functionality on our own.
    Apache Fuseki acts as the “guinea pig”, although the same principle can be applied to any other SPARQL endpoint.

    Architecture and Functionality

    The only way to implement the plan, if you do not get inside the product itself, is to create a proxy layer above the program interface of the access point. This is what we have done. All SPARQL queries addressed to the service go through proxies, where they are analyzed and further processed.
    When accessing the proxy, the application can log in under a specific user account - for this the standard program interface had to be expanded a little - and it can be accessed anonymously, remaining within the standard.
    The proxy has its own back-end, which provides the ability to configure user access rights (user groups) to ontology classes. Rights are inherited. The access level can be set to the following values:

    • no access rights;
    • reading;
    • change with moderation;
    • change.

    It is clear that each ontology object can simultaneously be an instance of any number of classes (including taking into account inheritance). Of all the rights options applicable to parent classes, the most stringent one is selected.
    The ability to edit class and property definitions is regulated by granting access rights to standard types, such as owl: Class.

    In our case, it was important to ensure the possibility of collective work on the ontology. The “change with moderation” rights level allows the user to execute DELETE / INSERT requests, but their result is not immediately applied to the database, but is submitted for approval to users who have the proper rights. Once a day, the back-end informs such users of the changes received, and they have the opportunity to apply them or reject them.
    All changes made by users to the ontology are saved in the log, which is in the back-end service database (relational; access rights settings are stored in it). As a result, we can build a history of changes for all properties of each ontology object, indicating the date and author of each change.

    Returning to access rights: any request arriving at a proxy passes a rights check either before or after execution. If the query is aimed at data sampling (SELECT, ASK, CONSTRUCT), then decisions containing those objects to which the current user does not have access are excluded from the set of its results (if the query was anonymous, only solutions consisting entirely of class instances are returned, for which there are no restrictions on rights). If the request is of type DELETE / INSERT / UPLOAD, then the set of triplets that it will affect is determined first, and if at least one of them does not have edit access rights, the request as a whole is canceled. Of course, the front-ends that work with our proxies had to be “taught” to interpret error messages, as well as warnings that the changes went into moderation.
    Paired DELETE / INSERT requests are detected, and when the INSERT request is canceled (what if?), The paired DELETE is also canceled. In general, when writing a proxy, I had to use some interesting workarounds; for example, a response to a query such as SELECT may not include objects that are not allowed to be accessed, but they will be involved in calculating the solution. Such a situation will arise, for example, when executing a request

    SELECT ?prop WHERE { ?object  "some value". 
          ?object  ?prop }

    in case the user does not have access rights to part of the objects? object. Our proxy expands such requests, and returns the? Prop properties of only available objects. Similar processing had to be applied to queries returning the value COUNT (*).

    After all of the above, it was already quite easy to implement the functionality of triggers. A trigger is a certain procedure that is executed after a request to change data if it affects instances of certain classes. In our project, triggers are used to notify external systems of changes - messages are sent to the bus; however, the same mechanism can be used for, for example, cascading changes in the database itself.

    Results and Performance

    In terms of functionality, we have achieved all the intended results. The system provides control of access rights regardless of which application sends a request to the access point, and also sends out notifications about changes in data. The change log allows you to restore the state of any object at an arbitrary point in time. The functionality of “editing with confirmation” provides a full-fledged moderation of changes in the ontology.
    It remains to find out how much additional processing will affect the speed of query execution. First of all, we were interested in ensuring that the speed of SELECT queries was not affected, since our product functions as a master data catalog for several other information systems.
    After analyzing the SELECT queries received under real load, as well as those queries to the real SPARQL access point that the proxy performs, we found that more than half of them are simple expressions like “A is a subclass of B”, and “A is a member of class B ". Of course, such queries are easy to cache, and update the cache when changing the contents of a real database using our trigger mechanism. As a result, the proxy responds to requests of this kind (as well as some more complex ones) without resorting to a real access point, and also widely uses the cache in algorithms for calculating access rights. The result exceeded our expectations: under real load, the system works only 13% percent slower than when directly accessing a real endpoint without rights control.

    With data modification requests, the situation is not so optimistic: their execution has become 6 times slower, since processing on DELETE / INSERT (and, especially, UPLOAD) is much more complicated and cannot be optimized. Well, I had to put up with this on a working system, having lost some of the performance in exchange for functionality.

    Also popular now: