Youkai April 8, 2015 at 13:32

Caching Layer on Top of Linq to SQL

Over the past year, we have moved an impressive portion of DirectCRM settings to the database. Many elements of promotional campaigns, which we previously described exclusively with code, are now created and configured by the manager through the admin panel. This resulted in a very complex database structure, with dozens of tables.

However, I had to pay for the transfer of settings to the database. For an architecture that allows you to cache rarely changing Linq to SQL entities, see cat.

The main unit of system settings is a promotional campaign. The promotion campaign includes many elements: trigger mailings, prizes and operations that the consumer can perform on the site. Each element has many settings that refer to other entities in the database. The result is a tree with a lot of nesting. To pull out this tree, you need to make dozens of database queries that adversely affect performance.
Initially, it was clear that you would have to attribute campaign caching, but we strictly followed the first optimization rule - do not optimize until the system performance satisfies us. But, of course, in the end the moment came when it was impossible to live without caching.
There were no questions at what point the cache should be reset - it needs to be updated whenever the campaign or its element changes (since the campaign is an aggregate, there are no difficulties in implementing such logic). But on the question of where to store this cache and how to reset it, I had to think a bit. The following options were considered:

store cache in Redis and clear when editing a campaign
store the cache in Redis and make a query in the database before each use to check if the campaign date has changed
store the cache in the process memory and make a query in the database before each use to check if the campaign date has changed

The option using Redis is attractive, since it allows you to keep only one copy of the cache for all WEB servers, and not load it for each server, but in fact it is a savings on matches: campaigns change rarely after the end of the configuration and it does not matter if the cache loaded four times a day or sixteen. Since there are no distributed transactions between Redis and the database, it is difficult to guarantee that when the campaign changes, the cache will be reset, so the first option disappears.

The third option has the following advantages over the second:

to get the cache you need to perform one request over the network instead of two
no need to bother with serialization (even if for this it is enough to put DataMemberAttribute for the necessary fields)

In general, the problem and the general approach to the solution are clear. Next will go the technical details.

Version 1.0

In order to make full use of the entities loaded in another DataContext within the current DataContext, they must be cloned and attached. An attachment adds the entity to the IdentityMap of the current DataContext. You cannot attach the same entity to two different DataContexts (therefore, each clone can be used in only one DataContext, i.e., in one business transaction). MemberwiseClone cannot be used for cloning, as in this case the clone will have references to entities attached to the DataContext in which the entity was loaded, and not to the current DataContext.

As a result, in order to be able to use entities from the static cache, the following code was written in the base class of the repository using Linq MetaModel and a little Reflection:

the code

public TEntity CloneAndAttach(TEntity sourceEntity)
{
    if (sourceEntity == null)
        throw new ArgumentNullException("sourceEntity");
    var entityType = sourceEntity.GetType();
    var entityMetaType = GetClonningMetaType(entityType);
    var entityKey = entityMetaType.GetKey(sourceEntity);
    // Проверка, не позволяющая клонировать одну и ту же сущность более одного раза. 
    // Кроме прочего помогает разобраться с циклическими ссылками.
    if (attachedEntities.ContainsKey(entityKey))
    {
        return attachedEntities[entityKey];
    }
    var clonedObject = Activator.CreateInstance(entityType);
    var clonedEntity = (TEntity)clonedObject;
    attachedEntities.Add(entityKey, clonedEntity);
    // клонируем поля
    foreach (var dataMember in entityMetaType.Fields)
    {
        var value = dataMember.StorageAccessor.GetBoxedValue(sourceEntity);
        dataMember.StorageAccessor.SetBoxedValue(ref clonedObject, value);
    }
    // клонируем EntityRef'ы
    foreach (var dataMember in entityMetaType.EntityRefs)
    {
        var thisKeyValues = dataMember.Association.ThisKey
            .Select(keyDataMember => keyDataMember.StorageAccessor.GetBoxedValue(sourceEntity))
            .ToArray();
        if (thisKeyValues.All(keyValue => keyValue == null))
        {
            continue;
        }
        var repository = Repositories.GetRepositoryCheckingBaseTypeFor(dataMember.Type);
        var value = repository.CloneAndAttach(dataMember.MemberAccessor.GetBoxedValue(sourceEntity));
        dataMember.MemberAccessor.SetBoxedValue(ref clonedObject, value);
    }
    // клонируем EntitySet'ы
    foreach (var dataMember in entityMetaType.EntitySets)
    {
        var repository = Repositories
            .GetRepositoryCheckingBaseTypeFor(dataMember.Type.GenericTypeArguments[0]);
        var sourceEntitySet = (IList)dataMember.MemberAccessor.GetBoxedValue(sourceEntity);
        var clonedEntitySet = (IList)Activator.CreateInstance(dataMember.Type);
        foreach (var sourceItem in sourceEntitySet)
        {
            var clonedItem = repository.CloneAndAttach(sourceItem);
            clonedEntitySet.Add(clonedItem);
        }
        dataMember.MemberAccessor.SetBoxedValue(ref clonedObject, clonedEntitySet);
    }
    table.Attach(clonedEntity);
    return clonedEntity;
}
private ClonningMetaType GetClonningMetaType(Type type)
{
    ClonningMetaType result;
    // Информация о клонируемом типе кэшируется в статическом ConcurrentDictionary, чтобы не юзать Reflection каждый раз
    if (!clonningMetaTypes.TryGetValue(type, out result))
    {
        result = clonningMetaTypes.GetOrAdd(type, key => new ClonningMetaType(MetaModel.GetMetaType(key)));
    }
    return result;
}
private class ClonningMetaType
{
    private readonly MetaType metaType;
    private readonly IReadOnlyCollection keys;
    public ClonningMetaType(MetaType metaType)
    {
        this.metaType = metaType;
        keys = metaType.DataMembers
            .Where(dataMember => dataMember.IsPrimaryKey)
            .ToArray();
        Fields = metaType.DataMembers
            .Where(dataMember => dataMember.IsPersistent)
            .Where(dataMember => !dataMember.IsAssociation)
            .ToArray();
        EntityRefs = metaType.DataMembers
            .Where(dataMember => dataMember.IsPersistent)
            .Where(dataMember => dataMember.IsAssociation)
            .Where(dataMember => !dataMember.Association.IsMany)
            .Where(dataMember => dataMember.Member.HasCustomAttribute())
            .ToArray();
        EntitySets = metaType.DataMembers
            .Where(dataMember => dataMember.IsPersistent)
            .Where(dataMember => dataMember.IsAssociation)
            .Where(dataMember => dataMember.Association.IsMany)  
            .Where(dataMember => dataMember.Member.HasCustomAttribute())
            .ToArray();
    }
    public IReadOnlyCollection Fields { get; private set; }
    public IReadOnlyCollection EntityRefs { get; private set; } 
    public IReadOnlyCollection EntitySets { get; private set; }
    public ItcEntityKey GetKey(object entity)
    {
        return new ItcEntityKey(
            metaType,
            keys.Select(dataMember => dataMember.StorageAccessor.GetBoxedValue(entity)).ToArray());
    }
}

So that when cloning, when passing through EntityRef and EntitySet, you do not unload half of the database, only those marked with the special attribute CacheableAssociationAttribute are cloned.

Since the cache is loaded in a separate DataContext, and cloning occurs after Dispose, we had to write a method that loads all cached associations of the loaded entity:

the code

public void LoadEntityForClone(TEntity entity)
{
    // Проверка, устраняющая проблемы с циклическими ссылками
    if (loadedEntities.Contains(entity))
    {
        return;
    }
    loadedEntities.Add(entity);
    var entityType = entity.GetType();
    var entityMetaType = GetClonningMetaType(entityType);
    // Загружаем все EntityRef'ы
    foreach (var dataMember in entityMetaType.EntityRefs)
    {
        var repository = Repositories.GetRepositoryCheckingBaseTypeFor(dataMember.Type);
        var value = dataMember.MemberAccessor.GetBoxedValue(entity);
        if (value != null)
        {
            repository.LoadEntityForClone(value);
        }
    }
    // Загружаем все EntitySet'ы
    foreach (var dataMember in entityMetaType.EntitySets)
    {
        var repository = Repositories
            .GetRepositoryCheckingBaseTypeFor(dataMember.Type.GenericTypeArguments[0]);
        var entitySet = (IList)dataMember.MemberAccessor.GetBoxedValue(entity);
        foreach (var item in entitySet)
        {
            repository.LoadEntityForClone(item);
        }
    }
}

Of course, if you use DataLoadOptions, you will need fewer requests to load an EntityRef. But we didn’t even think about dynamically building DataLoadOptions, because there is a bug in Linq to SQL that causes the entity type to load the entity type incorrectly when using DataLoadOptions to load the EntitySet (the type discriminator is ignored and the parent is loaded instead of the child type).

Since we access records in the database through repositories, I also made cache access there. When you try to get a campaign or campaign item, the following happens:

Checks if the campaign has already been added to the current DataContext from the cache. If so, then we return the object that has already been cloned and attached.
If the campaign is not in the cache, load it into the cache. Otherwise, we compare the date of the last change of the campaign in the database with the date of the last change of the campaign stored in the cache, and if they differ, we reload the campaign.
We clone and attach the campaign and all its elements to the current DataContext.

With this approach, we only attach the campaigns that we need and check the cache's relevance (date the campaign was changed) only once per transaction (once during the lifetime of the DataContext). Since there are many methods in the repositories for getting campaign elements (by Id, by campaign, by a unique string identifier), the cache data structure turned out to be complicated, but you can live with it.

The new caching architecture passed all integration tests and was successfully posted on the Production of one of the projects. Everything worked stably and nothing portended trouble.

Something went wrong

After laying out the cache for the next project, the errors: “DuplicateKeyException: cannot add an entity with a key that is already in use” fell out while attaching the campaign to the current DataContext. First thought: “We made a mistake somewhere in our recursive cloning code, and we attach the same entity twice.” - turned out to be wrong. The error was more conceptual.

Who's guilty

The code in the final project turned out to be to blame, referring to the campaign without using the cache before the attachment of the same campaign from the cache occurred. As I wrote above, when attaching an entity is added to IdentityMap, but if before that the entity was already loaded from the database, then it is already in IdentiyMap and an attempt to add an entity with the same key causes an error. Linq to SQL does not allow checking if an entity exists in IdentityMap before the attachment. And if you catch and ignore attachment errors, caching would be inferior.
There was an idea to remove from the repositories that work with the cache the ability to access the database bypassing the cache. Besides the fact that this is far from an easy task, this would not solve the problem completely, since many non-cached entities have EntityRef for cached entities, and when accessing EntityRef, the request goes past the repository.

What to do

There are two solutions to this problem:

Remove from cached repositories the ability to access the database bypassing the cache and remove references to cached from all non-cached entities.
Attach the cache when creating the DataContext before any other database queries are executed.

The first option would be to rewrite half the code. At the same time, the resulting code would be more cumbersome, since wherever there was an appeal to EntityRef, one would have to use repository methods. So I had to stop at the second option.

Since we attach entities when creating a DataContext, and not when referring to specific entities, we will have to explicitly indicate at the beginning of the transaction which entities we need from the cache (loading the entire cache each time would be clearly redundant). That is, the chosen approach is not universal caching; it just allows you to fine-tune the use of the cache for individual transactions.

New architecture

In some of our repositories, the results of queries on entity Ids were cached for the duration of the transaction in the Dictionary. In fact, this transaction cache can be considered our IdentityMap, used over IdentityMap Linq to SQL. We only had to transfer the transaction cache to the base repository and add access methods to it not only by Id, but also in other ways. As a result, we got a class of the transaction cache, upon access to which in any way, each entity is loaded no more than once. If, for example, we request an entity first by Id, and then by a unique string identifier, then the second request will not be executed and the access method will return the entity from the cache.

Since we now have a transaction cache in any repository, to support static caching, when creating a DataContext, it’s enough to fill in the transaction cache from the static cache, without forgetting to clone and attach each entity. All this looks as follows:

Speaking specifically about campaign caching, loading a static cache works like this:

IQueryable from the campaigns is passed to the factory creating the DataContext together with all the repositories, which describes which campaigns we will need.
Id and the date of the last change of campaigns from the transferred IQueryable are pulled out by one request.
For each campaign, the pulled-out date is checked against the date of this campaign in the cache and, if different, the campaign, together with all its elements, is reloaded.
All entities pulled from the static cache are attached and loaded into the transaction caches of the corresponding repositories.

What's next

Further, you can configure caching for other transactions as needed. In addition to the static cache for campaigns, general classes were created for static caching of dictionaries, which change only when the application is laid out. Those. Caching any entities for which you do not need to check the relevance of the cache is very easy to enable. In the future, I want to expand this functionality by adding the ability to flush the cache after a certain time, or due to simple queries in the database. It may be possible to redo the campaign caching to use the standard static cache classes with the necessary settings.

Although the resulting architecture looks harmonious and allows you to uniformly implement different types of caching, the need to explicitly indicate which entities we need at the beginning of the transaction is not encouraging. In addition, in order to load all the entities needed in a transaction and at the same time not to attach a bunch of extra objects to the context, you need to know the transaction structure in full detail.

There is one more implementation option for the cache, which, unfortunately, cannot be implemented in Linq to SQL, is to replace the standard IdentityMap with its own implementation, which before adding an entity tries to get it from the cache. With this approach, a cached entity loaded in any way will be replaced with its copy from the cache, which already has all EntityRef and EntitySet loaded. Thus, no subqueries will need to be made to receive them. If, at the same time, we add an entity search in IdentityMap before the database query to the most frequently used repository methods, the number of database queries will decrease drastically. At the same time, you can search in IdentityMap before the request for any entities, and not just those for which static caching is used. Since after receiving any entity from the database it is replaced with its copy from IdentityMap (if it is already there), during the lifetime of the DataContext it is impossible to get two different copies of one entity, even if it was changed in another transaction (the first loaded entity will always be returned ) Therefore, it makes no sense to request an entity twice per transaction, for example, by key, and any key requests for the duration of the transaction can be cached by checking the IdentityMap before the request.

An attempt to implement this architecture we will do after we move to the Entity Framework. EF allows you to get entities from IdentityMap, so at least partially the architecture can be improved.

Tags: