Algorithm for distributing data in a server cluster in dCache

    In the continuation of the article about dCache, I will talk about some details of the internal implementation.

    One of the important tasks of distributed systems is how to distribute the load among existing nodes. For distributed storage, this task is especially important, since the decision made at the writing stage affects how the data is read.



    Usually, the data recorded together will be read together. Be it photos from your last vacation or the latest results of a scientific experiment. This fact makes us scatter incoming data on as many nodes as possible in order to avoid the accumulation of clients on one of the servers. Easily solved problem if all nodes have the same size and the same free disk space. But this is rare in real conditions. New and, with a large volume, servers are connected when free space is already on the verge.

    In dCache, this is solved using two mechanisms: weighted random distribution taking into account free space (weighted random distribution) when recording and rebalancing data when adding new nodes.

    And so, how a weighted arbitrary distribution works taking into account free space:
    • each node is calculated weight , which is equal to the amount of free space on the node divided by the total free space in the system:
      weight = FreeN / FreeTotal
    • one of the nodes is arbitrarily selected, and the probability of choosing a node is directly proportional to its weight

    In code, it looks something like this:

    public Pool selectWritePool(Pool[] pools){
        double[] weights = newdouble[pools.length];
        long totalFree = 0;
        for (Pool pool:pools) {
          totalFree += pool.getFree();
        }
        int i = 0;
        for (Pool pool:pools) {
          weights[i] = (double) pool.getFree() / totalFree;
          i++;
        }
        return pools[i];
    }
    privatefinal Random rand = new Random();
    publicstaticintweightetRandom(double[]weights, Random r){
        double selection = r.nextDouble();
        double total = 0;
        int i = 0;
        for (i = 0; (i < weights.length) && (total <= selection); i++) {
          total += weights[i];
        }
        return i - 1;
    }
    


    This mechanism allows you to use all nodes, but those with more free space - more often. Probabilistic sampling ensures that the decision to write to any one specific server will not be made for different clients at the same time.

    As mentioned above, the internal re-balance command uses the same algorithm to even out server load. The load is calculated by the ratio of free space to total:
    load = FreeN / TotalN

    This algorithm is well proven in combat conditions.

    Also popular now: