misterion October 23, 2014 at 12:06

Introduction to Reducing the Number of Requests with Redis [Part 1]

From the sandbox

Recently, I have written several different ways to limit the number of queries using Redis . Both in commercial and in personal projects. In two parts of this publication, I want to cover two different but related ways to limit the number of queries — using standard Redis commands and using Lua scripts. Each subsequent of the described methods will add new use cases and solve the flaws of the previous ones.

This post assumes that you have some experience with Python and Redis and, to a lesser extent, with Lua, but those who do not have this experience will also be interested.

Why limit the number of requests?

For example, Twitter limits the number of requests to its API, while Reddit and StackOverflow use limits on the number of posts and comments.

Someone limits the number of requests to optimize the utilization of resources, someone struggles with spammers. In other words, on the modern Internet, limiting the number of requests to the platform aims to limit the impact that the user can have. Whatever the reason, let's proceed from the fact that we should count some actions of the user and prevent them if the user has reached or exceeded a certain limit. Let's start by limiting the number of requests to a certain API, to a maximum of 240 requests per hour per user.

We know that we need to count actions and restrict the user, so we need a little helper code. First, we must have a function that gives us one or more identifiers for the user performing the action. Sometimes it’s just the user's IP, sometimes its identifier. I prefer to use both if possible. At least IP if the user is not authorized. Below is the function that gets the IP and user ID using the Flask Flask-Login plugin.

from flask import g, request
def get_identifiers():
    ret = ['ip:' + request.remote_addr]
    if g.user.is_authenticated():
        ret.append('user:%s'%g.user.get_id())
    return ret

Just use counters

Now we have a function that returns user identifiers and we can begin to count our actions. One of the simplest methods available in Redis is to compute a key for a time range and increment a counter in it whenever an action of interest occurs. If the number in the counter exceeds the value we need, we will not allow the action to be performed. Here is a function that uses automatically damped keys with a range (and lifetime) of 1 hour:

import time
def over_limit(conn, duration=3600, limit=240):
    bucket = ':%i:%i'%(duration, time.time() // duration)
    for id in get_identifiers():
        key = id + bucket
        count = conn.incr(key)
        conn.expire(key, duration)
        if count > limit:
            return True
    return False

This is a fairly simple function. For each identifier, we increase the corresponding key in Redis and set its lifetime to 1 hour. If the counter value exceeded the limit you will return True. Otherwise, return False.

That's all. Well, or almost. This allows us to solve our problem - to limit the number of requests to 240 per hour for each user. However, the reality is that users will quickly notice that the limit is reset at the beginning of each hour. And nothing will stop them from making their 240 requests in a couple of seconds right at the beginning of the hour. Our work will then go to smarka.

We use different ranges

Our primary goal of restricting requests with an hourly basis was successful, but users begin to send all their API requests as soon as possible (at the beginning of every hour). It looks like that in addition to the hourly limit, we should introduce a second-by-minute and minute-by-minute limit to smooth out situations with the peak number of requests.

Suppose we decide that 10 requests per second, 120 requests per minute and 240 requests per hour are enough for our users, and will allow us to better distribute requests over time.

To do this, we can simply use our over_limit () function :

def over_limit_multi(conn, limits=[(1, 10), (60, 120), (3600, 240)]):
    for duration, limit in limits:
        if over_limit(conn, duration, limit):
            return True
    return False

This will work as we expected. However, each of the 3 calls to over_limit () can execute two Redis commands - one to update the counter and the second to set the lifetime for the key. We will execute them for IP and user ID. As a result, it may require up to 12 requests to Redis to simply say that one person has exceeded the limit for one operation. The easiest way to minimize the number of requests to Redis is to use pipelining . Such queries are also called transactional in Redis. In the context of Redis, this means that you will send many commands in one request.

We are fortunate that our over_limit () function is written so that you can easily replace the INCR and EXPIRE calls with a single query with MULTI. This change will allow us to reduce the number of requests to Redis from 12 to 6 when we use it together with over_limit_multi () .

def over_limit(conn, duration=3600, limit=240):
    pipe = conn.pipeline(transaction=True)
    bucket = ':%i:%i'%(duration, time.time() // duration)
    for id in get_identifiers():
        key = id + bucket
        pipe.incr(key)
        pipe.expire(key, duration)
        if pipe.execute()[0] > limit:
            return True
    return False

Reducing the number of calls to Redis by half is great, but we still make 6 requests just to see if the user can make an API call. You can write another version of over_limit_multi () , which does all the operations at once and checks the restrictions after, but it is obvious that the implementation will have several errors. We will be able to limit users and allow them to make no more than 240 requests per hour, however, in the worst case, it will be only 10 requests per hour. Yes, the error can be corrected by making another request to Redis, or you can simply transfer all the logic to Redis!

We think correctly

Instead of correcting our previous implementation, let's transfer it to the LUA script that we will execute inside Redis. In this script we will do the same thing that we did above - we will go through the list of restrictions, for each identifier we will increase the counter, update the lifetime and check if the counter has exceeded the limit.

import json
def over_limit_multi_lua(conn, limits=[(1, 10), (60, 125), (3600, 250)]):
    if not hasattr(conn, 'over_limit_multi_lua'):
        conn.over_limit_multi_lua = conn.register_script(over_limit_multi_lua_)
    return conn.over_limit_multi_lua(
        keys=get_identifiers(), args=[json.dumps(limits), time.time()])
over_limit_multi_lua_ = '''
local limits = cjson.decode(ARGV[1])
local now = tonumber(ARGV[2])
for i, limit in ipairs(limits) do
    local duration = limit[1]
    local bucket = ':' .. duration .. ':' .. math.floor(now / duration)
    for j, id in ipairs(KEYS) do
        local key = id .. bucket
        local count = redis.call('INCR', key)
        redis.call('EXPIRE', key, duration)
        if tonumber(count) > limit[2] then
            return 1
        end
    end
end
return 0
'''

Look at the piece of code immediately after the 'local bucket' . See that our Lua script looks like our previous solution and performs the same operations as the original over_limit () ?

Conclusion

We started with one time interval, and in the end, we have a method for limiting the number of requests that can work with several levels of restrictions, work with different identifiers for a single user, and performs only one request to Redis.

Actually, any of the options of our limiters can be useful in different applications.

I could not find how to correctly indicate for the article from the sandbox that it was a translation:

Posted by Josiah Carlson
Original article

Tags: