Centrifuge - I will no longer refresh the page before posting a comment

    Some time has passed since I wrote about the Centrifuge the previous time. There have been many changes over this period. Much of what was described in earlier articles ( 1 , 2 ) has sunk into oblivion, but the essence and idea of ​​the project remained the same - it is a server for sending real-time messages to users connected from a web browser. When an event occurs on your site that you need to instantly notify some of your users, you post this event in the Centrifuge, and it, in turn, sends it to all interested users who are subscribed to the desired channel. In its simplest form, this is shown in the diagram:



    The project is written in Python using the Tornado asynchronous web server. You can use it even if the backend of your site is not written in Python. I would like to talk about what the centrifuge is at the moment. Trying a project in action is a snap if you are familiar with installing Python packages. Inside virtualenv:





    $ pip install centrifuge

    Launch:

    $ centrifuge

    After that , the administrative interface of the Centrifuge process that you just started will be available at http: // localhost: 8000 .

    Specially for the article, I launched the Centrifuge instance on Heroku - habrifuge.herokuapp.com . Password is habrahabr. I hope for your honesty and prudence - the demo is in no way protected from attempts to break everything and prevent others from evaluating the project. Launched on a free dino with all the consequences. Heroku, of course, is not the best place to host this kind of application, but for the purpose of demonstration it will do.

    I think I will not be far from the truth if I say that there are no Centrifuge analogues, at least in the open-source Python world. I’ll try to explain why I think so. There are many ways to add real-time events to a site. From what comes to mind:

    • freestanding asynchronous server;
    • cloud service (pusher.com, pubnub.com);
    • gevent (gunicorn, uwsgi);
    • Nginx modules / extensions;
    • BOSH, XMPP.

    JavaScript has Meteor, Derby is a completely different approach. There is also a wonderful Faye - a server that integrates seamlessly with your JavaScript or Ruby backend. But this is a solution for NodeJS and Ruby. The centrifuge implements the first of the above approaches. The advantage of a stand-alone asynchronous server (and cloud service) is that you do not need to change the code and philosophy of the existing backend, which will inevitably happen as soon as you decide to use Gevent, for example, by patching the standard Python libraries. The approach with a separate server makes it easy and painless to integrate real-time messages into the existing backend architecture.

    Disadvantage - the output with a similar architecture turns out to be a bit "truncated" real-time. Your web application must withstand HTTP requests from clients that generate events: new events get initially to your backend, are validated, saved to the database, if necessary, and only then sent to Centrifuge (in pusher.com, in pubnub.com and other). However, this limitation in most cases does not affect the tasks of the web, dynamic real-time games where one client generates a very large number of events can suffer from this. For such cases, perhaps, a closer integration of the real-time application and the backend is needed, perhaps something like gevent-socketio. If the events on the site are generated not by the client, but by the backend itself, then in this case the lack mentioned above does not play a role.

    Saying that there are no analogues of the Centrifuge in the open-source world of Python, I do not mean that there is no other implementation of a stand-alone server for sending messages via web sockets and polyfills to them. I just did not find any such project that fully out of the box solves most of the problems of real use.

    Typing in the search engine " python real-time github"you will get a lot of links to examples of similar servers. But! Most of these results only demonstrate an approach to solving the problem without going deeper. You just need one process and you need to scale the application somehow - it’s good if the project documentation says that for these purposes, you need to use a PUB / SUB broker - Redis, ZeroMQ, RabbitMQ - it’s true, but you have to implement it yourself. Often all such examples are limited to a class variable of type set, into which new connection objects are added and the distribution of a new one messages to all clients from this connection set.The

    main purpose of the Centrifuge is to provide a solution to real-life problems out of the box.Let's look at some points that we will have to deal with in more detail.

    Polyfills

    Web sockets alone are not enough. If you don’t believe me, check out the talk with the speaking name “Websuckets” from one of the developers of Socket.io. Here are the slides . And here is the video:



    Of course, there are projects (again, dynamic real-time games) for which the use of web sockets is critical. The centrifuge uses SockJS to emulate web sockets in older browsers. This means support for browsers up to IE7 by using transports such as xhr-streaming, iframe-eventsource, iframe-htmlfile, xhr-polling, jsonp-polling. For this, a wonderful implementation of the SockJS server is used - sockjs-tornado .

    It is also worth noting that you can also connect to the Centrifuge using "clean" web sockets - without wrapping the interaction in the SockJS protocol.

    The repository has a JavaScript client with a simple and intuitive API.

    Scaling

    You can start several processes - they will communicate with each other using Redis PUB / SUB. I would like to note that Centrifuge does not at all pretend to be installed inside huge sites with millions of visitors. Perhaps, for such projects, you need to find another solution - the same cloud services or your own development. But for the vast majority of projects, several server instances behind the balancer connected by the Redis PUB / SUB mechanism will be more than enough. For example, we have one instance (Redis is not needed in this case) withstands 1000 simultaneous connections without problems, with an average message sending time of less than 50ms.

    By the way, here is a graphite from Graphite for a week-long operation of the Centrifuge used by the Mail.Ru Group intranet. The blue line is the number of active connections, the green line is the average time for sending messages in milliseconds. In the middle is the weekend. :)



    Authentication and Authorization

    When connecting to the Centrifuge, you use symmetric encryption based on the secret key of the project to generate the token (HMAC). This token is validated upon connection. Also, when connected, the user ID and, optionally, additional information about him are transmitted. Therefore, the Centrifuge knows enough about your users to handle connecting to private channels. This mechanism is inherently very similar to JWT (JSON Web Token).

    I would like to note one of the recent innovations. As I said in previous articles, if a client subscribes to a private channel, then the Centrifuge will send a POST request to your application, asking if it is possible for a user with such an ID to connect to a specific channel. Now you have the opportunity to create a private channel, upon subscribing to which your web application will not be involved at all. Just name the channel as you like and at the end after the special character # write the ID of the user who is allowed to subscribe to this channel. Only a user with ID 42 will be allowed to subscribe to this channel:

    news#42

    And you can do it like this:

    dialog#42,56

    This is a private channel for 2 users with IDs 42 and 56.

    In recent versions, a connection expiration mechanism has also been added - it is turned off by default, since it is not needed for most projects. The mechanism should be considered as experimental.

    Perhaps, in the process of project development there were two most difficult decisions: how to synchronize the state between several processes (in the end, the easiest way was chosen - using Redis) and the problem with clients connecting to the Centrifuge before they were deactivated (banned, deleted) in web application.

    The difficulty here is that the Centrifuge does not store anything at all except project settings and project namespace in a permanent storage. Therefore, it was necessary to come up with a way to reliably disable invalid clients, while not having the ability to save the identifiers or tokens of these clients, given the possible downtimes of the Centrifuge and web applications. This method was eventually found. However, it has not yet been possible to apply it in a real project, hence the experimental status. I will try to describe how the solution works in theory.

    As I described earlier, in order to connect to the Centrifuge from a browser, you must pass, in addition to the connection address, some required parameters - the current user ID and project ID. Also, in the connection parameters, there must be an HMAC token generated based on the secret key of the project on the backend of the web application. This token confirms the correctness of the parameters passed by the client.

    The trouble is that earlier, once having received such a token, the client could use it in the future without problems: subscribe to public channels, read messages from them. It’s good not to write (since messages initially pass through your backend)! This is a normal situation for many public sites. However, I was sure that an additional data protection mechanism was needed.

    Therefore, among the required parameters when connecting, a parameter appeared timestamp. This is Unix seconds ( str(int(time.time()))). This one is timestampalso involved in token generation. That is, the connection now looks like this:

    var centrifuge = new Centrifuge({
        url: 'http://localhost:8000/connection',
        token: 'TOKEN',
        project: 'PROJECT_ID',
        user: 'USER_ID',
        timestamp: '1395086390'
    });
    

    An option has appeared in the project settings that answers the question: how many seconds should a new connection be considered correct? The centrifuge periodically searches for compounds that have expired and adds them to a special list (actually set) for verification. Once in a certain time interval, the Centrifuge sends a POST request to your application with a list of user IDs that need verification. The application in response sends a list of user IDs that did not pass this check - these clients will immediately be forcibly disconnected from the Centrifuge, and there will be no automatic reconnection on the client side.

    But not so simple. There is a possibility that the “attacker”, having corrected, for example, JavaScript on the client, will instantly reconnect after being forcibly kicked out. And iftimestampin its connection parameters is still valid - the connection will be accepted. But in the next verification cycle, after its connection has expired, its ID will be sent to the web application by the same mechanism, it will say that the user is invalid, and after that it will be disconnected forever (since it timestamphas already expired). That is, there is a small gap in time during which the client has the opportunity to continue reading from public channels. But its value is configured - I think it’s not at all scary if, after the actual deactivation, the user can still read messages from the channels for some time.

    Perhaps, with the help of the scheme, it will be much easier to understand this mechanism:



    Deploy

    The repository has examples of real configuration files that we use to deploy the Centrifuge. We run it on CentOS 6 for Nginx under the supervisor (Supervisord). There is a spec file - if you have CentOS, then you can build rpm based on it.

    Monitoring

    The latest version of the Centrifuge has the ability to export various metrics to Graphite via UDP. Metrics are aggregated at a given time interval, à la StatsD. Above the text was just a picture with a graph from Graphite.

    In a previous article about the Centrifuge, I told you that it uses ZeroMQ. And the comments were unanimous - ZeroMQ is not needed, use Redis, whose performance is enough with the head. At first I thought a little and added Redis as an optional PUB / SUB backend. And then there was this benchmark:



    I was surprised, really. Why was ZeroMQ so worse for my tasks than Redis? I do not know the answer to this question. After searching the Internet, I found an article where the author also complains that ZeroMQ is not suitable for fast real-time web. Unfortunately, I have already lost the link to this article. As a result, ZeroMQ is no longer used in the Centrifuge, there are only 2 so-called engines left - Memory and Redis (the first is suitable if you start one instance of the Centrifuge, and you do not need Redis and its PUB / SUB).

    As you can see in the gif above, the web interface has not disappeared, it is still used to create projects, change settings, and monitor messages in some channels. Through it, you can also send commands to the centrifuge, for example, publish a message. In general, it was before, I just decided to repeat it if you suddenly are not in the know.

    From other changes:

    • MIT license instead of BSD;
    • namespace refactoring: now this is not a separate entity and field in the protocol, but simply a prefix in the channel name, separated by a colon ( public:news);
    • improvements in the JavaScript client;
    • work with JSON can now be significantly accelerated if you additionally install the ujson module;
    • Centrifugal organization on GitHub - with repositories related to the project, in addition to the Python client for the Centrifuge - there is now an example of how to deploy the project to Heroku and the first version of the adjacent library - a small wrapper for integration with Django (simplifies life with parameter generation connections, there are methods for conveniently sending messages to the Centrifuge);
    • Amended / expanded documentation in accordance with the changes / additions;
    • many other changes are reflected in changelog .

    As it was already mentioned earlier in the blog Mail.Ru Group on Habré, the Centrifuge is used in our corporate intranet. Real-rime messages added usability, color and dynamics to our internal portal. Users don’t need to refresh the page before posting a comment (no need to update, no need to update ...) - isn’t this great?

    Conclusion

    Like any other solution, you need to use the Centrifuge wisely. This is not a silver bullet, you need to understand that by and large it is only a message broker, whose only task is to keep connections with customers and send them messages.

    Do not wait for guaranteed delivery of a message to a client. If, for example, the user opened the page, then plunged his laptop into sleep, then when he “wakes up” his typewriter, the connection to the Centrifuge will be established again. But all the events that occurred while the laptop was in sleep mode will be lost. And the user needs to either refresh the page, or you yourself need to add the logic for loading the lost events from your backend. You also need to remember that almost all objects (connections, channels, message history) are stored in RAM, so it is important to monitor its consumption. We must not forget about the operating system limits for open file descriptors and increase them if necessary. You need to think about which channel to create in a given situation - private or public, with or without history,

    As I mentioned above, there are a lot of ways to add real-time to your site, you need to choose wisely, maybe in your case the option with a separate asynchronous server will not be more advantageous.

    PS At the end of spring I attended a conference of python developers in St. Petersburg Piter Py. One of the reports talked about instant user notifications that their task, which was performed asynchronously in the Celery worker, was ready. The speaker said that for these purposes they use just Tornado and web sockets. This was followed by several questions about how it works in conjunction with Django, how it starts, what kind of authorization ... The guys who asked those questions, if you read this article, give the Centrifuge a chance, it is great for such tasks.

    Also popular now: