Celery in busy projects: some practice

    On the eve of our Moscow Python Conf ++, we briefly talked with Oleg Churkin, a techlide FINTECH start-up, about his extensive experience with Celery: half a million background tasks, bugs and testing.

    - Tell us some details about the project you are working on now?

    At the moment, I'm running Fintech-start-up Statusmoney , which analyzes user financial data and allows customers to compare their income and expenses with other groups of people, set spending limits, watch how wealth grows or falls on charts. While the project is focused only on the North American market.

    To analyze financial information, we download and store all user transactions and integrate with credit bureaus to obtain additional data on credit history.

    Now we have about 200 thousand users and 1.5 terabytes of various financial data from our suppliers. Some transactions are about 100 million.

    - And what is the technological stack?

    The current project stack is Python 3.6, Django / Celery and Amazon Web Services. We actively use RDS and Aurora for storing relational data, ElasticCache for cache and for message queues, CloudWatch, Prometheus and Grafana for alerting and monitoring. And, of course, S3 file storage.

    We are also very actively using Celery for various business tasks: sending notifications and sending mass emails, mass updating various data from external services, asynchronous API and the like.

    On the front end we have React, Redux and TypeScript.

    - What is the main nature of the loads in your project and how do you
    cope with them ?

    The main load in the project falls on the background tasks that are performed by Celery. We run about half a million different tasks every day, for example, updating and processing (ETL) financial data of users from various banks, credit bureaus and investment institutions. In addition, we send a lot of notifications and count a lot of parameters for each user.

    We also have an asynchronous API implemented that “pulls” the results from external sources and also generates many tasks.

    At the moment, after tuning the infrastructure and Celery, we cope without problems, but earlier everything happened, I will definitely tell about it in my report.

    - How do you scale this and provide fault tolerance?

    For scaling, we use Auto Scaling Groups, a toolkit provided by our AWS cloud platform. Django and Celery scale well horizontally, we just adjusted the limits a bit to the maximum amount of memory used by uWSGI / Celery workers.

    - And monitor what?

    To monitor cpu / memory usage and the availability of the systems themselves, we use Cloud Watch in AWS, aggregate various metrics from the application and from Celery workers using Prometheus, and build graphs and send alerts to Grafana. For some data in Grafana, we use ELK as the source.

    - You mentioned the asynchronous API. Tell a little more about how it works for you

    Our users have the opportunity to “link” their bank (or any other financial) account and give us access to all their transactions. We display the linking and transaction processing process dynamically on the site, using the usual pooling of current results from the backend for this, and the backend retrieves data by running the ETL pipeline from several repetitive tasks.

    - Celery - a product with a controversial reputation. How do you live with him?

    According to my feelings, our relationship with Celery is now at the “Acceptance” stage - we figured out how the framework works inside, selected settings for ourselves, figured out the deployment, monitored and wrote several libraries to automate routine tasks. Some functionality was not enough for us “out of the box”, and we added it ourselves. Unfortunately, at the time of choosing the stack of technologies for the project, Celery had not so many competitors, and if we used simpler solutions, we would have to add much more.

    With bugs in the fourth version of Celery, we have never encountered. Most of the problems were related either to our lack of understanding of how this all works, or to external factors.

    I will tell you about some of the libraries written in our project in my speech.

    - My favorite question. How do you test all this music?

    Celery tasks are well tested with functional tests. Integration is tested using autotests and manual testing on QA-stands and staging. At the moment, we have not yet solved a couple of questions with testing periodic tasks: how to let testers run them and how to check that the schedule for these tasks is correct (meets the requirements)?

    - And tests for frontend and layout? What is the general ratio of manual and
    automated testing?

    On the front, we use Jest and write only unit tests for business logic. 55% of business criticisms of cases are now covered by auto tests on Selenium, at the moment we have about 600 tests in TestRail and 3000 tests on the backend.

    - What will be your report on Moscow Python Conf ++?

    In the report, I will tell you in detail for what tasks and how you can use Celery, and compare it with existing competitors. I will describe how to avoid various rakes when designing a complex system with a large number of tasks: which settings should be specified immediately, and which ones can be left for later, how to fix the new version of the code so as not to lose tasks when switching traffic, share written libraries to monitor tasks and queues.

    I will also touch on the implementation of the ETL pipelines on Celery and answer how to describe them nicely, what to use retry policy, how to granularly limit the number of tasks performed in conditions of limited resources. Plus I will describe what tools we use to implement batch processing of tasks, which economically consumes the available memory.

    In general, if you crave the details on all the above points, come. I hope my report will seem useful and interesting to you.

    Also popular now: