Go vs Excel hundreds of thousands of lines

    This year we already wrote on Habré about our project SmartCalls.io - a visual call designer created for business users. The project solves the business problem of mass customer dialing: a visual call script is created, an Excel file with telephone numbers is loaded, and then a dialing campaign is created. The campaign is launched - the customer calls start; at any time, you can watch the statistics, pause the campaign, boost the settings. Clients were satisfied until it became clear that sometimes it’s not just a lot of people who need to be telephoned, but VERY, VERY much. Under the cut - the essence of the problem and how we won it with the help of the HYIP (not without any basis) programming language.


    Initially, file processing was implemented in PHP 7.1 — it was an obvious choice, since the entire SmartCalls API was written on it. Work with call-lists had one limitation - the file should contain no more than 10 thousand filled lines. This restriction was from the very beginning in SmartCalls and, however, was not critical. Until a certain point.

    One large bank had a need for very large ringing campaigns: it was necessary to call many more than 10 thousand users. Of course, nothing prevented breaking large files into several small ones and loading them in turn, but making customers suffer like this is not our method. Speaking of loading one by one - if our client has already launched a telephone call campaign and suddenly wanted to add users to it, then he can easily do it. This is very convenient, because you do not need to stop dialing or launch a separate campaign for new users. But it should be understood that the possibility of reloading was not conceived as a way to load large files manually, in chunks.

    So, the team has a task - to implement the download of large files in the campaign for dialing.


    We are very competent in developing in Java - for example, in part, the Voximplant API is implemented in this language; we are also good at PHP (see the example above, Captain Obvious suggests). That is, we could quickly close this task using one of these languages, however, we had long thought to expand our technology stack, and then we remembered Go very quickly: it is fast enough (works well with memory), multi-threaded and does not need runtime because Go compiles to an executable binary. In addition, we can say about the size of the containers, but more on that later ...

    As a result, we wrote a microservice in the Go language, which accepts sheets of large size (tested up to 300 thousand lines) and format (xls, xlsx and all types). It is time for details.


    When a client loads a file> 10 thousand lines into a SmartCalls campaign, microservice is taken for it. It takes pointers to the input:

    • on the file uploaded to the S3 storage;
    • to the campaign in which this file needs to be downloaded.

    Then microservice runs through the file, hits it on the chunks of 10 thousand lines (maximum for the platform), and each chunk as a csv file loads into the S3 storage, making notes about the chunk in the database (path to the file, number of lines). Each chunk is processed and loaded in a separate thread, which gives an additional increase in execution speed.

    For reading Excel files, open-source libraries from tealeg and extrame were used . It's good that they have not only a lot of stars, but also fresh commits :)

    import (
      "github.com/tealeg/xlsx""github.com/extrame/xls"// прочие импорты

    And everything would be fine, but not without nuances. During the development, it turned out that xlsx and xls, created in different editors, are very different in format and rules for working with them. I had to do a lot of tests - OpenOffice, Excel of different versions, LibreOffice, Google Sheets, in order to teach microservice to bring files to a single form - CSV. After microservice “chews” a large file and turns it into CSV, the SmartCalls API turns on and already works with this csv file. For microservice, we left a limit of 300 thousand lines, as it strongly covers the needs of clients, and we did not encounter any greater needs.

    As a result, the implementation showed itself perfectly on the tests and the preprode, after which we rolled it into the prod.


    Our team is always trying to quickly roll out new features / improvements, because we want satisfied customers to remain so. The task with large files was not just another challenge for us, but also a good reason to introduce Go into the project, which we had been looking at for a long time. In addition to the rapid development and speed of work, Go gives us the groundwork for the future when we begin to introduce containers (to make seamless updates and that's it), which this language is very lightweight. We will write about containers separately, stay tuned :)

    Also popular now: