How Kiwi test 1'000 Python projects
For Russian speaking posted translated version here.
This is how Alex Viscreanu’s talk on Moscow Python Conf++ named. Now it's two weeks till before the conference, but of course, I've already heard what Alex will speak about. Find below some spoilers and talk preparing backstage: what kind of an open source Zoo developed in Kiwi, how it tests Python code and what’s the difference between The Zoo and for example mypy.
— Tell us a bit about Kiwi, yourself and what is your work within a company?
Kiwi.com is an online travel agency based in Czech Republic. We aim to make travelling as simple and accessible as possible. The company was founded in 2012 as Skypicker, and since then it has become one of the five biggest online sellers of airline tickets in Europe. It was renamed to Kiwi.com in 2016.
The special feature that we, at Kiwi.com, offer is the virtual interlining, which allows us to connect flights from companies that don’t usually cooperate together, and we are covering the possible connection issues caused by delayed flights.
Regarding me, I'm Alex Viscreanu, a full-stack developer that moved from Spain to Czech Republic to work at Kiwi.com. I have worked mainly with Python in the backend and Javascript, together with frameworks like Backbone.js, Angular or recently Vue.js, on the frontend.
I joined Kiwi.com to work within the Platform team, so I'm mainly developing internal tools and keeping some of the services used by our developers. As this position requires lots of knowledge about infrastructure management and build tools I think the position title that would fit the best for what I'm doing it's actually DevOps, with a stronger Dev than Ops part.
— How many python developers do you have in Kiwi? What are their main Python projects within a company?
We have around 350 developers across all our offices. From those 350 developers, 200 probably work with Python on a daily basis. Regarding their main projects, Kiwi.com is using a microservice architecture, and every team is responsible for a number of services. Every project has its own importance inside our architecture.
I don't think I can tell you any specific main project that would be representative of what all our devs are working on. The backend code is not publicly disclosed so there isn't much I can say about this.
But we have some interesting projects publicly available on our GitHub organization. There you can find Phoenix, our outage announcement slack tool; Crane, our script for deploying to Rancher directly from GitLab; The Zoo, our service catalogue and a bunch of smaller, but still cool projects.
— What about CI/CD and deploy infrastructure?
We are using GitLab as our source code repository, and thus, GitLab CI is the solution that integrates the best. We also think it’s a quite good solution. It’s pretty flexible and it allows us to have performant pipelines and, together with Crane, direct deployments to different environments with a simple click (or automatically if you are brave enough).
The entire CI is performed by a fleet of autoscaling EC2 instances, which allows us to scale as far as we need during our working hours while keeping the cost down by not having too many unused instances outside working hours.
For orchestrating our infrastructure we are currently using Rancher, which has proven to work just fine with our load and number of services.
— Seems like Python is your language of choice. What other languages are you using and for what purposes?
The second most used language must be JavaScript, mainly used for all our frontend and GraphQL APIs. We also have Kotlin and Java, for the Android apps; Swift and Objective-C for the iOS ones; some GoLang for a bunch of services and C/C++ for our flights engine.
— You mentioned the Zoo project, which is a new Open Source from Kiwi. Why does Kiwi open source things? What's the catch?
As many other companies, we rely on open source software on almost everything we develop. Maybe it sounds like a cliché, but when you take so much it’s also nice to give back, and contribute to that collective knowledge that helps everyone to go forward.
We also think that the projects we are open sourcing can benefit other people, and, at the same time, we can also benefit from other points of view or better solutions that we didn’t consider/knew about.
— More about Zoo. How many repositories do you check with it?
We have around 1300 repositories on our internal GitLab and around 100 in our public GitHub. In total is close to 1500.
We scan every repository we have, independently of if the service is registered in The Zoo or not. The main reason behind that logic is that the analytics we gather from the scanning are beneficial for us and, moreover, when the service is registered in The Zoo the data will already be there.
— That's an impressive number! And how many errors do you normally find? Any good catches for you to remember?
Currently in our database we have around 26000 issues found, which means around 20 issues per repository. Keep in mind that most of them are not strictly issues but just recommendations.
The process for writing a check for the zoo usually starts by identifying some issue on some repository. Then, if we consider that it’s something that could be dangerous to have in more places we proceed to write the check for The Zoo, just to make sure that we can easily identify which projects are affected so we can fix it as soon as possible.
Don’t really expect critical security breaches or tricky context issues. We leverage on other tools for that purpose and, even if we integrated them in our platform, The Zoo is not meant to be the first line of detection for such issues. It’s usually more about ensuring that all our repositories follow some common guidelines.
— Zoo itself does not have any checks by default, it's up for developer to write some checks. You created the Zoo and you write all kinds of checks for it. What are these checks, can you name a few one?
Yes, The Zoo is meant to be a platform where anyone can write their own checks. We have our own, which are quite particular to our setup and configuration but we want to open source them as well.
As I explained previously, our checks are based on the issues we found on our services. This issues range from just README recommendations to ensure an easier information gathering to more advanced configuration auditings like nginx configurations.
— That seems like things every big company should check about! Will you tell more about them during your talk?
More than the content of the checks I think it’s more important the ability to have them running over all the repositories in a simple way. Of course I will give information about what we, at Kiwi.com, are checking, and hopefully other people will benefit from our knowledge.
I totally encourage people to play with it, write their own checks and contribute to the general knowledgebase. I’m sure that someone will find something that fits to their needs.
— Thanks! Finally, if you can return 5 years ago, what one Python-related advice will you give to a younger self?
This is not an easy one… As someone that just started writing Python 3 about 1.5 years ago I’d highly recommend starting as soon as possible with it. It’s a natural evolution of the language that settles the fundamentals of the language. Right now I wouldn’t go back to code with Python 2, and not only because of the near end of support, but because I feel much more comfortable with its features.
Of course I’d also carry a baggage of good libraries that I ended up discovering with time, together with some good practices that I learned and I’m still learning.
This is how Alex Viscreanu’s talk on Moscow Python Conf++ named. Now it's two weeks till before the conference, but of course, I've already heard what Alex will speak about. Find below some spoilers and talk preparing backstage: what kind of an open source Zoo developed in Kiwi, how it tests Python code and what’s the difference between The Zoo and for example mypy.
— Tell us a bit about Kiwi, yourself and what is your work within a company?
Kiwi.com is an online travel agency based in Czech Republic. We aim to make travelling as simple and accessible as possible. The company was founded in 2012 as Skypicker, and since then it has become one of the five biggest online sellers of airline tickets in Europe. It was renamed to Kiwi.com in 2016.
The special feature that we, at Kiwi.com, offer is the virtual interlining, which allows us to connect flights from companies that don’t usually cooperate together, and we are covering the possible connection issues caused by delayed flights.
Some of the numbers that we manage at Kiwi.com include 90 000 000+ daily searches, 25 000 seats sold daily, and a total of 15 000 000 000+ flight combinations available.
Regarding me, I'm Alex Viscreanu, a full-stack developer that moved from Spain to Czech Republic to work at Kiwi.com. I have worked mainly with Python in the backend and Javascript, together with frameworks like Backbone.js, Angular or recently Vue.js, on the frontend.
I joined Kiwi.com to work within the Platform team, so I'm mainly developing internal tools and keeping some of the services used by our developers. As this position requires lots of knowledge about infrastructure management and build tools I think the position title that would fit the best for what I'm doing it's actually DevOps, with a stronger Dev than Ops part.
— How many python developers do you have in Kiwi? What are their main Python projects within a company?
We have around 350 developers across all our offices. From those 350 developers, 200 probably work with Python on a daily basis. Regarding their main projects, Kiwi.com is using a microservice architecture, and every team is responsible for a number of services. Every project has its own importance inside our architecture.
I don't think I can tell you any specific main project that would be representative of what all our devs are working on. The backend code is not publicly disclosed so there isn't much I can say about this.
But we have some interesting projects publicly available on our GitHub organization. There you can find Phoenix, our outage announcement slack tool; Crane, our script for deploying to Rancher directly from GitLab; The Zoo, our service catalogue and a bunch of smaller, but still cool projects.
— What about CI/CD and deploy infrastructure?
We are using GitLab as our source code repository, and thus, GitLab CI is the solution that integrates the best. We also think it’s a quite good solution. It’s pretty flexible and it allows us to have performant pipelines and, together with Crane, direct deployments to different environments with a simple click (or automatically if you are brave enough).
The entire CI is performed by a fleet of autoscaling EC2 instances, which allows us to scale as far as we need during our working hours while keeping the cost down by not having too many unused instances outside working hours.
For orchestrating our infrastructure we are currently using Rancher, which has proven to work just fine with our load and number of services.
— Seems like Python is your language of choice. What other languages are you using and for what purposes?
The second most used language must be JavaScript, mainly used for all our frontend and GraphQL APIs. We also have Kotlin and Java, for the Android apps; Swift and Objective-C for the iOS ones; some GoLang for a bunch of services and C/C++ for our flights engine.
— You mentioned the Zoo project, which is a new Open Source from Kiwi. Why does Kiwi open source things? What's the catch?
As many other companies, we rely on open source software on almost everything we develop. Maybe it sounds like a cliché, but when you take so much it’s also nice to give back, and contribute to that collective knowledge that helps everyone to go forward.
We also think that the projects we are open sourcing can benefit other people, and, at the same time, we can also benefit from other points of view or better solutions that we didn’t consider/knew about.
— More about Zoo. How many repositories do you check with it?
We have around 1300 repositories on our internal GitLab and around 100 in our public GitHub. In total is close to 1500.
We scan every repository we have, independently of if the service is registered in The Zoo or not. The main reason behind that logic is that the analytics we gather from the scanning are beneficial for us and, moreover, when the service is registered in The Zoo the data will already be there.
— That's an impressive number! And how many errors do you normally find? Any good catches for you to remember?
Currently in our database we have around 26000 issues found, which means around 20 issues per repository. Keep in mind that most of them are not strictly issues but just recommendations.
All of the proper issues are good catches.
The process for writing a check for the zoo usually starts by identifying some issue on some repository. Then, if we consider that it’s something that could be dangerous to have in more places we proceed to write the check for The Zoo, just to make sure that we can easily identify which projects are affected so we can fix it as soon as possible.
Don’t really expect critical security breaches or tricky context issues. We leverage on other tools for that purpose and, even if we integrated them in our platform, The Zoo is not meant to be the first line of detection for such issues. It’s usually more about ensuring that all our repositories follow some common guidelines.
— Zoo itself does not have any checks by default, it's up for developer to write some checks. You created the Zoo and you write all kinds of checks for it. What are these checks, can you name a few one?
Yes, The Zoo is meant to be a platform where anyone can write their own checks. We have our own, which are quite particular to our setup and configuration but we want to open source them as well.
As I explained previously, our checks are based on the issues we found on our services. This issues range from just README recommendations to ensure an easier information gathering to more advanced configuration auditings like nginx configurations.
— That seems like things every big company should check about! Will you tell more about them during your talk?
More than the content of the checks I think it’s more important the ability to have them running over all the repositories in a simple way. Of course I will give information about what we, at Kiwi.com, are checking, and hopefully other people will benefit from our knowledge.
I totally encourage people to play with it, write their own checks and contribute to the general knowledgebase. I’m sure that someone will find something that fits to their needs.
— Thanks! Finally, if you can return 5 years ago, what one Python-related advice will you give to a younger self?
This is not an easy one… As someone that just started writing Python 3 about 1.5 years ago I’d highly recommend starting as soon as possible with it. It’s a natural evolution of the language that settles the fundamentals of the language. Right now I wouldn’t go back to code with Python 2, and not only because of the near end of support, but because I feel much more comfortable with its features.
Of course I’d also carry a baggage of good libraries that I ended up discovering with time, together with some good practices that I learned and I’m still learning.
Come to Moscow Python Conf++ on April 5 to learn the details of working with this interesting open source project and maybe some way borrow Kiwi's experience.