
Kubernetes success stories in production. Part 5: Monzo Digital Bank

The Kubernetes series of success stories continues with the story of the British startup bank Monzo. This young company is categorized as “challenger banks” (yes, this is already a term from the Oxford Dictionary ) , i.e. such small banks that challenge a large and long-established financial industry. This becomes possible due to the active and widespread use of modern information technologies in its very basis, i.e. refusal of operations in the traditional format in favor of electronic counterparts, allowing to qualitatively reduce costs (banks with this approach are also called “digital-only”). The example of Monzo, created only 2 years ago, is interesting in that it helps the Kubernetes platform, Go language and other modern Open Source products that are familiar to DevOps engineers and not only to achieve great goals.
What is Monzo?
The company Monzo , originally known as Mondo, was founded in 2015 by a team of immigrants from another challenger bank - Starling Bank, created a year earlier - led by then-successful entrepreneur Tom Blomfield, who at that time was 29 years old. Positioning Monzo as a "bank for those who hate the [traditional] banks", the founders managed to very quickly find its audience: at the beginning of 2016 they carried out "the fastest crowdfunding-campaign in history", collecting 1 million GPB for 96 seconds.

In August of the same year, Monzo received its first (limited) license for official activities as a bank, and in April 2017, the restrictions were removed, whichmade Monzo a “fully authorized bank” and allowed offering its users settlement accounts. In its first truly active calendar year (2016), Monzo activated 83,700 cards and increased its staff from 16 to 71.
The company's end-user service is an electronic bank, available as a mobile application for iOS and Android, which is in real time receives push notifications of all payments made with Monzo bank cards. It also stores the transaction history indicating the locations where the payments were made, and the automatic assignment of categories depending on the type of the company receiving the money (this data is improved by the users themselves in accordance with the crowd-sourced suggestions approach).(A more detailed description of the application’s and bank’s functionality is beyond the scope of this material.)
In August 2016, IDC analysts published a report stating their confidence that Monzo will be able to “provide all guarantees offered by any [traditional] bank, and compete effectively with major transnational banks in the country. ”
Infrastructure and Solutions
The first fairly detailed information about the software architecture in Monzo and the infrastructure that serves it was made known by the report “ Building a Bank with Kubernetes ” from Oliver Beattie, who heads the company's engineers (by that time, 10 people were working in the backend engineering team ) . This performance took place at the Kubernetes London Meetup (October'16), and then at KubeCon 2016 (November'16).
Taking extensibility as one of the main factors for the main application of the bank, the company's engineers immediately chose a microservice architecture, explaining this by the fact that they don’t want to get one big “bloated” application that cannot be changed “not only now, but also in 10, in 20 years”, and this is exactly what happens with “legacy-banks”, as they are called in Monzo :
“The IT systems of many large banks are not expanding in the sense that making changes is too expensive for them. Take, for example, the ability to freeze a card in the application by pressing a button. A friend from RBS (Royal Bank of Scotland, one of the Big Four banks in the UK - approx. Transl.) Told me that they had considered this opportunity many times, but it took too long to figure out which of the 20 IT systems will require changes, and modifications for some of them have been frozen for years. Ultimately, this idea was discarded as too expensive. "
- Jonas Huckestein , Co-Founder and CTO of Monzo
At the time of Oliver’s speech, the company had about 150 microservices in production (Docker was used for containers). One of the problems they encountered along the way was the application’s performance. When there are few microservices, it was enough to duplicate all of them on each machine, but over time (with an increase in the number of microservices) this approach stopped working. Then the engineers began to break up microservices into certain groups ( app , core , etc.) for distribution to different hosts, but this option also did not become effective: it happened that the app lacked computing power, and the machines with core were idle. At this point, Monzo engineers came to Kubernetes, which allowed “just to have a common resource pool”, where the application runs and is easily scaled as needed. No less interesting was how this change affected the cost of IT infrastructure: the

red color on this graph shows the cost of the old infrastructure, and white - managed Kubernetes. (The AWS cloud is used as the infrastructure iron for Monzo.)
The next problem is the flexibility of a complex system consisting of many interconnected microservices. At first, it was partially solved by the use of queues based on RabbitMQ, but this approach was not Kubernetes-oriented, and the number of requirements presented here increased significantly:

The general essence of these requirements was reduced to minimal delays in responding to user queries and maximum success in the chance to give an answer. To achieve these goals, linkerd (about this project and the service mesh as a software class we wrote in here ) and Finagle were chosen .
On each host where the microservices were launched, a local linkerd instance was installed, which local microservices accessed with their requests, and he already contacted the rest of linkerd to decide where to send this request.

Illustration of using linkerd when a GET request appears on a Unicorn HTTP server
To ensure network isolation of various zones of their infrastructure, Monzo took Calico and Kubernetes network policies. For example, the “super-secure” zone can be used to store the full numbers of bank cards (transmitting this information “in pure form” is not required for most services, so they can be in other zones).
At the next KubeCon conference, already European and held in March 2017, the same Monzo Oliver again made a shorter talk about using Kubernetes in the company's infrastructure, saying that by that time the number of microservices in production had grown to 200. And almost all of them belonged to the stateless category: only Apache Kafka was mentioned from the stateful with the note that “we began to actively move database technologies there too”. His presentation, entitled “Processing Real Money at Monzo with Kubernetes and Linkerd”, was continued by the namesake from Buoyant (Oliver Gould), part of whose report was more focused on general acquaintance with linkerd (instead of any specificity with Monzo).
And the latest facts about the Monzo infrastructure were found in the report “ Securing your Infrastructure with CoreOS ” (May 31 at CoreOS Fest 2017):
- the number of microservices in Monzo has grown to 230 ;
- Apache Cassandra DBMS is used to store all data;
- Vault (with the Cassandra backend) is used to store secrets, but not all data has been transferred there yet;
- To communicate with payment systems (or rather, between AWS and the co-location, which directly communicates with these systems) , a WireGuard- based VPN is used , which is "excellent for container infrastructure."

Major incident of October 27
A fly in the ointment in this success story is added by an event that happened just recently - at the end of last month. In short, production at Monzo did not work correctly for about 80 minutes due to a known bug in Kubernetes and a combination of circumstances.
Respect is caused by the fact that on the Monzo community forum, the same Oliver postedvery detailed post-mortem about what happened with explanations even for those who first hear about Kubernetes. The essence of the problem was that 2 weeks before the incident, Monzo engineers made changes to the etcd cluster, increasing the number of nodes from 3 to 9 and at the same time updating the etcd version, and checked its operability ... However, one fine day the next roll-out of the updated service caused strange problems in the system that were not resolved by rolling back to a working version.
In addition to a certain sequence of events in working with the infrastructure, bug # 47131 in Kubernetes, manifesting itself on certain versions of etcd (still not closed), and bug # 1219 played a role in this incident.in linkerd (closed in April with the release of linkerd 1.0). The operational work of the company's engineers and the existing processes / mechanisms in case of accidents minimized the problem, however, a full simple banking platform for 20 minutes (and significant problems for more than an hour) is, of course, an unacceptable mistake even for such a "youth" startup. However, the community’s reaction to this engineering post was pleasantly surprised by the positive mood of the Monzo audience, who welcomed the openness and honesty of the company after what happened.
Own development
You can find dozens of repositories in the Monzo account on GitHub , some of which are proprietary developments in the Go language. In particular:
- typhon - RPC framework for the interaction between microservices;
- slog - a library for receiving logs in a structured form (taking into account the context and arbitrary key-value pairs for each event) and sending them to seelog;
- phosphor - a distributed tracing system similar to Dapper from Google and Zipkin from Twitter;
- terrors - a package for wrapping Golang errors with additional information.
For more information on how Go is being developed at Monzo, see the report “ Building a Bank with Go, ” announced last March by Monzo Distributed Systems Engineer (Matt Heath).
Summarizing
As of October 30, the number of Monzo users was 469 thousand, and the last round of investments, which ended in November, brought the company 71 million GBP, and all this indicates the great prospects of the project.
In general, Monzo and Kubernetes are united not only by a similar age, but also by the value they bring to the world, making available today what many have dreamed about and which is a logical development of their industry. The October incident, of course, does not paint the reputation of any bank, however, from the reaction that followed it, we can conclude that in this case it will soon bring new useful experience to a very talented team of engineers than lead to a collapse of confidence.
Other articles from the cycle
- “ Kubernetes success stories in production. Part 1: 4,200 hearths and TessMaster on eBay . ”
- “ Kubernetes success stories in production. Part 2: Concur and SAP . "
- “ Kubernetes success stories in production. Part 3: GitHub . "
- “ Kubernetes success stories in production. Part 4: SoundCloud (authors Prometheus) . "
- “ Kubernetes success stories in production. Part 6: BlaBlaCar . "
- “ Kubernetes success stories in production. Part 7: BlackRock . "
- “ Kubernetes success stories in production. Part 8: Huawei . "
- “ Kubernetes success stories in production. Part 9: CERN and 210 K8s clusters. ”
- “ Kubernetes success stories in production. Part 10: Reddit . "