What we read in March: five essential books for infrastructure engineers
We at Skyeng are slowly building our library of important and useful books. It all started with the fact that the founders of the company shared their lists on Facebook (links below), and now the heads of directions also joined them. In March, Nadezhda Ryabtsova, who is responsible for our IT infrastructure, presented her top professional literature. I asked her to tell about each book a little more - I hope the readers of Habra this list, supplemented by four weekly newsletters, will be useful.
First, the promised links. Georgy Soloviev shares a list of important books for entrepreneurs , and Khariton Matveev for product managers .
Nadezhda Ryabtsova aka ladamalina- Head of Skyeng IT infrastructure since 2016, came to us from a small (then) startup of Delivery Club. At that time, only 15 developers worked for us, and now her department of six remotely working people serves 12 teams of programmers.
I give her the floor.
You cannot force our SRE engineers to read everything, so I selected the five most necessary books. The main thing is to realize that in order to support the rapid growth of the company, we must implement practices and build new processes in the operations department, which literally three to six months ago were not needed.
Practical Monitoring: Effective Strategies for the Real World
A must read for growing startups, no matter how large the infrastructure. Explains the philosophy of service monitoring for a company and the construction of each component. Most system administrators install Zabbix with the collection of a minimum set of metrics and an alert for default thresholds. This approach does not work for us in Skyeng, for each of more than 50 projects it is necessary to be able to identify problems at several levels: application indicators, hardware status, trends and anomalies in business metrics. The metrics in each of our products are taken care of by analysts, developers, and devs.
Site Reliability Engineering: How Google Runs Production Systems
If I am not mistaken, this book is the first where SRE principles were well systematized and the role of reliability engineer was described. Super accessible on practical examples describes how the processes of incident management, monitoring and alerting in distributed systems, methods for identifying routine tasks that degrade team performance are built on Google. The approaches are explained in such a way that it is easy to project this on your company, incomparably smaller than Google. Only six engineers serve the Skyeng infrastructure, and this is enough if we correctly adapt the experience of leading large companies.
The Art of Capacity Planning: Scaling Web Resources in the Cloud
The book will teach you ahead of time to plan infrastructure expansion for growing projects. If there is not enough capacity in the end, then we have poorly planned. If there are four times as many as required, then we spent a lot of money in vain. Preliminary estimates must be made for a year or more, fortune telling on a crystal ball will not help. About 8 years ago, it was harder to do, as it seems to me, although there were already cloud services then, but they did not provide as many services as they do now.
The Phoenix Project. A novel about how DevOps is changing the business for the better
The only book in this short list in Russian, it’s a pity that they translate so little. It is popularly written, it helps to take a fresh look at delivery processes in development, identify bottlenecks, see volumes of routine tasks, protect planned work from the blockage of unplanned “fires”. I would say that this book is most useful to managers for reflection, but I also advise engineers, it is easy to read.
The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations
Read right after the Phoenix Project, a book by the same authors, continues and develops ideas for improving development processes. I also advise leaders in the first place. Soon there will be a publication in Russian, we are looking forward to it.
There is also a weekly mailing listthat you won’t include in the library, but I recommend it to all engineers:
O'Reilly Systems Engineering and Operations Newsletter
Well, I’ll traditionally remind you that we have many interesting vacancies . Although not in the IT infrastructure department (the positions were recently closed there), but there will be enough work for everyone!