7 habits of successful Site Reliability Engineers (according to New Relic)

Original author: Kevin Casey
  • Transfer
Note perev. : This is a translation of an article from New Relic ’s blog , which has published similar materials over the course of the year about various IT specializations related to software development and operation. The author is Kevin Casey, an independent journalist and winner of the Azbee Award, who writes for various publications and companies (including Red Hat).

In a recent publication, we examined the rise of Site Reliability Engineer in modern software organizations. But to be called SRE is one thing, but we would also like to know what is required in order to succeed in this position.

Therefore, we decided to study the traits and habits that are common to truly successful SREs. As with most development and operations, it’s obvious that top-notch technical skills are critical. For SRE, these specific skills may depend on how a particular organization defines or applies the position: Google’s approach to Site Reliability Engineering may require more experience in software engineering and code writing, and in another company, operational or quality assurance skills may be more valuable ( QA). Nevertheless, as it turned out during the study, what does the development and operation specialists do ?successful, which separates the "great" from the "good enough" is often a combination of habits and characteristics that complement the technical expertise.

The seven habits presented below were derived from detailed interviews with New Relic employees: Beth Long (Software Engineer) and Jason Qualman (Site Reliability Engineer). Let's get a look:

Habit 1: You analyze each change in the context of a (much) larger picture

Successful software developers understand how their code helps the entire business. SRE has its own version of this trait. “You need someone who really thinks not only about everyday tasks, but also about the bigger picture. A successful SRE can understand and explain things at a higher level, ”says Jason. Inside New Relic, we describe people such as “those who constantly analyze in every change the possible risks and its impact on the future, not just for today.” What does this mean for large infrastructure?

Habit 2: You are pragmatic and visionary in analysis

The best SREs take a pragmatic approach and evaluate how their work will affect the rest of the system or team. This approach minimizes the likelihood that the change "is thrown over the wall without understanding how it can affect a person sitting on the other side."

“We make decisions that are at a very low level throughout the stack. Sometimes they can hurt everyone above. You need to understand how solving a specific problem will affect everyone else who meets along the way, ”says Jason.

Habit 3: You want to keep moving when something doesn't help

Part of a pragmatic approach for SRE is the desire to discard processes and operations that may be appropriate, but are not really effective. Beth recalls an example when New Relic changed its reliability practices:

“A few years ago we went through a stage of active growth and, to hinder any instability associated with this, we implemented the Change Acceptance Board (CAB) process [advice on accepting changes; apparently, it implies a change advisory board - approx. perev. ] . It was intended to help us evaluate releases prior to their launch in production in order to protect against changes that break something and cause future incidents. The irony is that with a reduction in the speed of the cycle of releases we have begun to accumulate all used to lshie and used on lshie changes, the effect of which was the complete opposite of what was intended. These larger changes have increased the risks for each release. ”

In the end, the CAB process was thrown away in favor of more frequent and smaller releases, which led to much better results.

Habit 4: You use every automation feature

High-end SREs successfully cope with the main difficulty: how to increase the reliability of everything they do, without slowing down the company's ability to deliver software quickly. The solution is almost always automation. SRE needs to be proactive in the search for solutions to labor-intensive tasks, bugs, etc., with which manual interaction is carried out, using new automation methods or process changes.

“A significant component of this position is to think about ineffective and time-consuming tasks and abolish them as quickly as possible. Instead of postponing the solution of tasks performed manually, you say: “I will find the time to automate it right now and save everyone from having to do this painful activity,” ”explains Jason.

The obsessive focus on automation is not unique to New Relic: for example, The DevOps Handbook has a whole chapter that talks about the paradoxical effects of accepting manual processes. In SRE job descriptions, “automation” and its various manifestations are more common than any other words. A recent vacancy at SRE by Procore Technologies in Los Angeles, a construction management software company, has this second paragraph in its description: “Automate, automate, automate, and then ... automate!”.(Although only 4 days have passed since the original publication, the mentioned vacancy has already been closed, however, you can find many other examples of “automate” in the description of SRE obligations according to other companies - approx. Transl .)

Habit 5: You can convince the organization to do what is needed

Confidence in upholding a specific automation task or other SRE initiative is another attribute that defines the best SREs. You should want to defend your position, why it is critical to automate some process, or in another part of the work. And this is not easy, because it can cause a clash with the culture and speed of work of many traditional organizations working in the field of software.

Portland New Relic rally

Good SREs live with their engineering-specific version of the self-help classic How to Win Friends and Influence People". Simply put, their work includes the need to convince other people to do things that they don’t initially want, for example, a software engineer to focus more on the product’s capabilities rather than on the problems that may arise when scaling a product over several next years.

The best SREs need to be effective sellers, able to sell their colleagues the long-term benefits of automating a particular process or project, even if it may turn out to be difficult in the short term. Total? “You must be able to defend your position and say“ stop ”or“ no, we really need to do it now ”, which may be difficult in some organizations,” explains Beth.

Habit 6: You expand your skills with new tools and approaches.

As the SRE concept is still new, many SREs have previously held other posts. Some SREs may have developer experience, while others may have a traditional approach to operation. Jason and Beth note that hiring managers who do not reduce the role of SRE to one specific past experience are the most effective. For example, a traditional QA engineer may also have good training for an SRE position.

Regardless of the past, there is a chance that the SRE position will force you to leave your comfort zone and develop new skills. For example, a specialist in the field of exploitation may find it useful to study a programming language or three, and someone with experience in development will have to want and learn to think much more thoroughly about the processes and difficulties of operation than they used to do in the past. The best SREs take this path of training and skill development.

Habit 7: you trust the process

If there is some guiding philosophy for successful SREs, then it can be expressed as follows: in fact, you do not pursue the holy Grail, which will prevent everything from any breakdowns. This rarely works. Instead, you work tirelessly to see the big picture, implement automation, stimulate healthy patterns, learn new skills and tools, and improve reliability in everything you do. Perfection cannot be achieved, but the constant desire to make things better is the way to be followed.

American engineers New Relic on vacation

P.S. All company photos are taken from Glassdoor .

Also popular now: