Devops and Security: An Interview with Seth Vargo and Liz Rice

Containers today will not surprise anyone. Surprise the question about the safety of containers. It is especially interesting to ask about colleagues who use containers and microservices in production quite seriously: I often see surprised faces and a puzzled question, saying, “What, why is it?” It turns out that we already know about technology (and how not to know here: it seems that soon even schoolchildren will build a Kubernetes cluster in technology classes), but they have not yet learned how to protect its component parts. Perhaps there was simply no one to teach.

In this article and on DevOops we will have speakers who have eaten a dog on the topic of container security solutions that are right from the point of view of security. We go to them for answers to the simplest cloud security questions. Do you need to start something self-education?

Participants:

Seth Vargo is running Developer Advocate on Google. He previously worked at HashiCorp, Chef Software, CustomInk and several other startups in Pittsburgh. He is the author of Learning Chef and advocates for reducing inequalities in technology.

Liz Rice is a technical evangelist for Aqua Security, a security company deploying applications in cloud environments and container enterprise solutions. Liz is a very well-known person in the community, the chairman of KubeCon .

Oleg Chirukhin, edited by the JUG.ru Group

Let's start with a question that will determine our further conversation. What does DevOps mean to you? How is it different from the less well-known SRE?
For example, in a previous work, when I was asked to “implement DevOps,” I simply went over the table of contents of the SRE Book , snapped up ideas and applied them one by one. Well, or at least tried to convey them to others. What do you say - is this the right approach?

If we talk about DevOps, we are talking about avoiding the abyss, from the gap, which was usually found between the code development processes (Dev) and its subsequent maintenance (Ops). Imagine that in front of us is a high wall, on one side of which the developers are sitting, he creates a code, throws it through the wall. On the other side of the wall are people engaged in support. They catch the transferred application and start launching and maintaining. And these people know the details needed in the operation. For example, which ports and where will be allocated and open.

Instead of you having two separate groups of people with different interests, you want to make sure that they communicate with each other, decide what to do next, together determine what goals they will jointly try to achieve during the working day. So DevOps is a change in the culture of communication that helps colleagues in different technical teams to work for the common good: create business values, deploy software, and support it in their work. I think these are very important changes, and they bring with them accompanying new tools: if you do not have a culture of communication, then there is not much point in introducing the same CI / CD processes.

There is often confusion with DevOps and SRE. Some people believe that SRE competes with DevOps, others consider them to be completely different, overlapping concepts. In my opinion, they are not competitors, but close friends. Think about the approaches that underlie DevOps - reducing organizational costs, treating failures as a normal part of the workflow, gradually introducing changes, automating and implementing the necessary tools for this, using metrics to evaluate processes. As you can see, all this is very abstract: DevOps does not tell us how to reduce organizational costs or introduce gradual changes, it just tells you that it would be nice if they were implemented.

And now let's look at the SRE. Although the SRE approach has evolved independently of the DevOps approach, the SRE is, one might say, an implementation of the DevOps principles: if you represent DevOps as an abstract class or interface in programming, the SRE will be a concrete class that implements this interface. SRE has well-defined approaches to the mass of things and concepts: co-ownership, risk sharing, post-mortem, collecting and accumulating metric values, and more. Therefore, it is more convenient for organizations to talk about adopting SRE because of the presence of very clear processes and entities in it.

Do you think the term “DevOps engineer” is correct? Can it be replaced in some way?

I personally do not think that there is a concept of "DevOps engineer". You can read more in my article “10 Myths of DevOps” that “DevOps” actually represents an ideology: more communication and cooperation between different, but strongly connected essentially organizational units. Although today it looks quite sober and familiar, at first this approach caused both praise and harsh criticism. Since then, many organizations, including Etsy, Facebook and Flick, have surprisingly successfully implemented the principles of DevOps.

So, none of these organizations hired "DevOps engineers". The success of the implementation can be attributed to the emerging internal collaboration of teams and the willingness to change their existing processes and procedures in order to achieve a common goal. The role of the so-called “DevOps engineer” was to encourage teamwork between groups and to ensure that organizational units regularly exchange ideas and goals. Actually, we already have these people today, and we call them "managers."

I will add one more thing. When we assign someone to a particular position or role, we begin to expect very specific things and actions from him, so choosing a job title is important. But the exact job titles may differ due to the local realities and traditions adopted in the organization, so it’s not the name itself that is important, but rather the balance between the posts of all the people who work in the company.

Not so long ago, I communicated with a person who worked in a fairly large organization, so they created a team of several employees and called it, say, an infrastructure team. Then this team was renamed to something else, and after a while another team was created, and now it was called the infrastructure team - as a result, everyone just got confused: who belongs to the “infrastructure engineers” and what is their role.

In my opinion, it is more important not the existence of a team or position with a specific name, but the presence within the organization of a clear understanding of who is responsible for what. It doesn't matter if they call them SRE or DevOps: the main thing is to understand what a specific name means for this particular organization.

Liz, you are consulting, how do you explain to client companies the principles of DevOps? They sound rather abstract and in some ways difficult to explain. Or, maybe, you have developed some approach that allows you to convey to customers these ideas?

Watching what is the goal. Many people and companies with whom I communicate in consultations come to us and say that they want to deploy Kubernetes and containers. But before we talk about technology, you need to understand why they are trying to take such steps. And it turns out that the expected benefits of change often come down to the desire to be more flexible. Hence the movement in the direction that the technical teams could release the results of their work more quickly, something that can be explained “on the fingers”. At this point, it is useful to transfer the conversation in the direction that the problem of lack of culture (habits, if you will) of communication between team members will not be solved by any “throwing back” with new technologies, that communication is an essential resource.

In addition, since I often become involved in the issues of improving security solutions, our conversation shifts towards “Dev - ** Sec ** - Ops”, and it turns out that the construction of the system should be conducted in such a way that safety begins to be taken care of early stages of processes, and not just approached the question in the old manner: first write the application code, then deploy it, then transfer it to the operation service, and only at the very end does someone start thinking about the security of the running one.

In fact, many of the security issues are cheaper and easier to solve at the beginning of the process, but you need to put a lot of things into the project from the very beginning. For example, if you are going to work with containers, you need to take care that images are collected in a fairly safe way. It may be useful to scan them for vulnerabilities in order to at least avoid software deployment with already known security problems. Perhaps, you will try to configure the containers so that the software in them is launched by default not as root (as is often done for simplicity, without much need, when assembling the containers). If you take such steps, you will end up with an increase in the security of your application, and it will be possible, in the context of all this DevOps, to talk about the SecOps culture as well.

However, this means that developers, in fact, not being specialists in application security, are forced to think not only about the application code, but also create their own security systems. What then, in your opinion, should be the minimum skill set for a modern software developer or operating engineer?

We, whether we like it or not, constantly see the emergence of new rules and requirements, which at some point it turns out to be necessary to fulfill and comply with: take for example the same GDPR. The emergence and existence of these regulations means that more and more people should be aware of security. For example, today you can no longer store user names and passwords in plaintext-form in the database fields - this is no longer considered at least tolerable. I would say that the industry has already appeared quite clear to all the requirements for "hygiene".

A significant impact on this process is exerted by the tool and infrastructure manufacturers themselves, who are trying to design and change systems so that users can build safer applications and systems from the very beginning. For example, in Kubernetes over the past year, many of the default settings and parameter values have changed towards greater security, and this is really very cool. Previously, by default, the API server was open for anonymous access - this is not exactly what you expect to see out of the box. Now there are things like role-based access control, so we now have permissions (yes, out of the box), and when you first start Kubernetes, you are not open to the whole world, you are protected by default. I think this is a good way to make security available to everyone in the course of familiar processes,

Personally, I think that every software engineer should have at least a basic understanding of security. Things like the OWASP (Open Web Application Security Project) approaches help out, but ultimately the engineers need to educate themselves. It is unlikely that every engineer has a Ph.D. in cryptography, but if we want our colleagues to be serious about security, we must make it easier for them to make the right decisions. This is where tools like Vault can help - and security teams and professionals can make decisions and provide “security as an API”.

If I understand correctly, there is a tendency to turn everything into code. Everything as code. Infrastructure as code, processes as code, code as code. What are the consequences?

Before talking about the consequences, we need to talk about the benefits. The code exists a long time ago. Applications have always been a “code”, and during this time an extensive ecosystem of tools and processes was created to support and improve the application development process (CI / CD, linting, collaborative development tools, etc.). Having described the infrastructure as code, processes as code, security as code, we can use the same ecosystem, without paying anything extra for it. You can jointly develop infrastructure changes, review policies, etc. You can test changes to the infrastructure before deploying them. These are just some of the advantages that come with translating something into code.

I think the biggest consequences are time and complexity. When you work with something “like a code” (for example, through Terraform, Vault, CloudFormation, Deployment Manager, etc.), sometimes you have to run across inconsistencies between what is written in the code and what is actually happening. in the cloud. Modeling complex relationships is sometimes difficult to produce visually, especially given the scale. In addition, we may encounter the inaccuracy of abstractions - for example, a script that works through the API, can perceive the current state of affairs not as it is displayed to users through a web interface. However, over time, complexity decreases, and flexibility returns.

Code and other formal models are a field suitable for machine analysis and machine learning. When exactly robots will replace us? How should we respond?
Can robots customize Kubernetes without humans? In particular, can it happen that some professions (such as a system administrator or software tester with a high level of interactivity are a socially acceptable word for "manual tester") just disappear?

I do not think that people will disappear, but I suspect that some of the jobs that we have today will not continue in the future. I live in Pittsburgh, the world capital of autonomous vehicle control systems, and I see self-driving cars literally every day. In the future, of course, taxi drivers will be replaced with robotic driving technology. Perhaps not immediately, but after 10 or more years, but this future will inevitably come. However, I think taxi drivers will not disappear. Humanity is constantly reinventing itself.

I believe that the same can be said about machine learning in the field of control. I look forward to more AI to make our systems more stable and resilient, and I think we can decide to fight this approach - or accept it. The role of traditional system administrators may be replaced by someone who controls the AI system, but this does not mean that the people themselves will disappear. We experienced the same shifts in several industries, where technology and innovation provided higher speed and accuracy than humans, but in the end, someone needs to follow the robots :).

Imagine that a regular development team creates a web application, places it in a Kubernetes cluster. How can we make sure the application is protected? Hire a hacker or a blackhat expert who is trying to find security weaknesses, or are there any less complicated ways?

We at Aqua Security recently released an open source tool called “kube-hunter” . He is able to test Kubernetes for the possibility of hacking or penetration - doing a fairly simple test. You can take this tool and test your own cluster, and you will almost certainly find out something interesting for yourself, especially if your installation was still based on the old version of Kubernetes, which by default used less secure settings - you may well have, for example , open ports.

We all heard stories about people who “forgot” to close the control panel of their cluster from free access for the entire Internet, so the goal of creating a kube-hunter was to check your own system and make sure it is protected. In our opinion, hackers have long been scanning hosts on the Internet for the presence of open, well-known resources and ports, so your Kubernetes control panel, by chance not protected (and therefore open to everyone), is not so difficult to find, especially with the tools that they are used to using.

We want hunter to help ordinary Kubernetes administrators better understand whether there are configuration problems in their deployment, so it is designed exclusively for Kubernetes and reports what they found in a language that Kubernetes users understand. So we will notify you that you have not opened any port 6443, but, let's say, access to this particular component Kubernetes. Thus, it is easier for people to determine whether the found is considered a problem with a security-compromising configuration, and whether it is worthwhile to proceed immediately to fix it — and not being a security expert at Kubernetes. We want to try to make these checks available at any time, without the need for outside experts.

There is an observation: many are no longer interested in what is inside the containers they are launching.

Yes, and they have no idea what dependencies were built into these containers. Although, if you make plans to use the container approach to deploy the service, it seems sensible to know what software is inside this or that image. But this is not a problem of the container approach itself, it is only an attitude to safety on the part of those who use the approach.

Not so difficult to do everything correctly and securely. Say, as you move into the world of clouds, you need to start using additional tools — which will almost certainly add additional levels of security.

Clouds have their own specifics. For example, I often meet the error that in the cloud world the tool to which people are accustomed to the old times, will automatically do everything they need. In some ways, they are right: let's say you can use the usual (and always working) list of firewall settings, and this will increase security, but some of the old tools are no longer suitable today. For example, if you deploy an image of a container and use the old familiar tool to scan for vulnerabilities, then if the scanner does not know how to look inside the image, it simply will not find anything and you will get (possibly false) confidence that everything is in order.

I really like that when you go to the microservice architecture, you get quite a large number of small pieces of functions spread out on containers (the contents of which are in the palm of your hand), and you can see the traffic between them. Each of the containers can be perceived as a kind of black box, from which strictly defined actions are expected. And as soon as a container begins to behave in an unexpected way or produce strange results, it becomes possible to react. The better the specific images of the containers are assembled and customized, the clearer it will be if they work properly.

And how to catch anomalies? Use something like antivirus heuristics, neural networks?

During the software life cycle, we use various security tools. In this case, it is a runtime enforcer - a component that knows which container with which image we run and how it should behave. He “understands” that, for example, nginx will work inside, so he loads a pre-configured profile for this software and starts tracking actions that cannot be expected from nginx. He can also simply point to an image for which the profile is not set, then turn on the training mode and thus get a picture of “normal” behavior, against which anomalies can be seen. Plus, it controls which executable files use network traffic, which user IDs are used inside individual containers. The work of such a component of management and protection looks simply amazing:

Let's talk a little about something else. Serverless haipanul to the fullest, but very few professionals who have a deep understanding of these technologies. All have almost zero experience using it in production. What is wrong with the idea of working without a server? What do we need to understand in order to begin to use these approaches?

I do not think that severless haipanul, moreover, I think he still misunderstood. Today, the biggest problem with the serverless approach is the choice of a specific location for its application. Many organizations are trying to replace for a long time and services that have worked well on servers with a severless approach, because being “cheaper” but cheaper does not always mean better. We should better understand the trade-offs of the transition to a serverless approach, which in fact simply means "someone else's distributed system." You can not just take an existing application and make it severless. Native cloud serverless applications should be created as such from the very beginning. They should be “lean” in the sense of resource consumption, optimized for the fastest launch (and this means “cold” loading) and built taking into account the realities of distributed systems.

One of the biggest challenges with serverless is “how to do what's needed locally.” The usual approach is to link serverless applications using pub / sub, cloud storage, Redis, and a certain amount of code to glue this all together. It is difficult to verify these changes in any place other than production due to the tight connection between these different (cloud) components.

The hype around the severless approach to building services is quite unusual, and it is very interesting. For companies that use servers under serious, real workload, it will be difficult to transfer all applications to the serverless model. But it will be possible to translate something. Container security is built on the understanding of the software used, on vulnerability checks, on constant monitoring that this software behaves as expected. The same basic principles apply to serverless, and we will see some things in the near future.

Software tends to be transformed towards complication, the same serverless systems can be an example. As a result, we are forced to introduce complexity, and then fight it, because its presence makes management difficult. What to do with it?

I am a big fan of the approach, where changes are once described in some way, and then reproduced in the right situation - this, as mentioned, is an essential part of the culture that we call DevOps. Yes, in sum, it looks more complicated, but we have a lot of tools that would allow developers to unload so that they can focus on developing applications.

Recall, for example, the most common configuration file format for today: YAML. You know that you cannot run Kubernetes or an application in Kubernetes without writing a half dozen YAML files. There are jokes that you need to write at least 30,000 lines of YAML to run a simple application. So, I think that soon we will see tools that will help us not to actually write these 30,000 lines of YAML, but simply do everything ourselves, and we will only give them the code that we need to run.

I often hear that Kubernetes is difficult for beginners to understand. So it is in your opinion, and how long do you need to improve in order to understand it to your level?

I have been working with Kubernetes for the last two years and agree that it is rather difficult to understand. There are many different types of resources, types of objects and other things that need to be well and deeply understood, understanding why something is done this way and not otherwise. In addition, a declarative approach is used when your YAML file describes what you would like to achieve, and then Kubernetes will try to maintain the state of the system in accordance with this description. This is very convenient for self-healing services, but at first it is difficult to adopt such an approach. We must learn to understand how the whole system will behave.

Another reason why studying Kubernetes is quite difficult is that it is constantly evolving. Over the past two years, it has changed a lot and has become much more stable. By the way, Kubernetes finally got inCNCF status is graduated, which basically means that the product has become stable, that CNCF recommends that companies use it.

As a developer, I’m wondering: why not generate all these YAML files using a higher level description language, while we can use static error checking, etc. We can even run this description in dynamic simulation mode and check for security errors and then compile in YAML and use in Kubernetes.

Yes, there are certain tools for generating YAML, and this is a general purpose, and not just from a security point of view, and there is a tool that can look for problems in the YAML file. In general, I think there will be more and more tools, and in two or three years we will not have to bother with YAML directly.

By the way, how do you see the future of Kubernetes, for example, for the next three years?

Good question. In fact, you need to look not only at it, but also at the rapidly growing landscape of components and projects around that create a reliable platform for deploying cloud products, including the implementation of such things as security. As an excellent example, this is Istio (and other service mesh options), it has recently come to version 1.0. I think this is a good example of a really powerful tool that adds a level of security and allows you to do things that are understandable and necessary, such as deploying a canary.

I think that in two or three years we will have a fairly solid platform that will cover many different uses.

If you had a chance to give readers only one piece of advice, what would you say to them?

Scan your container images for vulnerabilities. You will be surprised to know how many people are deploying the code without knowing what is inside!

Constantly learn! Returning to the earlier question that robots occupy our jobs: the only way to prevent this is to never relax and learn all the time.

Next Sunday, at the DevOops 2018 conference, Seth will give a talk on “Modern security with microservices and the cloud” , and Liz will present “Practical steps for securing your container deployment” . Tickets can be purchased on the official website .

Tags:

devoops2018

Devops and Security: An Interview with Seth Vargo and Liz Rice

Participants:

Also popular now: