How to make a bomb from XML

    The oss-security mailing list has published a discussion of various XML parsing vulnerabilities . Vulnerabilities are susceptible to applications that allow libraries to process named and external entities in a DTD embedded in an XML document obtained from an untrusted source. Those. essentially, applications that do not change the default parser settings.

    Examples of XML bombs under cat. If you have XML processing applications, you can check them yourself for vulnerabilities. The bombs in this post are tested using the xmllint utility included with the libxml2 library, but you can also use other parsers.


    The XML standard allows XML documents to use DTDs to determine valid constructs from nested tags and attributes. A DTD can either be presented as a reference to an external source, or completely defined within the document itself. Example document with embedded DTD:

    Hello, world!

    In DTD, in addition to elements and attributes, you can define entities. An example of a document using named entities:

    Hello, ⌖!

    Check this document for validity and disclose entities as follows:

    $ xmllint --noent --valid hello.xml

    Exponential Blowing Up Entities

    Named entities can expand not only into character strings, but also in a sequence of other entities. Recursion is prohibited by the standard , but there are no restrictions on the permissible depth of nesting. This allows a compact representation of very long text strings (similar to how archivers do) and forms the basis of the billion laughs attack, known since 2003 .


    Modern XML parsers provide protection against such an attack. For example, libxml2 by default refuses to parse this document, despite its strict compliance with the standard:

    $ xmllint --noent --valid bomb1.xml
    Entity: line 1: parser error : Detected an entity reference loop
    bomb1.xml:8: parser error : Detected an entity reference loop

    In order to see how much it inflates when revealing entities, you must explicitly disable protection against this attack:

    $ xmllint --noent --valid --huge bomb1.xml | wc -c

    Obviously, adding a new entity, by analogy with the ones already given, inflates the output stream approximately as many times as there are links to the previous entity contained in the added one. The input document is increased by the number of bytes proportional to this number of links. Those. there is an exponential relationship between the size of the input XML document and the output character stream.

    A small XML document can cause disproportionate consumption of resources (such as RAM and processor time) for the task of parsing it to tags and character strings. This is a typical DoS attack, based on a significant difference in the complexity of the algorithm used in the typical and worst case.

    Quadratic inflation of entities

    As we have already seen, some libraries to combat the “billion laughs” attack introduce a strict artificial restriction on the depth of the tree of named entities. This restriction really helps prevent an exponential relationship between the size of the input XML file and the output character stream. However, for an attacker seeking to consume all of the server’s resources with a relatively small XML document, an exponential relationship between these values ​​is not necessary. A quadratic dependency is completely fine, and for it, one level of named entities is enough. We will simply repeat one long entity many times:


    $ xmllint --huge --noent --valid bomb2.xml | wc -c

    The --huge option has been added in case your version of libxml2 considers that the above example is an attack. She was taught this by this commit , i.e. at the time of publication of the post, the corresponding change did not manage to get into the release.

    External entities

    The XML standard contains the ability to obtain entity values ​​not only from ready-made strings, but also by accessing external resources, for example, via the HTTP protocol. This opens up the possibility for an attacker who has access to the XML parser on the zombie server to scan ports and even organize DoS attacks on other servers, hiding his IP address. This XML file, when trying to parse it with a parser that supports external entities, will create three requests to the Habr's RSS feeds:


    $ xmllint --noent --noout --load-trace bomb3.xml

    In the example above, you can prevent the parser from reading entities from the network by passing the --nonet key.

    In the same way, you can force a vulnerable application to read local files with sensitive information such as the password for the database. Unfortunately here --nonet does not help:


    $ xmllint --noent --nonet --valid bomb4.xml

    This type of attack is called XXE (from XML eXternal Entity). One recent example is a vulnerability in PostgreSQL, CVE-2012-3489 .


    Now let's talk about preventing such attacks.

    Of course, it is necessary to use versions of libraries that have taken countermeasures against these and other vulnerabilities. You must explicitly limit the resources spent parsing the XML document. For example, for libxml2, this can be done by calling xmlMemSetup () and passing your own memory management functions, which simply will not allow you to allocate too much. It is also necessary to limit access to external resources, for example, by writing your own entity loader .

    There is, however, an opinionthat all the measures listed above are aimed at the symptoms, and not at the essence of the listed vulnerabilities. Indeed, where did the task of parsing an XML document come from in accordance with the DTD mentioned (or contained) in it? Would it not be more correct to parse this XML document according to the rules of your application? After all, you check the validity of the data in the HTML form according to the regular expressions found in the code of its processor, and not that came with the form data. Accordingly, you will have enough XML-parser that does not use DTD (and therefore, non-validating), in which the necessary entities are preloaded.

    Also popular now: