lany March 1, 2015 at 16:00

Evidence of Incorrect Sort Algorithms for Android, Java, and Python

Transfer

Tim Peters developed the Timsort hybrid sorting algorithm in 2002. The algorithm is a clever combination of merge sorting and insertion sorting ideas and is focused on efficient work with real data. Timsort was first developed for Python, but then Joshua Bloch (creator of Java collections, by the way, he noted that most binary search algorithms contain an error ) ported it to Java (methods java.util.Collections.sort and java.util.Arrays .sort). Today, Timsort is the standard sorting algorithm in the Android SDK, Oracle JDK, and OpenJDK. Given the popularity of these platforms, we can conclude that billions are coming from computers, cloud services and mobile devices that use Timsort for sorting.

But back in 2015. After we successfully verified Java implementations of counting sorting and bitwise sorting ( J. Autom. Reasoning 53 (2), 129-139 ) with our formal verification tool called KeY , we searched for a new object to study. Timsort seemed like a good candidate because it is quite sophisticated and widely used. Unfortunately, we could not prove its correctness. The reason for this, upon closer examination, turned out to be simple: there is a bug in the Timsort implementation. Our theoretical studies have shown us where to look for the error (it is curious that the error was already in the Python implementation). This article describes how we have achieved this.

An article with a more complete analysis, as well as several test programs are available on our website .

Content

A bug in the implementation of Timsort on Android, Java and Python
1.1. We reproduce the bug of Timsort in Java
1.2. How does Timsort work in principle?
1.3 Timsort bug step by step
Proof of (incorrect) Timsort
2.1 KeY Verification System
2.2 Correction and its formal specification
2.3 Analysis of KeY results
Proposed bug fixes for Timsort implementations in Python and Android / Java
3.1 Incorrect merge_collapse function (Python)
3.2 Corrected merge_collapse function (Python)
3.3 Incorrect mergeCollapse function (Java / Android)
3.4 Corrected mergeCollapse function (Java / Android)
Conclusion: Which of the following conclusions follow?

1. Bug in the implementation of Timsort on Android, Java and Python

So what is the bug? And why don't you try reproducing it yourself?

1.1 We reproduce the Timsort bug in Java

git clone https://github.com/abstools/java-timsort-bug.git
cd java-timsort-bug
javac *.java
java TestTimSort 67108864

If a bug is present, you will see this result

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 40
at java.util.TimSort.pushRun(TimSort.java:413)
at java.util.TimSort.sort(TimSort.java:240)
at java.util.Arrays.sort(Arrays.java:1438)
at TestTimSort.main(TestTimSort.java:18)

Error Playback Video Recording

1.2 How does Timsort work in principle?

Timsort is a hybrid algorithm that uses insertion sorting and merge sorting.

The algorithm reorders the input array from left to right, finding in it consecutive (non-overlapping) sorted segments ("series", or runs). If the series is shorter than a certain minimum length, it is supplemented by subsequent elements and sorted by the insert. The lengths of the created series are added to the end of the runLen array . When a new series is added to runLen , the mergeCollapse method merges the series until the last three elements in runLen satisfy the following pair of conditions called an “ invariant ”:

runLen [n-2]> runLen [n-1] + runLen [n]
runLen [n-1]> runLen [n]

Here n is the index of the last series in runLen. The idea was that checking this invariant for the last three series ensures that all consecutive triples of the series satisfy it. At the end, all series merge, resulting in a fully sorted input array.

For reasons of efficiency, it is advisable to allocate as little memory as possible under runLen , but it should be enough for all series to fit there deliberately. If the invariant holds for all series, the length of the series will grow exponentially (even faster than the Fibonacci numbers: the length of each series is strictly greater than the sum of the lengths of the two subsequent ones). Since the series do not overlap, they will not need so many to completely sort even a huge input array.

1.3 Timsort bug step by step

The following code snippet shows that the mergeCollapse implementation checks the invariant for the last three series in runLen :

private void mergeCollapse() {
  while (stackSize > 1) {
    int n = stackSize - 2;
    if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1]) {
      if (runLen[n - 1] < runLen[n + 1])
         n--;
      mergeAt(n);
    } else if (runLen[n] <= runLen[n + 1]) {
      mergeAt(n);
    } else {
      break; // инвариант установлен
    }
  }
}

Unfortunately, this is not enough for all series to satisfy the invariant. Suppose runLen contains such a set of lengths before executing mergeCollapse (n = 4):

120, 80, 25, 20, 30

On the first pass of the while loop, series 25 and 20 in length will be combined (since 25 <20 + 30 and 25 <30):

120, 80, 45, 30

On the second pass of the cycle (now n = 3), we determine that the invariant is set for the last three series, because 80> 45 + 30 and 45> 30, so mergeCollapse completes the work. But mergeCollapse did not restore the invariant for the entire array: it is violated in the first three, since 120 <80 + 45.

The test generator on our site allows us to identify this problem. It creates an input array with many short series, while their lengths do not satisfy the invariant, which leads to the fall of Timsort. The true cause of the error is that, due to violation of the invariant, the length of the series grows slower than expected, as a result of which their number exceeds runLen.length and this leads to an ArrayOutOfBoundsException.

2. Proof of (not) correctness of Timsort

We found that the mergeCollapse invariant is broken when we try to formally verify Timsort. Fortunately, we also figured out how to fix it. As a result, we even managed to verify the corrected version, in which the invariant is really observed. But let's not rush. First, let's figure out what formal verification is and how it is performed.

2.1 KeY verification system

KeY is a deductive verification platform for single-threaded Java and JavaCard programs. It allows you to prove the correctness of programs for a given specification. Roughly speaking, the specification consists of preconditions (in terms of KeY - requires ) and postconditions (in terms of KeY - ensures ). Specifications are given before methods that implement them (for example, before the mergeCollapse () mentioned above). The specification of a method is also called its contract .

In the case of sorting, the precondition simply asserts that a non-empty array is supplied to the input, and the postcondition requires that the resulting array is sorted and is a permutation of the input. The KeY system proves that if the method being verified is called with input data that satisfy the preconditions, the method will end normally and the resulting state will satisfy the postcondition. This is also called complete correctness , because the normal completion of the method is proved. As we saw, the java.util.Arrays.sort () method from OpenJDK does not comply with this contract, because for certain input data it ends with an exception.

Additionally, invariants of classes and objects are used to indicate general restrictions on field values. Usually, data consistency or boundary conditions are checked:

/*@ private invariant
  @    runBase.length == runLen.length && runBase != runLen;
  @*/

This invariant says that the lengths of the runBase and runLen arrays must be the same, but at the same time they must be two different arrays. The semantics of invariants implies that each output method should not only provide the postconditions of its contract, but also the invariants of its own “this” object.

KeY uses the Java Modeling Language (JML) as the specification language . The expressions on it are written in the usual Java language, so Java programmers will easily learn it. The main JML extension is quantifiers ( \ forall T x - for any T, \ existsT x - there is T) and, of course, special keywords for contracts. JML specifications are provided in the comments of java files before the code to which they relate. The following is a simple example of a Java method with a JML specification:

/*@ private normal_behavior
  @ requires
  @   n >= MIN_MERGE;
  @ ensures
  @   \result >= MIN_MERGE/2;
  @*/
private static int /*@ pure @*/ minRunLength(int n) {
  assert n >= 0;
  int r = 0;      // Становится 1 если хотя бы один единичный бит был удалён сдвигом
  /*@ loop_invariant n >= MIN_MERGE/2 && r >=0 && r<=1;
    @ decreases n;
    @ assignable \nothing;
    @*/
  while (n >= MIN_MERGE) {
    r |= (n & 1);
    n >>= 1;
  }
  return n + r;
}

The minRunLength () contract contains one precondition: the input parameter must be no less than MIN_MERGE. In this case (and only in this) the method promises to complete normally (do not loop and throw an exception) and return a value that is greater than or equal to MIN_MERGE / 2. Additionally, the method is marked as pure . This means that the method does not modify the heap.

The key point is that the KeY system is able to statically prove the contracts of similar methods for anyinput parameters. How is this possible? KeY performs symbolic execution of the verified method, that is, it executes using symbolic values, so that all possible execution paths are taken into account. But this is not enough, because the symbolic execution of loops with an unlimited number of iterations (such as the loop in mergeCollapse (), where we do not know the value of stackSize) will never end. In order for symbolic execution of cycles to become finite, the logic of invariants is used. For example, the above minRunLength () method contains a loop whose specification is expressed by a loop invariant. The invariant claims that after each iteration the condition is satisfied n >= MIN_MERGE/2 && r >= 0 && r <= 1, and from this it is possible to prove the postcondition of the whole method. Abstract decreases(decreases) is used to prove completion of the cycle. It indicates an expression whose value is non-negative and strictly reduced. The assignable construct lists heap objects that can be changed in a loop. The \ nothing keyword means the heap is not modifiable. And really: the loop only changes the local variable r and the argument n.

In general, formal contracting is not sufficient for contracting methods. It is also necessary to give invariants of cycles. Sometimes it is very difficult to come up with an invariant that is observed in the cycle and at the same time is strong enough to deduce from it the postcondition of the method. Without instrumental support and technology for automated proof of theorems, it is hardly possible to compose the correct cycle invariants in nontrivial programs. In fact, this is where the mistake of the creators of Timsort lies. Under certain conditions, the loop in mergeCollapse violates the following Timsort class invariant (see section 1.3 Timsort bug step by step ):

/*@ private invariant 
  @   (\forall int i; 0<=i && i runLen[i+1] + runLen[i+2]))
  @*/

This invariant states that runLen [i] must be greater than the sum of the next two elements, for any non-negative i less than stackSize-4. The invariant is not restored even after the cycle, so the whole mergeCollapse method does not preserve the class invariant. Therefore, the loop invariant is not as strict as the developers assumed. This is what we discovered during our attempt to verify Timsort with KeY. Without a special tool, it would be almost impossible to detect.

Although JML is very similar to Java, it is a fully-fledged contractual programming language suitable for fully functional verification of Java programs.

2.2 Correction and its formal specification

A simplified version of the contract, which, according to the authors, should be observed in mergeCollapse, is given below.

/*@ requires
  @   stackSize > 0 &&
  @   runLen[stackSize-4] > runLen[stackSize-3]+runLen[stackSize-2]
  @   && runLen[stackSize-3] > runLen[stackSize-2];
  @ ensures
  @   (\forall int i; 0<=i && i runLen[i+1] + runLen[i+2])
  @   && runLen[stackSize-2] > runLen[stackSize-1]
  @*/
private void mergeCollapse()

The two formulas in ensure imply that when mergeCollapse is completed, all series satisfy the invariant given in section 1.2 . We have already seen that this contract is not executed in the current implementation of mergeCollapse ( section 1.3 ). Now we provide a revised version that respects the contract:

private void newMergeCollapse() {
  while (stackSize > 1) {
    int n = stackSize - 2;
    if (n > 0   && runLen[n-1] <= runLen[n] + runLen[n+1] || 
        n-1 > 0 && runLen[n-2] <= runLen[n] + runLen[n-1]) {
      if (runLen[n - 1] < runLen[n + 1])
        n--;
    } else if (n<0 || runLen[n] > runLen[n + 1]) {
      break; // инвариант установлен
    }
    mergeAt(n);
  }
}

The main idea of this version is to verify that the invariant is observed for the last four series in runLen instead of three . As we will see below, this is sufficient for the lengths of all series to satisfy the invariant after the completion of mergeCollapse.

The first step in proving a contract for a fixed version of mergeCollapse is to find a suitable loop invariant. Here is its simplified version:

/*@ loop_invariant
  @  (\forall int i; 0<=i && i runLen[i+1] + runLen[i+2])
  @  && runLen[stackSize-4] > runLen[stackSize-3])
  @*/

Intuitively, the cycle invariant states that all series except, possibly, the last four, satisfy the condition. At the same time, we note that the cycle ends (according to the break statement) only if the last four series also satisfy it. Thus, it can be guaranteed that all series satisfy the invariant.

2.3 Analysis of the results of KeY

When we submit a corrected version of mergeCollapse to KeY along with the contract and the loop invariant, the system performs symbolic execution of the loop and generates verification conditions: formulas, from the truth of which it follows that the mergeCollapse contract is executed. The following (simplified) formula shows the main condition for proving the correctness of the mergeCollapse generated by KeY: The

formula states that the conditions in brackets that are true after the completion of the cycle should follow the post-condition mergeCollapse. It is clear where the expressions in parentheses came from: the break statement, which completes the loop, is executed only when they are all true. We formally proved this formula (and all other verification conditions) using KeY in semi-automatic mode. The following is a sketch of the proof:

Proof. The formula runLen [stackSize-2]> runLen [stackSize-1] from the post-condition mergeCollapse directly follows from n> = 0 ==> runLen [n]> runLen [n + 1] .

Let us prove the following formula:

\ forall int i; 0 <= i && i runLen [i + 1] + runLen [i + 2].

The following options for the value of i are possible:

i <stackSize-4: corresponds to the loop invariant
i = stackSize-4: follows from n> 1 ==> runLen [n-2]> runLen [n-1] + runLen [n]
i = stackSize-3: follows from n> 0 ==> runLen [n-1]> runLen [n] + runLen [n + 1]

This proof shows that the new version of mergeCollapse is completed only when all series satisfy the invariant.

3. Proposed bug fixes for Timsort implementations in Python and Android / Java

Our error analysis (including the mergeCollapse fix ) was sent, reviewed and accepted into the Java bugtracker .

The bug is present at least in the Java version for Android, OpenJDK and OracleJDK: all use the same Timsort code. A bug is also present in Python. The following sections provide the original and revised versions.

As mentioned above, the idea of a fix is very simple: check that the invariant is valid for the last four series in runLen, and not just for three.

3.1 Invalid merge_collapse function (Python)

Timsort for Python (written in C using the Python API) is available in the subversion repository - The algorithm is also described here .

The Timsort version for Java has been ported from CPython. The CPython version contains an error designed to support arrays up to 2 ⁶⁴ elements, also contains an error. However, it is impossible to cause an array overflow error on modern machines: the algorithm allocates 85 elements for runLen, and this is enough (as our analysis shows in the full article) for arrays containing no more than 2 ⁴⁹ elements. For comparison, today's most powerful supercomputer Tianhe-2 has 2 ⁵⁰ bytes of memory.

/* The maximum number of entries in a MergeState's 
 * pending-runs stack.
 * This is enough to sort arrays of size up to about
 *     32 * phi ** MAX_MERGE_PENDING
 * where phi ~= 1.618.  85 is ridiculously large enough, 
 * good for an array with 2**64 elements.
 */
#define MAX_MERGE_PENDING 85
merge_collapse(MergeState *ms)
{
    struct s_slice *p = ms->pending;
    assert(ms);
    while (ms->n > 1) {
        Py_ssize_t n = ms->n - 2;
        if (n > 0 && p[n-1].len <= p[n].len + p[n+1].len) {
            if (p[n-1].len < p[n+1].len)
                --n;
            if (merge_at(ms, n) < 0)
                return -1;
        }
        else if (p[n].len <= p[n+1].len) {
                 if (merge_at(ms, n) < 0)
                        return -1;
        }
        else
            break;
    }
    return 0;
}

3.2 Fixed merge_collapse function (Python)

merge_collapse(MergeState *ms)
{
    struct s_slice *p = ms->pending;
    assert(ms);
    while (ms->n > 1) {
        Py_ssize_t n = ms->n - 2;
        if (     n > 0   && p[n-1].len <= p[n].len + p[n+1].len
            || (n-1 > 0 &&  p[n-2].len <= p[n].len + p[n-1].len)) {
            if (p[n-1].len < p[n+1].len)
                --n;
            if (merge_at(ms, n) < 0)
                return -1;
        }
        else if (p[n].len <= p[n+1].len) {
                 if (merge_at(ms, n) < 0)
                        return -1;
        }
        else
            break;
    }
    return 0;
}

3.3 Invalid mergeCollapse function (Java / Android)

The error is completely similar to the error in Python:

   private void mergeCollapse() {
        while (stackSize > 1) {
            int n = stackSize - 2;
            if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1]) {
                if (runLen[n - 1] < runLen[n + 1])
                    n--;
                mergeAt(n);
            } else if (runLen[n] <= runLen[n + 1]) {
                mergeAt(n);
            } else {
                break; // Инвариант установлен
            }
        }
    }

3.4 Fixed mergeCollapse function (Java / Android)

The fix is similar to the above for Python.
[UPDATE 26/2: we changed the code compared to the original version of the article. The old code was equivalent, but contained redundant checks and differed in style. Thanks to everyone who paid attention!]

   private void newMergeCollapse() {
     while (stackSize > 1) {
       int n = stackSize - 2;
       if (   (n >= 1 && runLen[n-1] <= runLen[n] + runLen[n+1])
           || (n >= 2 && runLen[n-2] <= runLen[n] + runLen[n-1])) {
                if (runLen[n - 1] < runLen[n + 1])
                    n--;
            } else if (runLen[n] > runLen[n + 1]) {
                break; // Инвариант установлен
            }
            mergeAt(n);
        }
    }

4. Conclusion: which conclusions follow from this?

When trying to verify Timsort, we were unable to establish the invariant of the sort object. By analyzing the causes of the failure, we found an error in the Timsort implementation that throws an ArrayOutOfBoundsException for a specific input. We proposed the correction of an erroneous method (without a noticeable decrease in performance) and formally proved that the correction is correct and there is no more error.

From this story, in addition to the error actually found, several observations can be made.

Often, programmers find formal methods impractical and / or far from real-world tasks. This is not so: we found and fixed a bug in the software that billions of users use every day. As our analysis showed, it would be practically impossible to find and fix such a bug without a formal analysis and verification tool. The error lived for many years in the standard library of Java and Python languages. Her manifestations were noticed earlier and even thought they had fixed, but in reality they only achieved that the error began to occur less frequently.
Although it is unlikely that such an error will occur by accident, it is easy to imagine how it can be used to attack. Probably, in the standard libraries of popular programming languages lie other undetected errors. Maybe it’s worth looking into them before they cause harm or are used by intruders?
The reaction of Java developers to our report is in some ways disappointing. Instead of using our corrected (and verified!) Version of mergeCollapse (), they chose to increasethe selected runLen size to a "sufficient" size. As we have shown, it was not at all necessary to do so. But now everyone who uses java.utils.Collection.sort () will be forced to spend more memory. Given the astronomical number of programs using such a basic function, this will lead to noticeable additional energy costs. We can only guess why our decision was not made: perhaps the JDK developers did not bother to read our report in full and therefore did not understand the essence of the correction and decided not to rely on it. In the end, OpenJDK is developing a community consisting largely of volunteers who don't have much time.

Which conclusions follow from this? We would be happy if our work would serve as a starting point for a closer interaction between researchers of formal methods and developers of open software platforms. Formal methods have already been successfully applied on Amazon and Facebook . Modern formal specification languages and formal verification tools are notso confusing and super complex to learn. Their usability and automation are constantly improving. But we need more people who would try, test and use our tools. Yes, it will take some effort to start writing formal specifications and verifying the code, but this is no more difficult than, for example, figuring out how to use the compiler or the project builder. But we are talking about days or weeks, not months or years. Well, how do you accept the challenge?

Best regards,
Stein de Guve , Jurian Roth , Frank de Boer , Richard Bubel and Rainer Henle .

Acknowledgments

This work was partially supported by the project of the seventh framework program of the European Union FP7-610582 ENVISAGE: Engineering Virtualized Services .
This article would not have been written without the support and polite reminders of Amund Tweit ! We also want to thank Beruz Nobact for providing us with a video showing the error.

Envisage logo

Tags: