Quick sort byte array in Java

    For current tasks, it was necessary to sort large arrays of bytes, both signed (unsigned) and unsigned (unsigned). The size of the array in my case was about 10 megabytes, it is not so much, that is, you can use sorting in memory.

    At first I started using java.util.Arrays.sort (byte []) ... Unfortunately, this solution was unacceptable, because:
    - Arrays.sort allows you to sort only signed values ​​... it is very strange that the JDK developers limited themselves to this;
    - Arrays.sort uses a universal method (advanced qsort), but for a number of tasks, such as for the current one, this is not optimal.

    As a result, I drew attention to the so-called sorting by counting, which in this case will be optimal. The implementation also turned out to be very simple.

    Explanation of the sorting method:

    1. Initialize the model, which is an int array [256]. Each i-th element of the model is Ni (the number of elements in the original array equal to i).

    2. Fill the resulting array, while sequentially writing into it every i-th element of the model Ni times.

    All this is enough, very quick sorting in one pass has been made!
    Moreover, this method works for both signed and unsigned sorting, the only difference is that for the unsigned filling of the resulting array is performed by elements in the range from 0 to 255, and for the signed one in the range from -128 to 127.

    Here is the class code ( deleted comments to take up less space, to understand the principle of this code is enough):

    1. public class ByteArraySorter {
    2.     private static final int BYTE_MODEL_SIZE = 256;
    3.     private static final int BYTE_MASK = 0xFF;
    4.     private static final int BYTE_SIGNED_MIN_VALUE = -128;
    5.     private static final int BYTE_SIGNED_MAX_VALUE = 127;
    6.     private static final int BYTE_UNSIGNED_MIN_VALUE = 0;
    7.     private static final int BYTE_UNSIGNED_MAX_VALUE = 255;
    8.  
    9.     public static void sort(byte[] buffer) {
    10.         sort(buffer, BYTE_SIGNED_MIN_VALUE, BYTE_SIGNED_MAX_VALUE);
    11.     }
    12.  
    13.     public static void sortUnsigned(byte[] buffer) {
    14.         sort(buffer, BYTE_UNSIGNED_MIN_VALUE, BYTE_UNSIGNED_MAX_VALUE);
    15.     }
    16.  
    17.     private static void sort(byte[] buffer, int fromValue, int toValue) {
    18.         if (buffer == null) { return; }
    19.  
    20.         int length = buffer.length;
    21.         if (length == 0) { return; }
    22.  
    23.         int[] model = new int[BYTE_MODEL_SIZE];
    24.  
    25.         for (int i = 0; i < length; i++) {
    26.             model[buffer[i] & BYTE_MASK]++;
    27.         }
    28.  
    29.         int index = 0;
    30.  
    31.         for (int i = fromValue; i <= toValue; i++) {
    32.             int toIndex = index + model[i & BYTE_MASK];
    33.  
    34.             while (index < toIndex) {
    35.                 buffer[index] = (byte)i;
    36.                 index++;
    37.             }
    38.         }
    39.     }
    40. }
    * This source code was highlighted with Source Code Highlighter.

    Measurement results for an array of 10 megabytes:

    Arrays.sort (byte []):
    - an array filled with random values: 0.703 sec
    - an array filled with already sorted values: 0.231 sec
    - an array filled with only zeros: 0.060 sec

    ByteArraySort.sort (byte [ ]):
    - an array filled with random values: 0.032 sec
    - an array filled with already sorted values: 0. 032 sec
    - an array filled with only zeros: 0.047 sec

    Each test was run 100 times, the minimum value was taken (the average differs from the minimum only a few milliseconds, so there is an error in th case can be neglected).

    The benefits of sorting by counting are clearly visible.

    Note 1. This type of sorting is inefficient for small arrays, that is, if you need to sort, say, less than 1000 elements (the value 1000 is obtained experimentally, but if anyone has the time and desire, it can be calculated mathematically, more accurately), then it’s better to use other types of sorting. If the array contains more than 1000 elements, then counting sorting has practically no competitors.

    Note 2. This algorithm uses additional memory for the model, namely 256 * 4 = 1024 bytes. Although this is quite a bit, but there is still the potential to receive an OutOfMemoryError, it is advisable to take this into account.

    If someone offers a better option, says what can be improved, or points out possible flaws, I will be very grateful.

    Also popular now: