Published on November 24, 2008

Sorting the petabyte of data took 6 hours 2 minutes.


    Google conducted an experiment to sort 1 PB of data using the MapReduce framework . The data was presented as 10 trillion records, each 100 bytes long. For sorting, 4,000 computers were involved. This unprecedented amount of data for this type of task was sorted out in 6 hours 2 minutes.

    During the experiment, Google employees had to solve the problem of placing 1 PB of data. The fact is that with each new sorting start, at least one of the 48,000 used hard drives failed. As a result, it was decided to give the Google File System a command to store three copies of each file on different hard drives.

    Sorting less than 1 TB of data on 1000 computers took 68 seconds. Thus, Google broke the previous record for sorting a similar amount of data, amounting to 209 seconds on 910 computers.

    For comparison, the total volume of photos stored on Facebook is 1 PB, the Large Hadron Collider will produce 15 PB of data per year, and Google processes about 20 PB of data per day.