10,000,000,000,000,000 bytes archived
On October 25, activists and employees of the Internet Archive held a ceremony for a significant event: the Internet archive exceeded 10 petabytes (10 16 bytes). Thanks to this archive with the Time Machine, we can see what famous sites looked like many years ago, find saved copies of web pages or simply restore your site from a “free backup”.
Internet Archive announced the distribution of 80-terabyte sample samples for 2011 to everyone for research. WARC files contain about 2.7 billion URIs. They include all text content and everything else that was saved, including images, video, flash, etc.
Start date: March 09, 2011
End date: December 23, 2011
Number of unique URLs: 2,273,840,159
Number of hosts: 29,032,069 Heritrix
spider first downloaded 1 million most popular sites according to Alexa (Habr was already there), and then followed the links . Another interesting fact announced at the ceremony. For the first time, the entire literary heritage of an entire nation has been fully digitized and uploaded to the Internet. These people became the Balinese . The Internet Archive was honored by the legendary scientist and programming ideologist Donald Knuth. He played the organ, opening the ceremony.