PetaBox or where the online archive archive.org lives

Not so long ago, on October 25, 2012, Internet Archive (archive.org) announced that the volume of sites archived from the Internet exceeded 10 petabytes (10,240 terabytes). But how and where is all this stored?

You can learn some details, as well as see the repository itself, thanks to our short review. Since Habrastorage is temporarily not working, we were forced to upload images to the server ua-hosting.com.ua. I hope that we can withstand it, if not, don’t kick much, we will upload the images later as needed :)

image

To store such a large amount of data, PetaBox was developed specifically for the Internet archive. PetaBox is a storage solution from Capricorn Technologies, which was developed by the employees of the Internet archive and CR Saikley to store and process 1 petabyte of information.

image

Specification:

- Capacity: 650 terabytes / rack;
- Power consumption: 6 kW / petabyte;
- There is no air conditioning, instead, excess heat is used to heat the rooms.

image

Infrastructure used as of December 2010:

- 4 data centers, 1300 nodes, 11,000 hard drives;
- “Time Machine”: 2.4 petabytes;
- Books / videos / music in the collection: 1.7 petabytes;
- Total storage: 5.8 petabytes.

image

History of creation


PetaBox (tm) is specially designed by the employees of the Internet archive for the secure storage and processing of 1 petabyte of information. The development goals were as follows:

- Low power consumption: 6 kW per rack, 60 kW for the entire storage cluster;
- High "density" of data placement: 100+ TB / rack;
- Using local computers for data processing (800 low-end PC's);
- the ability to use multiple operating systems;
- Ability to place in standard 19 ”cabinets / racks;
- Ability to place in a transport container 20x8x8 m;
- Ease of maintenance: one system administrator / petabyte;
- Software for automation of full backup (mirroring);
- Easy to scale;
- Inexpensive design;
- Low cost of storage.

image

History


The first 100 TB rack in the European archive began operations in June 2004. A second 80 TB rack was launched in San Francisco the same year. The online archive then created the company Capricorn Technologies, which specialized exclusively in the development and implementation of PetaBox.

image

In the period 2004-2007 Capricorn Technologies makes PetaBox replicas for major academic institutions, government agencies, and other businesses. Their largest product uses 750 gigabyte drives. In 2007, the data center of the Internet archive stores about 3 petabytes of information using PetaBox technology.

Now the fourth version of PetaBox is used, the main specifications of which are: 24 disks per 4U unit of equipment, 10 such units of equipment in a rack running Ubuntu, 240 disks of 2 TB in one rack.

image

Online archive in container


Well, in conclusion, I would like to draw attention to the transport container, which was developed by SAN for the Internet archive. The container capacity of 20x8x8 meters will save the entire library of the US Congress 55 times!


Also popular now: