Budget petabytes: How to build a cheap cloud storage. part 2
- Transfer
To be continued. Start see here
The Backblaze storage container power wiring diagram is shown below. Power supplies (PSUs) provide most of their power at 2 different voltages: 5V and 12V. We use 2 PSUs in the container, since 45 drives require a lot of 5V power, while powerful ATX PSUs give most of their power via the 12V bus. This is no coincidence: 1500W and more powerful ATX PSUs are designed for powerful 3D video cards that need extra power on the 12V bus. We might prefer 1 server PSU, but 2 ATX PSUs are cheaper.
BP1 feeds 3 front fans and multiplier panels of ports 1, 2, 3, 4, and 7. BP2 feeds everything else. (For a detailed list of special connectors on each PSU, see Appendix A). To power the panels of the port multipliers, the power cables go from the PSU through 4 holes in the separation metal plate on which the fans are held, in the center of the case (near the base of the fans) and then to the bottom of the 9 panels. Each panel of port multipliers on the bottom side has 2 Molex male connectors. Hard drives require the greatest power during the initial spin up of the drives, so if you turn on both PSUs at the same time, there will be a large (14 amp) peak of 120V power from the outlet. We recommend that you first turn on BP1, wait until the disks spin up (and the power consumption drops to reasonable values), and then turn on BP2.
Below is a photo of a partially assembled Backblaze storage container (click on the photo to enlarge). The metal case at the bottom has screws facing upwards to which we attach nylon gaskets (small white things in the photo below). Nylon helps dampen vibration, and this is a critical aspect of server design. The cards shown above the nylon gaskets are several of the 9 SATA port multiplier panels, which have 1 SATA connector on the bottom, and 5 hard drives can be inserted vertically on the top of the boards. All power and SATA cables go under the port multiplier panels. One of the panels in the photo below is completely filled with hard drives to show placement.
Note about disk vibrations: the discs vibrate too much if you leave them standing as shown in the photo above, so we wrap an “anti-vibration sleeve” (almost rubber tape) around the hard drive, between the red metal grill and the discs. This holds the disc tightly in the rubber. We also put a large (40 cm x 42 cm x 3 mm) piece of porous material along the top of the hard drives, after all 45 are inserted into the case. After that, the cover is screwed on top of the porous material to securely fasten the discs. In the future, we will devote a whole blog post to vibrations.
Below is the wiring diagram for SATA cables.
4 SATA cards are inserted in the Intel motherboard: 3 dual-port SYBA boards and 1 four-port Addonics board. 9 SATA cables attach to the top of the SATA boards and come in tandem with power cables. All 9 SATA cables are 91 cm long and use L-shaped connectors with latches on the side of the port multiplier panels and direct connectors without latches on the side of the SATA boards.
Note about SATA chipsets: each port multiplier board contains a Silicon Image SiI3726 chip so that 5 drives can be connected to 1 SATA port. Each of the SYBA 2-port PCIe SATA cards contains Silicon Image SiI3132, and Addonics 4-port PCI cards contain Silicon Image SiI3124. We use only 3 of the 4 available ports on Addonics boards, because we only have 9 port multiplier panels. We do not use SATA ports on the motherboard, because, despite Intel claims to support port multipliers in their south bridge ICH10, we noticed strange results in our performance tests. Silicon Image was a pioneer in port multiplier technology, and their chips work best together.
The Backblaze storage container is not a complete building block until it boots up and is online. The containers run on 64-bit Debian 4 Linux and the JFS file system, and they are self-contained devices that are accessed and accessed via HTTPS. Below you see a layer diagram.
Starting at the bottom, there are 45 hard drives available through SATA controllers. Then we use the fdisk utility on Linux to create 1 partition per disk. Above this, we combine 15 hard drives in 1 RAID6 volume with 2 parity drives (out of 15). RAID6 is created by the mdadm utility. The JFS file system stands above this, and the only type of access that we allow to this fully self-contained building block of the repository is through HTTPS based on a special level of Backblaze programming logic in Apache Tomcat 5.5. Taking all this into account, formatted (available) space is 87% of the raw capacity of hard drives. One of the most important aspects here is that any read / write of data to the Backblaze storage container occurs only through HTTPS. There is no iSCSI, nor NFS, nor SQL, nor Fiber Channel.
We are extremely pleased with the reliability and excellent performance of the containers, and the Backblaze storage container is a fully self-contained storage server. But the logic of where to save the data and how to encrypt it, index it and eliminate duplication is at a higher level (beyond the scope of this posting). When you manage a datacenter with thousands of hard drives, processors, motherboards and power supplies, you will have hardware failures - this is undeniable. Storage Containers Backblaze - the building blocks, which can be constructed in a large system, which does not allow the presence of single point of failure (single point of failure). Each container in itself is only a large piece of raw storage at a low price; he himself is not yet a "decision."
The first step to building a cheap cloud storage is to already have a cheap storage, and above we demonstrated how to create your own. If all you need is cheap storage, then that’s enough. If you need to build a cloud, then you still have to work.
Building a cloud involves not only installing a lot of hardware, but, importantly, deploying software to manage the hardware. At Backblaze, we developed programs that eliminate duplication and “slice” data into blocks; encrypt and transmit for backup; Reassemble, decrypt, re-create duplicate blocks and pack the data for recovery; finally, they monitor and manage the entire cloud storage system. This process is our own technology that we have been developing for years.
You can own your own system for this process and implement the Backblaze storage container design, or maybe you're just looking for inexpensive storage that won't be part of the cloud. In both cases, you can freely use the storage container design described above. If you do so, we would appreciate a link to Backblaze and welcome any insights, although this is not necessary. Please note that since we do not sell the design or storage containers themselves, we do not provide any support or guarantees.
In the following series: in a few weeks we’ll talk about iPhone vibration sensors, Swiss cheese-like container designs, why electricity costs more traffic, and more about the design of a large cloud storage.
The design of a Backblaze storage container would not have been possible without the sheer amount of help (usually requested bluntly) by the incredibly smart and generous people who answered our questions, worked with us and gave key clarifications at critical moments. Firstly, we thank Chris Robertson for the inspiration to build our own repository and for his early work on prototypes; Kurt Schaefer for advice on metal processing and the concept of "furniture" for printed circuit boards; Dominic Giampaolo of Apple Computer for his advice on hard drives, vibration, and certification; Stuart Cheshire of Apple Computer and Nick Tingle of Alcatel-Lucent for tips on low-level networks; Aaron Emigh (EVP & GM, Core Technology) at Six Apart for his help with the initial design; Gary Orenstein for clarifying the reliability of drives and the storage industry in general; Jonathan Beck for invaluable advice on vibrations, fans, cooling and case design; Silicon Image's Steve Smith (Senior Design Manager), Imran Pasha (Director of Software Engineering), and Alex Chervet (Director of Strategic Marketing), who helped us debug problems with the SATA protocol and loaned 10 different SATA boards for tests; James Lee of Chyang Fun Industries in Taiwan for developing SATA boards to simplify our design; Western Digital Wes Slimick, Richard Crockett, Don Shields, and Robert Knowles for their help debugging Western Digital drive logs; Protocase Christa Carey, Jennifer Hurd, and Shirley Evely for offering hundreds of small enhancements to the 3-D case design; Chester Yeung of Central Computer for delivering locally supplied parts quickly and continuously when it really mattered; Mason Lee from Zippy for tips on power supplies and special cables; as well as Angela Lai for knowing all the right people and representing them appropriately.
Finally, we thank the thousands of engineers who spent millions of hours working for free to get container components that are either cheap or completely free, such as an Intel processor, Gigabit Ethernet, amazingly dense hard drives, Linux, Tomcat, JFS, etc. We are aware that we stand on the shoulders of giants.
From the translator: Appendix A is designed as a neat table, which I cannot reproduce using Habr’s tools. In addition, to translate into Russian “760 Watt Power Supply”, “Qty”, “Price”, “Total” and “SATA II Cable” - this, in my opinion, is already too much. Therefore, please see the original application at the end of the original English posting.
Connecting wires: How to assemble a Backblaze storage container
The Backblaze storage container power wiring diagram is shown below. Power supplies (PSUs) provide most of their power at 2 different voltages: 5V and 12V. We use 2 PSUs in the container, since 45 drives require a lot of 5V power, while powerful ATX PSUs give most of their power via the 12V bus. This is no coincidence: 1500W and more powerful ATX PSUs are designed for powerful 3D video cards that need extra power on the 12V bus. We might prefer 1 server PSU, but 2 ATX PSUs are cheaper.
BP1 feeds 3 front fans and multiplier panels of ports 1, 2, 3, 4, and 7. BP2 feeds everything else. (For a detailed list of special connectors on each PSU, see Appendix A). To power the panels of the port multipliers, the power cables go from the PSU through 4 holes in the separation metal plate on which the fans are held, in the center of the case (near the base of the fans) and then to the bottom of the 9 panels. Each panel of port multipliers on the bottom side has 2 Molex male connectors. Hard drives require the greatest power during the initial spin up of the drives, so if you turn on both PSUs at the same time, there will be a large (14 amp) peak of 120V power from the outlet. We recommend that you first turn on BP1, wait until the disks spin up (and the power consumption drops to reasonable values), and then turn on BP2.
Below is a photo of a partially assembled Backblaze storage container (click on the photo to enlarge). The metal case at the bottom has screws facing upwards to which we attach nylon gaskets (small white things in the photo below). Nylon helps dampen vibration, and this is a critical aspect of server design. The cards shown above the nylon gaskets are several of the 9 SATA port multiplier panels, which have 1 SATA connector on the bottom, and 5 hard drives can be inserted vertically on the top of the boards. All power and SATA cables go under the port multiplier panels. One of the panels in the photo below is completely filled with hard drives to show placement.
Note about disk vibrations: the discs vibrate too much if you leave them standing as shown in the photo above, so we wrap an “anti-vibration sleeve” (almost rubber tape) around the hard drive, between the red metal grill and the discs. This holds the disc tightly in the rubber. We also put a large (40 cm x 42 cm x 3 mm) piece of porous material along the top of the hard drives, after all 45 are inserted into the case. After that, the cover is screwed on top of the porous material to securely fasten the discs. In the future, we will devote a whole blog post to vibrations.
Below is the wiring diagram for SATA cables.
4 SATA cards are inserted in the Intel motherboard: 3 dual-port SYBA boards and 1 four-port Addonics board. 9 SATA cables attach to the top of the SATA boards and come in tandem with power cables. All 9 SATA cables are 91 cm long and use L-shaped connectors with latches on the side of the port multiplier panels and direct connectors without latches on the side of the SATA boards.
Note about SATA chipsets: each port multiplier board contains a Silicon Image SiI3726 chip so that 5 drives can be connected to 1 SATA port. Each of the SYBA 2-port PCIe SATA cards contains Silicon Image SiI3132, and Addonics 4-port PCI cards contain Silicon Image SiI3124. We use only 3 of the 4 available ports on Addonics boards, because we only have 9 port multiplier panels. We do not use SATA ports on the motherboard, because, despite Intel claims to support port multipliers in their south bridge ICH10, we noticed strange results in our performance tests. Silicon Image was a pioneer in port multiplier technology, and their chips work best together.
Backblaze storage container runs on free software
The Backblaze storage container is not a complete building block until it boots up and is online. The containers run on 64-bit Debian 4 Linux and the JFS file system, and they are self-contained devices that are accessed and accessed via HTTPS. Below you see a layer diagram.
Starting at the bottom, there are 45 hard drives available through SATA controllers. Then we use the fdisk utility on Linux to create 1 partition per disk. Above this, we combine 15 hard drives in 1 RAID6 volume with 2 parity drives (out of 15). RAID6 is created by the mdadm utility. The JFS file system stands above this, and the only type of access that we allow to this fully self-contained building block of the repository is through HTTPS based on a special level of Backblaze programming logic in Apache Tomcat 5.5. Taking all this into account, formatted (available) space is 87% of the raw capacity of hard drives. One of the most important aspects here is that any read / write of data to the Backblaze storage container occurs only through HTTPS. There is no iSCSI, nor NFS, nor SQL, nor Fiber Channel.
Backblaze Storage Container - Building Block
We are extremely pleased with the reliability and excellent performance of the containers, and the Backblaze storage container is a fully self-contained storage server. But the logic of where to save the data and how to encrypt it, index it and eliminate duplication is at a higher level (beyond the scope of this posting). When you manage a datacenter with thousands of hard drives, processors, motherboards and power supplies, you will have hardware failures - this is undeniable. Storage Containers Backblaze - the building blocks, which can be constructed in a large system, which does not allow the presence of single point of failure (single point of failure). Each container in itself is only a large piece of raw storage at a low price; he himself is not yet a "decision."
Cloud Storage: Next Step
The first step to building a cheap cloud storage is to already have a cheap storage, and above we demonstrated how to create your own. If all you need is cheap storage, then that’s enough. If you need to build a cloud, then you still have to work.
Building a cloud involves not only installing a lot of hardware, but, importantly, deploying software to manage the hardware. At Backblaze, we developed programs that eliminate duplication and “slice” data into blocks; encrypt and transmit for backup; Reassemble, decrypt, re-create duplicate blocks and pack the data for recovery; finally, they monitor and manage the entire cloud storage system. This process is our own technology that we have been developing for years.
You can own your own system for this process and implement the Backblaze storage container design, or maybe you're just looking for inexpensive storage that won't be part of the cloud. In both cases, you can freely use the storage container design described above. If you do so, we would appreciate a link to Backblaze and welcome any insights, although this is not necessary. Please note that since we do not sell the design or storage containers themselves, we do not provide any support or guarantees.
In the following series: in a few weeks we’ll talk about iPhone vibration sensors, Swiss cheese-like container designs, why electricity costs more traffic, and more about the design of a large cloud storage.
Acknowledgments. We stood on the shoulders of giants.
The design of a Backblaze storage container would not have been possible without the sheer amount of help (usually requested bluntly) by the incredibly smart and generous people who answered our questions, worked with us and gave key clarifications at critical moments. Firstly, we thank Chris Robertson for the inspiration to build our own repository and for his early work on prototypes; Kurt Schaefer for advice on metal processing and the concept of "furniture" for printed circuit boards; Dominic Giampaolo of Apple Computer for his advice on hard drives, vibration, and certification; Stuart Cheshire of Apple Computer and Nick Tingle of Alcatel-Lucent for tips on low-level networks; Aaron Emigh (EVP & GM, Core Technology) at Six Apart for his help with the initial design; Gary Orenstein for clarifying the reliability of drives and the storage industry in general; Jonathan Beck for invaluable advice on vibrations, fans, cooling and case design; Silicon Image's Steve Smith (Senior Design Manager), Imran Pasha (Director of Software Engineering), and Alex Chervet (Director of Strategic Marketing), who helped us debug problems with the SATA protocol and loaned 10 different SATA boards for tests; James Lee of Chyang Fun Industries in Taiwan for developing SATA boards to simplify our design; Western Digital Wes Slimick, Richard Crockett, Don Shields, and Robert Knowles for their help debugging Western Digital drive logs; Protocase Christa Carey, Jennifer Hurd, and Shirley Evely for offering hundreds of small enhancements to the 3-D case design; Chester Yeung of Central Computer for delivering locally supplied parts quickly and continuously when it really mattered; Mason Lee from Zippy for tips on power supplies and special cables; as well as Angela Lai for knowing all the right people and representing them appropriately.
Finally, we thank the thousands of engineers who spent millions of hours working for free to get container components that are either cheap or completely free, such as an Intel processor, Gigabit Ethernet, amazingly dense hard drives, Linux, Tomcat, JFS, etc. We are aware that we stand on the shoulders of giants.
Appendix A. A detailed list of Backblaze storage container components.
From the translator: Appendix A is designed as a neat table, which I cannot reproduce using Habr’s tools. In addition, to translate into Russian “760 Watt Power Supply”, “Qty”, “Price”, “Total” and “SATA II Cable” - this, in my opinion, is already too much. Therefore, please see the original application at the end of the original English posting.