Anarchy and data storage: is there a future for SAN
- Transfer
Translator's note: Not so long ago, ZDnet expert Robin Harris published material on “ why SSDs are out of date ” (an adapted version will be released soon on our blog). Another expert of the publication, Jason Perlow, decided to build on this article and discuss the minuses and prospects of network storage systems (SAN and NAS) - he believes that the future will be with the approach of cloud providers and the use of JBOD.
The material presented below contains a significant number of technical terms, the translation of which may cause inaccuracies. If you notice a typo, mistake or inaccuracy in the translation, write to us and we will quickly fix it.

It is important to understand that the SAS and SATA interfaces currently in use are nothing more than the evolved SCSI and ATA technologies that have been developing for decades. Since then, server technology has gone far ahead, and the need for data processing applications has increased significantly.
Obviously, industry representatives should consider further upgrading existing storage technologies. However, the task of squeezing more IOPS from SSDs is more relevant to science, from a practical point of view, the issue of the total cost of ownership of data storage is much more important .
In large companies, the issue of organizing data storage is becoming increasingly acute - especially in regulated industries whose participants are required to store data for a long time in accordance with legislative standards. In addition, this information should be easily accessible - this complicates the use of archiving technologies.
To put it bluntly, SAN and NAS cost a ton of money, and it takes up to a third of all hardware costs in the company's data centers. Yes, the manufacturers of such products have earned an excellent reputation, so their products are used to store the most critical data, but let's be honest - there is no magic in these boxes filled with disks the size of a refrigerator.
Inside all the same SATA, SAS, various interfaces, controllers and special software, which is responsible for creating logical device numbers (LUNs). Most of these controllers run on proprietary versions of UNIX or even BSD variants — the user will never see any of this. For him, a SAN or NAS after creating a LUN is a real black box.
This costly insanity can be put to an end, but this requires a non-standard and creative approach by representatives of large businesses - this means that they should treat the storage in the same way as providers of scalable services (hyperscale) do. When we talk about scaling, we mean Amazon Web Services, Microsoft Azure, or Google Computer Engine.
Do you think that these companies managed to create cloud storage systems exclusively using EMC and NetApp hardware?
Of course not, it's just not possible. Instead of SAN and NAS, such companies use JBOD - arrays of " just a Bunch of Disks". They managed to create a cheaper infrastructure with the help of accessible and affordable equipment, JBOD and experienced engineers.
There are no standards for building such infrastructures, moreover, this process is complicated by the fact that historically there is an opinion about the need to use optical fiber to create clustered storage - and this is still very expensive.
However, thanks to 10Gigabit and 40Gigabit Ethernet networks, RDMA Ethernet cardsand the advent of the SMB 3.0 network access protocol, the situation is changing rapidly.
The concept is quite simple - an organization simply connects many “heads” of file servers to an existing switched Ethernet infrastructure, and uses many JBODs (for example, DataOn, Dell or Supermicro), composed of SAS 15K and SSD, in tiered configuration and connected with these "heads" in the SAS cluster.

In turn, these anchor file servers are connected to virtualized or physical systems that provide access to data using SMB 3.0. The elasticity of such a system depends on the OS that manages the storage, and not on some secret software built into the controllers, as is done in SAN and NAS.
The script described in the image above uses the Microsoft Scale-out File Servers (SoFS) that come with the built-in Windows Server 2012 R2 and use the Storage Spaces component to work. The hardware used here is the DataOn DNS-1660D in combination with Dell R820 rack servers and Mellanox RDMA cards.
The configuration described above is capable of achieving a constant speed of more than 1 million IOPS per second. Dell has published
a paper on building JBOD SoFS arrays using the MD1220 PowerVault . In general, any combination of JBOD, the common x86 architecture hardware using SAS and 10 Gb / s Ethernet connection will work.
In addition to Microsoft, there are other vendors involved in building architectures based on JBOD - for example, Nexenta (based on ZFS from Solaris), for Linux there are HA-LVM and GFS / GFS2, which include the Red Hat Resilient Storage component . The equivalent for Ubuntu Server is called ClusterStack .
The conclusion here is that despite the reliability and validation of SAN and NAS solutions, their hegemony in ensuring the highest performance and “resiliency” of storage is coming to an end, and other tools will be widely distributed soon.
Company executives who want to save on storage of an ever-increasing amount of data in the near future may resort to the method that cloud providers use - using JBOD and software defined storage ( about this, by the way, there was an article on Habré ) built into modern server operating systems, and the use of cloud-integrated storage ( CIS ) for applications that are allowed to store backups in the cloud.
The material presented below contains a significant number of technical terms, the translation of which may cause inaccuracies. If you notice a typo, mistake or inaccuracy in the translation, write to us and we will quickly fix it.

It is important to understand that the SAS and SATA interfaces currently in use are nothing more than the evolved SCSI and ATA technologies that have been developing for decades. Since then, server technology has gone far ahead, and the need for data processing applications has increased significantly.
Obviously, industry representatives should consider further upgrading existing storage technologies. However, the task of squeezing more IOPS from SSDs is more relevant to science, from a practical point of view, the issue of the total cost of ownership of data storage is much more important .
In large companies, the issue of organizing data storage is becoming increasingly acute - especially in regulated industries whose participants are required to store data for a long time in accordance with legislative standards. In addition, this information should be easily accessible - this complicates the use of archiving technologies.
To put it bluntly, SAN and NAS cost a ton of money, and it takes up to a third of all hardware costs in the company's data centers. Yes, the manufacturers of such products have earned an excellent reputation, so their products are used to store the most critical data, but let's be honest - there is no magic in these boxes filled with disks the size of a refrigerator.
Inside all the same SATA, SAS, various interfaces, controllers and special software, which is responsible for creating logical device numbers (LUNs). Most of these controllers run on proprietary versions of UNIX or even BSD variants — the user will never see any of this. For him, a SAN or NAS after creating a LUN is a real black box.
This costly insanity can be put to an end, but this requires a non-standard and creative approach by representatives of large businesses - this means that they should treat the storage in the same way as providers of scalable services (hyperscale) do. When we talk about scaling, we mean Amazon Web Services, Microsoft Azure, or Google Computer Engine.
Do you think that these companies managed to create cloud storage systems exclusively using EMC and NetApp hardware?
Of course not, it's just not possible. Instead of SAN and NAS, such companies use JBOD - arrays of " just a Bunch of Disks". They managed to create a cheaper infrastructure with the help of accessible and affordable equipment, JBOD and experienced engineers.
There are no standards for building such infrastructures, moreover, this process is complicated by the fact that historically there is an opinion about the need to use optical fiber to create clustered storage - and this is still very expensive.
However, thanks to 10Gigabit and 40Gigabit Ethernet networks, RDMA Ethernet cardsand the advent of the SMB 3.0 network access protocol, the situation is changing rapidly.
The concept is quite simple - an organization simply connects many “heads” of file servers to an existing switched Ethernet infrastructure, and uses many JBODs (for example, DataOn, Dell or Supermicro), composed of SAS 15K and SSD, in tiered configuration and connected with these "heads" in the SAS cluster.

In turn, these anchor file servers are connected to virtualized or physical systems that provide access to data using SMB 3.0. The elasticity of such a system depends on the OS that manages the storage, and not on some secret software built into the controllers, as is done in SAN and NAS.
The script described in the image above uses the Microsoft Scale-out File Servers (SoFS) that come with the built-in Windows Server 2012 R2 and use the Storage Spaces component to work. The hardware used here is the DataOn DNS-1660D in combination with Dell R820 rack servers and Mellanox RDMA cards.
The configuration described above is capable of achieving a constant speed of more than 1 million IOPS per second. Dell has published
a paper on building JBOD SoFS arrays using the MD1220 PowerVault . In general, any combination of JBOD, the common x86 architecture hardware using SAS and 10 Gb / s Ethernet connection will work.
In addition to Microsoft, there are other vendors involved in building architectures based on JBOD - for example, Nexenta (based on ZFS from Solaris), for Linux there are HA-LVM and GFS / GFS2, which include the Red Hat Resilient Storage component . The equivalent for Ubuntu Server is called ClusterStack .
The conclusion here is that despite the reliability and validation of SAN and NAS solutions, their hegemony in ensuring the highest performance and “resiliency” of storage is coming to an end, and other tools will be widely distributed soon.
Company executives who want to save on storage of an ever-increasing amount of data in the near future may resort to the method that cloud providers use - using JBOD and software defined storage ( about this, by the way, there was an article on Habré ) built into modern server operating systems, and the use of cloud-integrated storage ( CIS ) for applications that are allowed to store backups in the cloud.