"Soft + box server" or a complete solution?

Original author: Chad Sakac
  • Transfer
Chad, what do you want to say with this heading?

Last week, I had a discussion with Japanese partners on software-defined storage. We discussed what EMC is doing in this direction, and also shared their thoughts on what partners should do. Interestingly, they were fully focused on economic models of using the software + general purpose server bundle. It seemed that they even found differences between these models from others where I did not see any differences.

And the week before - when I was in Australia - I had a lot of conversations with clients on Hadoop deployment scenarios. In particular, it was about when it makes sense to use Isilon for this .. All clients thought the same way: take the distribution kit and install it on the boxed server. At first, they could not accept the idea that a solution based on Isilon could be better, more productive, and cheaper. But still they came to this.

... And that same week - at the VMUG conference in Sydney - I had interesting discussions about VSAN , ScaleIO and Nutanix and Solidfire software and hardware systems in comparison with purely software solutions.

The main catch in all these conversations was the same: it’s hard for people to accept one VERY, VERY simple thing:

Storage systems (for the most part) are built according to the “software + box server” scheme. They are simply PACKED and SALE in the form of hardware and software systems.

I note: “for the most part” means that there are categories and architectures that include either unique equipment or standard equipment that is so specific (for example, connecting XtremIO or VMAX3 nodes via InfiniBand) that separating software from hard does not make sense. (By the way, according to my classification , these are the architectures of the “Second type” - “horizontally scalable clusters with close connections”).

People are fundamentally illogical ... KHAN !!! (Glory to you Leonard Nimoy - live long and prosper in our hearts!)

Ah, people. We are stuck on visualizations, fixated on the physical concept of things. It is difficult for us to think about the architecture of systems in terms of how they FUNCTION, and not in terms of how they PACK.

Let me illustrate here - to show what I generally mean! Keep reading - and you will understand the reality!

Suppose I need to deploy a Hadoop cluster. Petabyte at 5. For this, I have to think through complex communications, computing infrastructure and staging (staging - uploading data to a Hadoop cluster from the environment where the data was generated or stored. Note translation)

If I were in the majority place, I would take would be a distribution, as well as servers and network infrastructure components. I would probably start with a small cluster, and then build it up. I would probably use rack mounted servers. And for a rough estimate of the cluster size, I would use standard ratios for the number of disks per server. And I see that many do so.

Based on this scenario, then tell someone “ you know ... if you: 1) virtualized Hadoop nodes using vSphere and Big Data Extensions - not for consolidation, but for better manageability; and 2) use Cisco UCS instead of rack servers; and 3) instead of pushing packs of disks into rack servers, we used EMC Isilon - if you did these three things, the solution would be two times faster, two times more compact and two times cheaper in terms of total cost of ownership ”- well ... if someone told you this, you would have decided that he was drunk to smithereens.

But it turns out that the above statement is true .

Of course, not for everyone. However, it is true for an interesting case, parsed by reference. This is the case of a real client whose Hadoop cluster has grown to 5 petabytes. The link results are the results of their own testing (thanks to Dan Beresl and Chris Birdwell for sharing). The client in question is a huge telecommunications company. Detailed test results are available to you, including performance testing.


And now an interesting observation: it’s hard for many to think of Isilon as an HDFS storage (HDFS is a distributed file system as part of Hadoop - approx. Transl.) , Because it “looks” like a hardware complex, and not like standard general-purpose servers carrying I need the necessary software .

Often, customers starting with Hadoop think, “I need a standard hardware platform.”

However, DO NOT judge a book by its cover. "A rose smells like a rose, even though you call it a rose, at least not."

Take a look at these pictures: When you look at them, you probably think: “Standard industrial servers” . ... And now let me show their front side:




When you look at them, you probably think: "Hardware complex . "

But in both cases you are looking at the same thing .

Software called OneFS is what gives Isilon its power. The supply of Isilon in the form of a hardware-software complex is a consequence of the customers' desire to buy (and maintain) it in this way.

In the case of HDFS, which was examined by reference, the virtualized configuration based on UCS and Isilon became so fast (2 times faster!), Compact (2 times more compact!) And economical (2 times more economical!)Due to Isilon software functions:

  1. Ensuring high data availability both within the same rack and within the array of racks, and without triple data duplication. This is SOFT
  2. Refusal from staging, because the same data is available both via NFS and through HDFS. This is also SOFT
  3. Rich opportunities for creating snapshots and replicas of HDFS objects. And this is also SOFT

... in the case of the telecom client, there was no magic on the part of the hardware (although there was some benefit from using UCS. It came from a denser packing of computing power. The rejection of local drives allowed us to replace the rack servers with blades).

There is a funny moment in this whole story. Tell our customers: “ Here you have a software implementation of HDFS with all the above properties, put it on your hardware ” - customers would be more willing to use ONEFS. They would not have to deal with the difficulties that local drives carry. Scaling HDFS would be less painful for them.

So why aren't we doing this? The answer is shown in the diagram below:


I constantly have this strange dialogue with clients. They start with the fact that “I do not want a hardware complex” (yeah, then “Here you have ScaleIO / VSAN and Isilon in the form of software”), and then they come to the conclusion that they ask for a hardware complex :-) This is as predictable as that the sun will rise in the east. Due to such dialogues, Solidfire recently announced the release of a software version of their product (I bet that sales will be low compared to the sales of their complex), and Nutanix even started with purely software solutions (and ended with complex ones). For the same reasons, for the successful launch of VSAN, special “Nodes under VSAN” were needed ( VSAN Ready Nodes - approx.) And I suspect that very substantial VSAN consumption will soon go through integrated solutions like VSPEX Blue. Of course, this use case will be more popular than the “install VSAN on your hardware” option. The same is true for ScaleIO.


The answer is as follows:

  1. Many do not understand that the economy of software and hardware solutions is determined by the benefit for the client and his business models, and not the cost of the hardware. People come to this when they begin to build complete solutions with their own hands. Try building a Nexenta cluster yourself with high availability and fault tolerance. The price will be the same as the price of VNX in the NetApp FAS configuration. Try deploying Gluster with Redhat Enterprise support. Pay as much as for Isilon. Try building a Ceph cluster (again with enterprise support). Get the price of ECS. Conclusion: iron has nothing to do with it (at least not in the case of architectures of the “third” and “fourth” types) ( See above the link to the classification of architectures from Chad Sakash - approx. Per. )
  2. Even the largest companies usually do not have the “bare metal in the form of service" internal function. They hold a team that is responsible for developing the standards of boxed equipment and options for its support, as well as for creating an abstract model of iron, following which you can deploy all kinds of software. Google, Amazon, and Facebook have a “bare metal in the form of a service” function, most of the others do not. Conclusion: support models gravitate to models of integrated and convergent solutions.

However, do not get me wrong. The idea of ​​non-hardware software is still important.

Such software can be used in many different ways. If the software exists separately from the iron, then it can be obtained without "extra weight" in order to learn, try, play, use in work - but without support. In any case, someone will have to pay for support (regardless of whether it is open or closed source).

For all these reasons and many others, all EMC software will become gradually available “without add-ons,” and in some cases, possibly even open source.

By the way, all of the above does not mean that there is no room for innovation in the world of iron. Here is the latest Isilon HD node, with and without decorative panels:


It is easy to understand why he is code-named "Colossus." The drives are mounted vertically, as are the casings for their tight packaging: Viking @ 120 x 2.5 ”SSD / HDD and Voyager @ 60 x 3.5” HDD. Each Isilon node also needs computing power. They are housed in the same 4U case. As a result, we have up to 376TB in one case, which is very tight and very cool. This node is designed for hyper-capacious configurations and for archival use.

Just take a look at all this! Software-defined storages are amazing. And the fact that they can be deployed in many different ways up to your own iron is simply wonderful. At the same time, I think that in the foreseeable future, the most preferable models for using software-defined storage will remain complex solutions and a converged / hyper-converged infrastructure.

Your thoughts?

(About Isilon and HDFS also read on our blog: Recipe for "Fast Data" based on a solution for big data - approx. Per.)

Also popular now: