BoVados January 14, 2014 at 10:09

How to make an excellent piece of hardware - an overview of the new EMC VNX storage systems

EMC storage systems are like a good German car. You know that you overpay a little for the brand, but data security and responsiveness are ensured. And service: premium guarantee, for example, delivery of spare parts within 4 hours with the departure of an engineer if necessary.

Relatively recently, a new line of equipment has been released, the flagship of which allows you to put up to 1,500 disks - a total of 6 PB . Below is her review with a brief educational program about storage in principle. And the story of how really good things did even better.

Right away

I know that immediately ask for the cost. She's tall. You need to pay at least thirty thousand dollars. For this money they give a completely fault-tolerant configuration with powerful functionality, three-year support and updated software, which will be discussed below. That is, the system is far from home and not even for small businesses.

The most important thing that you get for this money: the frightening words DU / DL will cease to appear in reports. And this is the worst combination for service engineers and customers. Reducing the risks of downtime or data loss is what the CIO is prepared to fight by any means.

Why do you need hybrid storage?

First, a little history. Multilevel storage is a combination of fast and cheap, but capacious disks in one data array. There was the possibility of such a combination even when we only dreamed about flash-disks because of their space cost. So, there is a myth that this chip was invented to improve performance. And no! Initially, such storage was used to save money: we put data that is not used on slow and cheap disks. So you need less fast ones - only for "hot" data.

Then SSDs entered our life tightly. It was logical to add a little seasoning from the SSD to our “soup” of fast and slow disks. The fastest and “hottest” data, only 5-10 percent of the total volume, now fell on them. Updated VNX storage systems are the third generation of hybrid arrays in which, in my opinion, they have excelled.

So, on the one hand, we have solutions entirely consisting of computing and flash-modules. They work with hot data at a speed rather close to DRAM than to classic HDDs, but at the same time, such systems are very expensive when recalculating the cost of a gigabyte of stored data. As a rule, they are used where it is necessary to constantly "grind" 10-20 terabytes with high speed. An example is highly loaded DBMS, VDI. On the other hand, systems on classical mechanical disks do not allow working with data so quickly, but make it possible to store a lot of them (the cost per stored gigabyte can be an order of magnitude lower).

However, in most cases, you need to have a large amount of storage at the same time, and quickly provide access to "hot" data. An example is typical tasks for storage when you need to store some kind of DBMS at the same time as volumes for virtualization, mail, archives, etc. ... In these cases, either Flash-SHD + classic SHD bundles or mixed (hybrid) type storage are used. Today we are talking just about the latter - supporting both flash drives and classic drives in the same system.

So, the previous generation system was already very good. She herself was able to work with flash-modules and was able to do everything that is needed to work with highly loaded applications. In general, it might seem that to do something better is quite difficult, and a further increase in productivity is possible only due to iron. But no, the developers were able, having made serious changes to the software part of the platform, to achieve simply a huge gain in performance.

New generation

These are the things.

First, the new EMC is designed for large volumes . The flagship allows you to put up to 1500 drives. By the way, one of our clients can use these 6 PBs, since there is a lot of data there, but in practice there have not been more than 2BB implementations.

Everything works faster. With a larger cache, X-BLADE (file servers) are now more optimized. Yes, yes, we do not forget that access to this storage can be both block (FC, iSCSI, FCoE), and file (CIFS, NFS).

Added new block deduplication. About her and other software details below.

The hardware component has been updated. Block controllers are now able to work with a full SSD configuration and support up to 32 cores.

Block Controller Architecture

The main differences from the previous line (more below)

MCx multi-core architecture (R5.33): 4x faster performance than previous generation, multi-core cache and FAST cache management, multi-threaded RAID management
FAST VP Update
Synchronous Active / Active LUN mode
Virtual Recoverpoint (vRPA)
Support for HyperV ODX and SMB 3.0

Iron

A new controller hardware has appeared - everything except the flagship uses the new DPE platform. Of the major changes is another backup power module. Fault-tolerant power batteries are built into the controller for all models except the flagship. The older model still has batteries, but now there are not 2, but 4 pieces, and they are now replaced by an order of magnitude more convenient.

The layout of the controllers and shelves is still the same as before (VNX8000 is an exception):

Take a look at drives:

Now flash drives are divided into two categories: ultrafast (eMLC) is used for cache and tiered storage, normal (just MLC) is only for tiered storage.

Soft

The basic functionality of the software “out of the box” is now also greater, but this, in general, was expected.

Interestingly, new software features have appeared. For example, I was very pleased with the new approach to the allocation of processor resources. Previously, a process received a certain number of cores and worked only on them. That is, with unequal downloads of certain processes (frequent enough for real systems) there was an uneven load on the cores. Somewhere, the kernel could be idle, and somewhere, the load was close to 100%. The new architecture has taken a step toward greater virtualization - dynamic resource sharing and multi-threaded processing.

Completely redesigned cache.Previously, the memory was divided into 2 areas: write cache and read cache. These two buffers did not intersect. Accordingly, one did the optimization of writing data to disk with optimal blocks, and the second predicted the reading of the following blocks. So, on the new arrays, the cache is completely reorganized, the arrays now use the dynamic cache, which is responsible for both tasks simultaneously without dividing the memory into different areas. This approach not only allowed more efficient use of memory space, but also dramatically increased productivity. On typical loads, according to the manufacturer, this very magically increases the cache performance up to 5 times.

Redesigned Fast Cache- These are flash disks used as an additional cache. He is now given more processor resources due to the dynamic allocation of resources. Plus, Fast Cache now provides faster eMLC disks. As a result, developers were able to use more aggressive algorithms, which makes warming up the cache much faster. At the same time, the engineers, along with the developers of the classic cache algorithms, went over the drivers and rewrote them under the modular principle, which also greatly increased productivity and significantly reduced the response time of an individual call.

Fully Automated Storage Tiering Virtual Pool (FAST VP).This is exactly the same mechanism for combining various disks into a single "pool". It was also significantly redesigned and more closely integrated with the FAST cache. By the way, the Fast VP granularity is reduced by 4 times and now the data is moved in 256 MB pieces.

Here, probably, it will be relevant to return to the beginning, where I said that mixing disks does not increase productivity. Now that we have the opportunity to use SSDs for storage and as a cache, we can talk about speed and the efficiency of data storage and processing.

New block deduplication. The principle is as old as the world: data beat on blocks, identical blocks are stored once. For duplicate blocks, links to the same blocks are used as in archives. Now all this is very much “ground” for virtual environments.

On new storage systems, deduplication is scheduled. For example, applications write data to the pool without deduplication during the working period. Then, at the time of reducing the load, the storage engine starts analyzing the recorded data and compresses it by removing duplicates. All this is in great agreement with the fast cache (since the load on deduplicated blocks will increase, and their number will naturally decrease, as a result, they will switch to fast cache). In practice, this means that subjectively more data will fit in the fast cache. This greatly speeds up the work of the array. The two most typical types of load benefit the most from the new mechanism:

The same type of virtual environment, such as VDI.
Multiple applications for a single dataset.

The estimated savings for such loads will be: The

new principle of working with disks.EMC also changed the principle of working with disks. Previously, the disk address was tied to its physical location in the shelf; in the new generation, the system remembers disks by their unique identifiers, which makes it possible not to bind more to the location of disks in the system. It seems to be an insignificant change, however, how much now it will be easier to move the array to another site! After all, now you can not worry about the location of the disks in the same order as before. And the problem of an unbalanced load on the back-end can be solved simply by orders of magnitude easier and faster: rearranged disks in turn - and you're done, you don’t even need to extinguish the database. And now, if necessary, part of the data can simply be physically pulled out of the array and temporarily taken to another site, such a kind of hot backup is obtained, I think you’ll figure out why you might need it.

The added Permanent Sparing technology has proven itself on Hi-end systems. In addition, now there is no need to allocate special disks for hot swapping, the array will use by default the most suitable unused disk.

There are important changes in the principle of controller access to disks.Initially, in the mid-range systems, one controller worked with disks, and the access paths of other controllers were used only during failures. Upon failure, the volume moved to another controller. This meant that if some array requires full system resources, it can theoretically receive no more than 50% of the total system resources - from one controller. This problem was partially resolved in the last 2 lines: the ALUA mechanism appeared. The essence of the mechanism: the transfer of host requests to the controller the owner of the volume through the cache synchronization interface. Unfortunately, the throughput of this interface is far from unlimited, and accessing the controller to the owner of the volume is still faster. Ways began to be divided into optimal and non-optimal instead of active and passive. Roughly speaking, in the ideal case - up to 75% of the array capacity on that. In the new generation, developers have gone further. Without changing the hardware structure (!) They changed the logic of the array to Active-Active. Now both controllers simultaneously write to different areas simultaneously, simply blocking their area for the duration of recording.

In practice, if we use an array for one task, you can use a cheaper lower-end storage model. Yes, and load balancing on the paths becomes much easier. Plus, before, when moving the LUN under the control of another controller, there were serious delays in processing external I / O requests, and now there will not be either one or the other.

Updated software for the array. Externally, the interface for working with an array has not changed much. In my subjective assessment, this was and remains (fortunately) the most convenient interface for storage - 99% of the tasks are done with a couple of clicks right in the web interface. Exotic, of course, requires a console.

The basic kit now comes with a very useful new tool for long-term monitoring. You can see reports on the file and block parts - a bunch of metrics. Allows you to easily find bottlenecks. Another tool based on these metrics allows you to plan the expansion of the infrastructure - so that the administrator does not consider with his hands what and how much is required next year.

Another very important thing for virtual environments.Arrays still support tight integration with virtualization systems. If anyone does not know, VMware is a division of EMC (rather independent, but division). Hence the joint work of VMware programmers and storage developers. The result is a seamless, seamless and extremely deep integration. Virtual machines are integrated into the storage interface (monitoring of machine resources at the storage level), typical copying and cloning loads can be transferred to the array itself, typical storage administration tasks can be done from the VMware infrastructure interface.

Oddly enough, but there are similar advantages for Hyper-V (since EMC has been working closely with MS for a very long time - the developers have worked together historically). Almost the same level of depth of integration: general management, unloading of infrastructure. There are special chips for MS - for example, Branch Caсhe relieves network infrastructure with weak communication channels. Yes! EMC VNX is the first storage system to support SMB 3.0.

That's all for now. Questions can be asked in the comments or by e-mail VBolotnov@croc.ru . In terms of cost for specific tasks, I can orient rather quickly with specific calculations - if necessary, ask, not a problem.

Yes. We already twisted our VNX5400 demo system in our hands and sent it for testing to one of the customers who plans to update their storage systems this year. Soon she will be back, so if anyone is interested to feel the real system on their own, you can come to our office and look at the EMC Solution Center. Write, agree on a date.

UPD : If anything, we have a program to replace tape and disk libraries with the new EMC.

Tags: