Choosing Equipment for Enterprise Cloud Storage
Data is the foundation of any business. If their storage location is insufficiently reliable or unable to provide continuous access, then virtually all of the enterprise’s activities will be at risk.
Of course, it is possible and necessary to ensure the safety and accessibility of information by the correct choice of server software and a competent configuration. But no less important is iron - equipment that stores and processes data. If it does not meet the needs of the company, then no software will make it sufficiently reliable and fault-tolerant.
In this article, we will consider one of the approaches to choosing iron for creating enterprise cloud storage.
Why is the cloud?
Cloud infrastructure has several advantages:
- The ability to quickly scale . The increase in storage capacity and computing power is achieved by quickly connecting additional servers and storage. This is especially true for companies for which the cloud load is assumed to be irregular.
- Cost reduction . The cloud allows you to create a single center in which all computing processes will be performed, while to increase disk space it will be sufficient to simply purchase drives, without the need to organize the installation of new servers.
- Simplification of business processes . Cloud storage, unlike local storage, implies the possibility of constant access to it. This means that you can work with files at any time of the day from anywhere. Employees will be able to receive the information necessary for work easier and faster, it will be possible to organize remote jobs.
- Increased fault tolerance . It is quite obvious that if the data is stored on several servers, then their safety in case of technical problems will be higher than if they were on only one machine.
There are many public cloud services on the market now. For many small and medium-sized companies, they really become a good choice, especially when it comes to services with payment only for the resources used or testing the service. However, its cloud storage also provides several advantages. It will come in handy if:
- The company's activities impose restrictions on the location of servers . Russian state. institutions, as well as organizations involved in the processing of personal data, are required by law to store all their information on the territory of the Russian Federation. Accordingly, it is not possible for them to rent foreign servers, and in general it is very undesirable to trust sensitive information to contractors. Creating a private storage will help you take full advantage of the cloud without breaking the law.
- Full management of security policies is required . It is impossible to know exactly how data protection works in Microsoft or Amazon services. To completely secure the information as you think it is necessary, you can only in your own cloud.
- Customization of equipment for yourself . When renting, you have to work with what the supplier provides. However, having your own servers at your disposal, you can configure them to solve specific problems just for your business, and also use the software that your IT team perfectly knows.
When purchasing equipment for cloud storage, the question often arises: rent a car or buy your own. We have already figured out in what cases our own server is indispensable. However, you can organize a cloud on third-party services. This is especially true for small companies, because it is not always possible to allocate the necessary amount from the budget for the purchase of servers, it is not always possible to create your own server room, and there is no need to spend money on machine maintenance.
But if you still decide to take your own servers for cloud storage, then you should keep in mind that changing the equipment will be problematic and costly. It’s okay if the capacity is not enough: you can always add another machine to the cluster. But if the performance turns out to be excessive, then nothing can be done with this. Therefore, the choice is to do the following.
Clearly define the purpose for which the server will be used . In our case, this is file storage. Accordingly, the most interesting are drives. What you should pay attention to when choosing them:
Capacity. It depends on how many employees will use the storage, what types of files they will upload to the server, how much information is already waiting to be transferred to the cloud, and how much the volume of work data is increasing annually.
On average, working with text files, presentations, PDFs and a small number of images requires an average of 10-15 GB per employee. To work with large volumes of high-quality pictures and photographs, you need to increase at least 50-100 GB, or even more. The needs of video and audio processing personnel can reach several terabytes per person. In some cases, for example, when using large corporate software packages with support for versioned projects, we can talk about 10 terabytes per cloud user. Do not forget to consider the capacity for backup files and unforeseen needs of the company.
Regarding RAID controllers, then for an enterprise cloud it is better not to use on-board solutions. Their performance may not be enough to serve a large number of requests with a satisfactory speed. So it is better to choose discrete models from the lower and middle price ranges.
It is also necessary to determine the processing power . If you are creating cloud storage based on several servers, it is recommended to select identical or very close configurations. This makes managing load balancing somewhat easier. In general, it is better not to bet on one powerful machine, equipping it with an expensive processor and RAM, but to buy 2-3 cars cheaper. Why?
If your storage will only receive and give static files, without the possibility of their launch, then the processor power is not too important. Therefore, it is better not to chase the number of cores and choose a model with a good "tact". Of the inexpensive options, Intel Xeon processor E56XX series with 4 cores are not bad, and Intel Core i5 machines can be recommended from more expensive models.
Examples of server models
If you prefer not to build the server yourself, but to immediately get ready-to-use equipment, then pay attention to several suitable models for creating file storage.
Dell PowerEdge T110 . The server is equipped with an Intel Core i3 2120 processor with only two cores, but each of them has a good clock speed of 3.3 GHz, which is more important for our cloud. The initial configuration of RAM is not very large - 4 GB, but can be expanded to 32 GB. The server comes in two trim levels - without a pre-installed hard drive or with a 1TB HDD-drive and SATA interface.
Lenovo ThinkServer RS140. It has a powerful Intel Xeon E3 processor with four cores of 3.3 GHz each. RAM “out of the box” - 4 GB, plus four more slots for its expansion. Also included are two 1 TB hard drives with a SATA interface.
HP ProLiant ML10 Gen9 . In many ways, it is similar to the model described above - all the same Intel Xeon E3 and two terabyte HDDs. The main difference in RAM - the HP server has two plates of 4 GB each.
Is there enough space to store files?
Storage capacity is the cornerstone of the file server. After assessing the volume of stored data and growth dynamics, after six months you can come to the unpleasant conclusion that you made a mistake with the forecast and the data grows faster than planned.
In the case of storage virtualization, you can almost always (with a reasonable approach to storage planning) expand the disk subsystem of the virtual machine or increase the LUN. In the case of a physical file server and local disks, your capabilities will be much more modest. Even though there are free slots in the server for additional disks, you may run into the problem of selecting drives suitable for your RAID array.
But before solving the problem in an extensive way, one should recall the technical means that help to combat the lack of space.
NTFS Disk Quotas
One of the oldest and most reliable user restriction mechanisms is file system quotas.
By enabling quotas for a volume, you can limit the amount of files that each user saves. Files that are owned by the user at the NTFS level fall into the user's quota. The main drawback of the mechanism is that, firstly, it is not so easy to determine which files belong to a particular user, and secondly, files created by administrators will not be included in the quota. The quota mechanism is rarely used in practice, it was almost completely replaced by File Server Resource Manager, which first appeared in Windows Server 2003 R2.
File Server Resource Manager
This component of Windows Server will allow quoting disk space at the specific folder level. If you allocate personal home directories on file servers for employees, as well as dedicated folders for shared department documents, then FSRM is the best choice.
Of course, quotas alone will not increase file storage. But users, as a rule, are loyal to a fair (equal) division of resources, and in case of a shortage, they are ready to overcome small bureaucratic procedures to expand disk space.
Quotas will also help against server overload in case a user accidentally places a large amount of information. At the very least, this will not affect other employees or departments.
In addition, FSRM includes a mechanism for "screening" (filtering) files that can be stored on the server. If you are sure that mp3- and avi--files do not have a place on the file server, then you can prevent them from being saved using FSRM.
Regular files lend themselves well to NTFS compression, and given the performance of modern processors, the server has enough resources for this operation. If there is not enough space, you can safely include it for the volume or individual folders. For example, in Windows Server 2012, a more advanced mechanism appeared, using which NTFS-compression on file servers for most scenarios is a thing of the past.
Windows Server 2012 includes the ability to deduplicate data located on an NTFS volume. This is a fairly advanced and flexible mechanism that combines both deduplication with a variable long block and effective compression of stored blocks. At the same time, for different types of data, the mechanism can use different compression algorithms, and if compression is not effective, do not use it. Such subtleties are not available in traditional NTFS compression.
In addition, deduplication does not optimize files that users have worked with for the past 30 days (this interval can be configured) in order not to slow down the speed of working with dynamically changing data.
You can evaluate the potential increase in free space using the ddpeval utility. On a typical file server, savings are between 30-50%.
File server performance
As we noted earlier, a file server is not the most resource-demanding service, but still, you should reasonably approach the configuration of disk and network subsystems.
Linear read or write speeds are not critical for a file server. Any modern hard drive has high characteristics of linear read / write speed, but they are important only in cases where the user copies a large file to his local drive, or, conversely, places it on the server.
If you look at the statistics of Perfmon, the average read / write speed for 150-200 users is quite low and is only a few megabytes per second. Peak values are more interesting. But it should be borne in mind that these peaks are also limited by the network interface speed, and for a regular server it is 1 Gbit / s (i.e. 100 MB exchange with the disk subsystem).
In normal operation, access to files is non-linear, arbitrary blocks are read and written from the disk, therefore, disk performance in random access operations, that is, maximum IOPS, is more critical.
For 150-200 employees, the indicators are quite modest - 10-20 operations of input output per second with a disk queue within 1-2.
Any array of standard SATA drives will satisfy these requirements.
For 500-1000 active users, the number of operations jumps to 250-300, and the disk queue reaches 5-10. When the queue reaches this value, users can notice that the file server is "slowing down".
In practice, to achieve 300 IOPS performance, you already need an array of at least 3-4 typical SATA drives.
In this case, one should take into account not only “raw performance”, but also the delay introduced by the operation of the RAID controller - the so-called RAID penalty. This topic is clearly explained in the article https://habrahabr.ru/post/164325/ .
To determine the required number of disks, we use the formula:
Total number of Disks required = ((Total Read IOPS + (Total Write IOPS*RAID Penalty))/Disk Speed IOPS)
RAID-5 with a write penalty of 4 operations, a read profile of 50%, a write of 50%, a disk speed of 75 IOPS, a target performance of 300 IOPS:
(300*0,5 + (300*0,5*4))/75 = 10 дисков.
If you have a lot of active users, then you will need a capacious server or more productive disks, such as SAS with a rotation speed of 10,000 RPM.
Network Interface Speed
A low network interface speed is one of the reasons for delays when working with a file server. In 2016, a server with a 100 Mbps network card is nonsense.
A typical server is equipped with a network card with a speed of 1 Gbit / s, but this also limits the disk exchange speed of about 100 Mb / s. If the server has several network cards, then you can combine them (aggregate) in one logical interface to increase both the performance and availability of the cloud. The good news is that for a file server (“many clients access the same server”), aggregation works well.
Owners of HP servers can use the proprietary utility HP Network Configuration Utility
If you are using Windows Server 2012, then an easier and more reliable way would be to use the standard NIC Teaming tool.
You can learn more about this setting and the nuances of using it in a Hyper-V environment from this article .